Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.009 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'discard': 0.07; 'subject:code': 0.07; 'whatever.': 0.09; 'cc:addr:python-list': 0.11; 'mostly': 0.14; 'adjusted': 0.16; 'adjusting': 0.16; 'dict': 0.16; 'dictionary.': 0.16; 'from:addr:pobox.com': 0.16; 'from:addr:skip': 0.16; 'skipped': 0.16; 'suffixes': 0.16; 'words.': 0.16; 'subject:python': 0.16; ':-)': 0.16; 'sender:addr:gmail.com': 0.17; 'fix': 0.17; 'wrote:': 0.18; 'all,': 0.19; 'aug': 0.22; 'cc:addr:python.org': 0.22; 'adds': 0.24; 'skip': 0.24; 'cc:2**0': 0.24; 'suggested': 0.26; 'train': 0.26; 'meeting': 0.26; 'header:In-Reply-To:1': 0.27; 'thus': 0.29; 'words': 0.29; 'message-id:@mail.gmail.com': 0.30; 'fri,': 0.33; 'common': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'accessing': 0.36; 'advertised': 0.36; 'curious': 0.36; "i'll": 0.36; 'thank': 0.38; 'pm,': 0.38; 'anything': 0.39; 'does': 0.39; 'algorithms': 0.60; 'lower': 0.61; 'simple': 0.61; 'such': 0.63; 'accounts': 0.64; 'contact': 0.67; 'advantages': 0.68; 'adapted': 0.84; 'ethan': 0.84; 'furman': 0.84; 'suited': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=XUzIyf24CvgjOL89XiWF4QOrtL5+45b3xfX/gDgiiG8=; b=x5yopXhx46Fj2djyIwDHdEq0qkWfbJAuyATmLLg6dSPD1v+jM1ukxj2NtChUNiPnFN hBDk3p52Cc0t+COWHC6AZF+G50ayQxxmT55wC6+sxln0+2//LAa3I166JnlhnuA7NuMG GY/5psuZ3RIA1CdUsUYITyMM0chI8Br2wrRjqniQwDaXjpfTTQmHNWlR2OVNDdRi4VfC 1w0FjgFWqcFmH//0V8fUssCpd3U3kHr8Gcz4nC4EM5vhEecYBNWmZFSyKOlTJrhbsCQ2 qfO6zVvd5H5zA0GVh7Y+/C5bp4BQe6gODfR6rnnSTMc3O7+uoueu2OndIr6+9DkdGTqi Ci4w== MIME-Version: 1.0 X-Received: by 10.50.21.3 with SMTP id r3mr8372695ige.13.1409364845391; Fri, 29 Aug 2014 19:14:05 -0700 (PDT) Sender: skip.montanaro@gmail.com In-Reply-To: <540112C8.9090100@stoneleaf.us> References: <540112C8.9090100@stoneleaf.us> Date: Fri, 29 Aug 2014 21:14:05 -0500 X-Google-Sender-Auth: XpFAhdA2l-U0ROy1I94nDCZmV9c Subject: Re: Keeping python code and database in sync From: Skip Montanaro To: Ethan Furman Content-Type: text/plain; charset=UTF-8 Cc: Python X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 22 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1409364849 news.xs4all.nl 2942 [2001:888:2000:d::a6]:42082 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77309 On Fri, Aug 29, 2014 at 6:54 PM, Ethan Furman wrote: > Thus endeth my attempts to train Skip's Polly. But I am curious -- if 'suckitude' is in immediate contact with punctuation such as just now, or at the end of a sentence, does it not count? That would be suckitude indeed! ;) Thank you all, "suckitude" made it into the corpus, or cruft, or dustbin, whatever. :-) ? dict /usr/share/dict/words accessing accounts adapted adding addressed adds adjusted adjusting advantages advertised aired akumbo algorithms alister allen allowed ... subscribed suckitude suffered suggested suggestions suggests suited ... /usr/share/dict/words really isn't a very good dictionary. Note all the words that are valid but which get flagged, mostly because they have common suffixes applied to words. I'll fix that shortly. Yes, "words" are skipped if they contain anything other than lower case alphabetic characters. Really simple words = text.split(), then discard words not meeting the criteria. Skip