Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder3.xlned.com!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.016 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; '(all': 0.07; 'discard': 0.07; 'subject:code': 0.07; 'etc).': 0.09; 'cc:addr:python-list': 0.11; 'brackets': 0.16; 'comma,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'skipped': 0.16; 'subject:python': 0.16; 'sat,': 0.16; 'wrote:': 0.18; 'aug': 0.22; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'skip': 0.24; 'cc:2**0': 0.24; 'meeting': 0.26; 'header:In-Reply-To:1': 0.27; 'words': 0.29; 'message-id:@mail.gmail.com': 0.30; 'common': 0.35; 'problem.': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'pm,': 0.38; 'anything': 0.39; 'easy': 0.60; 'catch': 0.60; 'lower': 0.61; 'full': 0.61; 'simple': 0.61; '30,': 0.65; 'more:': 0.84; 'stop,': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=MAvE4IG2WFnmfyCzvJh2hcNb7QhPA+wtANfALqi58TU=; b=ssRngWQTQYH2ai5S1aur4y2YQbE6rwZ73kM6ms/N0Lu9ugf9bzlhpuvNhpFzif3pnL a7tHJdctbWzKDn6kZdrPzHVJGyVXd8Uijai4r9MfGOa20eO7ksxN1gfid/Vzox7ZFqow kROK/no47pJ51NIL0xelu//cCzXT08WyyEP8sk5AXn6cAMuO84U8tonSwEO9f3LhIr1q +gUMHnPvcz099EOQ84gR90b0fR++Wfm0B+KY2u3+NRmCjyCYGataj8jdIuJwZSdSu/wI fA2X3KFaSK6+Len13nbhPQjxw41fsBRFolv7W4TDnx4UtoD24BEVzppbYW2Z8PGvLQBf dGiQ== MIME-Version: 1.0 X-Received: by 10.42.35.8 with SMTP id o8mr14682408icd.41.1409365734989; Fri, 29 Aug 2014 19:28:54 -0700 (PDT) In-Reply-To: References: <540112C8.9090100@stoneleaf.us> Date: Sat, 30 Aug 2014 12:28:54 +1000 Subject: Re: Keeping python code and database in sync From: Chris Angelico Cc: Python Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 11 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1409365737 news.xs4all.nl 2899 [2001:888:2000:d::a6]:47363 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77310 On Sat, Aug 30, 2014 at 12:14 PM, Skip Montanaro wrote: > Yes, "words" are skipped if they contain anything other than lower > case alphabetic characters. Really simple words = text.split(), then > discard words not meeting the criteria. Easy way to catch a few more: Just .strip() off a few common items of punctuation (quotes (all types), full stop, comma, brackets (all types), etc). If there are any inside the word, discard the word, but those at one end or other aren't a problem. ChrisA