Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.110 X-Spam-Level: * X-Spam-Evidence: '*H*': 0.78; '*S*': 0.00; 'cc:addr:python-list': 0.10; 'looping': 0.16; 'subject: \n ': 0.16; 'winner.': 0.16; 'translation': 0.16; 'wrote:': 0.17; 'translate': 0.20; 'cc:2**0': 0.23; "python's": 0.23; 'cc:no real name:2**0': 0.24; 'script': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'am,': 0.27; 'checking': 0.27; 'plain': 0.27; 'question': 0.27; 'translated': 0.27; 'subject:size': 0.29; 'character': 0.29; 'words': 0.29; "i'm": 0.29; 'becomes': 0.30; 'certain': 0.33; 'word.': 0.33; 'list': 0.35; 'faster': 0.35; 'text.': 0.35; 'table': 0.35; 'subject:?': 0.35; 'there': 0.35; 'but': 0.36; 'subject:with': 0.36; 'thank': 0.36; 'subject:: ': 0.38; 'received:192': 0.39; 'build': 0.39; 'list,': 0.39; 'received:192.168': 0.40; 'you.': 0.61; 'time,': 0.62; 'more': 0.63; 'header:Reply-To:1': 0.68; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'construct': 0.84; 'grave': 0.84; 'subject:NOT': 0.84; 'subject:any': 0.84 Date: Wed, 17 Oct 2012 11:00:11 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: nwaits Subject: Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:c8CHtyd/mfq+ApJMGkLg2yaCwpviBs0M1KexUcVPyZp 20mSXeQtlKIg2mfaEwHTtpV/8o8hY8iCalZJESkXu2E1Lpk6vM gm5Io5kMdChHwSOuL+yYJ3SyM6cmJVICZjtIqzD5C1eV0CCtXt /zr636k40k+947LsmMVQHf6R9LbjLDPUnZouSt8gA5LXzN2yGo g9t/NIz/oA1VH5e39w+QhpxrefRQcceYC4mUXQyTfGuI3sR/yD 3/iNBvd5RE1InIYZD69hvDVazNc6mu5NPlElsE39H8p+9dTNoX phk06nyf+UUFhWWsaQPW2rC9S6v8KEB+WWdwF004i4LXvnAHA= = Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: d@davea.name List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 17 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1350486045 news.xs4all.nl 6983 [2001:888:2000:d::a6]:57064 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:31516 On 10/17/2012 10:31 AM, nwaits wrote: > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels? > Thank you. if you can construct a list of "illegal" characters, then you can simply check each character of the word against the list, and if it succeeds for all of the characters, it's a winner. If that's not fast enough, you can build a translation table from the list of illegal characters, and use translate on each word. Then it becomes a question of checking if the translated word is all zeroes. More setup time, but much faster looping for each word. -- DaveA