Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #31520

Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <wxjmfauth@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.069
X-Spam-Evidence '*H*': 0.86; '*S*': 0.00; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'def': 0.10; 'looping': 0.16; 'subject: \n ': 0.16; 'winner.': 0.16; 'translation': 0.16; 'wrote:': 0.17; 'unicode': 0.17; '>>>': 0.18; 'translate': 0.20; 'import': 0.21; "python's": 0.23; 'script': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'am,': 0.27; 'cc:addr:gmail.com': 0.27; 'checking': 0.27; 'plain': 0.27; 'question': 0.27; 'cc:2**2': 0.27; 'translated': 0.27; 'subject:size': 0.29; 'character': 0.29; 'words': 0.29; "i'm": 0.29; 'becomes': 0.30; 'certain': 0.33; 'word.': 0.33; 'received:google.com': 0.34; 'list': 0.35; 'faster': 0.35; 'text.': 0.35; 'table': 0.35; 'subject:?': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'skip:u 20': 0.36; 'but': 0.36; 'cc:no real name:2**1': 0.36; 'subject:with': 0.36; 'should': 0.36; 'thank': 0.36; 'received:209': 0.37; 'received:209.85.216': 0.37; 'subject:: ': 0.38; 'build': 0.39; 'list,': 0.39; 'from:no real name:2**0': 0.60; 'range': 0.60; 'you.': 0.61; 'time,': 0.62; 'more': 0.63; 'capital': 0.68; 'construct': 0.84; 'grave': 0.84; 'received:209.85.216.184': 0.84; 'subject:NOT': 0.84; 'subject:any': 0.84; 'angel': 0.93
Newsgroups comp.lang.python
Date Wed, 17 Oct 2012 08:32:52 -0700 (PDT)
In-Reply-To <mailman.2350.1350486045.27098.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=81.62.101.121; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_
References <a7454cb7-e6dc-4167-b72a-56a67a5873a7@googlegroups.com> <mailman.2350.1350486045.27098.python-list@python.org>
User-Agent G2/1.0
X-Google-Web-Client true
X-Google-IP 81.62.101.121
MIME-Version 1.0
Subject Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?
From wxjmfauth@gmail.com
To comp.lang.python@googlegroups.com
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
Cc python-list@python.org, d@davea.name, nwaits <nowaits@gmail.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Message-ID <mailman.2352.1350487976.27098.python-list@python.org> (permalink)
Lines 57
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1350487976 news.xs4all.nl 6845 [2001:888:2000:d::a6]:44546
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:31520

Show key headers only | View raw


Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
> On 10/17/2012 10:31 AM, nwaits wrote:
> 
> > I'm very impressed with python's wordlist script for plain text.  Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?  
> 
> > Thank you.
> 
> 
> 
> if you can construct a list of "illegal" characters, then you can simply
> 
> check each character of the word against the list, and if it succeeds
> 
> for all of the characters, it's a winner.
> 
> 
> 
> If that's not fast enough, you can build a translation table from the
> 
> list of illegal characters, and use translate on each word.  Then it
> 
> becomes a question of checking if the translated word is all zeroes.  
> 
> More setup time, but much faster looping for each word.
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

Lazy way.
Py3.2

>>> import unicodedata
>>> def HasDiacritics(w):
...     w_decomposed = unicodedata.normalize('NFKD', w)
...     return 'no' if len(w) == len(w_decomposed) else 'yes'
...     
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
>>>

Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Script for finding words of any size that do NOT contain vowels with acute diacritic marks? nwaits <nowaits@gmail.com> - 2012-10-17 07:31 -0700
  Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Dave Angel <d@davea.name> - 2012-10-17 11:00 -0400
    Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 08:32 -0700
      Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-17 11:07 -0600
        Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:17 -0700
          Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Chris Angelico <rosuav@gmail.com> - 2012-10-18 05:22 +1100
          Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-17 12:27 -0600
            Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:33 -0700
            Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:33 -0700
        Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:17 -0700
      Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? David Robinow <drobinow@gmail.com> - 2012-10-17 13:16 -0400
        Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-17 23:18 +0000
    Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 08:32 -0700

csiph-web