Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #31519

Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

Newsgroups comp.lang.python
Date 2012-10-17 08:32 -0700
References <a7454cb7-e6dc-4167-b72a-56a67a5873a7@googlegroups.com> <mailman.2350.1350486045.27098.python-list@python.org>
Message-ID <748e561a-7e75-4b13-be6b-91831d3b59c4@googlegroups.com> (permalink)
Subject Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?
From wxjmfauth@gmail.com

Show all headers | View raw


Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
> On 10/17/2012 10:31 AM, nwaits wrote:
> 
> > I'm very impressed with python's wordlist script for plain text.  Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?  
> 
> > Thank you.
> 
> 
> 
> if you can construct a list of "illegal" characters, then you can simply
> 
> check each character of the word against the list, and if it succeeds
> 
> for all of the characters, it's a winner.
> 
> 
> 
> If that's not fast enough, you can build a translation table from the
> 
> list of illegal characters, and use translate on each word.  Then it
> 
> becomes a question of checking if the translated word is all zeroes.  
> 
> More setup time, but much faster looping for each word.
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

Lazy way.
Py3.2

>>> import unicodedata
>>> def HasDiacritics(w):
...     w_decomposed = unicodedata.normalize('NFKD', w)
...     return 'no' if len(w) == len(w_decomposed) else 'yes'
...     
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
>>>

Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Script for finding words of any size that do NOT contain vowels with acute diacritic marks? nwaits <nowaits@gmail.com> - 2012-10-17 07:31 -0700
  Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Dave Angel <d@davea.name> - 2012-10-17 11:00 -0400
    Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 08:32 -0700
      Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-17 11:07 -0600
        Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:17 -0700
          Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Chris Angelico <rosuav@gmail.com> - 2012-10-18 05:22 +1100
          Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-17 12:27 -0600
            Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:33 -0700
            Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:33 -0700
        Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:17 -0700
      Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? David Robinow <drobinow@gmail.com> - 2012-10-17 13:16 -0400
        Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-17 23:18 +0000
    Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 08:32 -0700

csiph-web