Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #31519
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2012-10-17 08:32 -0700 |
| References | <a7454cb7-e6dc-4167-b72a-56a67a5873a7@googlegroups.com> <mailman.2350.1350486045.27098.python-list@python.org> |
| Message-ID | <748e561a-7e75-4b13-be6b-91831d3b59c4@googlegroups.com> (permalink) |
| Subject | Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? |
| From | wxjmfauth@gmail.com |
Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
> On 10/17/2012 10:31 AM, nwaits wrote:
>
> > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
>
> > Thank you.
>
>
>
> if you can construct a list of "illegal" characters, then you can simply
>
> check each character of the word against the list, and if it succeeds
>
> for all of the characters, it's a winner.
>
>
>
> If that's not fast enough, you can build a translation table from the
>
> list of illegal characters, and use translate on each word. Then it
>
> becomes a question of checking if the translated word is all zeroes.
>
> More setup time, but much faster looping for each word.
>
>
>
> --
>
>
>
> DaveA
Lazy way.
Py3.2
>>> import unicodedata
>>> def HasDiacritics(w):
... w_decomposed = unicodedata.normalize('NFKD', w)
... return 'no' if len(w) == len(w_decomposed) else 'yes'
...
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
>>>
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)
jmf
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Script for finding words of any size that do NOT contain vowels with acute diacritic marks? nwaits <nowaits@gmail.com> - 2012-10-17 07:31 -0700
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Dave Angel <d@davea.name> - 2012-10-17 11:00 -0400
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 08:32 -0700
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-17 11:07 -0600
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:17 -0700
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Chris Angelico <rosuav@gmail.com> - 2012-10-18 05:22 +1100
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-17 12:27 -0600
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:33 -0700
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:33 -0700
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 11:17 -0700
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? David Robinow <drobinow@gmail.com> - 2012-10-17 13:16 -0400
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-17 23:18 +0000
Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks? wxjmfauth@gmail.com - 2012-10-17 08:32 -0700
csiph-web