Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!selfless.tophat.at!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Fri, 13 May 2011 18:34:55 +0100
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Regular Expression for words (with umlauts, without numbers)
References: <878vua4mjp.fsf@pcwi7557.uni-muenster.de> <BANLkTinVM0v-Ujku5ZgXYXM0oP6VATEHQg@mail.gmail.com>
In-Reply-To: <BANLkTinVM0v-Ujku5ZgXYXM0oP6VATEHQg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.1520.1305308103.9059.python-list@python.org>
Lines: 22
NNTP-Posting-Host: 82.94.164.166
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:5314

On 13/05/2011 17:14, Tim Chon wrote:
> Hallo Jens,
>
> In current python re module, you have to do something like:
>
> ((?!\d|_\w)+ which uses the negative look ahead to grab all words except
> integers and underscore. Of course, if you turn on the unicode flag re.U
> or use it inline like, (?u) then this will grab your desired umlauts.
>
> I'd actually recommend, however, that if you have an extra 20 minutes,
> to use Regexp 2.7:
> http://bugs.python.org/issue2636
>
> Its a much needed improvement over F.Lundh's re implementation (from
> 1999!) and its 40% faster. Moreover, you can do exactly what you are
> requesting like so,
>
> (?u)[[:alpha:]]+
>
The latest release is here:

     http://pypi.python.org/pypi/regex