Groups > comp.lang.python > #87197 > unrolled thread

Re: Letter class in re

Started by	Albert-Jan Roskam <fomcl@yahoo.com>
First post	2015-03-09 06:33 -0700
Last post	2015-03-09 06:33 -0700
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

  Re: Letter class in re Albert-Jan Roskam <fomcl@yahoo.com> - 2015-03-09 06:33 -0700

#87197 — Re: Letter class in re

From	Albert-Jan Roskam <fomcl@yahoo.com>
Date	2015-03-09 06:33 -0700
Subject	Re: Letter class in re
Message-ID	<mailman.207.1425908185.21433.python-list@python.org>

--------------------------------------------
On Mon, 3/9/15, Tim Chase <python.list@tim.thechases.com> wrote:

 Subject: Re: Letter class in re
 To: python-list@python.org
 Date: Monday, March 9, 2015, 12:17 PM
 
 On 2015-03-09 11:37,
 Wolfgang Maier wrote:
 > On 03/09/2015
 11:23 AM, Antoon Pardon wrote:
 >> Does
 anyone know what regular expression to use for a sequence
 of
 >> letters? There is a class for
 alphanumerics but I can't find one
 >> for just letters, which I find odd.
 > 
 > how about [a-zA-Z]
 ?
 
 That breaks if you have
 Unicode letters.  While ugly, since "\w" is
 composed of "letters, numbers, and
 underscores", you can assert that
 the
 "\w" you find is not a number or underscore by
 using
 
   (?:(?!_|\d)\w)
 

I was going to make the same remark, but with a slightly different solution: 
In [1]: repr(re.search("[a-zA-Z]", "é"))
Out[1]: 'None'
 
In [2]: repr(re.search(u"[^\d\W_]+", u"é", re.I | re.U))
Out[2]: '<_sre.SRE_Match object at 0x027CDB10>'

"[^\d\W_]+" means something like "one or more (+) of 'not (a digit, a non-word, an underscore)'.

[toc] | [standalone]

csiph-web

Re: Letter class in re

Contents

#87197 — Re: Letter class in re