Groups > comp.lang.python > #2402 > unrolled thread

Alphabetics respect to a given locale

Started by	candide <candide@free.invalid>
First post	2011-04-01 22:55 +0200
Last post	2011-04-02 15:18 +0200
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

  Alphabetics respect to a given locale candide <candide@free.invalid> - 2011-04-01 22:55 +0200
    Re: Alphabetics respect to a given locale Emile van Sebille <emile@fenx.com> - 2011-04-01 16:16 -0700
    Re: Alphabetics respect to a given locale candide <candide@free.invalid> - 2011-04-02 15:18 +0200

#2402 — Alphabetics respect to a given locale

From	candide <candide@free.invalid>
Date	2011-04-01 22:55 +0200
Subject	Alphabetics respect to a given locale
Message-ID	<4d963c2b$0$1584$426a34cc@news.free.fr>

How to retrieve the list of all characters defined as alphabetic for the 
current locale  ?

[toc] | [next] | [standalone]

#2416

From	Emile van Sebille <emile@fenx.com>
Date	2011-04-01 16:16 -0700
Message-ID	<mailman.111.1301699907.2990.python-list@python.org>
In reply to	#2402

On 4/1/2011 1:55 PM candide said...
> How to retrieve the list of all characters defined as alphabetic for the
> current locale ?

I think this is supposed to work, but not for whatever reason for me 
when I try to test after changing my locale (but I think that's a centos 
thing)...

import locale
locale.setlocale(locale.LC_ALL,'')
import string
print string.lowercase

I don't see where else this might be for python.

However, you can test if something is alpha:

 >>> val = u'caf' u'\xE9'
 >>> val.isalpha()
True
 >>>

... and check its unicode category

 >>> import unicodedata
 >>> unicodedata.category(u'a')
'Ll' # Letter - lower case
 >>> unicodedata.category(u'A')
'Lu' # Letter - upper case
 >>> unicodedata.category(u'1')
'Nd' # Number - decimal?
 >>> unicodedata.category(u'\x01')
'Cc' #


HTH,

Emile

[toc] | [prev] | [next] | [standalone]

#2453

From	candide <candide@free.invalid>
Date	2011-04-02 15:18 +0200
Message-ID	<4d972283$0$4785$426a74cc@news.free.fr>
In reply to	#2402

Le 01/04/2011 22:55, candide a écrit :
> How to retrieve the list of all characters defined as alphabetic for the
> current locale ?


Thanks for the responses. Alas, neither solution works.

Under Ubuntu :

# ----------------------
import string
import locale

print locale.getdefaultlocale()
print locale.getpreferredencoding()

locale.setlocale(locale.LC_ALL, "")

print string.letters

letter_class = u"[" + u"".join(unichr(c) for c in range(0x10000) if
unichr(c).isalpha()) + u"]"

#print letter_class
# ----------------------

prints the following :


('fr_FR', 'UTF8')
UTF-8
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz


I commented out the letter_class printing for outputing a flood of 
characters not belonging to the usual french character set.


More or less the same problem under Windows, for instance, 
string.letters gives the "latin capital letter eth" as an analphabetic 
character (this is not the case, we never use this letter in true french 
words).

[toc] | [prev] | [standalone]

csiph-web

Alphabetics respect to a given locale

Contents

#2402 — Alphabetics respect to a given locale

#2416

#2453