Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!border1.nntp.ams2.giganews.com!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Dave Angel <davea@davea.name>
Subject: Re: Default scope of variables
Date: Thu, 04 Jul 2013 22:03:52 -0400
References: <51d4eb9c$0$29999$c3e8da3$5496439d@news.astraweb.com> <mailman.4200.1372910878.3114.python-list@python.org> <51d508ed$0$6512$c3e8da3$5496439d@news.astraweb.com> <mailman.4211.1372924504.3114.python-list@python.org> <kr491h$g2t$1@dont-email.me> <51d62039$0$29999$c3e8da3$5496439d@news.astraweb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
In-Reply-To: <51d62039$0$29999$c3e8da3$5496439d@news.astraweb.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.4264.1372989849.3114.python-list@python.org>
Lines: 75
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:49921

On 07/04/2013 09:24 PM, Steven D'Aprano wrote:
> On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote:
> [...]
>> Anyway, none of the calculations that has been given takes into account
>> the fact that names can be /less/ than one million characters long.
>
>
> Not in *my* code they don't!!!
>
> *wink*
>
>
>> The
>> actual number of non-empty strings of length at most 1000000 characters,
>> that consist only of ascii letters, digits or underscores, and that
>> don't start with a digit, is
>>
>> sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62
>
>
> I take my hat of to you sir, or possibly madam. That is truly an inspired
> piece of pedantry.
>
>
>> It's perhaps worth mentioning that some non-ascii characters are allowed
>> in identifiers in Python 3, though I don't know which ones.
>
> PEP 3131 describes the rules:
>
> http://www.python.org/dev/peps/pep-3131/
>
> For example:
>
> py> import unicodedata as ud
> py> for c in 'éæ¥µ¿μЖᚃ‰⇄∞':
> ...     print(c, ud.name(c), c.isidentifier(), ud.category(c))
> ...
> é LATIN SMALL LETTER E WITH ACUTE True Ll
> æ LATIN SMALL LETTER AE True Ll
> ¥ YEN SIGN False Sc
> µ MICRO SIGN True Ll
> ¿ INVERTED QUESTION MARK False Po
> μ GREEK SMALL LETTER MU True Ll
> Ж CYRILLIC CAPITAL LETTER ZHE True Lu
> ᚃ OGHAM LETTER FEARN True Lo
> ‰ PER MILLE SIGN False Po
> ⇄ RIGHTWARDS ARROW OVER LEFTWARDS ARROW False So
> ∞ INFINITY False Sm
>
>
>

The isidentifier() method will let you weed out the characters that 
cannot start an identifier.  But there are other groups of characters 
that can appear after the starting "letter".  So a more reasonable 
sample might be something like:

 > py> import unicodedata as ud
 > py> for c in 'éæ¥µ¿μЖᚃ‰⇄∞':
 > ...     xc = "X" + c
 > ...     print(c, ud.name(c), xc.isidentifier(), ud.category(c))
 > ...

In particular,
     http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers

has a  definition for id_continue that includes several interesting 
categories.  I expected the non-ASCII digits, but there's other stuff 
there, like "nonspacing marks" that are surprising.

I'm pretty much speculating here, so please correct me if I'm way off.

-- 
DaveA