Path: csiph.com!usenet.pasdenom.info!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <kr59i3$8jb$1@ger.gmane.org>
References: <51d4eb9c$0$29999$c3e8da3$5496439d@news.astraweb.com> <mailman.4200.1372910878.3114.python-list@python.org> <51d508ed$0$6512$c3e8da3$5496439d@news.astraweb.com> <mailman.4211.1372924504.3114.python-list@python.org> <kr491h$g2t$1@dont-email.me> <51d62039$0$29999$c3e8da3$5496439d@news.astraweb.com> <kr59i3$8jb$1@ger.gmane.org>
From: Joshua Landau <joshua.landau.ws@gmail.com>
Date: Fri, 5 Jul 2013 03:27:18 +0100
Subject: Re: Default scope of variables
To: Dave Angel <davea@davea.name>
Content-Type: text/plain; charset=UTF-8
Cc: python-list <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.4265.1372991287.3114.python-list@python.org>
Lines: 58
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:49923

On 5 July 2013 03:03, Dave Angel <davea@davea.name> wrote:
> On 07/04/2013 09:24 PM, Steven D'Aprano wrote:
>> On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote:
>>> It's perhaps worth mentioning that some non-ascii characters are allowed
>>> in identifiers in Python 3, though I don't know which ones.
>>
>> PEP 3131 describes the rules:
>>
>> http://www.python.org/dev/peps/pep-3131/
>
> The isidentifier() method will let you weed out the characters that cannot
> start an identifier.  But there are other groups of characters that can
> appear after the starting "letter".  So a more reasonable sample might be
> something like:
...
> In particular,
>     http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers
>
> has a  definition for id_continue that includes several interesting
> categories.  I expected the non-ASCII digits, but there's other stuff there,
> like "nonspacing marks" that are surprising.
>
> I'm pretty much speculating here, so please correct me if I'm way off.

For my calculation above, I used this code I quickly mocked up:

> import unicodedata as unidata
> from sys import maxunicode
> from collections import defaultdict
> from itertools import chain
>
> def get():
>     xid_starts = set()
>     xid_continues = set()
>
>     id_start_categories = "Lu, Ll, Lt, Lm, Lo, Nl".split(", ")
>     id_continue_categories = "Mn, Mc, Nd, Pc".split(", ")
>
>     characters = (chr(n) for n in range(maxunicode + 1))
>
>     print("Making normalized characters")
>
>     normalized = (unidata.normalize("NFKC", character) for character in characters)
>     normalized = set(chain.from_iterable(normalized))
>
>     print("Assigning to categories")
>
>     for character in normalized:
>         category = unidata.category(character)
>
>         if category in id_start_categories:
>             xid_starts.add(character)
>         elif category in id_continue_categories:
>             xid_continues.add(character)
>
>     return xid_starts, xid_continues

Please note that "xid_continues" actually represents "xid_continue - xid_start".