Path: csiph.com!usenet.pasdenom.info!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.010 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'elif': 0.05; 'represents': 0.05; 'sys': 0.07; 'categories.': 0.09; 'pep': 0.09; 'cc:addr :python-list': 0.11; 'python': 0.11; 'def': 0.12; 'collections': 0.16; 'defaultdict': 0.16; 'identifier.': 0.16; 'identifiers': 0.16; 'itertools': 0.16; 'non-ascii': 0.16; 'set()': 0.16; 'url:peps': 0.16; 'wrote:': 0.18; 'thu,': 0.19; '>>>': 0.22; 'import': 0.22; 'cc:addr:python.org': 0.22; 'url:dev': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'appear': 0.29; 'correct': 0.29; 'character': 0.29; 'quickly': 0.29; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'skip:( 20': 0.30; "i'm": 0.30; 'code': 0.31; '+0100,': 0.31; "d'aprano": 0.31; 'ones.': 0.31; 'steven': 0.31; 'up:': 0.31; 'stuff': 0.32; 'url:python': 0.33; 'there,': 0.34; 'skip:s 30': 0.35; 'skip:u 20': 0.35; 'something': 0.35; 'definition': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'method': 0.36; 'url:org': 0.36; 'starting': 0.37; 'expected': 0.38; 'pm,': 0.38; 'skip:x 10': 0.40; 'above,': 0.60; 'chain': 0.60; 'dave': 0.60; 'url:3': 0.61; 'july': 0.63; 'more': 0.64; 'worth': 0.66; 'sample': 0.67; 'jul': 0.74; 'describes': 0.84; 'url:reference': 0.84; 'angel': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=z/dFkUzJV2ToNh7AFkiTzkiiAQreB7ZIpyrBsAWZtg8=; b=fZ/GtxHNts8LPsyHo2SVfl4g/fLazDtW6gkCpH/DOLk+8BQxwhsve+I1k7zCOzsI0N DUE8rU053tbAXT+T08EfSAzvAbHnumD7yGMzgs/LALJH6n5yRtgIgSAhS3Xlk7+qDrln Qp1vCYPPv/QCA9oXuybq3PDY2kA/PWy1/k5ZsY/9LlV7xUY4DpQ7NS61/B2bYfkkLzQA 3Ikm/oDmZ6+pdfOa0C3LCk0GBRFY+s0Wls7BrE0SAoARfYVRKePNfTkENlqww6oHDPl4 ODQNtRe06bUS0H9/mRj1InHXOumSUavBa61MQHkKQZB5wThbcVj9kBDvq9dBYO7am2pv c9Yw== X-Received: by 10.152.20.40 with SMTP id k8mr4163783lae.25.1372991278349; Thu, 04 Jul 2013 19:27:58 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <51d4eb9c$0$29999$c3e8da3$5496439d@news.astraweb.com> <51d508ed$0$6512$c3e8da3$5496439d@news.astraweb.com> <51d62039$0$29999$c3e8da3$5496439d@news.astraweb.com> From: Joshua Landau Date: Fri, 5 Jul 2013 03:27:18 +0100 Subject: Re: Default scope of variables To: Dave Angel Content-Type: text/plain; charset=UTF-8 Cc: python-list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 58 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1372991287 news.xs4all.nl 15897 [2001:888:2000:d::a6]:37444 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:49923 On 5 July 2013 03:03, Dave Angel wrote: > On 07/04/2013 09:24 PM, Steven D'Aprano wrote: >> On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote: >>> It's perhaps worth mentioning that some non-ascii characters are allowed >>> in identifiers in Python 3, though I don't know which ones. >> >> PEP 3131 describes the rules: >> >> http://www.python.org/dev/peps/pep-3131/ > > The isidentifier() method will let you weed out the characters that cannot > start an identifier. But there are other groups of characters that can > appear after the starting "letter". So a more reasonable sample might be > something like: ... > In particular, > http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers > > has a definition for id_continue that includes several interesting > categories. I expected the non-ASCII digits, but there's other stuff there, > like "nonspacing marks" that are surprising. > > I'm pretty much speculating here, so please correct me if I'm way off. For my calculation above, I used this code I quickly mocked up: > import unicodedata as unidata > from sys import maxunicode > from collections import defaultdict > from itertools import chain > > def get(): > xid_starts = set() > xid_continues = set() > > id_start_categories = "Lu, Ll, Lt, Lm, Lo, Nl".split(", ") > id_continue_categories = "Mn, Mc, Nd, Pc".split(", ") > > characters = (chr(n) for n in range(maxunicode + 1)) > > print("Making normalized characters") > > normalized = (unidata.normalize("NFKC", character) for character in characters) > normalized = set(chain.from_iterable(normalized)) > > print("Assigning to categories") > > for character in normalized: > category = unidata.category(character) > > if category in id_start_categories: > xid_starts.add(character) > elif category in id_continue_categories: > xid_continues.add(character) > > return xid_starts, xid_continues Please note that "xid_continues" actually represents "xid_continue - xid_start".