Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'explicitly': 0.05; 'explicit': 0.07; 'permitted': 0.07; 'ascii': 0.09; 'classes.': 0.09; 'identifier': 0.09; 'python': 0.11; '"python': 0.16; 'backward': 0.16; 'categories,': 0.16; 'hypothetical': 0.16; 'identifiers': 0.16; 'identifiers.': 0.16; 'introduces': 0.16; 'listed,': 0.16; 'mine.': 0.16; 'need:': 0.16; 'non-ascii': 0.16; 'right:': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.18; 'wed,': 0.18; '3.0': 0.19; 'seems': 0.21; 'unicode': 0.24; '(for': 0.26; 'defined': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'character': 0.29; 'generally': 0.29; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'that.': 0.31; 'class': 0.32; 'lists': 0.32; "we're": 0.32; 'url:python': 0.33; 'continuing': 0.33; 'maybe': 0.34; 'definition': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'url:org': 0.36; 'should': 0.36; 'turn': 0.37; 'list': 0.37; 'starting': 0.37; 'to:addr:python-list': 0.38; 'though,': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'url:3': 0.61; 'more': 0.64; 'talking': 0.65; 'obvious': 0.74; 'batchelder': 0.84; 'characters,': 0.84; 'url:reference': 0.84; 'whereas': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=3hVijfnGzCS3udob7TLoNGKxk9loSKlZUL9nWkxuZys=; b=HFYrYPGirXTmmnjgQdFMv/we6dPq2LMJPHig4qInHIGp4GoduRQ7UvBwBNFfjjev4m mpGkfQEv/fhGVy4r20jWdotF55Acs7WygLyEBQdsI94Yz1bDZpI6xsuXs1zFERSwx/hB yaWVrW1LS8ortNElHySjokCARmri4QdaE+dOiy2GwoeVGbri44dlknc4ajs5vgTHEoaM LyB0jpf31ee8XHV5R4gCU2hF73n4rCmF79NhdCE2RxECOjWVkq9kO2w3tQsTeA7Tzqhw vgHMTxkb3LgPUoQ/mXIWR+DLkkQPajWZP8AZSYpRH8oHmVxVEPcG22LTMusmIzmNjk3H 7W1g== X-Received: by 10.67.13.134 with SMTP id ey6mr31708589pad.44.1396367661459; Tue, 01 Apr 2014 08:54:21 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <5331D902.3030902@gmail.com> <53321819$0$29994$c3e8da3$5496439d@news.astraweb.com> <53393BA4.2080305@rece.vub.ac.be> <5339C281.7080300@rece.vub.ac.be> <533A768F.5080102@rece.vub.ac.be> <533A96E9.1030107@rece.vub.ac.be> <533AAA13.4010309@rece.vub.ac.be> From: Ian Kelly Date: Tue, 1 Apr 2014 09:53:41 -0600 Subject: Re: unicode as valid naming symbols To: Python Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1396367664 news.xs4all.nl 2893 [2001:888:2000:d::a6]:36595 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:69522 On Tue, Apr 1, 2014 at 7:44 AM, Chris Angelico wrote: > On Wed, Apr 2, 2014 at 12:33 AM, Ned Batchelder wrote: >> Maybe I'm misunderstanding the discussion... It seems like we're talking >> about a hypothetical definition of identifiers based on Unicode character >> categories, but there's no need: Python 3 has defined precisely that. From >> the docs >> (https://docs.python.org/3/reference/lexical_analysis.html#identifiers): >> > > "Python 3.0 introduces **additional characters** from outside the > ASCII range" - emphasis mine. > > Python currently has - at least, per that documentation - a hybrid > system with ASCII characters defined in the classic way, and non-ASCII > characters defined by their Unicode character classes. I'm talking > about a system that's _purely_ defined by Unicode character classes. > It may turn out that the class list exactly compasses the ASCII > characters listed, though, in which case you'd be right: it's not > hypothetical. The only ASCII character not encompassed is that _ is explicitly permitted to start an identifier (for obvious reasons) whereas characters in Pc are more generally only permitted to continue identifiers. There are also explicit lists of extra permitted characters in PropList.txt for backward compatibility (once a character is permitted, it should remain permitted even if its Unicode category changes). There are currently 4 extra starting characters and 12 extra continuing characters, but none of these are ASCII.