Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.017 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'deny': 0.07; 'duplicate': 0.07; 'etc).': 0.09; 'parsing': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'thread': 0.14; '(it': 0.16; 'as-is': 0.16; 'did.': 0.16; 'distinct': 0.16; 'entirely.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'identifiers;': 0.16; 'operators,': 0.16; 'operators.': 0.16; 'simplest': 0.16; 'subject:unicode': 0.16; 'symbols': 0.16; 'syntactic': 0.16; 'elements': 0.16; 'all.': 0.16; 'language': 0.16; 'wrote:': 0.18; 'obviously': 0.18; "python's": 0.19; 'cc:addr:python.org': 0.22; 'mathematical': 0.24; 'switched': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'suggested': 0.26; 'header :In-Reply-To:1': 0.27; 'point': 0.28; 'character': 0.29; 'thus': 0.29; "doesn't": 0.30; 'characters': 0.30; 'operations,': 0.30; 'message-id:@mail.gmail.com': 0.30; 'went': 0.31; 'code': 0.31; '(maybe': 0.31; 'boundary': 0.31; 'fine,': 0.31; 'indentation': 0.31; 'names.': 0.31; 'operators': 0.31; 'symbolic': 0.31; 'this.': 0.32; 'figure': 0.32; 'sense': 0.34; 'classes': 0.35; 'one,': 0.35; 'operations': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'possible': 0.36; 'issue': 0.38; 'pm,': 0.38; 'either': 0.39; 'space': 0.40; 'how': 0.40; 'ensure': 0.60; 'letters': 0.60; 'impact': 0.61; 'new': 0.61; 'entire': 0.61; 'name': 0.63; 'choose': 0.64; 'become': 0.64; 'more': 0.64; 'nobody': 0.68; 'legal': 0.71; 'therefore': 0.72; 'increasing': 0.74; 'future,': 0.83; 'actually,': 0.84; 'collision': 0.84; "it'd": 0.84; 'pardon': 0.84; 'ruled': 0.84; 'stronger': 0.84; 'to:none': 0.92; 'serious': 0.97 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=Iv/QqKtPbBQto4/PEp+ybo6tt7iKLDJr+irEw/Rk80M=; b=uDSkI2d8KjjLg1LDxBenvjAp40LkTeLt73zdTwVthDQHTd81x6VDLkotEen7RrQTSe ERK2qHQMB0bsmcqPm/R0Fy5L2P1HpPBjP6HXqo30YtsiWS6M7HIl4kOdfWSf9TZhsyRj uXJcjJQGd5KLICr9YrHCvfJlHiaD5dBWMCt9Rxb/mHiFLk4wWL/8We9ef8F+8wV6oBLF 1qsTc3PyZJSXzdCOqIdjc3e3cTehawx1cZrX2yiGn+0d0qYKFTVlWz0bKoUnNt8m97py Uf50wEJuQLKTiaYXTADScgN239tf6G9XG+nU5Gk9hiZP+DBmr1lEcpBW+6XRgPxux5iQ tFSg== MIME-Version: 1.0 X-Received: by 10.66.181.70 with SMTP id du6mr3921423pac.23.1396349895171; Tue, 01 Apr 2014 03:58:15 -0700 (PDT) In-Reply-To: <533A96E9.1030107@rece.vub.ac.be> References: <5331D902.3030902@gmail.com> <53321819$0$29994$c3e8da3$5496439d@news.astraweb.com> <53393BA4.2080305@rece.vub.ac.be> <5339C281.7080300@rece.vub.ac.be> <533A768F.5080102@rece.vub.ac.be> <533A96E9.1030107@rece.vub.ac.be> Date: Tue, 1 Apr 2014 21:58:14 +1100 Subject: Re: unicode as valid naming symbols From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 36 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1396349898 news.xs4all.nl 2898 [2001:888:2000:d::a6]:59208 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:69503 On Tue, Apr 1, 2014 at 9:37 PM, Antoon Pardon wrote: > Python also uses symbols for names of operations, like '+'. And when > someone suggested python might consider increasing the number of > operations and gave some symbols for those extra operations, nobody > suggested that would make python unreadable, though it would be far > more like the path taken by APL then what we are discussing now. Actually, people did. But mainly the thread (look up "Time we switched to unicode?") went off looking at how hard it'd be to type those operators, and therefore the more serious point that there would either be hard-to-type language elements or duplicate syntactic tokens ("lambda" as well as "=CE=BB", etc). That isn't an issue with names, because any name has only one, well, name. If you choose to use both "alpha" and "=CE=B1" as names, that's fine, and they're distinct names. You can make your code unreadable, and it doesn't impact my code at all. Language-level features like operators have stronger concerns. But because, in the future, Python may choose to create new operators, the simplest and safest way to ensure safety is to put a boundary on what can be operators and what can be names; Unicode character classes are perfect for this. It's also possible that all Unicode whitespace characters might become legal for indentation and separation (maybe they are already??), so obviously they're ruled out as identifiers; anyway, I honestly do not think people would want to use U+2007 FIGURE SPACE inside a name. So if we deny whitespace, and accept letters and digits, it makes good sense to deny mathematical symbols so as to keep them available for operators. (It also makes reasonable sense to *permit* mathematical symbols, thus allowing you to use them for functions/methods, in the same way that you can use "n", "o", and "t", but not "not"; but with word operators, the entire word has to be used as-is before it's a collision - with a symbolic one, any instance of that symbol inside a name will change parsing entirely. It's a trade-off, and Python's made a decision one way and not the other.) ChrisA