Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'error:': 0.07; 'none,': 0.07; 'ascii': 0.09; 'identifier': 0.09; 'measure': 0.09; 'pep': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; '"keep': 0.16; '(modulo': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'implemented,': 0.16; 'lowercase': 0.16; 'quoted': 0.16; 'subject:Unicode': 0.16; 'uppercased,': 0.16; 'sat,': 0.16; 'wrote:': 0.18; '>>>': 0.22; 'saying': 0.22; 'cc:addr:python.org': 0.22; 'cc:2**0': 0.24; 'header:In-Reply- To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'message- id:@mail.gmail.com': 0.30; 'skip:( 20': 0.30; 'code': 0.31; "skip:' 10": 0.31; '(my': 0.31; '>>>>': 0.31; 'linux': 0.33; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'sequence': 0.36; 'should': 0.36; 'issue': 0.38; 'rather': 0.38; 'sure': 0.39; 'either': 0.39; 'according': 0.40; 'applicable': 0.60; 'break': 0.61; 'simple': 0.61; "you're": 0.61; 'name': 0.63; 'skip:n 10': 0.64; 'more': 0.64; 'temporary': 0.65; 'mar': 0.68; '2014,': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=ZIKtEskHZZVK7qPxrUEEPic4s+gbNXESu5nnggMDuBo=; b=jeLLE9wpX86ONA90CNUpHJOjSMlifLT5HGFsQkbbjX3VOXe+yqe1FkunDZWDr5T7Xl FlJ1Q21w0+WQcUHzdRcOv71zR7ZfpvZ5Dbnvklr6KtYwp3p+UQMi+0t1a1kORTJNe9IT ipzMc9CesURs/fnJSsmwru+GEjP3nnGtx3OyLgkcgmQ2eFzQ8m+cKeztE59gbdv0ArQ7 /NIf5S19jd4sh2B/gK52ArjYKy1+zHWPYxK1c0JoTFB/z79VkzCzsfQgereNJ4R3C8UW 49daMIOdKzK157gicaovdVmJy07zAeSC43XuLkZOGXVV0yF5D1o1AiCnWJr6koTy5WH4 LlyQ== MIME-Version: 1.0 X-Received: by 10.52.78.231 with SMTP id e7mr12960966vdx.28.1399079749278; Fri, 02 May 2014 18:15:49 -0700 (PDT) In-Reply-To: <432508d1-984d-4c07-890b-31a7058429c6@googlegroups.com> References: <5361d4f9$0$11109$c3e8da3@news.astraweb.com> <82067b83-a6f5-4b16-b012-385535ea5607@googlegroups.com> <53635b34$0$29965$c3e8da3$5496439d@news.astraweb.com> <0bdd2577-2893-4564-9857-fcfc6021dced@googlegroups.com> <536387b8$0$29965$c3e8da3$5496439d@news.astraweb.com> <5e91529c-c03f-44ee-a610-5697fea167b2@googlegroups.com> <432508d1-984d-4c07-890b-31a7058429c6@googlegroups.com> Date: Sat, 3 May 2014 11:15:49 +1000 Subject: Re: Unicode 7 From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1399079758 news.xs4all.nl 2944 [2001:888:2000:d::a6]:44002 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:70882 On Sat, May 3, 2014 at 10:58 AM, Rustom Mody wrote: > You think this > >>>> (=EF=AC=81ne, fine) =3D (1,2) # and no issue about it > > is fine? Not sure which part you're objecting to. Are you saying that this should be an error: >>> a, a =3D 1, 2 # simple ASCII identifier used twice or that Python should take the exact sequence of codepoints, rather than normalizing? Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> =EF=AC=81ne =3D 1 >>> vars() {'__package__': None, '__spec__': None, '__doc__': None, 'fine': 1, '__loader__': , '__builtins__': , '__name__': '__main__'} As regards normalization, I would be happy with either "keep it exactly as you provided" or "normalize according to ", as long as it's consistent. It's like what happens with SQL identifiers: according to the standard, an unquoted name should be uppercased, but some databases instead lowercase them. It doesn't break code (modulo quoted names, not applicable here), as long as it's consistent. (My reading of PEP 3131 is that NFKC is used; is that what's implemented, or was that a temporary measure and/or something for Py2 to consider?) ChrisA