Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Unicode normalisation [was Re: [beginner] What's wrong?] Date: Fri, 8 Apr 2016 14:54:00 +1000 Lines: 34 Message-ID: References: <99234e90-fcd4-4a05-b97f-b47228dde20c@googlegroups.com> <1459571270.714249.566352882.6ADCD0CC@webmail.messagingengine.com> <87bn5sqcac.fsf@elektro.pacujo.net> <56ffedf1$0$1611$c3e8da3$5496439d@news.astraweb.com> <87h9fkq7tl.fsf@elektro.pacujo.net> <3524319.g0I1c1cpMS@PointedEars.de> <2796705.edb3E9ArW3@PointedEars.de> <1584744.4h7ToaqLat@PointedEars.de> <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de AXyp7Zx1MfTFiI2LrpnrnA8KsmvpsyG6cpS5rfxQEJdg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:: [': 0.03; 'cure': 0.07; 'cc:addr:python-list': 0.09; '-rf': 0.09; 'identifier': 0.09; 'internally': 0.09; 'mess': 0.09; 'sane': 0.09; 'throw': 0.09; 'python': 0.10; 'syntax': 0.13; '"hello': 0.16; "'r'": 0.16; '2016': 0.16; 'contrived': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'identifiers,': 0.16; 'letters.': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:Unicode': 0.16; 'subject:beginner': 0.16; 'wrote:': 0.16; 'example.': 0.18; 'subject:] ': 0.19; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'component': 0.23; 'this:': 0.23; 'examples': 0.24; 'header:In-Reply-To:1': 0.24; 'error': 0.27; 'fri,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'specifically': 0.28; 'fine': 0.28; 'closer': 0.29; 'subject: [': 0.29; 'handled': 0.29; "i'm": 0.30; 'print': 0.30; "can't": 0.32; 'problem': 0.33; 'gets': 0.35; 'received:google.com': 0.35; 'text': 0.35; 'propose': 0.35; 'but': 0.36; 'received:209.85': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'say': 0.37; 'received:209.85.213': 0.37; 'received:209': 0.38; 'does': 0.39; 'enough': 0.39; 'still': 0.40; 'close': 0.61; 'back': 0.62; 'here.': 0.62; 'more': 0.63; 'letters': 0.67; 'chrisa': 0.84; 'confusing': 0.84; 'subject:skip:n 10': 0.84; 'visually': 0.84; 'to:none': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-transfer-encoding; bh=jfjsFcUl8RnuGld6m7ACyUNpoLiX+ZS56KND+pVsLhw=; b=yTgrmOAWaIbPU3O3RckIoj2gXg7xRgu9MaccbGZWE4poJkOOmGMuz5Ir1h58vlM0kx a2TlSflNL5w6TfdiFiPM0MCulRabB5yW6/jeVuHxQmThZfgWvx6ydv7XH0KDcG/kneuN Ak+1EeXtPdRfyDt/nu33krzhLpahJK4jfMAkzcxGII580tXLNw9h0fUfg+kiednvfg6b tj2W/WZ/ni2F6NWuC39ddOC8Yica7S0rVl7g23bwIDpQde7D9GJF9N54xQCaREWjsohk X92gyvexgC9CrEJPdVNAZc1mqhZRCaaJ2X27EUwRZUH73s4sVTzrkJNSVZvLbzoGx5mI gTjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc:content-transfer-encoding; bh=jfjsFcUl8RnuGld6m7ACyUNpoLiX+ZS56KND+pVsLhw=; b=gaFKlrxreX/SMXUN1lEOB+oKiOWvTXHzRv94URr8Ltbl4/jrF1pECqoX3ymZHNlkJh 8mfp2lifJZFJhddL5fS+wJaNAw5qkEtJdgOXluTllewEaBhHVXjGKyYznT+uEefncNVT afqCZK0UKM81j6Q8oLt5LMRpaPODGn4NdMHtvJF69dQkRofDJq70YlrY+B4TcQphSevk +kPxT3/XuZD676lPK9HBu2bIySyreeuPs/4mN1Fdo3PaNGOX7tDlm371Mxj11vRhYUir o/CT8+KtzRiHubcvTcIDhNhkpHipSryKxEqGpxHM4YtSorpnWgNo4oDM12dycxnE0ijY cbCw== X-Gm-Message-State: AD7BkJIZhENCf3bFYxn2Y2oCZBatipptWpkbwer8FhyWCejpOO9ptLTYzSdAc5Jp7LM89XbCBgiPS8mnPcGCzg== X-Received: by 10.50.43.226 with SMTP id z2mr1436357igl.94.1460091240394; Thu, 07 Apr 2016 21:54:00 -0700 (PDT) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <99234e90-fcd4-4a05-b97f-b47228dde20c@googlegroups.com> <1459571270.714249.566352882.6ADCD0CC@webmail.messagingengine.com> <87bn5sqcac.fsf@elektro.pacujo.net> <56ffedf1$0$1611$c3e8da3$5496439d@news.astraweb.com> <87h9fkq7tl.fsf@elektro.pacujo.net> <3524319.g0I1c1cpMS@PointedEars.de> <2796705.edb3E9ArW3@PointedEars.de> <1584744.4h7ToaqLat@PointedEars.de> <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> Xref: csiph.com comp.lang.python:106643 On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody wrote: > No I am not clever/criminal enough to know how to write a text that is vi= sually > close to > print "Hello World" > but is internally closer to > rm -rf / > > For me this: > >>> =CE=91 =3D 1 >>>> A =3D 2 >>>> =CE=91 + 1 =3D=3D A > True >>>> > > > is cure enough that I am not amused To me, the above is a contrived example. And you can contrive examples that are just as confusing while still being ASCII-only, like swimmer/swirnmer in many fonts, or I and l, or any number of other visually-confusing glyphs. I propose that we ban the letters 'r' and 'l' from identifiers, to ensure that people can't mess with themselves. > Specifically as far as I am concerned if python were to throw back say > a ligature in an identifier as a syntax error -- exactly what python2 doe= s -- > I think it would be perfectly fine and a more sane choice The ligature is handled straight-forwardly: it gets decomposed into its component letters. I'm not seeing a problem here. ChrisA