Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Unicode normalisation [was Re: [beginner] What's wrong?] Date: Fri, 8 Apr 2016 16:13:07 +1000 Lines: 37 Message-ID: References: <99234e90-fcd4-4a05-b97f-b47228dde20c@googlegroups.com> <1459571270.714249.566352882.6ADCD0CC@webmail.messagingengine.com> <87bn5sqcac.fsf@elektro.pacujo.net> <56ffedf1$0$1611$c3e8da3$5496439d@news.astraweb.com> <87h9fkq7tl.fsf@elektro.pacujo.net> <3524319.g0I1c1cpMS@PointedEars.de> <2796705.edb3E9ArW3@PointedEars.de> <1584744.4h7ToaqLat@PointedEars.de> <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> <570748ec$0$1620$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de O0JihW9U2BDjmJ+MXS4I3ArHEu3UhHPA0qZQJ0ISOGmg== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.028 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'subject:: [': 0.03; 'received:209.85.223': 0.03; 'compiler': 0.05; 'cc:addr:python- list': 0.09; "ain't": 0.09; '2016': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'identifiers,': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'silly': 0.16; 'subject:Unicode': 0.16; 'subject:beginner': 0.16; 'wrote:': 0.16; "wouldn't": 0.16; 'skip:a 60': 0.18; 'subject:] ': 0.19; 'all,': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'not,': 0.22; 'mixed': 0.22; 'header:In-Reply-To:1': 0.24; 'script': 0.25; 'fri,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'character.': 0.29; 'comparison': 0.29; 'really,': 0.29; 'subject: [': 0.29; 'character': 0.29; "i'm": 0.30; 'code': 0.30; "can't": 0.32; 'align': 0.33; "d'aprano": 0.33; 'raising': 0.33; 'steven': 0.33; 'editor': 0.34; 'that,': 0.34; 'received:google.com': 0.35; 'unicode': 0.35; 'quite': 0.35; "isn't": 0.35; 'but': 0.36; 'too': 0.36; 'should': 0.36; 'received:209.85': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'beyond': 0.37; 'things': 0.38; 'received:209': 0.38; 'names': 0.38; 'anything': 0.38; 'why': 0.39; 'sure': 0.39; 'easily': 0.39; 'space': 0.40; 'some': 0.40; 'skip:n 10': 0.62; 'more': 0.63; 'you.': 0.64; 'combining': 0.66; 'apart': 0.70; 'feeling': 0.72; 'treat': 0.72; 'business.': 0.77; '100': 0.79; '4:00': 0.84; 'chrisa': 0.84; 'forced': 0.84; 'gut': 0.84; 'subject:skip:n 10': 0.84; 'typesetting': 0.84; 'to:none': 0.91; 'thing,': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc; bh=pTpdId3IVmxqh2JhhEqD5/GiPs6Zt+vTzOCLTc1GawY=; b=Ah/M6kDwVippnxu8a8HmQQj9E1OF8horwBrEWNVFMeKMSQc5fod7u7dlfflXkQYL5L ICE76G/KCJW1GUeOXL1Qw++jfdX6dm6yFbdZS1korILkdiF2We9bjy1x6ilX9z49/hyE n8saMOhwKTTXUhmuSF55z6jDutO3ZZyiW1R0GPT/kFG3nEGSrKxfUcULyZC44Q4XvrOn SS5/j3b8bHeiV9UQY6V77sVLYusetFcwyduzvviASO+Ha/0x00rJFL1b3O4yW8pwCVse NP5AGGIhHE0gX33TwRsPi2hPjgVlhULNDue6Bv0m6ZlU2IeGMQqTg6rDx1FCHzfzlT5c yVKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc; bh=pTpdId3IVmxqh2JhhEqD5/GiPs6Zt+vTzOCLTc1GawY=; b=G6CKyYc2Ug3uT6B2hLb9Hx0tI53lzsSJ+UYeUYbMk6tDiN3NhjQgPsFYzlmAU/plty b19HSG4oPSfmvFRXbw8gmLvT7tMW+iAmrnQaUxTelcKGvuG9KRHTbUQic0CU2icrx7D0 cFaeicv0FCiG2dg1yZ+0kZa9YICtu+79t7Ak9nuIOFdWtQFZw366pivG/+dm/h2EjLNa UQfPjFL9bZ/0trrIgSd0Or30PtVJRdQLmOexs0qWHbxibhm0KB7kjwOyu8N1KhYYAIDD ZEzLTRv5VpCi/jDMffjrkI07LSh/2p7DisPraFDSeLqqavSc0L7qc/03ESr3PS6sWfQ8 u9MA== X-Gm-Message-State: AD7BkJJToQw+j5fJthyhFCyXKQVOnKXcVpl+OmfTf/3d5ds/xzbbcOc4raEDC1V15bXLrxJHa8b7dJFL0k4a0g== X-Received: by 10.107.169.32 with SMTP id s32mr6810859ioe.19.1460095987950; Thu, 07 Apr 2016 23:13:07 -0700 (PDT) In-Reply-To: <570748ec$0$1620$c3e8da3$5496439d@news.astraweb.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <99234e90-fcd4-4a05-b97f-b47228dde20c@googlegroups.com> <1459571270.714249.566352882.6ADCD0CC@webmail.messagingengine.com> <87bn5sqcac.fsf@elektro.pacujo.net> <56ffedf1$0$1611$c3e8da3$5496439d@news.astraweb.com> <87h9fkq7tl.fsf@elektro.pacujo.net> <3524319.g0I1c1cpMS@PointedEars.de> <2796705.edb3E9ArW3@PointedEars.de> <1584744.4h7ToaqLat@PointedEars.de> <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> <570748ec$0$1620$c3e8da3$5496439d@news.astraweb.com> Xref: csiph.com comp.lang.python:106651 On Fri, Apr 8, 2016 at 4:00 PM, Steven D'Aprano wrote: > Or for that matter: > > a = akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqwe9fhlcjbqvcbhsiauy37wkg() + 100 > b = 100 + akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqew9fhlcjbqvcbhsiauy37wkg() > > How easily can you tell them apart at a glance? Ouch! Can't even align them top and bottom. This is evil. > I think that, beyond normalisation, the compiler need not be too concerned > by confusables. I wouldn't *object* to the compiler raising a warning if it > detected confusable identifiers, or mixed script identifiers, but I think > that's more the job for a linter or human code review. The compiler should treat as identical anything that an editor should reasonably treat as identical. I'm not sure whether multiple combining characters on a single base character are forced into some order prior to comparison or are kept in the order they were typed, but my gut feeling is that they should be considered identical. > They are not, and never have been, in the typesetting business. Perhaps > characters are not the only things easily confused *wink* Peter is definitely a character. So are you. QUITE a character. :) > But really, why should we object? Is "pile-of-poo" any more silly than any > of the other dingbats, graphics characters, and other non-alphabetical > characters? Unicode is not just for "letters of the alphabet". It's less silly than "ZERO-WIDTH NON-BREAKING SPACE", which isn't a space at all, it's a joiner. Go figure. (History's a wonderful thing, ain't it? So's backward compatibility and a guarantee that names will never be changed.) ChrisA