Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.021 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'context': 0.07; 'contexts': 0.09; 'conventions.': 0.09; 'exception.': 0.09; 'worse': 0.09; 'cc:addr:python-list': 0.11; 'times,': 0.14; 'alphabet': 0.16; 'ascii,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'inability': 0.16; 'm\xc3\xbcller': 0.16; 'non-ascii': 0.16; 'readable': 0.16; 'subject:python': 0.16; 'wrote:': 0.18; 'rules': 0.22; 'cc:addr:python.org': 0.22; 'unicode': 0.24; 'cc:2**0': 0.24; 'sort': 0.25; 'compare': 0.26; 'somewhere': 0.26; 'header:In- Reply-To:1': 0.27; 'patch': 0.29; "doesn't": 0.30; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'that.': 0.31; 'something': 0.35; 'german': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'scheme': 0.36; 'words,': 0.36; 'done': 0.36; 'useful': 0.36; 'similar': 0.36; 'should': 0.36; 'too': 0.37; 'handle': 0.38; 'issue': 0.38; 'pm,': 0.38; 'rather': 0.38; 'according': 0.40; 'even': 0.60; 'hope': 0.61; 'full': 0.61; 'become': 0.64; 'more': 0.64; 'different': 0.65; 'url:blogspot': 0.65; 'talking': 0.65; 'default': 0.69; 'url:2013': 0.84; 'absolutely': 0.87; 'drops': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=tRS3sIfVolBKzdN1W10JKRd1dozjIIht3lQOOd//+mE=; b=BrFmvyKv2Wsi289Mo0RTdvQBrSYClC48F72hvZ/1gBzH93OjYcpjIiVt3/aXX6ZhjF nEFtkzt6iQDqOcrLa2nN8fFYJ9Nesk74FxC97v9PJVB5LNQmv2Hykg81No2Dv7JfQo57 OlS8g+0EaDtgi1HorKo5m2tJqWZ8V0Ap+BbL5gOTj8PM6XbmSRMa7OP0pXWnkODCXq+f vYNjRRlSevHSGNAPru6I6V4YU0TvfSsx4AybZjUbz36BZWu8M0YRU2y3iX24iWInVFNe HULZyjkA6ET6/7hV2MPRkY4gJzJi3qhaF4YYzM+Ow3LyPDpTUoi44lQAOjJXnDV+UWJL eO3w== MIME-Version: 1.0 X-Received: by 10.58.116.1 with SMTP id js1mr92221veb.29.1401611469624; Sun, 01 Jun 2014 01:31:09 -0700 (PDT) In-Reply-To: <87lhth9i38.fsf@elektro.pacujo.net> References: <8761kozg98.fsf@rudin.co.uk> <7deeb0f8-f4f4-4661-b0bb-f9481b6ef35a@googlegroups.com> <878upjwewt.fsf@elektro.pacujo.net> <7713af25-2ed4-49b3-9457-b352f9486c78@googlegroups.com> <87lhth9i38.fsf@elektro.pacujo.net> Date: Sun, 1 Jun 2014 18:31:09 +1000 Subject: Re: IDE for python From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 34 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1401611476 news.xs4all.nl 2956 [2001:888:2000:d::a6]:50637 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72366 On Sun, Jun 1, 2014 at 5:58 PM, Marko Rauhamaa wrote: > As a Finnish-speaker, I hope that patch doesn't become default behavior. > Too many times, we have been victimized by the German conventions. A > Finnish-speaker would much rather see > > J=C3=A4rvenp=C3=A4=C3=A4 =3D> Jarvenpaa > =C3=96ll=C3=B6l=C3=A4 =3D> Ollola > K=C3=A4rkk=C3=A4inen =3D> Karkkainen > > than > > J=C3=A4rvenp=C3=A4=C3=A4 =3D> Jaervenpaeae > =C3=96ll=C3=B6l=C3=A4 =3D> Oelloelae > K=C3=A4rkk=C3=A4inen =3D> Kaerkkaeinen It's even worse than that. The rules for ASCIIfying adorned characters vary according to context - M=C3=BCller and Mueller are different names, and in many contexts should sort and compare differently, and I remember reading somewhere that there's a context in which it's more useful to decompose =C3=BC to u rather than ue. There is no "safe" lossy transformation that can be done to any language's words, and this is no exception. ASCIIfication has to be accepted as flawed; this issue (an inability to handle non-ASCII labels) is similar to a lot of blog URLs - http://rosuav.blogspot.com/2013/08/20th-international-g-festival-awa= rds.html is talking about the "International G&S Festival" awards, but the URL drops the "&S" part. (If you absolutely have to transmit something losslessly in pure ASCII, you need a scheme like Punycode, which is a lot less clean and readable than a decomposition scheme.) Of course, the better solution is to permit the full Unicode alphabet in identifiers... ChrisA