Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder7.xlned.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.024 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'python,': 0.02; 'encoding': 0.05; '(of': 0.07; 'utf-8': 0.07; '(instead': 0.09; 'etc).': 0.09; 'luckiness': 0.09; 'used.': 0.09; 'windows,': 0.09; 'cc:addr:python-list': 0.11; '(and,': 0.16; 'confuse': 0.16; 'eleven': 0.16; 'fancy': 0.16; 'lower-case': 0.16; 'python),': 0.16; 'sign,': 0.16; 'subject:unicode': 0.16; 'symbols': 0.16; 'though)': 0.16; 'unicode.': 0.16; 'weird': 0.16; 'wrote:': 0.18; '(not': 0.18; 'bit': 0.19; 'typing': 0.19; 'separate': 0.22; 'cc:addr:python.org': 0.22; 'skip:` 20': 0.24; 'tend': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'character': 0.29; 'quickly': 0.29; '(like': 0.30; 'characters': 0.30; 'especially': 0.30; 'message-id:@mail.gmail.com': 0.30; '(which': 0.31; '(although': 0.31; '25,': 0.31; 'assumes': 0.31; 'font': 0.31; 'sense': 0.34; 'something': 0.35; 'good.': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'subject:?': 0.36; 'should': 0.36; 'too': 0.37; 'being': 0.38; 'same.': 0.38; 'pm,': 0.38; 'that,': 0.38; 'does': 0.39; 'enough': 0.39; 'space': 0.40; 'letters': 0.60; 'new': 0.61; 'from:charset:utf-8': 0.61; 'first': 0.61; 'here:': 0.62; 'such': 0.63; 'love': 0.65; 'skip:\xe2 10': 0.65; 'spot': 0.65; 'mar': 0.68; 'capital': 0.73; '8bit%:43': 0.74; 'intelligent': 0.74; 'upper': 0.74; 'different.': 0.84; 'distinguish': 0.84; 'good!': 0.84; 'greek': 0.84; 'oh:': 0.84; 'quicker': 0.84; 'valid,': 0.84; 'worthless': 0.84; 'don\xe2\x80\x99t': 0.91; 'neue': 0.91; 'you\xe2\x80\x99ll': 0.91; 'lucky': 0.93; 'url:tk': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=gRAGiCVNfL8qfbVpw1PTs+jIB1jVeZR8DTXdZUlFWik=; b=uTFRCP6fPbYsVEi1FVEfxbh8nfFdbJCzCjVY2QoLsNxKluZH4S12qn58sZXfIXkHM9 nIcW/I0KibRvvZ3vBL0vAg7KaPg+g/zkJfK2ukdPNhJ9sXexygsQirXD+DjF8OhwRJrg tWkAXQVlBbIAOgpwj3huMsmj3WnUSZzfOVo9m7amlThnhjlUpsAENog2FowJo9N2QqWz HvObai1T2cjXBqaU8gCc7FUaIA6f9ZIAtUmUr42cgng0BmDiLk6HLDMaVTVobGn7XZ2z 24xC7IbAYT+FFo+XY8691xuyuBLBaXlTmH8OmbWvWK+W6fC974HUtiTpcEV5vVl+fdSv qfNg== MIME-Version: 1.0 X-Received: by 10.50.66.143 with SMTP id f15mr19295317igt.18.1395768250209; Tue, 25 Mar 2014 10:24:10 -0700 (PDT) In-Reply-To: <87ior2zosv.fsf@elektro.pacujo.net> References: <9daf0806-02de-4447-964c-c8f8953c23e5@googlegroups.com> <532d5bd9$0$29994$c3e8da3$5496439d@news.astraweb.com> <0b78649a-16b3-4410-8258-e859578d62be@googlegroups.com> <281c8ce1-4f03-4e93-b5cd-d45b85e89e7e@googlegroups.com> <87ior2zosv.fsf@elektro.pacujo.net> Date: Tue, 25 Mar 2014 18:24:10 +0100 Subject: Re: Time we switched to unicode? From: =?UTF-8?B?Q2hyaXMg4oCcS3dwb2xza2HigJ0gV2Fycmljaw==?= To: Marko Rauhamaa Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 69 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1395768259 news.xs4all.nl 2929 [2001:888:2000:d::a6]:41933 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:69044 On Tue, Mar 25, 2014 at 9:05 AM, Marko Rauhamaa wrote: > Chris Angelico : > >> On Tue, Mar 25, 2014 at 4:14 PM, Mark H Harris w= rote: >>>>>> =CE=A0=C2=B9 =3D pi >> >> That's good! (Although typing =CE=A0=C2=B9 quicker than pi is majorly pu= shing it. > > It don't think that's good. The lower-case letter =CF=80=C2=B2 should be = used. The > upper-case letter is used for a product, although unicode dedicates a > separate character for the purpose: =E2=88=8F=C2=B3. > > I often see Americans, especially, confuse upper and lower-case letters > in symbols ("KM" for "km", "L" for "l" etc). =E2=80=9CL=E2=80=9D is actually valid, and so is =E2=80=9Cl=E2=80=9D. This= happens mainly because humans (and computers) tend to write =E2=80=9C1 l=E2=80=9D (one liter, one-= ell) in a way that makes it harder to distinguish (becoming eleven or ell-ell), especially if you don=E2=80=99t include the space (which is invalid). On Tue, Mar 25, 2014 at 9:23 AM, Chris Angelico wrote: > If you can type a capital =E2=88=8F=C2=B3, you can type a lower-case =CF= =80=C2=B2, unless there's something very weird going on. Nitpick time! (because we all love it so much!) =CE=A0=C2=B9 =3D U+03A0 GREEK CAPITAL LETTER PI =CF=80=C2=B2 =3D U+03C0 GREEK SMALL LETTER PI =E2=88=8F=C2=B3 =3D U+220F N-ARY PRODUCT =E2=80=9CIf you can type an N-ARY PRODUCT, you can type a GREEK SMALL LETTE= R PI, unless there=E2=80=99s something very weird going on.=E2=80=9D =E2=80=A6like, the user is in the past and is using ISO 8859-7 (instead of = a 21st-century encoding, like UTF-8). An encoding which has support for =CE=A0=C2=B9 and =CF=80=C2=B2, but not for =E2=88=8F=C2=B3=E2=80=A6 (of cou= rse, this assumes that, if we add those new characters into python, we allow any encoding, somehow.) That=E2=80=99s not too weird, other than the ancient encoding being used. (though that=E2=80=99s a bit less weird on Windows, but that=E2=80=99d be Windows-1253.) Oh: and speaking of fancy Unicode characters that are worthless ~duplicates, spot the difference here: =C2=B5 =CE=BC If you are lucky enough (and, luckiness may involve reading this e-mail in Helvetica (not Neue though) on a Mac), you can clearly see that they are different. If you are using a font that does not differentiate them, you may think they=E2=80=99re the same. If you ask som= e intelligent software (like `unicodedata.name()` in Python), you=E2=80=99ll quickly find out the first is MICRO SIGN, and the other is GREEK SMALL LETTER MU. Such craziness is what makes Unicode Unicode. --=20 Chris =E2=80=9CKwpolska=E2=80=9D Warrick PGP: 5EAAEA16 stop html mail | always bottom-post | only UTF-8 makes sense