Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Chris Angelico <rosuav@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Should stdlib files contain 'narrow non breaking space' U+202F?
Date: Fri, 18 Dec 2015 17:51:32 +1100
Lines: 90
Message-ID: <mailman.45.1450421501.30845.python-list@python.org>
References: <n4vf3d$rer$1@ger.gmane.org> <CAPTjJmoa5dBiDJ5zDu4jEy4MjOm6vp=J3zLqJw7ZfEUf9da=DQ@mail.gmail.com> <n509h0$p6h$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <n509h0$p6h$1@ger.gmane.org>
Precedence: list
Xref: csiph.com comp.lang.python:100581

On Fri, Dec 18, 2015 at 5:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
> Last I knew, Guido still wanted stdlib files to be all-ascii, especially
> possibly in special cases. There is no good reason I can think of for the=
re
> to be an invisible non-ascii space in a comment.  It strikes me as most
> likely an accident (typo) that should be fixed.  I suspect the same of mo=
st
> of the following.  Perhaps you should file an issue (and patch?) on the
> tracker.

You're probably right on that one. Here's others - and the script I
used to find them.

import os
for root, dirs, files in os.walk("."):
    if "test" in root: continue
    for fn in files:
        if not fn.endswith(".py"): continue
        if "test" in fn: continue
        with open(os.path.join(root,fn),"rb") as f:
            for l,line in enumerate(f):
                try:
                    line.decode("ascii")
                    continue # Ignore the ASCII lines
                except UnicodeDecodeError:
                    line =3D line.rstrip(b"\n")
                    try: line =3D line.decode("UTF-8")
                    except UnicodeDecodeError: line =3D repr(line) # If
it's not UTF-8 either, show it as b'...'
                    print("%s:%d: %s" % (fn,l,line))


shlex.py:37:             self.wordchars +=3D ('=C3=9F=C3=A0=C3=A1=C3=A2=C3=
=A3=C3=A4=C3=A5=C3=A6=C3=A7=C3=A8=C3=A9=C3=AA=C3=AB=C3=AC=C3=AD=C3=AE=C3=AF=
=C3=B0=C3=B1=C3=B2=C3=B3=C3=B4=C3=B5=C3=B6=C3=B8=C3=B9=C3=BA=C3=BB=C3=BC=C3=
=BD=C3=BE=C3=BF'
shlex.py:38:                                '=C3=80=C3=81=C3=82=C3=83=C3=84=
=C3=85=C3=86=C3=87=C3=88=C3=89=C3=8A=C3=8B=C3=8C=C3=8D=C3=8E=C3=8F=C3=90=C3=
=91=C3=92=C3=93=C3=94=C3=95=C3=96=C3=98=C3=99=C3=9A=C3=9B=C3=9C=C3=9D=C3=9E=
')
functools.py:7: # and =C5=81ukasz Langa <lukasz at langa.pl>.
heapq.py:34: [explanation by Fran=C3=A7ois Pinard]
getopt.py:21: # Peter =C3=85strand <astrand@lysator.liu.se> added gnu_getop=
t().
sre_compile.py:26:     (0x69, 0x131), # i=C4=B1
sre_compile.py:28:     (0x73, 0x17f), # s=C5=BF
sre_compile.py:30:     (0xb5, 0x3bc), # =C2=B5=CE=BC
sre_compile.py:32:     (0x345, 0x3b9, 0x1fbe), # \u0345=CE=B9=E1=BE=BE
sre_compile.py:34:     (0x390, 0x1fd3), # =CE=90=E1=BF=93
sre_compile.py:36:     (0x3b0, 0x1fe3), # =CE=B0=E1=BF=A3
sre_compile.py:38:     (0x3b2, 0x3d0), # =CE=B2=CF=90
sre_compile.py:40:     (0x3b5, 0x3f5), # =CE=B5=CF=B5
sre_compile.py:42:     (0x3b8, 0x3d1), # =CE=B8=CF=91
sre_compile.py:44:     (0x3ba, 0x3f0), # =CE=BA=CF=B0
sre_compile.py:46:     (0x3c0, 0x3d6), # =CF=80=CF=96
sre_compile.py:48:     (0x3c1, 0x3f1), # =CF=81=CF=B1
sre_compile.py:50:     (0x3c2, 0x3c3), # =CF=82=CF=83
sre_compile.py:52:     (0x3c6, 0x3d5), # =CF=86=CF=95
sre_compile.py:54:     (0x1e61, 0x1e9b), # =E1=B9=A1=E1=BA=9B
sre_compile.py:56:     (0xfb05, 0xfb06), # =EF=AC=85=EF=AC=86
punycode.py:2: Written by Martin v. L=C3=B6wis.
koi8_t.py:2: # http://ru.wikipedia.org/wiki/=D0=9A=D0=9E=D0=98-8
__init__.py:0: # Copyright (C) 2005 Martin v. L=C3=B6wis
client.py:737:         a Date representing the file=E2=80=99s last-modified=
 time, a
client.py:739:         containing a guess at the file=E2=80=99s type. See a=
lso the
bdist_msi.py:0: # Copyright (C) 2005, 2006 Martin von L=C3=B6wis
connection.py:399:             # Issue #=E2=80=AF20540: concatenate before
sending, to avoid delays due
message.py:531:                        filename=3D('utf-8', '', Fu=C3=9Fbal=
ler.ppt'))
message.py:533:                        filename=3D'Fu=C3=9Fballer.ppt'))
request.py:181:     * geturl() =E2=80=94 return the URL of the resource
retrieved, commonly used to
request.py:184:     * info() =E2=80=94 return the meta-information of the
page, such as headers, in the
request.py:188:     * getcode() =E2=80=93 return the HTTP status code of th=
e
response.  Raises URLError
dbapi2.py:2: # Copyright (C) 2004-2005 Gerhard H=C3=A4ring <gh@ghaering.de>
__init__.py:2: # Copyright (C) 2005 Gerhard H=C3=A4ring <gh@ghaering.de>

They're nearly all comments. A few string literals.

I would be inclined to ASCIIfy the apostrophes, dashes, and the
connection.py space that started this thread. People's names, URLs,
and demonstrative characters I'm more inclined to leave. Agreed?

ChrisA