Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Should stdlib files contain 'narrow non breaking space' U+202F? Date: Fri, 18 Dec 2015 17:51:32 +1100 Lines: 90 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de YNIT/9KlAGgWY9E7IFnuwA/xewQtUnZGIoduS6dy7P2Q== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'guido': 0.05; "'',": 0.07; '2005,': 0.07; 'l\xc3\xb6wis': 0.07; 'raises': 0.07; 'root,': 0.07; "subject:' ": 0.07; 'utf-8': 0.07; 'cc:addr:python-list': 0.09; 'files:': 0.09; 'non-ascii': 0.09; 'subject:files': 0.09; '\xe2\x80\x94': 0.09; 'ignore': 0.14; '"test"': 0.16; 'concatenate': 0.16; 'dirs,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'headers,': 0.16; 'inclined': 0.16; 'invisible': 0.16; 'leave.': 0.16; 'literals.': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'reedy': 0.16; 'repr(line)': 0.16; 'root:': 0.16; "skip:' 60": 0.16; 'skip:( 60': 0.16; 'subject:breaking': 0.16; 'subject:non': 0.16; 'thread.': 0.16; 'wrote:': 0.16; 'string': 0.17; 'fixed.': 0.18; 'try:': 0.18; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'martin': 0.22; '%s"': 0.22; 'ascii': 0.22; 'dec': 0.23; 'nearly': 0.23; 'import': 0.24; 'written': 0.24; 'header:In-Reply-To:1': 0.24; 'script': 0.25; 'fri,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'cases.': 0.29; "people's": 0.29; 'url:wikipedia': 0.29; '(c)': 0.29; "i'm": 0.30; 'url:wiki': 0.30; 'code': 0.30; 'guess': 0.31; 'probably': 0.31; 'especially': 0.32; 'skip:_ 10': 0.32; 'possibly': 0.32; '2006': 0.33; 'http': 0.33; 'file': 0.34; 'except': 0.34; 'received:google.com': 0.35; 'should': 0.36; 'there': 0.36; 'url:org': 0.36; 'lines': 0.36; 'received:209.85': 0.36; '(and': 0.36; '2005': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:209.85.213': 0.37; 'wanted': 0.37; 'received:209': 0.38; 'files': 0.38; 'von': 0.38; 'still': 0.40; 'space': 0.40; 'skip:u 10': 0.61; 'avoid': 0.61; 'show': 0.62; 'more': 0.63; '8bit%:95': 0.65; 'due': 0.65; 'response.': 0.66; "they're": 0.66; '8bit%:96': 0.67; '\xe2\x80\x93': 0.72; 'special': 0.73; '5:36': 0.84; 'chrisa': 0.84; 'sending,': 0.84; 'subject:space': 0.84; 'to:none': 0.91; 'comment.': 0.91; 'subject:+': 0.91; 'urls,': 0.91; 'url:ru': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=p5Ifqg8jVh0+/yTb49Bs2xWi16Dz62a4sLQnGyQL7RI=; b=KpT0j5NqXIhQi/+11I6P3sgbnXRfHCONL0mk9taK/WlA4gqnhnCgghvrIK8cAy8/sf 2fUE0qqQMyBgaBlr8nj3rAdcQ+u5AJwich9Tj7kyY4jLnuBp9aitggy7GqV06lyFitZ3 ul2Rm4U9Hip46uXgaRGSGPQJzhdOvnox1qO7noHJtddvcXsLPvuJh3h1pG/qsGdCY8na avI/GkxoqYIZCrGy5zz4mdGtNckrB8o7hE1HaeKF/TXSwOxCScWtlf523RFWSm339TEE QAYBFgGH9JbgRi3gtPNvdwgUnJp6DVoNbBKr4VXQRs36LMxJBQey2FTIqgkFIhtGyyCW CJiA== X-Received: by 10.50.70.38 with SMTP id j6mr1036357igu.13.1450421493008; Thu, 17 Dec 2015 22:51:33 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100581 On Fri, Dec 18, 2015 at 5:36 PM, Terry Reedy wrote: > Last I knew, Guido still wanted stdlib files to be all-ascii, especially > possibly in special cases. There is no good reason I can think of for the= re > to be an invisible non-ascii space in a comment. It strikes me as most > likely an accident (typo) that should be fixed. I suspect the same of mo= st > of the following. Perhaps you should file an issue (and patch?) on the > tracker. You're probably right on that one. Here's others - and the script I used to find them. import os for root, dirs, files in os.walk("."): if "test" in root: continue for fn in files: if not fn.endswith(".py"): continue if "test" in fn: continue with open(os.path.join(root,fn),"rb") as f: for l,line in enumerate(f): try: line.decode("ascii") continue # Ignore the ASCII lines except UnicodeDecodeError: line =3D line.rstrip(b"\n") try: line =3D line.decode("UTF-8") except UnicodeDecodeError: line =3D repr(line) # If it's not UTF-8 either, show it as b'...' print("%s:%d: %s" % (fn,l,line)) shlex.py:37: self.wordchars +=3D ('=C3=9F=C3=A0=C3=A1=C3=A2=C3= =A3=C3=A4=C3=A5=C3=A6=C3=A7=C3=A8=C3=A9=C3=AA=C3=AB=C3=AC=C3=AD=C3=AE=C3=AF= =C3=B0=C3=B1=C3=B2=C3=B3=C3=B4=C3=B5=C3=B6=C3=B8=C3=B9=C3=BA=C3=BB=C3=BC=C3= =BD=C3=BE=C3=BF' shlex.py:38: '=C3=80=C3=81=C3=82=C3=83=C3=84= =C3=85=C3=86=C3=87=C3=88=C3=89=C3=8A=C3=8B=C3=8C=C3=8D=C3=8E=C3=8F=C3=90=C3= =91=C3=92=C3=93=C3=94=C3=95=C3=96=C3=98=C3=99=C3=9A=C3=9B=C3=9C=C3=9D=C3=9E= ') functools.py:7: # and =C5=81ukasz Langa . heapq.py:34: [explanation by Fran=C3=A7ois Pinard] getopt.py:21: # Peter =C3=85strand added gnu_getop= t(). sre_compile.py:26: (0x69, 0x131), # i=C4=B1 sre_compile.py:28: (0x73, 0x17f), # s=C5=BF sre_compile.py:30: (0xb5, 0x3bc), # =C2=B5=CE=BC sre_compile.py:32: (0x345, 0x3b9, 0x1fbe), # \u0345=CE=B9=E1=BE=BE sre_compile.py:34: (0x390, 0x1fd3), # =CE=90=E1=BF=93 sre_compile.py:36: (0x3b0, 0x1fe3), # =CE=B0=E1=BF=A3 sre_compile.py:38: (0x3b2, 0x3d0), # =CE=B2=CF=90 sre_compile.py:40: (0x3b5, 0x3f5), # =CE=B5=CF=B5 sre_compile.py:42: (0x3b8, 0x3d1), # =CE=B8=CF=91 sre_compile.py:44: (0x3ba, 0x3f0), # =CE=BA=CF=B0 sre_compile.py:46: (0x3c0, 0x3d6), # =CF=80=CF=96 sre_compile.py:48: (0x3c1, 0x3f1), # =CF=81=CF=B1 sre_compile.py:50: (0x3c2, 0x3c3), # =CF=82=CF=83 sre_compile.py:52: (0x3c6, 0x3d5), # =CF=86=CF=95 sre_compile.py:54: (0x1e61, 0x1e9b), # =E1=B9=A1=E1=BA=9B sre_compile.py:56: (0xfb05, 0xfb06), # =EF=AC=85=EF=AC=86 punycode.py:2: Written by Martin v. L=C3=B6wis. koi8_t.py:2: # http://ru.wikipedia.org/wiki/=D0=9A=D0=9E=D0=98-8 __init__.py:0: # Copyright (C) 2005 Martin v. L=C3=B6wis client.py:737: a Date representing the file=E2=80=99s last-modified= time, a client.py:739: containing a guess at the file=E2=80=99s type. See a= lso the bdist_msi.py:0: # Copyright (C) 2005, 2006 Martin von L=C3=B6wis connection.py:399: # Issue #=E2=80=AF20540: concatenate before sending, to avoid delays due message.py:531: filename=3D('utf-8', '', Fu=C3=9Fbal= ler.ppt')) message.py:533: filename=3D'Fu=C3=9Fballer.ppt')) request.py:181: * geturl() =E2=80=94 return the URL of the resource retrieved, commonly used to request.py:184: * info() =E2=80=94 return the meta-information of the page, such as headers, in the request.py:188: * getcode() =E2=80=93 return the HTTP status code of th= e response. Raises URLError dbapi2.py:2: # Copyright (C) 2004-2005 Gerhard H=C3=A4ring __init__.py:2: # Copyright (C) 2005 Gerhard H=C3=A4ring They're nearly all comments. A few string literals. I would be inclined to ASCIIfy the apostrophes, dashes, and the connection.py space that started this thread. People's names, URLs, and demonstrative characters I'm more inclined to leave. Agreed? ChrisA