Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Serhiy Storchaka Newsgroups: comp.lang.python Subject: Re: Should stdlib files contain 'narrow non breaking space' U+202F? Date: Fri, 18 Dec 2015 09:12:50 +0200 Lines: 81 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de Cp/uH8SLxDWiN9UUQr1AKAtkIguNDXjpbX6Pd4QTLdmA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'guido': 0.05; "'',": 0.07; '2005,': 0.07; 'bug.': 0.07; 'l\xc3\xb6wis': 0.07; 'raises': 0.07; 'root,': 0.07; "subject:' ": 0.07; 'utf-8': 0.07; 'files:': 0.09; 'non-ascii': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:files': 0.09; '\xe2\x80\x94': 0.09; 'ignore': 0.14; '"test"': 0.16; 'apostrophes': 0.16; 'concatenate': 0.16; 'dirs,': 0.16; 'headers,': 0.16; 'inclined': 0.16; 'invisible': 0.16; 'leave.': 0.16; 'literals.': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'reedy': 0.16; 'repr(line)': 0.16; 'root:': 0.16; "skip:' 60": 0.16; 'skip:( 60': 0.16; 'subject:breaking': 0.16; 'subject:non': 0.16; 'thread.': 0.16; 'wrote:': 0.16; 'string': 0.17; 'fixed.': 0.18; 'try:': 0.18; '2015': 0.20; 'issue.': 0.20; 'martin': 0.22; '%s"': 0.22; 'ascii': 0.22; 'dec': 0.23; 'nearly': 0.23; 'import': 0.24; 'written': 0.24; 'header:In-Reply-To:1': 0.24; 'script': 0.25; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'chris': 0.26; 'fri,': 0.27; 'agreed.': 0.29; 'cases.': 0.29; 'docstrings': 0.29; "people's": 0.29; 'url:wikipedia': 0.29; '(c)': 0.29; "i'm": 0.30; 'url:wiki': 0.30; 'code': 0.30; 'guess': 0.31; 'probably': 0.31; 'especially': 0.32; 'skip:_ 10': 0.32; 'possibly': 0.32; '2006': 0.33; 'http': 0.33; 'open': 0.33; 'file': 0.34; 'except': 0.34; 'should': 0.36; 'there': 0.36; 'url:org': 0.36; 'lines': 0.36; '(and': 0.36; '2005': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'wanted': 0.37; 'files': 0.38; 'von': 0.38; 'to:addr:python.org': 0.40; 'still': 0.40; 'space': 0.40; 'skip:u 10': 0.61; 'avoid': 0.61; 'show': 0.62; 'more': 0.63; '8bit%:95': 0.65; 'due': 0.65; 'response.': 0.66; "they're": 0.66; '8bit%:96': 0.67; '\xe2\x80\x93': 0.72; 'special': 0.73; '5:36': 0.84; 'sending,': 0.84; 'subject:space': 0.84; 'comment.': 0.91; 'subject:+': 0.91; 'urls,': 0.91; 'url:ru': 0.98 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: 193.202.118.164 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100582 On 18.12.15 08:51, Chris Angelico wrote: > On Fri, Dec 18, 2015 at 5:36 PM, Terry Reedy wrote: >> Last I knew, Guido still wanted stdlib files to be all-ascii, especially >> possibly in special cases. There is no good reason I can think of for there >> to be an invisible non-ascii space in a comment. It strikes me as most >> likely an accident (typo) that should be fixed. I suspect the same of most >> of the following. Perhaps you should file an issue (and patch?) on the >> tracker. > > You're probably right on that one. Here's others - and the script I > used to find them. > > import os > for root, dirs, files in os.walk("."): > if "test" in root: continue > for fn in files: > if not fn.endswith(".py"): continue > if "test" in fn: continue > with open(os.path.join(root,fn),"rb") as f: > for l,line in enumerate(f): > try: > line.decode("ascii") > continue # Ignore the ASCII lines > except UnicodeDecodeError: > line = line.rstrip(b"\n") > try: line = line.decode("UTF-8") > except UnicodeDecodeError: line = repr(line) # If > it's not UTF-8 either, show it as b'...' > print("%s:%d: %s" % (fn,l,line)) > > > shlex.py:37: self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ' > shlex.py:38: 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ') > functools.py:7: # and Łukasz Langa . > heapq.py:34: [explanation by François Pinard] > getopt.py:21: # Peter Åstrand added gnu_getopt(). > sre_compile.py:26: (0x69, 0x131), # iı > sre_compile.py:28: (0x73, 0x17f), # sſ > sre_compile.py:30: (0xb5, 0x3bc), # µμ > sre_compile.py:32: (0x345, 0x3b9, 0x1fbe), # \u0345ιι > sre_compile.py:34: (0x390, 0x1fd3), # ΐΐ > sre_compile.py:36: (0x3b0, 0x1fe3), # ΰΰ > sre_compile.py:38: (0x3b2, 0x3d0), # βϐ > sre_compile.py:40: (0x3b5, 0x3f5), # εϵ > sre_compile.py:42: (0x3b8, 0x3d1), # θϑ > sre_compile.py:44: (0x3ba, 0x3f0), # κϰ > sre_compile.py:46: (0x3c0, 0x3d6), # πϖ > sre_compile.py:48: (0x3c1, 0x3f1), # ρϱ > sre_compile.py:50: (0x3c2, 0x3c3), # ςσ > sre_compile.py:52: (0x3c6, 0x3d5), # φϕ > sre_compile.py:54: (0x1e61, 0x1e9b), # ṡẛ > sre_compile.py:56: (0xfb05, 0xfb06), # ſtst > punycode.py:2: Written by Martin v. Löwis. > koi8_t.py:2: # http://ru.wikipedia.org/wiki/КОИ-8 > __init__.py:0: # Copyright (C) 2005 Martin v. Löwis > client.py:737: a Date representing the file’s last-modified time, a > client.py:739: containing a guess at the file’s type. See also the > bdist_msi.py:0: # Copyright (C) 2005, 2006 Martin von Löwis > connection.py:399: # Issue # 20540: concatenate before > sending, to avoid delays due > message.py:531: filename=('utf-8', '', Fußballer.ppt')) > message.py:533: filename='Fußballer.ppt')) > request.py:181: * geturl() — return the URL of the resource > retrieved, commonly used to > request.py:184: * info() — return the meta-information of the > page, such as headers, in the > request.py:188: * getcode() – return the HTTP status code of the > response. Raises URLError > dbapi2.py:2: # Copyright (C) 2004-2005 Gerhard Häring > __init__.py:2: # Copyright (C) 2005 Gerhard Häring > > They're nearly all comments. A few string literals. > > I would be inclined to ASCIIfy the apostrophes, dashes, and the > connection.py space that started this thread. People's names, URLs, > and demonstrative characters I'm more inclined to leave. Agreed? Agreed. Please open an issue. Using non-ASCII apostrophes and like in docstrings may be considered a bug.