Groups > comp.lang.python > #100581 > unrolled thread

Re: Should stdlib files contain 'narrow non breaking space' U+202F?

Started by	Chris Angelico <rosuav@gmail.com>
First post	2015-12-18 17:51 +1100
Last post	2015-12-19 00:52 -0800
Articles	7 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Should stdlib files contain 'narrow non breaking space' U+202F? Chris Angelico <rosuav@gmail.com> - 2015-12-18 17:51 +1100
    Re: Should stdlib files contain 'narrow non breaking space' U+202F? Steven D'Aprano <steve@pearwood.info> - 2015-12-18 20:49 +1100
      Re: Should stdlib files contain 'narrow non breaking space' U+202F? wxjmfauth@gmail.com - 2015-12-18 07:55 -0800
      Re: Should stdlib files contain 'narrow non breaking space' U+202F? Terry Reedy <tjreedy@udel.edu> - 2015-12-18 16:56 -0500
        Re: Should stdlib files contain 'narrow non breaking space' U+202F? Marko Rauhamaa <marko@pacujo.net> - 2015-12-19 00:16 +0200
          Re: Should stdlib files contain 'narrow non breaking space' U+202F? Chris Angelico <rosuav@gmail.com> - 2015-12-19 10:58 +1100
        Re: Should stdlib files contain 'narrow non breaking space' U+202F? wxjmfauth@gmail.com - 2015-12-19 00:52 -0800

#100581 — Re: Should stdlib files contain 'narrow non breaking space' U+202F?

From	Chris Angelico <rosuav@gmail.com>
Date	2015-12-18 17:51 +1100
Subject	Re: Should stdlib files contain 'narrow non breaking space' U+202F?
Message-ID	<mailman.45.1450421501.30845.python-list@python.org>

On Fri, Dec 18, 2015 at 5:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
> Last I knew, Guido still wanted stdlib files to be all-ascii, especially
> possibly in special cases. There is no good reason I can think of for there
> to be an invisible non-ascii space in a comment.  It strikes me as most
> likely an accident (typo) that should be fixed.  I suspect the same of most
> of the following.  Perhaps you should file an issue (and patch?) on the
> tracker.

You're probably right on that one. Here's others - and the script I
used to find them.

import os
for root, dirs, files in os.walk("."):
    if "test" in root: continue
    for fn in files:
        if not fn.endswith(".py"): continue
        if "test" in fn: continue
        with open(os.path.join(root,fn),"rb") as f:
            for l,line in enumerate(f):
                try:
                    line.decode("ascii")
                    continue # Ignore the ASCII lines
                except UnicodeDecodeError:
                    line = line.rstrip(b"\n")
                    try: line = line.decode("UTF-8")
                    except UnicodeDecodeError: line = repr(line) # If
it's not UTF-8 either, show it as b'...'
                    print("%s:%d: %s" % (fn,l,line))


shlex.py:37:             self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'
shlex.py:38:                                'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')
functools.py:7: # and Łukasz Langa <lukasz at langa.pl>.
heapq.py:34: [explanation by François Pinard]
getopt.py:21: # Peter Åstrand <astrand@lysator.liu.se> added gnu_getopt().
sre_compile.py:26:     (0x69, 0x131), # iı
sre_compile.py:28:     (0x73, 0x17f), # sſ
sre_compile.py:30:     (0xb5, 0x3bc), # µμ
sre_compile.py:32:     (0x345, 0x3b9, 0x1fbe), # \u0345ιι
sre_compile.py:34:     (0x390, 0x1fd3), # ΐΐ
sre_compile.py:36:     (0x3b0, 0x1fe3), # ΰΰ
sre_compile.py:38:     (0x3b2, 0x3d0), # βϐ
sre_compile.py:40:     (0x3b5, 0x3f5), # εϵ
sre_compile.py:42:     (0x3b8, 0x3d1), # θϑ
sre_compile.py:44:     (0x3ba, 0x3f0), # κϰ
sre_compile.py:46:     (0x3c0, 0x3d6), # πϖ
sre_compile.py:48:     (0x3c1, 0x3f1), # ρϱ
sre_compile.py:50:     (0x3c2, 0x3c3), # ςσ
sre_compile.py:52:     (0x3c6, 0x3d5), # φϕ
sre_compile.py:54:     (0x1e61, 0x1e9b), # ṡẛ
sre_compile.py:56:     (0xfb05, 0xfb06), # ﬅﬆ
punycode.py:2: Written by Martin v. Löwis.
koi8_t.py:2: # http://ru.wikipedia.org/wiki/КОИ-8
__init__.py:0: # Copyright (C) 2005 Martin v. Löwis
client.py:737:         a Date representing the file’s last-modified time, a
client.py:739:         containing a guess at the file’s type. See also the
bdist_msi.py:0: # Copyright (C) 2005, 2006 Martin von Löwis
connection.py:399:             # Issue # 20540: concatenate before
sending, to avoid delays due
message.py:531:                        filename=('utf-8', '', Fußballer.ppt'))
message.py:533:                        filename='Fußballer.ppt'))
request.py:181:     * geturl() — return the URL of the resource
retrieved, commonly used to
request.py:184:     * info() — return the meta-information of the
page, such as headers, in the
request.py:188:     * getcode() – return the HTTP status code of the
response.  Raises URLError
dbapi2.py:2: # Copyright (C) 2004-2005 Gerhard Häring <gh@ghaering.de>
__init__.py:2: # Copyright (C) 2005 Gerhard Häring <gh@ghaering.de>

They're nearly all comments. A few string literals.

I would be inclined to ASCIIfy the apostrophes, dashes, and the
connection.py space that started this thread. People's names, URLs,
and demonstrative characters I'm more inclined to leave. Agreed?

ChrisA

[toc] | [next] | [standalone]

#100592

From	Steven D'Aprano <steve@pearwood.info>
Date	2015-12-18 20:49 +1100
Message-ID	<5673d6ac$0$1612$c3e8da3$5496439d@news.astraweb.com>
In reply to	#100581

On Fri, 18 Dec 2015 05:51 pm, Chris Angelico wrote:

> I would be inclined to ASCIIfy the apostrophes, dashes, and the
> connection.py space that started this thread. People's names, URLs,
> and demonstrative characters I'm more inclined to leave. Agreed?


No.



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#100602

From	wxjmfauth@gmail.com
Date	2015-12-18 07:55 -0800
Message-ID	<bc36cf67-83c7-462b-aefa-2b9db711c523@googlegroups.com>
In reply to	#100592

Le vendredi 18 décembre 2015 10:49:45 UTC+1, Steven D'Aprano a écrit :
> On Fri, 18 Dec 2015 05:51 pm, Chris Angelico wrote:
> 
> > I would be inclined to ASCIIfy the apostrophes, dashes, and the
> > connection.py space that started this thread. People's names, URLs,
> > and demonstrative characters I'm more inclined to leave. Agreed?
> 
> 
> No.
> 

You know,

Narrow-minded ascii users/code developers are narrow-
minded ascii users/code developers and they will always
be and stay narrow-minded ascii users/code developers.

There is a very simple way to check it: There are
not even able to make their product, eg Python 3.5.1,
crash with an "é", U+00E9.

As a Unicode lover (and a little bit more), this is what
is very interesting with this language.

jmf

[toc] | [prev] | [next] | [standalone]

#100605

From	Terry Reedy <tjreedy@udel.edu>
Date	2015-12-18 16:56 -0500
Message-ID	<mailman.57.1450475780.30845.python-list@python.org>
In reply to	#100592

On 12/18/2015 4:49 AM, Steven D'Aprano wrote:
> On Fri, 18 Dec 2015 05:51 pm, Chris Angelico wrote:
>
>> I would be inclined to ASCIIfy the apostrophes, dashes, and the
>> connection.py space that started this thread. People's names, URLs,
>> and demonstrative characters I'm more inclined to leave. Agreed?
>
> No.

No in the sense of a blanket rule.  But in at least some cases, yes.  In 
idlelib/README.txt, ' somehow got changed to the a latin-1 encoded 
slanted apostrophe (by Notepad++ I think) when I edited the file.  Since 
IDLE *assumes* that the file is ascii-only and does not specify an 
encoding, display failed on Serhiy's non-Windows system.  Issue 25905. 
I changed it back. Other accidents should be fixed.

Guido also wants syntax chars and identifiers in stdlib code kept to 
ascii only for universal readability.  Maybe that will change someday.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#100606

From	Marko Rauhamaa <marko@pacujo.net>
Date	2015-12-19 00:16 +0200
Message-ID	<8760zvjtbm.fsf@elektro.pacujo.net>
In reply to	#100605

Terry Reedy <tjreedy@udel.edu>:

> Guido also wants syntax chars and identifiers in stdlib code kept to
> ascii only for universal readability.

Readability, or writability? Most people would have no idea how to
produce the characters with their keyboards.


Marko

[toc] | [prev] | [next] | [standalone]

#100608

From	Chris Angelico <rosuav@gmail.com>
Date	2015-12-19 10:58 +1100
Message-ID	<mailman.58.1450483121.30845.python-list@python.org>
In reply to	#100606

On Sat, Dec 19, 2015 at 9:16 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Terry Reedy <tjreedy@udel.edu>:
>
>> Guido also wants syntax chars and identifiers in stdlib code kept to
>> ascii only for universal readability.
>
> Readability, or writability? Most people would have no idea how to
> produce the characters with their keyboards.

Most of the non-ASCII in the CPython source is people's names, which
will simply be copied and pasted from somewhere.

ChrisA

[toc] | [prev] | [next] | [standalone]

#100611

From	wxjmfauth@gmail.com
Date	2015-12-19 00:52 -0800
Message-ID	<70cab636-0323-4693-a6d4-882a0de53950@googlegroups.com>
In reply to	#100605

Le vendredi 18 décembre 2015 22:56:44 UTC+1, Terry Reedy a écrit :
> On 12/18/2015 4:49 AM, Steven D'Aprano wrote:
> > On Fri, 18 Dec 2015 05:51 pm, Chris Angelico wrote:
> >
> >> I would be inclined to ASCIIfy the apostrophes, dashes, and the
> >> connection.py space that started this thread. People's names, URLs,
> >> and demonstrative characters I'm more inclined to leave. Agreed?
> >
> > No.
> 
> No in the sense of a blanket rule.  But in at least some cases, yes.  In 
> idlelib/README.txt, ' somehow got changed to the a latin-1 encoded 
> slanted apostrophe (by Notepad++ I think) when I edited the file.  Since 
> IDLE *assumes* that the file is ascii-only and does not specify an 
> encoding, display failed on Serhiy's non-Windows system.  Issue 25905. 
> I changed it back. Other accidents should be fixed.
> 
> Guido also wants syntax chars and identifiers in stdlib code kept to 
> ascii only for universal readability.  Maybe that will change someday.
> 
> -- 
> Terry Jan Reedy

A lot of hypocrisy.

If you are handling Unicode properly and correctly,
there are no problems.

This is precisely the purpose of Unicode to solve
these problems and it works.

It's not a suprise, you are not able to make "your" IDLE
crash.

The good news for users (apps users or devs). All
products are working very well, but not Python.

jmf

[toc] | [prev] | [standalone]

csiph-web

Re: Should stdlib files contain 'narrow non breaking space' U+202F?

Contents

#100581 — Re: Should stdlib files contain 'narrow non breaking space' U+202F?

#100592

#100602

#100605

#100606

#100608

#100611