Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100576 > unrolled thread

Re: Should stdlib files contain 'narrow non breaking space' U+202F?

Started byMark Lawrence <breamoreboy@yahoo.co.uk>
First post2015-12-18 00:02 +0000
Last post2015-12-18 04:35 -0600
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Should stdlib files contain 'narrow non breaking space' U+202F? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-18 00:02 +0000
    Re: Should stdlib files contain 'narrow non breaking space' U+202F? Steven D'Aprano <steve@pearwood.info> - 2015-12-18 20:51 +1100
      Re: Should stdlib files contain 'narrow non breaking space' U+202F? eryk sun <eryksun@gmail.com> - 2015-12-18 04:35 -0600

#100576 — Re: Should stdlib files contain 'narrow non breaking space' U+202F?

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-12-18 00:02 +0000
SubjectRe: Should stdlib files contain 'narrow non breaking space' U+202F?
Message-ID<mailman.41.1450396996.30845.python-list@python.org>
On 17/12/2015 23:18, Chris Angelico wrote:
> On Fri, Dec 18, 2015 at 10:05 AM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
>> The culprit character is hidden between "Issue #" and "20540" at line 400 of
>> C:\Python35\Lib\multiprocessing\connection.py.
>> https://bugs.python.org/issue20540 and
>> https://hg.python.org/cpython/rev/125c24f47f3c refers.
>>
>> I'm asking as I've just spent 30 minutes tracking down why my debug code
>> would bomb when running on 3.5, but not 2.7 or 3.2 through 3.4.
>
> I'm curious as to why this character should bomb your code at all -
> it's in a comment. Is it that your program was expecting ASCII, or is
> it something about that particular character?
>

I'm playing with ASTs and using the stdlib as test data.  I was trying 
to avoid going down this particular route, but...

A lot of it is down to Windows, as the actual complaint is:-

     six.print_(source)
   File "C:\Python35\lib\encodings\cp1252.py", line 19, in encode
     return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u202f' in 
position 407: character maps to <undefined>

And as usual I've answered my own question.  The cp1252 shows even if my 
console is set to 65001, *BUT* I'm piping the output to file as it's so 
much faster.  Having taken five minutes to run the code without the pipe 
everything runs to completion.

I suppose the original question still holds, but I for one certainly 
won't be losing any sleep over it.  Talking of which, good night all :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [next] | [standalone]


#100593

FromSteven D'Aprano <steve@pearwood.info>
Date2015-12-18 20:51 +1100
Message-ID<5673d713$0$1612$c3e8da3$5496439d@news.astraweb.com>
In reply to#100576
On Fri, 18 Dec 2015 11:02 am, Mark Lawrence wrote:

> A lot of it is down to Windows, as the actual complaint is:-
> 
>      six.print_(source)

Looks like a bug in six to me.

See, without Unicode comments in the std lib, you never would have found
that bug.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#100596

Fromeryk sun <eryksun@gmail.com>
Date2015-12-18 04:35 -0600
Message-ID<mailman.53.1450434985.30845.python-list@python.org>
In reply to#100593
On Fri, Dec 18, 2015 at 3:51 AM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Fri, 18 Dec 2015 11:02 am, Mark Lawrence wrote:
>
>> A lot of it is down to Windows, as the actual complaint is:-
>>
>>      six.print_(source)
>
> Looks like a bug in six to me.
>
> See, without Unicode comments in the std lib, you never would have found
> that bug.

I think Mark said he's piping the output. In this case it's not
looking at the current console/terminal encoding. Instead it defaults
to the platform's preferred encoding. On Windows that's the system
ANSI encoding, such as codepage 1252. You can set
PYTHONIOENCODING=UTF-8 to override this for stdin, stdout, and stderr.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web