Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #74987 > unrolled thread

Re: Unicode, stdout, and stderr

Started byPeter Otten <__peter__@web.de>
First post2014-07-22 11:09 +0200
Last post2014-07-22 02:33 -0700
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Unicode, stdout, and stderr Peter Otten <__peter__@web.de> - 2014-07-22 11:09 +0200
    Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-22 02:33 -0700

#74987 — Re: Unicode, stdout, and stderr

FromPeter Otten <__peter__@web.de>
Date2014-07-22 11:09 +0200
SubjectRe: Unicode, stdout, and stderr
Message-ID<mailman.12175.1406020191.18130.python-list@python.org>
Frank Millman wrote:

> 
> "Peter Otten" <__peter__@web.de> wrote in message
> news:lql3am$2q7$1@ger.gmane.org...
>> Frank Millman wrote:
>>
>>> Hi all
>>>
>>> This is not important, but I would appreciate it if someone could
>>> explain the following, run from cmd.exe on Windows Server 2003 -
>>>
>>> C:\>python
>>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>>> bit (In
>>> tel)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> x = '\u2119'
>>>>>> x  # this uses stderr
>>> '\u2119'
>>
>> No, both print to stdout, but just
>>
>>>>> x
>>
>> is passed to the display hook of the interactive interpreter. This
>> applies
>> repr() and  then tries to print the result. If this fails it makes
>> another effort, roughly (the actual code is written in C)
>>
>> sys.stdout.buffer.write(repr(x).encode(
>>    sys.stdout.encoding, "backslashreplace"))
>>
>>
> 
> Thanks, Peter. Very interesting.
> 
> Out of interest, does the same thing happen when writing to sys.stderr?

If you are asking about the fallback mechanism, that is specific to 
sys.displayhook in the interactive interpreter. 

But stdout and stderr do handle errors differently:

>>> import sys
>>> sys.stdout.errors
'strict'
>>> sys.stderr.errors
'backslashreplace'

So a codepoint written to stdout that cannot be encoded with stdout.encoding 
raises an error while a codepoint written to stderr that cannot be encoded 
with stderr.encoding is escaped.

Another way to make stdout more forgiving:

>>> import sys
>>> print("\u2119")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
position 0: character maps to <undefined>
>>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace", 
encoding=sys.stdout.encoding, closefd=False)
>>> print("\u2119")
&#8473;

[toc] | [next] | [standalone]


#74990

Fromwxjmfauth@gmail.com
Date2014-07-22 02:33 -0700
Message-ID<4b57b362-93b9-4654-b366-5ce635475aca@googlegroups.com>
In reply to#74987
Le mardi 22 juillet 2014 11:09:37 UTC+2, Peter Otten a écrit :
> Frank Millman wrote:
> 
> 
> 
> > 
> 
> > "Peter Otten" <__peter__@web.de> wrote in message
> 
> > news:lql3am$2q7$1@ger.gmane.org...
> 
> >> Frank Millman wrote:
> 
> >>
> 
> >>> Hi all
> 
> >>>
> 
> >>> This is not important, but I would appreciate it if someone could
> 
> >>> explain the following, run from cmd.exe on Windows Server 2003 -
> 
> >>>
> 
> >>> C:\>python
> 
> >>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> 
> >>> bit (In
> 
> >>> tel)] on win32
> 
> >>> Type "help", "copyright", "credits" or "license" for more information.
> 
> >>>>>> x = '\u2119'
> 
> >>>>>> x  # this uses stderr
> 
> >>> '\u2119'
> 
> >>
> 
> >> No, both print to stdout, but just
> 
> >>
> 
> >>>>> x
> 
> >>
> 
> >> is passed to the display hook of the interactive interpreter. This
> 
> >> applies
> 
> >> repr() and  then tries to print the result. If this fails it makes
> 
> >> another effort, roughly (the actual code is written in C)
> 
> >>
> 
> >> sys.stdout.buffer.write(repr(x).encode(
> 
> >>    sys.stdout.encoding, "backslashreplace"))
> 
> >>
> 
> >>
> 
> > 
> 
> > Thanks, Peter. Very interesting.
> 
> > 
> 
> > Out of interest, does the same thing happen when writing to sys.stderr?
> 
> 
> 
> If you are asking about the fallback mechanism, that is specific to 
> 
> sys.displayhook in the interactive interpreter. 
> 
> 
> 
> But stdout and stderr do handle errors differently:
> 
> 
> 
> >>> import sys
> 
> >>> sys.stdout.errors
> 
> 'strict'
> 
> >>> sys.stderr.errors
> 
> 'backslashreplace'
> 
> 
> 
> So a codepoint written to stdout that cannot be encoded with stdout.encoding 
> 
> raises an error while a codepoint written to stderr that cannot be encoded 
> 
> with stderr.encoding is escaped.
> 
> 
> 
> Another way to make stdout more forgiving:
> 
> 
> 
> >>> import sys
> 
> >>> print("\u2119")
> 
> Traceback (most recent call last):
> 
>   File "<stdin>", line 1, in <module>
> 
>   File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
> 
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
> 
> position 0: character maps to <undefined>
> 
> >>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace", 
> 
> encoding=sys.stdout.encoding, closefd=False)
> 
> >>> print("\u2119")
> 
> &#8473;

=====

or in a similar way

>>> print(ascii('abcéoe EURO\u2119'))
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stdout.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stderr.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>> 
>>> sys.stdout.write((ascii('abcéoe EURO\u2119').strip("'") + '\n'))
abc\xe9\u0153\u20ac\u2119
>>>

jmf

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web