Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #74987 > unrolled thread
| Started by | Peter Otten <__peter__@web.de> |
|---|---|
| First post | 2014-07-22 11:09 +0200 |
| Last post | 2014-07-22 02:33 -0700 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Unicode, stdout, and stderr Peter Otten <__peter__@web.de> - 2014-07-22 11:09 +0200
Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-22 02:33 -0700
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-07-22 11:09 +0200 |
| Subject | Re: Unicode, stdout, and stderr |
| Message-ID | <mailman.12175.1406020191.18130.python-list@python.org> |
Frank Millman wrote:
>
> "Peter Otten" <__peter__@web.de> wrote in message
> news:lql3am$2q7$1@ger.gmane.org...
>> Frank Millman wrote:
>>
>>> Hi all
>>>
>>> This is not important, but I would appreciate it if someone could
>>> explain the following, run from cmd.exe on Windows Server 2003 -
>>>
>>> C:\>python
>>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>>> bit (In
>>> tel)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> x = '\u2119'
>>>>>> x # this uses stderr
>>> '\u2119'
>>
>> No, both print to stdout, but just
>>
>>>>> x
>>
>> is passed to the display hook of the interactive interpreter. This
>> applies
>> repr() and then tries to print the result. If this fails it makes
>> another effort, roughly (the actual code is written in C)
>>
>> sys.stdout.buffer.write(repr(x).encode(
>> sys.stdout.encoding, "backslashreplace"))
>>
>>
>
> Thanks, Peter. Very interesting.
>
> Out of interest, does the same thing happen when writing to sys.stderr?
If you are asking about the fallback mechanism, that is specific to
sys.displayhook in the interactive interpreter.
But stdout and stderr do handle errors differently:
>>> import sys
>>> sys.stdout.errors
'strict'
>>> sys.stderr.errors
'backslashreplace'
So a codepoint written to stdout that cannot be encoded with stdout.encoding
raises an error while a codepoint written to stderr that cannot be encoded
with stderr.encoding is escaped.
Another way to make stdout more forgiving:
>>> import sys
>>> print("\u2119")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
position 0: character maps to <undefined>
>>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace",
encoding=sys.stdout.encoding, closefd=False)
>>> print("\u2119")
ℙ
[toc] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-07-22 02:33 -0700 |
| Message-ID | <4b57b362-93b9-4654-b366-5ce635475aca@googlegroups.com> |
| In reply to | #74987 |
Le mardi 22 juillet 2014 11:09:37 UTC+2, Peter Otten a écrit :
> Frank Millman wrote:
>
>
>
> >
>
> > "Peter Otten" <__peter__@web.de> wrote in message
>
> > news:lql3am$2q7$1@ger.gmane.org...
>
> >> Frank Millman wrote:
>
> >>
>
> >>> Hi all
>
> >>>
>
> >>> This is not important, but I would appreciate it if someone could
>
> >>> explain the following, run from cmd.exe on Windows Server 2003 -
>
> >>>
>
> >>> C:\>python
>
> >>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>
> >>> bit (In
>
> >>> tel)] on win32
>
> >>> Type "help", "copyright", "credits" or "license" for more information.
>
> >>>>>> x = '\u2119'
>
> >>>>>> x # this uses stderr
>
> >>> '\u2119'
>
> >>
>
> >> No, both print to stdout, but just
>
> >>
>
> >>>>> x
>
> >>
>
> >> is passed to the display hook of the interactive interpreter. This
>
> >> applies
>
> >> repr() and then tries to print the result. If this fails it makes
>
> >> another effort, roughly (the actual code is written in C)
>
> >>
>
> >> sys.stdout.buffer.write(repr(x).encode(
>
> >> sys.stdout.encoding, "backslashreplace"))
>
> >>
>
> >>
>
> >
>
> > Thanks, Peter. Very interesting.
>
> >
>
> > Out of interest, does the same thing happen when writing to sys.stderr?
>
>
>
> If you are asking about the fallback mechanism, that is specific to
>
> sys.displayhook in the interactive interpreter.
>
>
>
> But stdout and stderr do handle errors differently:
>
>
>
> >>> import sys
>
> >>> sys.stdout.errors
>
> 'strict'
>
> >>> sys.stderr.errors
>
> 'backslashreplace'
>
>
>
> So a codepoint written to stdout that cannot be encoded with stdout.encoding
>
> raises an error while a codepoint written to stderr that cannot be encoded
>
> with stderr.encoding is escaped.
>
>
>
> Another way to make stdout more forgiving:
>
>
>
> >>> import sys
>
> >>> print("\u2119")
>
> Traceback (most recent call last):
>
> File "<stdin>", line 1, in <module>
>
> File "/usr/local/lib/python3.4/encodings/cp437.py", line 19, in encode
>
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
>
> position 0: character maps to <undefined>
>
> >>> sys.stdout = open(1, mode="w", errors="xmlcharrefreplace",
>
> encoding=sys.stdout.encoding, closefd=False)
>
> >>> print("\u2119")
>
> ℙ
=====
or in a similar way
>>> print(ascii('abcéoe EURO\u2119'))
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stdout.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>> sys.stderr.write(ascii('abcéoe EURO\u2119') + '\n')
'abc\xe9\u0153\u20ac\u2119'
>>>
>>> sys.stdout.write((ascii('abcéoe EURO\u2119').strip("'") + '\n'))
abc\xe9\u0153\u20ac\u2119
>>>
jmf
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web