Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #74968 > unrolled thread

Unicode, stdout, and stderr

Started by"Frank Millman" <frank@chagford.com>
First post2014-07-22 08:18 +0200
Last post2014-07-22 00:07 -0700
Articles 8 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Unicode, stdout, and stderr "Frank Millman" <frank@chagford.com> - 2014-07-22 08:18 +0200
    Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-21 23:54 -0700
    Re: Unicode, stdout, and stderr Steven D'Aprano <steve@pearwood.info> - 2014-07-22 06:58 +0000
      Re: Unicode, stdout, and stderr "Frank Millman" <frank@chagford.com> - 2014-07-22 09:15 +0200
      Re: Unicode, stdout, and stderr Lele Gaifax <lele@metapensiero.it> - 2014-07-22 09:36 +0200
      Re: Unicode, stdout, and stderr Akira Li <4kir4.1i@gmail.com> - 2014-07-23 05:01 +0400
        Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-23 00:35 -0700
    Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-22 00:07 -0700

#74968 — Unicode, stdout, and stderr

From"Frank Millman" <frank@chagford.com>
Date2014-07-22 08:18 +0200
SubjectUnicode, stdout, and stderr
Message-ID<mailman.12161.1406009902.18130.python-list@python.org>
Hi all

This is not important, but I would appreciate it if someone could explain 
the following, run from cmd.exe on Windows Server 2003 -

C:\>python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit 
(In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = '\u2119'
>>> x  # this uses stderr
'\u2119'
>>> print(x)  # this uses stdout
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
position
0: character maps to <undefined>
>>>

It seems that there is a difference between writing to stdout and writing to 
stderr. My questions are -

1. What is the difference?

2. Is there an easy way to get stdout to behave the same as stderr?

Thanks

Frank Millman


[toc] | [next] | [standalone]


#74973

Fromwxjmfauth@gmail.com
Date2014-07-21 23:54 -0700
Message-ID<a2b6ec66-f323-4954-93c9-54417c7e5d41@googlegroups.com>
In reply to#74968
Le mardi 22 juillet 2014 08:18:08 UTC+2, Frank Millman a écrit :
> Hi all
> 
> 
> 
> This is not important, but I would appreciate it if someone could explain 
> 
> the following, run from cmd.exe on Windows Server 2003 -
> 
> 
> 
> C:\>python
> 
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit 
> 
> (In
> 
> tel)] on win32
> 
> Type "help", "copyright", "credits" or "license" for more information.
> 
> >>> x = '\u2119'
> 
> >>> x  # this uses stderr
> 
> '\u2119'
> 
> >>> print(x)  # this uses stdout
> 
> Traceback (most recent call last):
> 
>   File "<stdin>", line 1, in <module>
> 
>   File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
> 
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
> 
> position
> 
> 0: character maps to <undefined>
> 
> >>>
> 
> 
> 
> It seems that there is a difference between writing to stdout and writing to 
> 
> stderr. My questions are -
> 
> 
> 
> 1. What is the difference?
> 
> 
> 
> 2. Is there an easy way to get stdout to behave the same as stderr?
> 
> 
> 
%%%%%%%%%%


This is an example of what I explained in my
last msg in the "Python 3 is killing Python".

Quote of my comment:

"Generally, speaking, this is a perpetual annoyment
(to be polite) in Python. Python is always attempting
to find a solution for the "Python user", to enforce a
coding usage instead of letting the user/programmer
doing the task correctly.

I'm not alone to think like this and I have seen
many times people complaining about this."

----

Something different.

>>> x = '\u2119'
>>> x  # this uses stderr

This not stderr, but stdout (I doubt you redirected it).
What you see is the *representation* of x

>>> print(x)  # this uses stdout

Correct. This is supposed to print, understand desplay,
the "x as litteral" (I do not find a proper name).

Solution:
Work properly. Undestand the coding of chars eco-system as a whole
correctly.

In short: *encode*

I explained that many times (including on wx-list).

jmf

[toc] | [prev] | [next] | [standalone]


#74974

FromSteven D'Aprano <steve@pearwood.info>
Date2014-07-22 06:58 +0000
Message-ID<53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com>
In reply to#74968
On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:

> Hi all
> 
> This is not important, but I would appreciate it if someone could
> explain the following, run from cmd.exe on Windows Server 2003 -
> 
> C:\>python
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> bit (In
> tel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = '\u2119'
>>>> x  # this uses stderr
> '\u2119'


What makes you think it uses stderr? To the best of my knowledge, it uses 
stdout.


>>>> print(x)  # this uses stdout
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
> position 0: character maps to <undefined>

I think your problem is that print tries to encode the string to your 
terminal's encoding, which appears to be CP-437 ("MS DOS" code page). Can 
you convince cmd.exe to use UTF-8? That should fix the problem. (Although 
apparently Window's handling of UTF-8 is buggy, so it will create many 
wonderful new problems, yay!)

http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how

http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default

http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8



> It seems that there is a difference between writing to stdout and
> writing to stderr. 

I would be surprised if that were the case, but I don't have a Windows 
box to test it. Try this:


import sys
print(x, file=sys.stderr)  # I expect this will fail
print(repr(x), file=sys.stdout)  # I expect this will succeed



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#74978

From"Frank Millman" <frank@chagford.com>
Date2014-07-22 09:15 +0200
Message-ID<mailman.12166.1406013356.18130.python-list@python.org>
In reply to#74974
"Steven D'Aprano" <steve@pearwood.info> wrote in message 
news:53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com...
> On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:
>
>> Hi all
>>
>> This is not important, but I would appreciate it if someone could
>> explain the following, run from cmd.exe on Windows Server 2003 -
>>
>> C:\>python
>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>> bit (In
>> tel)] on win32
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> x = '\u2119'
>>>>> x  # this uses stderr
>> '\u2119'
>
>
> What makes you think it uses stderr? To the best of my knowledge, it uses
> stdout.
>

This is from the docs on sys.stdxxx

sys.stdin
sys.stdout
sys.stderr

File objects used by the interpreter for standard input, output and errors:
  - stdin is used for all interactive input (including calls to input());
  - stdout is used for the output of print() and expression statements and 
for the prompts of input();
  - The interpreter's own prompts and its error messages go to stderr.

>
>> It seems that there is a difference between writing to stdout and
>> writing to stderr.
>
> I would be surprised if that were the case, but I don't have a Windows
> box to test it. Try this:
>
>
> import sys
> print(x, file=sys.stderr)  # I expect this will fail

It does not fail.

> print(repr(x), file=sys.stdout)  # I expect this will succeed
>

It fails.

The clue that led me to stderr is that the logging module displays unicode 
strings to the console without a problem. I delved into the source code, and 
found that it writes to stderr. When I changed mine to stderr, it also 
worked.

Frank


[toc] | [prev] | [next] | [standalone]


#74981

FromLele Gaifax <lele@metapensiero.it>
Date2014-07-22 09:36 +0200
Message-ID<mailman.12169.1406014585.18130.python-list@python.org>
In reply to#74974
"Frank Millman" <frank@chagford.com> writes:

> "Steven D'Aprano" <steve@pearwood.info> wrote in message 
> news:53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com...
>> I would be surprised if that were the case, but I don't have a Windows
>> box to test it. Try this:
>>
>>
>> import sys
>> print(x, file=sys.stderr)  # I expect this will fail
>
> It does not fail.

Effectively it does not, but for some reason it actually print the
repr() of the string.

>> print(repr(x), file=sys.stdout)  # I expect this will succeed
>>
>
> It fails.

This surprises me as well, why does it fail here?

>>> repr('\u2119')
"'\u2119'"
>>> print(repr('\u2119'))
Traceback ... UnicodeEncodeError ...

On GNU/Linux, I get:

>>> repr('\u2119')
"'ℙ'"
>>> print(repr('\u2119'))
'ℙ'

Uhm, it must be related to the fact that on Py3 the repr() of something
is a unicode object too, so the output machinery tries to encode it to
the output encoding.... Still, I miss the difference between stdout and
stderr (both are cp437, accordingly to sys.xxx.encoding).

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]


#75045

FromAkira Li <4kir4.1i@gmail.com>
Date2014-07-23 05:01 +0400
Message-ID<mailman.12209.1406077282.18130.python-list@python.org>
In reply to#74974
"Frank Millman" <frank@chagford.com> writes:

> "Steven D'Aprano" <steve@pearwood.info> wrote in message 
> news:53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com...
>> On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:
>>
>>> This is not important, but I would appreciate it if someone could
>>> explain the following, run from cmd.exe on Windows Server 2003 -
>>>
>>> C:\>python
>>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>>> bit (In
>>> tel)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> x = '\u2119'
>>>>>> x  # this uses stderr
>>> '\u2119'
>>

>>> It seems that there is a difference between writing to stdout and
>>> writing to stderr.
>>
>> I would be surprised if that were the case, but I don't have a Windows
>> box to test it. Try this:
>>
>>
>> import sys
>> print(x, file=sys.stderr)  # I expect this will fail
>
> It does not fail.
>> print(repr(x), file=sys.stdout)  # I expect this will succeed
>
> It fails.

Check sys.stderr.errors attribute. Try

    >>> import sys
    >>> x = '\u2119'
    >>> x.encode(sys.stderr.encoding, sys.stderr.errors) # succeed
    >>> x.encode(sys.stdout.encoding, sys.stdout.errors) # fail

sys.stderr uses 'backslashreplace' error handler that is why you see
\u2119 instead of ℙ.

On Linux with utf-8 locale:

  >>> print('\u2119')
  ℙ
  >>> print(repr('\u2119'))
  'ℙ'
  >>> print(ascii('\u2119'))
  '\u2119'
  >>> '\u2119'
  'ℙ'
  >>> repr('\u2119')
  "'ℙ'"
  >>> ascii('\u2119')
  "'\\u2119'"

On Windows, try https://pypi.python.org/pypi/win_unicode_console

  C:\> pip install win-unicode-console
  C:\> py -i -m run

It is alpha but your feedback may improve it
https://github.com/Drekin/win-unicode-console/issues 

If you could also use a GUI console e.g.:

  C:\> py -3 -m idlelib

Or http://ipython.org/notebook.html

There are many other IDEs for Python e.g.,

http://stackoverflow.com/q/81584/what-ide-to-use-for-python


--
Akira

[toc] | [prev] | [next] | [standalone]


#75065

Fromwxjmfauth@gmail.com
Date2014-07-23 00:35 -0700
Message-ID<067c5d5c-7306-464b-bdd1-fa457ff17960@googlegroups.com>
In reply to#75045
Le mercredi 23 juillet 2014 03:01:08 UTC+2, Akira Li a écrit :
> 
> 

 
win_unicode_console:

I tested it.

---------------

@ Terry

tab, tabstop, EM Quad: not good

-----------

jmf

[toc] | [prev] | [next] | [standalone]


#74977

Fromwxjmfauth@gmail.com
Date2014-07-22 00:07 -0700
Message-ID<c580232e-5b2d-45b7-a957-fab088ec5fa2@googlegroups.com>
In reply to#74968
Le mardi 22 juillet 2014 08:18:08 UTC+2, Frank Millman a écrit :
> Hi all
> 
> 
> 
> This is not important, but I would appreciate it if someone could explain 
> 
> the following, run from cmd.exe on Windows Server 2003 -
> 
> 
> 
> C:\>python
> 
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit 
> 
> (In
> 
> tel)] on win32
> 
> Type "help", "copyright", "credits" or "license" for more information.
> 
> >>> x = '\u2119'
> 
> >>> x  # this uses stderr
> 
> '\u2119'
> 
> >>> print(x)  # this uses stdout
> 
> Traceback (most recent call last):
> 
>   File "<stdin>", line 1, in <module>
> 
>   File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
> 
>     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> 
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in 
> 
> position
> 
> 0: character maps to <undefined>
> 
> >>>
> 
> 
> 
> It seems that there is a difference between writing to stdout and writing to 
> 
> stderr. My questions are -
> 
> 
> 
> 1. What is the difference?
> 
> 
> 
> 2. Is there an easy way to get stdout to behave the same as stderr?
> 
> 
> 
%%%%%%%%%%%

Again, from my "magic" interactive intepreter.

>>> x = 'a\u2119z'
>>> sys.stdout.encoding
'<unicode>'
>>> x
'aℙz'
>>> sys.stdout.encoding = 'cp437'
>>> print(x)
Traceback (most recent call last):
  File "<eta last command>", line 1, in <module>
  File "D:\jm\jmpy\eta\eta41beta1\etastdio.py", line 158, in write
    s = s.encode(self.pencoding).decode('cp1252')
  File "c:\python32\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in position 1:
 character maps to <undefined>
>>> print(x.encode(sys.stdout.encoding, 'replace'))
'a?z'
>>> # voilà, no error

jmf

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web