Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #74968 > unrolled thread
| Started by | "Frank Millman" <frank@chagford.com> |
|---|---|
| First post | 2014-07-22 08:18 +0200 |
| Last post | 2014-07-22 00:07 -0700 |
| Articles | 8 — 5 participants |
Back to article view | Back to comp.lang.python
Unicode, stdout, and stderr "Frank Millman" <frank@chagford.com> - 2014-07-22 08:18 +0200
Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-21 23:54 -0700
Re: Unicode, stdout, and stderr Steven D'Aprano <steve@pearwood.info> - 2014-07-22 06:58 +0000
Re: Unicode, stdout, and stderr "Frank Millman" <frank@chagford.com> - 2014-07-22 09:15 +0200
Re: Unicode, stdout, and stderr Lele Gaifax <lele@metapensiero.it> - 2014-07-22 09:36 +0200
Re: Unicode, stdout, and stderr Akira Li <4kir4.1i@gmail.com> - 2014-07-23 05:01 +0400
Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-23 00:35 -0700
Re: Unicode, stdout, and stderr wxjmfauth@gmail.com - 2014-07-22 00:07 -0700
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2014-07-22 08:18 +0200 |
| Subject | Unicode, stdout, and stderr |
| Message-ID | <mailman.12161.1406009902.18130.python-list@python.org> |
Hi all
This is not important, but I would appreciate it if someone could explain
the following, run from cmd.exe on Windows Server 2003 -
C:\>python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit
(In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = '\u2119'
>>> x # this uses stderr
'\u2119'
>>> print(x) # this uses stdout
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
position
0: character maps to <undefined>
>>>
It seems that there is a difference between writing to stdout and writing to
stderr. My questions are -
1. What is the difference?
2. Is there an easy way to get stdout to behave the same as stderr?
Thanks
Frank Millman
[toc] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-07-21 23:54 -0700 |
| Message-ID | <a2b6ec66-f323-4954-93c9-54417c7e5d41@googlegroups.com> |
| In reply to | #74968 |
Le mardi 22 juillet 2014 08:18:08 UTC+2, Frank Millman a écrit : > Hi all > > > > This is not important, but I would appreciate it if someone could explain > > the following, run from cmd.exe on Windows Server 2003 - > > > > C:\>python > > Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit > > (In > > tel)] on win32 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> x = '\u2119' > > >>> x # this uses stderr > > '\u2119' > > >>> print(x) # this uses stdout > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "C:\Python34\lib\encodings\cp437.py", line 19, in encode > > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > > UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in > > position > > 0: character maps to <undefined> > > >>> > > > > It seems that there is a difference between writing to stdout and writing to > > stderr. My questions are - > > > > 1. What is the difference? > > > > 2. Is there an easy way to get stdout to behave the same as stderr? > > > %%%%%%%%%% This is an example of what I explained in my last msg in the "Python 3 is killing Python". Quote of my comment: "Generally, speaking, this is a perpetual annoyment (to be polite) in Python. Python is always attempting to find a solution for the "Python user", to enforce a coding usage instead of letting the user/programmer doing the task correctly. I'm not alone to think like this and I have seen many times people complaining about this." ---- Something different. >>> x = '\u2119' >>> x # this uses stderr This not stderr, but stdout (I doubt you redirected it). What you see is the *representation* of x >>> print(x) # this uses stdout Correct. This is supposed to print, understand desplay, the "x as litteral" (I do not find a proper name). Solution: Work properly. Undestand the coding of chars eco-system as a whole correctly. In short: *encode* I explained that many times (including on wx-list). jmf
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2014-07-22 06:58 +0000 |
| Message-ID | <53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #74968 |
On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:
> Hi all
>
> This is not important, but I would appreciate it if someone could
> explain the following, run from cmd.exe on Windows Server 2003 -
>
> C:\>python
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
> bit (In
> tel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = '\u2119'
>>>> x # this uses stderr
> '\u2119'
What makes you think it uses stderr? To the best of my knowledge, it uses
stdout.
>>>> print(x) # this uses stdout
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
> position 0: character maps to <undefined>
I think your problem is that print tries to encode the string to your
terminal's encoding, which appears to be CP-437 ("MS DOS" code page). Can
you convince cmd.exe to use UTF-8? That should fix the problem. (Although
apparently Window's handling of UTF-8 is buggy, so it will create many
wonderful new problems, yay!)
http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how
http://stackoverflow.com/questions/14109024/how-to-make-unicode-charset-in-cmd-exe-by-default
http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8
> It seems that there is a difference between writing to stdout and
> writing to stderr.
I would be surprised if that were the case, but I don't have a Windows
box to test it. Try this:
import sys
print(x, file=sys.stderr) # I expect this will fail
print(repr(x), file=sys.stdout) # I expect this will succeed
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2014-07-22 09:15 +0200 |
| Message-ID | <mailman.12166.1406013356.18130.python-list@python.org> |
| In reply to | #74974 |
"Steven D'Aprano" <steve@pearwood.info> wrote in message news:53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com... > On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote: > >> Hi all >> >> This is not important, but I would appreciate it if someone could >> explain the following, run from cmd.exe on Windows Server 2003 - >> >> C:\>python >> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 >> bit (In >> tel)] on win32 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> x = '\u2119' >>>>> x # this uses stderr >> '\u2119' > > > What makes you think it uses stderr? To the best of my knowledge, it uses > stdout. > This is from the docs on sys.stdxxx sys.stdin sys.stdout sys.stderr File objects used by the interpreter for standard input, output and errors: - stdin is used for all interactive input (including calls to input()); - stdout is used for the output of print() and expression statements and for the prompts of input(); - The interpreter's own prompts and its error messages go to stderr. > >> It seems that there is a difference between writing to stdout and >> writing to stderr. > > I would be surprised if that were the case, but I don't have a Windows > box to test it. Try this: > > > import sys > print(x, file=sys.stderr) # I expect this will fail It does not fail. > print(repr(x), file=sys.stdout) # I expect this will succeed > It fails. The clue that led me to stderr is that the logging module displays unicode strings to the console without a problem. I delved into the source code, and found that it writes to stderr. When I changed mine to stderr, it also worked. Frank
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2014-07-22 09:36 +0200 |
| Message-ID | <mailman.12169.1406014585.18130.python-list@python.org> |
| In reply to | #74974 |
"Frank Millman" <frank@chagford.com> writes:
> "Steven D'Aprano" <steve@pearwood.info> wrote in message
> news:53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com...
>> I would be surprised if that were the case, but I don't have a Windows
>> box to test it. Try this:
>>
>>
>> import sys
>> print(x, file=sys.stderr) # I expect this will fail
>
> It does not fail.
Effectively it does not, but for some reason it actually print the
repr() of the string.
>> print(repr(x), file=sys.stdout) # I expect this will succeed
>>
>
> It fails.
This surprises me as well, why does it fail here?
>>> repr('\u2119')
"'\u2119'"
>>> print(repr('\u2119'))
Traceback ... UnicodeEncodeError ...
On GNU/Linux, I get:
>>> repr('\u2119')
"'ℙ'"
>>> print(repr('\u2119'))
'ℙ'
Uhm, it must be related to the fact that on Py3 the repr() of something
is a unicode object too, so the output machinery tries to encode it to
the output encoding.... Still, I miss the difference between stdout and
stderr (both are cp437, accordingly to sys.xxx.encoding).
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | Akira Li <4kir4.1i@gmail.com> |
|---|---|
| Date | 2014-07-23 05:01 +0400 |
| Message-ID | <mailman.12209.1406077282.18130.python-list@python.org> |
| In reply to | #74974 |
"Frank Millman" <frank@chagford.com> writes:
> "Steven D'Aprano" <steve@pearwood.info> wrote in message
> news:53ce0b96$0$29897$c3e8da3$5496439d@news.astraweb.com...
>> On Tue, 22 Jul 2014 08:18:08 +0200, Frank Millman wrote:
>>
>>> This is not important, but I would appreciate it if someone could
>>> explain the following, run from cmd.exe on Windows Server 2003 -
>>>
>>> C:\>python
>>> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32
>>> bit (In
>>> tel)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> x = '\u2119'
>>>>>> x # this uses stderr
>>> '\u2119'
>>
>>> It seems that there is a difference between writing to stdout and
>>> writing to stderr.
>>
>> I would be surprised if that were the case, but I don't have a Windows
>> box to test it. Try this:
>>
>>
>> import sys
>> print(x, file=sys.stderr) # I expect this will fail
>
> It does not fail.
>> print(repr(x), file=sys.stdout) # I expect this will succeed
>
> It fails.
Check sys.stderr.errors attribute. Try
>>> import sys
>>> x = '\u2119'
>>> x.encode(sys.stderr.encoding, sys.stderr.errors) # succeed
>>> x.encode(sys.stdout.encoding, sys.stdout.errors) # fail
sys.stderr uses 'backslashreplace' error handler that is why you see
\u2119 instead of ℙ.
On Linux with utf-8 locale:
>>> print('\u2119')
ℙ
>>> print(repr('\u2119'))
'ℙ'
>>> print(ascii('\u2119'))
'\u2119'
>>> '\u2119'
'ℙ'
>>> repr('\u2119')
"'ℙ'"
>>> ascii('\u2119')
"'\\u2119'"
On Windows, try https://pypi.python.org/pypi/win_unicode_console
C:\> pip install win-unicode-console
C:\> py -i -m run
It is alpha but your feedback may improve it
https://github.com/Drekin/win-unicode-console/issues
If you could also use a GUI console e.g.:
C:\> py -3 -m idlelib
Or http://ipython.org/notebook.html
There are many other IDEs for Python e.g.,
http://stackoverflow.com/q/81584/what-ide-to-use-for-python
--
Akira
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-07-23 00:35 -0700 |
| Message-ID | <067c5d5c-7306-464b-bdd1-fa457ff17960@googlegroups.com> |
| In reply to | #75045 |
Le mercredi 23 juillet 2014 03:01:08 UTC+2, Akira Li a écrit : > > win_unicode_console: I tested it. --------------- @ Terry tab, tabstop, EM Quad: not good ----------- jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2014-07-22 00:07 -0700 |
| Message-ID | <c580232e-5b2d-45b7-a957-fab088ec5fa2@googlegroups.com> |
| In reply to | #74968 |
Le mardi 22 juillet 2014 08:18:08 UTC+2, Frank Millman a écrit :
> Hi all
>
>
>
> This is not important, but I would appreciate it if someone could explain
>
> the following, run from cmd.exe on Windows Server 2003 -
>
>
>
> C:\>python
>
> Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit
>
> (In
>
> tel)] on win32
>
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> x = '\u2119'
>
> >>> x # this uses stderr
>
> '\u2119'
>
> >>> print(x) # this uses stdout
>
> Traceback (most recent call last):
>
> File "<stdin>", line 1, in <module>
>
> File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
>
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in
>
> position
>
> 0: character maps to <undefined>
>
> >>>
>
>
>
> It seems that there is a difference between writing to stdout and writing to
>
> stderr. My questions are -
>
>
>
> 1. What is the difference?
>
>
>
> 2. Is there an easy way to get stdout to behave the same as stderr?
>
>
>
%%%%%%%%%%%
Again, from my "magic" interactive intepreter.
>>> x = 'a\u2119z'
>>> sys.stdout.encoding
'<unicode>'
>>> x
'aℙz'
>>> sys.stdout.encoding = 'cp437'
>>> print(x)
Traceback (most recent call last):
File "<eta last command>", line 1, in <module>
File "D:\jm\jmpy\eta\eta41beta1\etastdio.py", line 158, in write
s = s.encode(self.pencoding).decode('cp1252')
File "c:\python32\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2119' in position 1:
character maps to <undefined>
>>> print(x.encode(sys.stdout.encoding, 'replace'))
'a?z'
>>> # voilà, no error
jmf
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web