Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #19114

Re: sys.argv as a list of bytes

From Nobody <nobody@nowhere.com>
Subject Re: sys.argv as a list of bytes
Date 2012-01-19 05:05 +0000
Message-Id <pan.2012.01.19.05.05.31.519000@nowhere.com>
Newsgroups comp.lang.python
References <20120118081612.13745187@bigfoot.com> <mailman.4823.1326873960.27778.python-list@python.org> <20120118111627.14d490ac@bigfoot.com>
Organization Zen Internet

Show all headers | View raw


On Wed, 18 Jan 2012 09:05:42 +0100, Peter Otten wrote:

>> Python has a special errorhandler, "surrogateescape" to deal with
>> bytes that are not valid UTF-8.

On Wed, 18 Jan 2012 11:16:27 +0100, Olive wrote:

> But is it safe even if the locale is not UTF-8?

Yes. Peter's reference to UTF-8 is misleading. The surrogateescape
mechanism is used to represent anything which cannot be decoded according
to the locale's encoding. E.g. in the "C" locale, any byte >= 128 will be
encoded as a surrogate.

On Wed, 18 Jan 2012 09:05:42 +0100, Peter Otten wrote:

> It is still possible to get the original bytes:
> 
> python3 -c'import sys; print(sys.argv[1].encode("utf-8", "surrogateescape"))'

Except, it isn't. Because the Python dev's can't make up their mind which
encoding sys.argv uses, or even document it.

AFAICT:

On Windows, there never was a bytes version of sys.argv to start with
(the OS supplies the command line using wide strings).

On Mac OS X, the command line is always decoded using UTF-8.

On Unix, the command line is decoded using mbstowcs(). There isn't a
Python function to query which encoding this used (if there even _is_ a
corresponding Python encoding).

Except on Windows (where OS APIs take wide string parameters), if a
library function needs to pass a Unicode string to an API function, it
will normally decode it using sys.getfilesystemencoding(), which isn't
guaranteed to be the encoding which was used to fabricate sys.argv in
the first place.

In short: if you need to write "system" scripts on Unix, and you need them
to work reliably, you need to stick with Python 2.x.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

sys.argv as a list of bytes Olive <diolu@bigfoot.com> - 2012-01-18 08:16 +0100
  Re: sys.argv as a list of bytes Peter Otten <__peter__@web.de> - 2012-01-18 09:05 +0100
    Re: sys.argv as a list of bytes Olive <diolu@bigfoot.com> - 2012-01-18 11:16 +0100
      Re: sys.argv as a list of bytes Peter Otten <__peter__@web.de> - 2012-01-18 15:01 +0100
      Re: sys.argv as a list of bytes Nobody <nobody@nowhere.com> - 2012-01-19 05:05 +0000
        Re: sys.argv as a list of bytes jmfauth <wxjmfauth@gmail.com> - 2012-01-19 02:40 -0800

csiph-web