Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #67699

Re: Decoding a process output

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'output': 0.05; 'python3': 0.07; 'utf-8': 0.07; 'variables': 0.07; 'filenames': 0.09; 'newline': 0.09; 'parsing': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'sequences.': 0.09; 'subject:process': 0.09; 'runs': 0.10; 'python': 0.11; '2.7': 0.14; "(i'm": 0.16; 'btw': 0.16; 'command,': 0.16; 'lang': 0.16; 'python),': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'wrote:': 0.18; 'variable': 0.18; 'seems': 0.21; 'command': 0.22; '>>>': 0.22; 'shell': 0.22; 'header:User-Agent:1': 0.23; '(such': 0.24; 'byte': 0.24; 'bytes': 0.24; 'directory.': 0.24; 'parse': 0.24; 'skip:l 30': 0.24; 'typical': 0.24; 'fine': 0.24; "haven't": 0.24; 'environment': 0.24; 'question': 0.24; 'header:X-Complaints-To:1': 0.27; 'point': 0.28; 'external': 0.29; 'skip:p 30': 0.29; "doesn't": 0.30; 'provided,': 0.31; 'run': 0.32; 'linux': 0.33; 'running': 0.33; 'guess': 0.33; "i'd": 0.34; 'case,': 0.35; 'but': 0.35; 'module.': 0.36; 'thanks': 0.36; 'hi,': 0.36; 'should': 0.36; 'application': 0.37; 'problems': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'space': 0.40; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'new': 0.61; 'such': 0.63; 'relatively': 0.65; 'containing': 0.69; 'evaluate': 0.72; 'goal': 0.75; 'introduce': 0.78; 'mount': 0.93
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Peter Otten <__peter__@web.de>
Subject Re: Decoding a process output
Date Tue, 04 Mar 2014 17:05:50 +0100
Organization None
References <lf46bg$5qc$1@ger.gmane.org>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Gmane-NNTP-Posting-Host p57bdafb9.dip0.t-ipconnect.de
User-Agent KNode/4.11.5
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.7728.1393949173.18130.python-list@python.org> (permalink)
Lines 47
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1393949173 news.xs4all.nl 2917 [2001:888:2000:d::a6]:52800
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:67699

Show key headers only | View raw


Francis Moreau wrote:

> Hi,
> 
> In my understanding (I'm relatively new to python), I need to decode any
> bytes data provided, in my case, by a shell command (such as findmnt)
> started by the subprocess module. The goal of my application is to parse
> the command outputs.
> 
> My application runs only on linux BTW and should run fine on both python
> 2.7 and py3k.
> 
> My question is when decoding the output bytes data of the external
> command, which encoding should I use ?
> 
> Should I guess the encoding by inspecting LANG or any LC_* environment
> variables ?
> 
> Should I force one of those environment variable to a specific value
> before running my external command ?
> 
> Thanks for any tips.

You can use locale.getpreferredencoding(), which seems to evaluate LANG:

$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
$ LANG= python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

I haven't seen a Linux system that doesn't use UTF-8 for a while, but you 
have to remember that filenames are still arbitrary byte sequences. You can 
cope with this in Python by not decoding the bytes or using surrogateescape

>>> os.mkdir(bytes([i for i in range(1, 256) if i != b"/"[0]]))
>>> os.listdir(b".")
[b'\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f 
!"#$%&\'()*+,-.0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff']
>>> os.listdir(".")
['\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f 
!"#$%&\'()*+,-.0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\udc80\udc81\udc82\udc83\udc84\udc85\udc86\udc87\udc88\udc89\udc8a\udc8b\udc8c\udc8d\udc8e\udc8f\udc90\udc91\udc92\udc93\udc94\udc95\udc96\udc97\udc98\udc99\udc9a\udc9b\udc9c\udc9d\udc9e\udc9f\udca0\udca1\udca2\udca3\udca4\udca5\udca6\udca7\udca8\udca9\udcaa\udcab\udcac\udcad\udcae\udcaf\udcb0\udcb1\udcb2\udcb3\udcb4\udcb5\udcb6\udcb7\udcb8\udcb9\udcba\udcbb\udcbc\udcbd\udcbe\udcbf\udcc0\udcc1\udcc2\udcc3\udcc4\udcc5\udcc6\udcc7\udcc8\udcc9\udcca\udccb\udccc\udccd\udcce\udccf\udcd0\udcd1\udcd2\udcd3\udcd4\udcd5\udcd6\udcd7\udcd8\udcd9\udcda\udcdb\udcdc\udcdd\udcde\udcdf\udce0\udce1\udce2\udce3\udce4\udce5\udce6\udce7\udce8\udce9\udcea\udceb\udcec\udced\udcee\udcef\udcf0\udcf1\udcf2\udcf3\udcf4\udcf5\udcf6\udcf7\udcf8\udcf9\udcfa\udcfb\udcfc\udcfd\udcfe\udcff']

However, the typical shell tools have problems with names containing a 
newline or even space (and if not you may introduce such problems parsing 
the tool's output), so I'd like to see how findmnt responds to a mount point 
like the above directory.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Decoding a process output Peter Otten <__peter__@web.de> - 2014-03-04 17:05 +0100

csiph-web