Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #87126 > unrolled thread
| Started by | Albert-Jan Roskam <fomcl@yahoo.com> |
|---|---|
| First post | 2015-03-07 19:03 +0000 |
| Last post | 2015-03-07 19:03 +0000 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Newbie question about text encoding Albert-Jan Roskam <fomcl@yahoo.com> - 2015-03-07 19:03 +0000
| From | Albert-Jan Roskam <fomcl@yahoo.com> |
|---|---|
| Date | 2015-03-07 19:03 +0000 |
| Subject | Re: Newbie question about text encoding |
| Message-ID | <mailman.161.1425755314.21433.python-list@python.org> |
--- Original Message -----
> From: Chris Angelico <rosuav@gmail.com>
> To:
> Cc: "python-list@python.org" <python-list@python.org>
> Sent: Saturday, March 7, 2015 6:26 PM
> Subject: Re: Newbie question about text encoding
>
> On Sun, Mar 8, 2015 at 4:14 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> See:
>>
>> $ mkdir /tmp/xyz
>> $ touch /tmp/xyz/
>> \x80'
>> $ python3
>> Python 3.3.2 (default, Dec 4 2014, 12:49:00)
>> [GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
>> Type "help", "copyright", "credits" or
> "license" for more information.
>> >>> import os
>> >>> os.listdir('/tmp/xyz')
>> ['\udc80']
>> >>> open(os.listdir('/tmp/xyz')[0])
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> FileNotFoundError: [Errno 2] No such file or directory:
> '\udc80'
>>
>> File names encoded with Latin-X are quite commonplace even in UTF-8
>> locales.
>
> That is not a problem with UTF-8, though. I don't understand how
> you're blaming UTF-8 for that. There are two things happening here:
>
> 1) The underlying file system is not UTF-8, and you can't depend on
> that, ergo the decode to Unicode has to have some special handling of
> failing bytes.
> 2) You forgot to put the path on that, so it failed to find the file.
> Here's my version of your demo:
>
>>>> open("/tmp/xyz/"+os.listdir('/tmp/xyz')[0])
> <_io.TextIOWrapper name='/tmp/xyz/\udc80' mode='r'
> encoding='UTF-8'>
>
> Looks fine to me.
>
> Alternatively, if you pass a byte string to os.listdir, you get back a
> list of byte string file names:
>
>>>> os.listdir(b"/tmp/xyz")
> [b'\x80']
Nice, I did not know that. And glob.glob works the same way: it returns a list of ustrings when given a ustring, and returns bstrings when given a bstring.
Back to top | Article view | comp.lang.python
csiph-web