Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #53618
| From | Ferrous Cranus <nikos@superhost.gr> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: UnicodeDecodeError issue |
| Date | 2013-09-04 14:38 +0300 |
| Organization | A noiseless patient Spider |
| Message-ID | <l07641$vh2$1@dont-email.me> (permalink) |
| References | (11 earlier) <mailman.487.1378124525.19984.python-list@python.org> <l029ev$2a1g$1@news.ntua.gr> <mailman.511.1378146537.19984.python-list@python.org> <3e549761-4323-4379-b4e4-ce51597d59c0@googlegroups.com> <mailman.38.1378294002.5461.python-list@python.org> |
Στις 4/9/2013 2:26 μμ, ο/η Dave Angel έγραψε:
> On 4/9/2013 04:35, Ferrous Cranus wrote:
>
>> Τη Δευτέρα, 2 Σεπτεμβρίου 2013 9:28:36 μ.μ. UTC+3, ο χρήστης Dave Angel έγραψε:
>>> On 2/9/2013 11:05, Ferrous Cranus wrote:
>>>
>>>
>>>
>>>> Στις 2/9/2013 3:21 μμ, ο/η Dave Angel έγραψε:
>>>
>>>>> Starting with the byte string in the error message:
>>>
>>>>>>>> f = open("junk.txt", "w")
>>>
>>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>
>>>>>>>> f.close()
>>>
>>>>
>>>
>>>>
>>>
>>>> Ιndeed but yet again, file checks out the encoding of the filename that
>>>
>>>> consists of these lines above, not of the actual strings.
>>>
>>>>
>>>
>>>>
>>>
>>>
>>>
>>> 'file' does nothing interesting with the filename, it just opens it and
>>>
>>> examines the contents. For example,
>>>
>>>
>>>
>>> file www/cgi-bin/files.py
>>>
>>>
>>>
>>> will examine the Python source file, not run it.
>>>
>>>
>>>
>>> So first in the interpreter, I ran
>>>
>>>
>>>
>>>>>>> f = open("junk.txt", "w")
>>>
>>>>>>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>>>
>>>>>>> f.close()
>>>
>>>
>>>
>>> then at the bash prompt, I ran:
>>>
>>>
>>>
>>> davea@think2:~$ file junk.txt
>>>
>>> junk.txt: ISO-8859 text
>>
>>
>> That is one Clever Idea Dave.
>>
>> I take it that the charset of the file 'junk.txt' gets identified by the characters encoding that read form within the file?
>
> 'file' only guesses the most likely encoding for 'junk.txt' But at
> least it can know it's not utf-8, since that would give an decoding
> error.
>
> That's why, whenever 'file' makes its verdict, it's up to you to check
> it by displaying the data after decoding it with that tentative
> encoding.
>
>>
>> But wait a minute: What editor do you uses to write these 3 lines?
>> I mean am a bit confused.
>
> As I said right above, "in the interpreter, I ran"...
> And if that's not clear enough, you can see the >>>> prompts that the
> Python interpreter uses. By interpeter, I mean I ran Python with no
> parameters. I did not run IDLE or any other IDE, that might take it
> upon itself to interfere.
>
>
>>
>> i for example i 'nano tets.py' which has within:
>>
>> f = open("junk.txt", "w")
>> f.write(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2\n')
>> f.close()
>>
>> then when i save the file within nano for example by default in utf-8 charset
>
> That's the encoding for the file tets.py, and you'll notice that it's
> actually ASCII. Notice that the string I copied from the error message
> uses escape sequences for all non-ASCII bytes.
>
>>
>> how would it be able to detect the bytestring within that is supposed to be of greek-iso's
>
> I wouldn't be running 'file' on the tets.py file, but on the junk.txt
> file created when you run
> python tets.py
>
> So since the tets.py file was a sidetrack, I just ran those three lines
> in the interpreter.
>
I'm still consused about this.
say we save those 3 lines inside junk.txt and we save it by default as utf-8
when we 'file junk.txt'
what will file respond with?
filename's charset?
or
will it llook at the bystering within to decide what encoding it uses?
fi
--
Webhost <http://superhost.gr>
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 09:41 +0300
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-08-31 16:53 +1000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:02 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:18 +0300
Re: UnicodeDecodeError issue Peter Otten <__peter__@web.de> - 2013-08-31 09:25 +0200
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 10:58 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 11:31 +0300
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 11:28 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 15:58 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-08-31 16:07 +0300
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-31 15:44 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-08-31 23:50 -0700
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:12 +1000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 10:23 +0300
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 17:28 +1000
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 10:35 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 16:59 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:40 +0000
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-01 20:51 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-01 08:35 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:08 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 17:25 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 15:36 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-01 19:10 +0300
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 01:23 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-01 23:14 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 07:16 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 11:38 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 14:49 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:21 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-02 18:05 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 18:28 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos.gr33k@gmail.com> - 2013-09-04 01:35 -0700
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 11:26 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 14:38 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-04 12:38 +0000
Re: UnicodeDecodeError issue Ferrous Cranus <nikos@superhost.gr> - 2013-09-04 17:29 +0300
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-05 00:17 +0000
Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 03:07 +0000
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-05 13:59 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve@pearwood.info> - 2013-09-05 05:28 +0000
Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 12:56 +0100
Re: UnicodeDecodeError issue Dave Angel <davea@davea.name> - 2013-09-02 12:24 +0000
Re: UnicodeDecodeError issue MRAB <python@mrabarnett.plus.com> - 2013-09-02 15:44 +0100
Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-03 08:23 -0700
Re: UnicodeDecodeError issue Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-09-04 10:01 +0200
Re: UnicodeDecodeError issue wxjmfauth@gmail.com - 2013-09-04 07:08 -0700
Re: UnicodeDecodeError issue Chris Angelico <rosuav@gmail.com> - 2013-09-03 08:45 +1000
Re: UnicodeDecodeError issue Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-03 14:56 +0000
Re: UnicodeDecodeError issue Joel Goldstick <joel.goldstick@gmail.com> - 2013-09-02 20:49 -0400
csiph-web