Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #87161
| Date | 2015-03-09 08:15 +1100 |
|---|---|
| Subject | Opaque error message on UTF-8 decode |
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.176.1425849330.21433.python-list@python.org> (permalink) |
>>> b"\xed\xb4\x80".decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position
0: invalid continuation byte
But 0xED is not a continuation byte, it's a start byte. And it's a
perfectly valid one:
>>> b"\xed\x9f\xbf".decode()
'\ud7ff'
Pike is more explicit about what the problem is:
> utf8_to_string("\xed\xb4\x80");
UTF-8 sequence beginning with 0xed 0xb4 at index 0 would decode to a
UTF-16 surrogate character.
Is this something where Python's error message could do with
improvement, or is it not worth the hassle? Should I raise a tracker
issue about this?
ChrisA
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Opaque error message on UTF-8 decode Chris Angelico <rosuav@gmail.com> - 2015-03-09 08:15 +1100
csiph-web