Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #90291

Why does unicode-escape decode escape symbols that are already escaped?

Date 2015-05-10 17:53 +0200
Subject Why does unicode-escape decode escape symbols that are already escaped?
From "Somelauw ." <somelauw@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.312.1431273197.12865.python-list@python.org> (permalink)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

In Python 3, decoding "€" with unicode-escape returns 'â\x82¬' which in my
opinion doesn't make sense.
The € already is decoded; if it were encoded it would look like this:
'\u20ac'.
So why is it doing this?

In Python 2 the behaviour is similar, but slightly different.

$ python3 -S
Python 3.3.3 (default, Nov 27 2013, 17:12:35)
[GCC 4.8.2] on linux
>>> import codecs
>>> codecs.decode('€', 'unicode-escape')
'â\x82¬'
>>> codecs.encode('€', 'unicode-escape')
b'\\u20ac'
>>>

$ python2 -S
Python 2.7.5+ (default, Sep 17 2013, 15:31:50)
[GCC 4.8.1] on linux2
>>> import codecs
>>> codecs.decode('€', 'unicode-escape')
u'\xe2\x82\xac'
>>> codecs.encode('€', 'unicode-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0:
ordinal not in range(128)
>>>

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Why does unicode-escape decode escape symbols that are already escaped? "Somelauw ." <somelauw@gmail.com> - 2015-05-10 17:53 +0200

csiph-web