Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!1.eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Terry Reedy <tjreedy@udel.edu>
Subject: Re: Why does unicode-escape decode escape symbols that are already escaped?
Date: Sun, 10 May 2015 13:00:52 -0400
References: <CA+gt_a82WGXHUZhcdbTUWG+TRV1Ys1ZSrkGjOxgavZGjAh9FiQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
In-Reply-To: <CA+gt_a82WGXHUZhcdbTUWG+TRV1Ys1ZSrkGjOxgavZGjAh9FiQ@mail.gmail.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.319.1431277269.12865.python-list@python.org>
Lines: 26
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:90299

On 5/10/2015 11:53 AM, Somelauw . wrote:
> In Python 3, decoding "=E2=82=AC" with unicode-escape returns '=C3=A2\x=
82=C2=AC' which in
> my opinion doesn't make sense.

Agreed. I think this is a bug in that it should raise an exception=20
instead. Decoding a string only makes sense for rot-13

> The =E2=82=AC already is decoded; if it were encoded it would look like=
 this:
> '\u20ac'.
> So why is it doing this?

> $ python3 -S
> Python 3.3.3 (default, Nov 27 2013, 17:12:35)
> [GCC 4.8.2] on linux
>  >>> import codecs
>  >>> codecs.decode('=E2=82=AC', 'unicode-escape')
> '=C3=A2\x82=C2=AC'
>  >>> codecs.encode('=E2=82=AC', 'unicode-escape')
> b'\\u20ac'

--=20
Terry Jan Reedy