Groups > comp.lang.python > #90299 > unrolled thread

Re: Why does unicode-escape decode escape symbols that are already escaped?

Started by	Terry Reedy <tjreedy@udel.edu>
First post	2015-05-10 13:00 -0400
Last post	2015-05-11 14:00 +1000
Articles	2 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Why does unicode-escape decode escape symbols that are already escaped? Terry Reedy <tjreedy@udel.edu> - 2015-05-10 13:00 -0400
    Re: Why does unicode-escape decode escape symbols that are already escaped? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-11 14:00 +1000

#90299 — Re: Why does unicode-escape decode escape symbols that are already escaped?

From	Terry Reedy <tjreedy@udel.edu>
Date	2015-05-10 13:00 -0400
Subject	Re: Why does unicode-escape decode escape symbols that are already escaped?
Message-ID	<mailman.319.1431277269.12865.python-list@python.org>

On 5/10/2015 11:53 AM, Somelauw . wrote:
> In Python 3, decoding "€" with unicode-escape returns 'â\x82¬' which in
> my opinion doesn't make sense.

Agreed. I think this is a bug in that it should raise an exception 
instead. Decoding a string only makes sense for rot-13

> The € already is decoded; if it were encoded it would look like this:
> '\u20ac'.
> So why is it doing this?

> $ python3 -S
> Python 3.3.3 (default, Nov 27 2013, 17:12:35)
> [GCC 4.8.2] on linux
>  >>> import codecs
>  >>> codecs.decode('€', 'unicode-escape')
> 'â\x82¬'
>  >>> codecs.encode('€', 'unicode-escape')
> b'\\u20ac'

-- 
Terry Jan Reedy

[toc] | [next] | [standalone]

#90344

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-11 14:00 +1000
Message-ID	<55502951$0$12997$c3e8da3$5496439d@news.astraweb.com>
In reply to	#90299

On Mon, 11 May 2015 03:00 am, Terry Reedy wrote:

> Decoding a string only makes sense for rot-13

Or any other string-to-string encoding.

As has been discussed on python-ideas and python-dev many times, the idea of
a codec is much more general than just bytes -> string and string -> bytes.
It can deal with any transformation of data. The codec machinery can, I
believe, operate on any suitable type, and it can certainly operate on
bytes -> bytes and str -> str.


I have gradually come to agree that bytes and str objects should only
support decode() and encode() operations respectively, but str->str and
bytes->bytes codecs are useful to.


-- 
Steven

[toc] | [prev] | [standalone]

csiph-web

Re: Why does unicode-escape decode escape symbols that are already escaped?

Contents

#90299 — Re: Why does unicode-escape decode escape symbols that are already escaped?

#90344