Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com>
References: <CA+gt_a82WGXHUZhcdbTUWG+TRV1Ys1ZSrkGjOxgavZGjAh9FiQ@mail.gmail.com> <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com>
Date: Mon, 11 May 2015 01:56:46 +0200
Subject: Re: Why does unicode-escape decode escape symbols that are already escaped?
From: "Somelauw ." <somelauw@gmail.com>
To: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.328.1431302215.12865.python-list@python.org>
Lines: 23
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:90314

2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
> Whenever you start encoding and decoding, you need to know whether
> you're working with bytes->text, text->bytes, or something else. In
> the case of unicode-escape, it expects to encode text into bytes, as
> you can see with your second example - you give it a Unicode string,
> and get back a byte string. When you attempt to *decode* a Unicode
> string, that doesn't actually make sense, so it first gets *encoded*
> to bytes, before being decoded. What you're actually seeing there is
> that the one-character string is being encoded into a three-byte UTF-8
> sequence,and then the unicode-escape decode takes those bytes and
> interprets them as characters; as it happens, that's equivalent to a
> Latin-1 decode:

Thanks for your response.
I was using unicode-escape for handling escape characters like
converting "\\n" to actual newlines.
My input argument is already in string format and the decoding from
bytes to string has already been done a couple of layers deeper, so I
really needed a string to string conversion.
I guess that it's not possible to do this operation without converting
to bytes first (even if I use the codecs module, it will convert to
bytes implicitly as you just told me).
What I'm probably going to do is writing my own parser to perform this task.