Groups > comp.lang.python > #90314 > unrolled thread

Re: Why does unicode-escape decode escape symbols that are already escaped?

Started by	"Somelauw ." <somelauw@gmail.com>
First post	2015-05-11 01:56 +0200
Last post	2015-05-11 01:56 +0200
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Why does unicode-escape decode escape symbols that are already escaped? "Somelauw ." <somelauw@gmail.com> - 2015-05-11 01:56 +0200

#90314 — Re: Why does unicode-escape decode escape symbols that are already escaped?

From	"Somelauw ." <somelauw@gmail.com>
Date	2015-05-11 01:56 +0200
Subject	Re: Why does unicode-escape decode escape symbols that are already escaped?
Message-ID	<mailman.328.1431302215.12865.python-list@python.org>

2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
> Whenever you start encoding and decoding, you need to know whether
> you're working with bytes->text, text->bytes, or something else. In
> the case of unicode-escape, it expects to encode text into bytes, as
> you can see with your second example - you give it a Unicode string,
> and get back a byte string. When you attempt to *decode* a Unicode
> string, that doesn't actually make sense, so it first gets *encoded*
> to bytes, before being decoded. What you're actually seeing there is
> that the one-character string is being encoded into a three-byte UTF-8
> sequence,and then the unicode-escape decode takes those bytes and
> interprets them as characters; as it happens, that's equivalent to a
> Latin-1 decode:

Thanks for your response.
I was using unicode-escape for handling escape characters like
converting "\\n" to actual newlines.
My input argument is already in string format and the decoding from
bytes to string has already been done a couple of layers deeper, so I
really needed a string to string conversion.
I guess that it's not possible to do this operation without converting
to bytes first (even if I use the codecs module, it will convert to
bytes implicitly as you just told me).
What I'm probably going to do is writing my own parser to perform this task.

[toc] | [standalone]

csiph-web

Re: Why does unicode-escape decode escape symbols that are already escaped?

Contents

#90314 — Re: Why does unicode-escape decode escape symbols that are already escaped?