Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90314 > unrolled thread
| Started by | "Somelauw ." <somelauw@gmail.com> |
|---|---|
| First post | 2015-05-11 01:56 +0200 |
| Last post | 2015-05-11 01:56 +0200 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Why does unicode-escape decode escape symbols that are already escaped? "Somelauw ." <somelauw@gmail.com> - 2015-05-11 01:56 +0200
| From | "Somelauw ." <somelauw@gmail.com> |
|---|---|
| Date | 2015-05-11 01:56 +0200 |
| Subject | Re: Why does unicode-escape decode escape symbols that are already escaped? |
| Message-ID | <mailman.328.1431302215.12865.python-list@python.org> |
2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav@gmail.com>: > Whenever you start encoding and decoding, you need to know whether > you're working with bytes->text, text->bytes, or something else. In > the case of unicode-escape, it expects to encode text into bytes, as > you can see with your second example - you give it a Unicode string, > and get back a byte string. When you attempt to *decode* a Unicode > string, that doesn't actually make sense, so it first gets *encoded* > to bytes, before being decoded. What you're actually seeing there is > that the one-character string is being encoded into a three-byte UTF-8 > sequence,and then the unicode-escape decode takes those bytes and > interprets them as characters; as it happens, that's equivalent to a > Latin-1 decode: Thanks for your response. I was using unicode-escape for handling escape characters like converting "\\n" to actual newlines. My input argument is already in string format and the decoding from bytes to string has already been done a couple of layers deeper, so I really needed a string to string conversion. I guess that it's not possible to do this operation without converting to bytes first (even if I use the codecs module, it will convert to bytes implicitly as you just told me). What I'm probably going to do is writing my own parser to perform this task.
Back to top | Article view | comp.lang.python
csiph-web