Re: Why does unicode-escape decode escape symbols that are already escaped?

Path	csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path	<somelauw@gmail.com>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.000
X-Spam-Evidence	'H': 1.00; 'S': 0.00; '(even': 0.05; 'argument': 0.05; 'encoding': 0.05; 'string.': 0.05; 'encoded': 0.07; 'layers': 0.07; 'parser': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes,': 0.09; 'encode': 0.09; 'escape': 0.09; 'subject:Why': 0.09; 'cc:addr:python-list': 0.11; 'codecs': 0.16; 'expects': 0.16; 'sense,': 0.16; 'subject:already': 0.16; 'subject:unicode': 0.16; 'task.': 0.16; 'input': 0.22; 'example': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'module,': 0.24; 'string,': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'equivalent': 0.26; 'handling': 0.26; 'second': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; "doesn't": 0.30; 'characters': 0.30; 'converting': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'subject:that': 0.31; 'probably': 0.32; 'text': 0.33; 'guess': 0.33; 'actual': 0.34; 'something': 0.35; 'convert': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'done': 0.36; 'thanks': 0.36; 'possible': 0.36; 'subject:?': 0.36; 'being': 0.38; 'needed': 0.38; 'skip:u 10': 0.60; "you're": 0.61; 'first': 0.61; 'back': 0.62; 'to:addr:gmail.com': 0.65; 'response.': 0.68; 'me).': 0.84
DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LW4D+y61ogqXRtRoG1LqJErpgGFMtOeUTiwxiO7jAmU=; b=cwh/i9uZoEDgURzoUMMg3bhhXaAw0K5kImnkDZ4LazXxc7ECKHZcppSQ2bkFsIMZ9W oqBbqEbgkHhVqQRHocB4dKDQ8HzGH493fZlF3yu2HqT1mopX3B/gc84m3qPyVXho6i2m YoGJcMt6xY1EU5vL7CWoY8I+JA3Hg84bfRUHwCykYYCiHlHAfeH+sXT/72Y8LnM0D2mX pcmkjEclaEC+/E2VuED+j+QwKfOhRiGBAsuS5wMFO9ScqInT/Hyeqp0c7NCNfPWH+Anr o4mvSTyHsq/1k/fV8/YqQP973N5cvVK0vCRjK1Mp0EcqhbhnN/VEh27QpWvaASbQtxWO cwAg==
MIME-Version	1.0
X-Received	by 10.55.54.136 with SMTP id d130mr16359221qka.22.1431302206900; Sun, 10 May 2015 16:56:46 -0700 (PDT)
In-Reply-To	<CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com>
References	<CA+gt_a82WGXHUZhcdbTUWG+TRV1Ys1ZSrkGjOxgavZGjAh9FiQ@mail.gmail.com> <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com>
Date	Mon, 11 May 2015 01:56:46 +0200
Subject	Re: Why does unicode-escape decode escape symbols that are already escaped?
From	"Somelauw ." <somelauw@gmail.com>
To	Chris Angelico <rosuav@gmail.com>
Cc	"python-list@python.org" <python-list@python.org>
Content-Type	text/plain; charset=UTF-8
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.20+
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list/>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.328.1431302215.12865.python-list@python.org> (permalink)
Lines	23
NNTP-Posting-Host	2001:888:2000:d::a6
X-Trace	1431302215 news.xs4all.nl 2936 [2001:888:2000:d::a6]:42662
X-Complaints-To	abuse@xs4all.nl
Xref	csiph.com comp.lang.python:90314

Show key headers only | View raw

2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
> Whenever you start encoding and decoding, you need to know whether
> you're working with bytes->text, text->bytes, or something else. In
> the case of unicode-escape, it expects to encode text into bytes, as
> you can see with your second example - you give it a Unicode string,
> and get back a byte string. When you attempt to *decode* a Unicode
> string, that doesn't actually make sense, so it first gets *encoded*
> to bytes, before being decoded. What you're actually seeing there is
> that the one-character string is being encoded into a three-byte UTF-8
> sequence,and then the unicode-escape decode takes those bytes and
> interprets them as characters; as it happens, that's equivalent to a
> Latin-1 decode:

Thanks for your response.
I was using unicode-escape for handling escape characters like
converting "\\n" to actual newlines.
My input argument is already in string format and the decoding from
bytes to string has already been done a couple of layers deeper, so I
really needed a string to string conversion.
I guess that it's not possible to do this operation without converting
to bytes first (even if I use the codecs module, it will convert to
bytes implicitly as you just told me).
What I'm probably going to do is writing my own parser to perform this task.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread

Thread

Re: Why does unicode-escape decode escape symbols that are already escaped? "Somelauw ." <somelauw@gmail.com> - 2015-05-11 01:56 +0200

csiph-web