Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #90314

Re: Why does unicode-escape decode escape symbols that are already escaped?

Path csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <somelauw@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; '(even': 0.05; 'argument': 0.05; 'encoding': 0.05; 'string.': 0.05; 'encoded': 0.07; 'layers': 0.07; 'parser': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes,': 0.09; 'encode': 0.09; 'escape': 0.09; 'subject:Why': 0.09; 'cc:addr:python-list': 0.11; 'codecs': 0.16; 'expects': 0.16; 'sense,': 0.16; 'subject:already': 0.16; 'subject:unicode': 0.16; 'task.': 0.16; 'input': 0.22; 'example': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'module,': 0.24; 'string,': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'equivalent': 0.26; 'handling': 0.26; 'second': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; "doesn't": 0.30; 'characters': 0.30; 'converting': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'subject:that': 0.31; 'probably': 0.32; 'text': 0.33; 'guess': 0.33; 'actual': 0.34; 'something': 0.35; 'convert': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'done': 0.36; 'thanks': 0.36; 'possible': 0.36; 'subject:?': 0.36; 'being': 0.38; 'needed': 0.38; 'skip:u 10': 0.60; "you're": 0.61; 'first': 0.61; 'back': 0.62; 'to:addr:gmail.com': 0.65; 'response.': 0.68; 'me).': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LW4D+y61ogqXRtRoG1LqJErpgGFMtOeUTiwxiO7jAmU=; b=cwh/i9uZoEDgURzoUMMg3bhhXaAw0K5kImnkDZ4LazXxc7ECKHZcppSQ2bkFsIMZ9W oqBbqEbgkHhVqQRHocB4dKDQ8HzGH493fZlF3yu2HqT1mopX3B/gc84m3qPyVXho6i2m YoGJcMt6xY1EU5vL7CWoY8I+JA3Hg84bfRUHwCykYYCiHlHAfeH+sXT/72Y8LnM0D2mX pcmkjEclaEC+/E2VuED+j+QwKfOhRiGBAsuS5wMFO9ScqInT/Hyeqp0c7NCNfPWH+Anr o4mvSTyHsq/1k/fV8/YqQP973N5cvVK0vCRjK1Mp0EcqhbhnN/VEh27QpWvaASbQtxWO cwAg==
MIME-Version 1.0
X-Received by 10.55.54.136 with SMTP id d130mr16359221qka.22.1431302206900; Sun, 10 May 2015 16:56:46 -0700 (PDT)
In-Reply-To <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com>
References <CA+gt_a82WGXHUZhcdbTUWG+TRV1Ys1ZSrkGjOxgavZGjAh9FiQ@mail.gmail.com> <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com>
Date Mon, 11 May 2015 01:56:46 +0200
Subject Re: Why does unicode-escape decode escape symbols that are already escaped?
From "Somelauw ." <somelauw@gmail.com>
To Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.328.1431302215.12865.python-list@python.org> (permalink)
Lines 23
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1431302215 news.xs4all.nl 2936 [2001:888:2000:d::a6]:42662
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:90314

Show key headers only | View raw


2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
> Whenever you start encoding and decoding, you need to know whether
> you're working with bytes->text, text->bytes, or something else. In
> the case of unicode-escape, it expects to encode text into bytes, as
> you can see with your second example - you give it a Unicode string,
> and get back a byte string. When you attempt to *decode* a Unicode
> string, that doesn't actually make sense, so it first gets *encoded*
> to bytes, before being decoded. What you're actually seeing there is
> that the one-character string is being encoded into a three-byte UTF-8
> sequence,and then the unicode-escape decode takes those bytes and
> interprets them as characters; as it happens, that's equivalent to a
> Latin-1 decode:

Thanks for your response.
I was using unicode-escape for handling escape characters like
converting "\\n" to actual newlines.
My input argument is already in string format and the decoding from
bytes to string has already been done a couple of layers deeper, so I
really needed a string to string conversion.
I guess that it's not possible to do this operation without converting
to bytes first (even if I use the codecs module, it will convert to
bytes implicitly as you just told me).
What I'm probably going to do is writing my own parser to perform this task.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Why does unicode-escape decode escape symbols that are already escaped? "Somelauw ." <somelauw@gmail.com> - 2015-05-11 01:56 +0200

csiph-web