Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #90314
| Path | csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <somelauw@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; '(even': 0.05; 'argument': 0.05; 'encoding': 0.05; 'string.': 0.05; 'encoded': 0.07; 'layers': 0.07; 'parser': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes,': 0.09; 'encode': 0.09; 'escape': 0.09; 'subject:Why': 0.09; 'cc:addr:python-list': 0.11; 'codecs': 0.16; 'expects': 0.16; 'sense,': 0.16; 'subject:already': 0.16; 'subject:unicode': 0.16; 'task.': 0.16; 'input': 0.22; 'example': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'module,': 0.24; 'string,': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'equivalent': 0.26; 'handling': 0.26; 'second': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; "doesn't": 0.30; 'characters': 0.30; 'converting': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'subject:that': 0.31; 'probably': 0.32; 'text': 0.33; 'guess': 0.33; 'actual': 0.34; 'something': 0.35; 'convert': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'done': 0.36; 'thanks': 0.36; 'possible': 0.36; 'subject:?': 0.36; 'being': 0.38; 'needed': 0.38; 'skip:u 10': 0.60; "you're": 0.61; 'first': 0.61; 'back': 0.62; 'to:addr:gmail.com': 0.65; 'response.': 0.68; 'me).': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LW4D+y61ogqXRtRoG1LqJErpgGFMtOeUTiwxiO7jAmU=; b=cwh/i9uZoEDgURzoUMMg3bhhXaAw0K5kImnkDZ4LazXxc7ECKHZcppSQ2bkFsIMZ9W oqBbqEbgkHhVqQRHocB4dKDQ8HzGH493fZlF3yu2HqT1mopX3B/gc84m3qPyVXho6i2m YoGJcMt6xY1EU5vL7CWoY8I+JA3Hg84bfRUHwCykYYCiHlHAfeH+sXT/72Y8LnM0D2mX pcmkjEclaEC+/E2VuED+j+QwKfOhRiGBAsuS5wMFO9ScqInT/Hyeqp0c7NCNfPWH+Anr o4mvSTyHsq/1k/fV8/YqQP973N5cvVK0vCRjK1Mp0EcqhbhnN/VEh27QpWvaASbQtxWO cwAg== |
| MIME-Version | 1.0 |
| X-Received | by 10.55.54.136 with SMTP id d130mr16359221qka.22.1431302206900; Sun, 10 May 2015 16:56:46 -0700 (PDT) |
| In-Reply-To | <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com> |
| References | <CA+gt_a82WGXHUZhcdbTUWG+TRV1Ys1ZSrkGjOxgavZGjAh9FiQ@mail.gmail.com> <CAPTjJmpQ_25UVGVpg5MUsO1+sUeBp+20t5Txo+XhXkSkYPMjqw@mail.gmail.com> |
| Date | Mon, 11 May 2015 01:56:46 +0200 |
| Subject | Re: Why does unicode-escape decode escape symbols that are already escaped? |
| From | "Somelauw ." <somelauw@gmail.com> |
| To | Chris Angelico <rosuav@gmail.com> |
| Cc | "python-list@python.org" <python-list@python.org> |
| Content-Type | text/plain; charset=UTF-8 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.328.1431302215.12865.python-list@python.org> (permalink) |
| Lines | 23 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1431302215 news.xs4all.nl 2936 [2001:888:2000:d::a6]:42662 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:90314 |
Show key headers only | View raw
2015-05-10 18:06 GMT+02:00 Chris Angelico <rosuav@gmail.com>: > Whenever you start encoding and decoding, you need to know whether > you're working with bytes->text, text->bytes, or something else. In > the case of unicode-escape, it expects to encode text into bytes, as > you can see with your second example - you give it a Unicode string, > and get back a byte string. When you attempt to *decode* a Unicode > string, that doesn't actually make sense, so it first gets *encoded* > to bytes, before being decoded. What you're actually seeing there is > that the one-character string is being encoded into a three-byte UTF-8 > sequence,and then the unicode-escape decode takes those bytes and > interprets them as characters; as it happens, that's equivalent to a > Latin-1 decode: Thanks for your response. I was using unicode-escape for handling escape characters like converting "\\n" to actual newlines. My input argument is already in string format and the decoding from bytes to string has already been done a couple of layers deeper, so I really needed a string to string conversion. I guess that it's not possible to do this operation without converting to bytes first (even if I use the codecs module, it will convert to bytes implicitly as you just told me). What I'm probably going to do is writing my own parser to perform this task.
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Why does unicode-escape decode escape symbols that are already escaped? "Somelauw ." <somelauw@gmail.com> - 2015-05-11 01:56 +0200
csiph-web