Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(even': 0.05; 'argument': 0.05; 'encoding': 0.05; 'string.': 0.05; 'encoded': 0.07; 'layers': 0.07; 'parser': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes,': 0.09; 'encode': 0.09; 'escape': 0.09; 'subject:Why': 0.09; 'cc:addr:python-list': 0.11; 'codecs': 0.16; 'expects': 0.16; 'sense,': 0.16; 'subject:already': 0.16; 'subject:unicode': 0.16; 'task.': 0.16; 'input': 0.22; 'example': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'module,': 0.24; 'string,': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'equivalent': 0.26; 'handling': 0.26; 'second': 0.26; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; "doesn't": 0.30; 'characters': 0.30; 'converting': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'subject:that': 0.31; 'probably': 0.32; 'text': 0.33; 'guess': 0.33; 'actual': 0.34; 'something': 0.35; 'convert': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'done': 0.36; 'thanks': 0.36; 'possible': 0.36; 'subject:?': 0.36; 'being': 0.38; 'needed': 0.38; 'skip:u 10': 0.60; "you're": 0.61; 'first': 0.61; 'back': 0.62; 'to:addr:gmail.com': 0.65; 'response.': 0.68; 'me).': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LW4D+y61ogqXRtRoG1LqJErpgGFMtOeUTiwxiO7jAmU=; b=cwh/i9uZoEDgURzoUMMg3bhhXaAw0K5kImnkDZ4LazXxc7ECKHZcppSQ2bkFsIMZ9W oqBbqEbgkHhVqQRHocB4dKDQ8HzGH493fZlF3yu2HqT1mopX3B/gc84m3qPyVXho6i2m YoGJcMt6xY1EU5vL7CWoY8I+JA3Hg84bfRUHwCykYYCiHlHAfeH+sXT/72Y8LnM0D2mX pcmkjEclaEC+/E2VuED+j+QwKfOhRiGBAsuS5wMFO9ScqInT/Hyeqp0c7NCNfPWH+Anr o4mvSTyHsq/1k/fV8/YqQP973N5cvVK0vCRjK1Mp0EcqhbhnN/VEh27QpWvaASbQtxWO cwAg== MIME-Version: 1.0 X-Received: by 10.55.54.136 with SMTP id d130mr16359221qka.22.1431302206900; Sun, 10 May 2015 16:56:46 -0700 (PDT) In-Reply-To: References: Date: Mon, 11 May 2015 01:56:46 +0200 Subject: Re: Why does unicode-escape decode escape symbols that are already escaped? From: "Somelauw ." To: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 23 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1431302215 news.xs4all.nl 2936 [2001:888:2000:d::a6]:42662 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:90314 2015-05-10 18:06 GMT+02:00 Chris Angelico : > Whenever you start encoding and decoding, you need to know whether > you're working with bytes->text, text->bytes, or something else. In > the case of unicode-escape, it expects to encode text into bytes, as > you can see with your second example - you give it a Unicode string, > and get back a byte string. When you attempt to *decode* a Unicode > string, that doesn't actually make sense, so it first gets *encoded* > to bytes, before being decoded. What you're actually seeing there is > that the one-character string is being encoded into a three-byte UTF-8 > sequence,and then the unicode-escape decode takes those bytes and > interprets them as characters; as it happens, that's equivalent to a > Latin-1 decode: Thanks for your response. I was using unicode-escape for handling escape characters like converting "\\n" to actual newlines. My input argument is already in string format and the decoding from bytes to string has already been done a couple of layers deeper, so I really needed a string to string conversion. I guess that it's not possible to do this operation without converting to bytes first (even if I use the codecs module, it will convert to bytes implicitly as you just told me). What I'm probably going to do is writing my own parser to perform this task.