X-FeedAbuse: http://nntpfeed.proxad.net/abuse.pl feeded by 78.192.65.63 Path: csiph.com!usenet.pasdenom.info!nntpfeed.proxad.net!news.muarf.org!news.roellig-ltd.de!open-news-network.org!border2.nntp.ams1.giganews.com!nntp.giganews.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'anyway.': 0.04; 'run- time': 0.05; 'bytes.': 0.07; 'rest,': 0.07; 'utf-8': 0.07; 'scripts': 0.09; '-------': 0.09; 'backslash': 0.09; 'bits.': 0.09; 'creighton': 0.09; 'from:addr:techtonik': 0.09; 'from:name:anatoly techtonik': 0.09; 'garbage': 0.09; 'okay': 0.09; 'stdout': 0.09; 'url:github': 0.09; 'cc:addr:python-list': 0.10; 'mailman': 0.10; 'python': 0.11; 'question.': 0.13; 'wed,': 0.15; 'java,': 0.15; 'anatoly': 0.16; 'crashes': 0.16; 'curses': 0.16; 'decode': 0.16; 'encodings': 0.16; 'guessing': 0.16; 'nodes': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.16; 'bytes': 0.18; 'debugging': 0.18; 'laura': 0.18; 'tree': 0.18; 'cc:2**0': 0.21; 'cc:addr:python.org': 0.21; 'explicit': 0.22; 'text,': 0.22; '2015': 0.23; 'header:In-Reply-To:1': 0.24; 'chris': 0.26; 'error': 0.27; 'least': 0.27; 'errors.': 0.27; 'message- id:@mail.gmail.com': 0.28; 'rest': 0.28; 'skip:( 20': 0.28; 'crash': 0.29; 'loss,': 0.29; 'node': 0.29; 'preserve': 0.29; 'solution,': 0.29; 'windows,': 0.29; 'character': 0.29; 'no,': 0.29; 'there.': 0.30; 'mode': 0.31; 'print': 0.31; "can't": 0.32; 'skip:[ 10': 0.32; 'post': 0.32; 'stands': 0.33; 'surely': 0.33; 'received:google.com': 0.34; 'could': 0.35; 'cc:': 0.35; 'fail': 0.35; 'text.': 0.35; 'unicode': 0.35; 'list': 0.35; 'but': 0.36; 'being': 0.36; 'text': 0.36; 'there': 0.36; 'possible': 0.36; 'flow': 0.36; 'forwarded': 0.37; 'subject:: ': 0.37; 'missing': 0.37; "won't": 0.38; 'stuff': 0.38; 'say': 0.38; 'means': 0.39; 'pm,': 0.39; 'data': 0.40; 'build': 0.40; 'subject: (': 0.40; 'some': 0.40; 'even': 0.61; 'skip:u 10': 0.62; 'more': 0.62; 'information': 0.62; 'leaving': 0.63; 'you.': 0.64; 'email addr:python.org"': 0.66; 'russian': 0.72; 'led': 0.73; 'disrupt': 0.84; 'everything,': 0.84; 'freaky': 0.84; 'zen': 0.84; 'recover': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=WEvLm+quXY5ZKTvRJNmd1bR03Sov5X05kG11KeC3Q4k=; b=ZpEpjdxAJJ/ur6JfDWhHAUNwcogE3GZCxV55uw29IlL6c8i4j6TMuVf7WC8nyqCeFE TMwxlxKPmfNPvtpAwOqC56dYR8o9L9MI0DD5p3f0IsAOEvrTPbPmAMJ1Eqc3qQUSTL/H mb/TEAbmd/j0eHTh2BhcTmnYnihzWtkZ3LL0QRRHD7J8azm3bSKtdb63D8bnz7kddagJ kFUYiBbi2PIHh4oX996/oLENwhliwNqGA5UNC/jF2lFr+ZQqpA85FDlEDX+n04IYCkq1 nemEzg0+P0Sdvwq4AD9x1Dqawo30zKv956DhpuJ9RO26KMuoOJKhRTGn6Ub/3YqNMW/r p/ig== X-Received: by 10.229.184.2 with SMTP id ci2mr8574372qcb.2.1432886728179; Fri, 29 May 2015 01:05:28 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <201505271257.t4RCv1R2015793@fido.openend.se> References: <201505271257.t4RCv1R2015793@fido.openend.se> From: anatoly techtonik Date: Fri, 29 May 2015 11:05:07 +0300 Subject: Re: Fwd: Lossless bulletproof conversion to unicode (backslashing) (fwd) To: Laura Creighton Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 62 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1432886737 news.xs4all.nl 2895 [2001:888:2000:d::a6]:43421 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:91434 On Wed, May 27, 2015 at 3:57 PM, Laura Creighton wrote: > ------- Forwarded Message > > Return-Path: > Received: from mail.python.org (mail.python.org [82.94.164.166]) > by theraft.openend.se (8.14.4/8.14.4/Debian-4) with ESMTP id t4RC09ap02From: Chris Angelico > Cc: "python-list@python.org" > > > On Wed, May 27, 2015 at 9:52 PM, anatoly techtonik wrote: >> And the short answer is that we need unicode because we are printing this >> information to the stdout, and stdout is opened in text mode at least on >> Windows, and without explicit conversion, Python will try to decode stuff >> as being `ascii` and fail anyway. > > So you're working with text. No. It is unknown. I am printing Nodes of SCons build graph and I don't know how Nodes are represented. In my case it appeared that Node contained Russian text, which led to crash of SCons. It could contain Russian text in cp1251 or in utf-8 or in KOI-8 and I can't do guessing of all possible encodings there. I just need to print that tree without crash or information loss. > That means you HAVE to decode it somehow; > you fundamentally cannot print bytes to the console. Lossless > concealment of arbitrary bytes won't help you. Won't help me with what? I am debugging build scripts to find out the *structure* of my dependencies and then all of the sudden Python crashes with UnicodeDecode error leaving me pronouncing bad Russian curses aloud. It is not even less forgiving than Java, but is also more treacherous, because of its run-time nature. It will surely help to preserve my zen if Python could just flow through the nodes of this graph. Garbage is okay - I can clean it up or remove if it stands in the way, just disrupt my flow or say me that now I want to deal with UnicodeDecode errors. Because I don't. > If you can't adequately > decode everything, either backslash-escape the rest, or use a > replacement character; you can't print out those bytes. Yes. How to backslash the rest in Python 2? In Python 3 there is some freaky "surrogateescape" error strategy, but what to do in Python 2? Replacement character is not a solution, because it is a data loss, and if I want to do post processing of graph log, I won't be able to recover the missing bits. > And no, I will not cc you. Subscribe to the list if you're going to > ask a question. Added Mailman to my suxx tracker: https://github.com/techtonik/suxx-tracker#mailman -- anatoly t.