Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'encoded': 0.05; 'bytes.': 0.07; 'happily': 0.07; 'python': 0.08; 'encoding.': 0.09; 'utf-8': 0.09; 'angelico': 0.16; 'encode': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'hexadecimal': 0.16; 'side.': 0.16; 'significance': 0.16; 'stdout': 0.16; 'meant': 0.18; 'discussion': 0.19; 'bytes': 0.19; '(which': 0.20; 'header:In- Reply-To:1': 0.21; 'support,': 0.21; "haven't": 0.22; 'subject:problem': 0.22; 'fine': 0.22; 'received:209.85.210.174': 0.23; 'received:mail-iy0-f174.google.com': 0.23; 'smart': 0.23; 'do,': 0.25; 'changed': 0.25; 'saying': 0.26; 'string': 0.26; "i'm": 0.27; 'message-id:@mail.gmail.com': 0.28; '(not': 0.28; 'problem': 0.28; 'exists': 0.29; 'explicitly': 0.29; 'characters,': 0.30; 'decimal': 0.30; 'agree': 0.32; 'to:addr :python-list': 0.33; 'chris': 0.34; 'thinking': 0.34; 'characters': 0.34; 'skip:" 10': 0.35; 'agreed': 0.37; 'received:google.com': 0.37; 'change': 0.37; 'received:209.85': 0.37; 'communicate': 0.37; 'two': 0.37; 'but': 0.38; 'data': 0.38; 'subject:: ': 0.38; 'should': 0.39; 'received:209': 0.39; 'to:addr:python.org': 0.39; 'really': 0.40; 'target': 0.60; 'your': 0.60; 'other.': 0.63; "we've": 0.63; 'demand': 0.66; 'boss': 0.73; 'thousand': 0.74; 'hundred': 0.76; 'agreement,': 0.84; 'encoding,': 0.84; 'render': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=EYRYYozqrUyClriS9AMfZNlliOuc6kuFWBpqFRJ7qEU=; b=B7OidlZkqD7sVj1MqIJE8p57+KShWcjCnei+JSEZTA/1cjAZKeglOlpYlt2vjuawYr 3SgSa8pDsrbn7qbDmZH/omKyg66ruuXOWfkFMzhYuwzcmMXS909iBrCWANawXuPA1nFJ Zrxnd4D4Xqdt3894TVgb/qMUtUtDstMSpzfC4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=VTq/pWP67+dYBJGbbmy+0jW67Suyvomfrefx6i2WQpRc0X+PeGbs9HbHHqtCpucTQM iBbXbkStSgHFc+liJl35hB961BJjBBLD3EAc2QuIe3taUi2PVpEJ+kAPfhLYd7HNuPbs CxLAWnGe4yg01534IOtQpwRK+M99bJ0FyTnGs= MIME-Version: 1.0 In-Reply-To: <4df2340d$0$30577$a729d347@news.telepac.pt> References: <4df02e04$0$1779$a729d347@news.telepac.pt> <4df137a7$0$30580$a729d347@news.telepac.pt> <4df16f2e$0$30572$a729d347@news.telepac.pt> <8762oewjao.fsf@benfinney.id.au> <4df2340d$0$30577$a729d347@news.telepac.pt> Date: Sat, 11 Jun 2011 08:07:04 +1000 Subject: Re: the stupid encoding problem to stdout From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 28 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1307743627 news.xs4all.nl 49174 [::ffff:82.94.164.166]:56934 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:7412 2011/6/11 S=E9rgio Monteiro Basto : > ok after thinking about this, this problem exist because Python want be > smart with ttys The *anomaly* (not problem) exists because Python has a way of being told a target encoding. If two parties agree on an encoding, they can send characters to each other. I had this discussion at work a while ago; my boss was talking about being "binary-safe" (which really meant "8-bit safe"), while I was saying that we should support, verify, and demand properly-formed UTF-8. The main significance is that agreeing on an encoding means we can change the encoding any time it's convenient, without having to document that we've changed the data - because we haven't. I can take the number "twelve thousand three hundred and forty-five" and render that as a string of decimal digits as "12345", or as hexadecimal digits as "3039", but I haven't changed the number. If you know that I'm giving you a string of decimal digits, and I give you "12345", you will get the same number at the far side. Python has agreed with stdout that it will send it characters encoded in UTF-8. Having made that agreement, Python and stdout can happily communicate in characters, not bytes. You don't need to explicitly encode your characters into bytes - and in fact, this would be a very bad thing to do, because you don't know _what_ encoding stdout is using. If it's expecting UTF-16, you'll get a whole lot of rubbish if you send it UTF-8 - but it'll look fine if you send it Unicode. Chris Angelico