Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.010 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'subject:Python': 0.06; '"""': 0.07; 'binary': 0.07; 'processing.': 0.07; 'stops': 0.07; 'variables': 0.07; 'armin': 0.09; 'explanation': 0.09; 'cc:addr :python-list': 0.11; 'encodings': 0.16; 'encodings,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'taming': 0.16; 'url:catb': 0.16; 'write,': 0.16; 'apps': 0.16; 'thursday,': 0.16; 'wrote:': 0.18; 'cc:addr:python.org': 0.22; 'error': 0.23; 'byte': 0.24; 'integer': 0.24; '(or': 0.24; 'environment': 0.24; 'cc:2**0': 0.24; 'read,': 0.26; 'world,': 0.26; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'unix': 0.29; 'message-id:@mail.gmail.com': 0.30; '(possibly': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'subject:some': 0.31; 'universal': 0.31; 'text': 0.33; 'fri,': 0.33; '"the': 0.34; 'common': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'set.': 0.36; 'possible': 0.36; 'url:org': 0.36; 'two': 0.37; 'being': 0.38; 'mapping': 0.38; 'fact': 0.38; 'does': 0.39; 'heard': 0.39; 'structure': 0.39; 'called': 0.40; 'easy': 0.60; 'worry': 0.60; 'information,': 0.61; 'information': 0.63; 'such': 0.63; 'soon': 0.63; 'valuable': 0.63; 'more': 0.64; 'specialized': 0.65; 'talking': 0.65; 'between': 0.67; 'streams': 0.84; 'tie': 0.84; 'beings': 0.91; 'crucial': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=BVZ+flP0sEGaSYUjaubcayfFIApT/PWEZ1r2K9+vICk=; b=Q7msQY5RS+KhYvNkDTs/Ddk2mIKp5Bz5QR5HrKw2h4AYRqp2jgE8N08efgjOQ68y7b kqkE/+m+CwuDWtgSdeRN845bB81/Vm7sdHtBV64bpmszFxe+cEOWHnRM8x+i7tnjHRXh WGKKRcmv1VZwnVNrGSKj9ZHRCEiXj1CROYGD65V5q+GBWzzVnPAwMv5ZU/o1/xAG3ZvT SKnbjXNTncC3+5vlK3oHBcau96dh3gNd2rXo+eejrFmfK8nglJ+aw71FUyB0KyMcYWiH J5Q52Kmg7J7ERYXbQJT0H9icdReI0NiYxq06CYTbQ1V514PL0fyJa+T9Dgkr11Gn68ff 54Rg== MIME-Version: 1.0 X-Received: by 10.58.39.129 with SMTP id p1mr3830671vek.69.1401989775083; Thu, 05 Jun 2014 10:36:15 -0700 (PDT) In-Reply-To: <1dc666b6-1696-4662-8832-530a2b4f66a7@googlegroups.com> References: <538a8f48$0$29978$c3e8da3$5496439d@news.astraweb.com> <538bcfff$0$29978$c3e8da3$5496439d@news.astraweb.com> <538C5BB8.1020702@chamonix.reportlab.co.uk> <538f1a61$0$29978$c3e8da3$5496439d@news.astraweb.com> <53902bb1$0$11109$c3e8da3@news.astraweb.com> <87wqcvu20h.fsf@elektro.pacujo.net> <7b3543f6-6f62-49c5-abdc-e2783fd6d629@googlegroups.com> <87oay7tnxt.fsf@elektro.pacujo.net> <53908dd0$0$29978$c3e8da3$5496439d@news.astraweb.com> <1dc666b6-1696-4662-8832-530a2b4f66a7@googlegroups.com> Date: Fri, 6 Jun 2014 03:36:14 +1000 Subject: Re: Python 3.2 has some deadly infection From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 37 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1401989777 news.xs4all.nl 2831 [2001:888:2000:d::a6]:53468 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72739 On Fri, Jun 6, 2014 at 2:54 AM, Rustom Mody wrote: > On Thursday, June 5, 2014 9:42:28 PM UTC+5:30, Chris Angelico wrote: >> On Fri, Jun 6, 2014 at 1:33 AM, Steven D'Aprano wrote: >> > In the Unix world, text formats and text >> > processing is much more common in user-space apps than binary processing. >> > Perhaps the definitive explanation and celebration of the Unix way is >> > Eric Raymond's "The Art Of Unix Programming": >> > http://www.catb.org/esr/writings/taoup/html/ch05s01.html > >> Specifically, this from the opening paragraph: >> """ >> Text streams are a valuable universal format because they're easy for >> human beings to read, write, and edit without specialized tools. These >> formats are (or can be designed to be) transparent. >> """ > > A fact that stops being true when you tie up text with encodings. > For two reasons: > > 1. The function/pair encode/decode mapping between byte-string and text > cannot be a bijection because the byte-string set is larger than the text > set. This is the error that Armin was hit by > > 2. Since there is not one but a zillion encodings possible we are not > talking of one (possibly universal) data structure but a zillion > ones: "Text streams are a universal format" - which encoding-ed > form of text?? As soon as you store or transmit ANY form of information, you need to worry about encodings. Ever heard of this thing called "network byte order"? It's part of taming the wilds of integer encodings. The theory is that the LC environment variables will carry all that crucial out-of-band information about encodings, and while the practice isn't perfect, it does still mean that there is such a thing as a text stream. ChrisA