Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(at': 0.04; 'subject:Python': 0.06; 'made.': 0.07; 'ascii': 0.09; 'false.': 0.09; 'from:addr:ethan': 0.09; 'from:addr:stoneleaf.us': 0.09; 'from:name:ethan furman': 0.09; 'handful': 0.09; 'logic': 0.09; 'message-id:@stoneleaf.us': 0.09; 'reference:': 0.09; 'whole.': 0.09; '~ethan~': 0.09; 'python': 0.11; 'jan': 0.12; "wouldn't": 0.14; "'c'": 0.16; 'ascii,': 0.16; 'compares': 0.16; 'dismiss': 0.16; 'fine.': 0.16; 'imo,': 0.16; 'nightmare.': 0.16; 'received:69.93': 0.16; 'subject:More': 0.16; 'subject:Unicode': 0.16; 'unicode,': 0.16; 'wrote:': 0.18; 'programming': 0.22; 'header:User-Agent:1': 0.23; 'aspect': 0.24; 'bytes': 0.24; 'text,': 0.24; 'unicode': 0.24; 'mon,': 0.24; 'developers': 0.25; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; '[1]': 0.29; 'character': 0.29; 'object.': 0.31; 'protocols': 0.31; 'subject:About': 0.31; 'this.': 0.32; 'trouble': 0.34; 'could': 0.34; 'something': 0.35; 'but': 0.35; 'programming,': 0.36; 'charset:us-ascii': 0.36; 'turn': 0.37; 'two': 0.37; 'project': 0.37; 'easily': 0.37; 'handle': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'issue': 0.38; 'pm,': 0.38; 'embedded': 0.39; 'to:addr:python.org': 0.39; 'major': 0.40; 'how': 0.40; 'even': 0.60; 'subject:"': 0.60; 'most': 0.60; 'received:173': 0.61; 'course': 0.61; 'simply': 0.61; 'simple': 0.61; 'places': 0.64; 'more': 0.64; 'batchelder': 0.84; 'describes': 0.84; 'fields,': 0.84; 'norm': 0.84; 'niche': 0.91; 'imagine': 0.93 Date: Sun, 05 Jan 2014 18:23:57 -0800 From: Ethan Furman User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: python-list@python.org Subject: Re: "More About Unicode in Python 2 and 3" References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator3304.hostgator.com X-AntiAbuse: Original Domain - python.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stoneleaf.us X-BWhitelist: no X-Source-IP: 173.12.184.233 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: ([173.12.184.233]) [173.12.184.233]:42501 X-Source-Auth: ethan+stoneleaf.us X-Email-Count: 2 X-Source-Cap: dG9idWs7dG9idWs7Z2F0b3IzMzA0Lmhvc3RnYXRvci5jb20= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 23 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1388976376 news.xs4all.nl 2830 [2001:888:2000:d::a6]:35673 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63261 On 01/05/2014 05:48 PM, Chris Angelico wrote: > On Mon, Jan 6, 2014 at 12:16 PM, Ned Batchelder wrote: >> So now we have two revered developers vocally having trouble with Python 3. >> You can dismiss their concerns as niche because it's only network >> programming, but that would be a mistake. > > IMO, network programming (at least on the internet) is even more Py3's > domain (pun not intended). The issue is not how to handle text, the issue is how to handle ascii when it's in a bytes object. Using my own project [1] as a reference: good ol' dbf files -- character fields, numeric fields, logic fields, time fields, and of course the metadata that describes these fields and the dbf as a whole. The character fields I turn into unicode, no sweat. The metadata fields are simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did the job just fine. In Py3 that compares an int (67) to the unicode letter 'C' and returns False. For me this is simply a major annoyance, but I only have a handful of places where I have to deal with this. Dealing with protocols where bytes is the norm and embedded ascii is prevalent -- well, I can easily imagine the nightmare. The most unfortunate aspect is that even if we did "fix" it in 3.5, it wouldn't help any body who has to support multiple versions... unless, of course, a backport could also be made. -- ~Ethan~