Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.06; 'ascii': 0.09; 'correspond': 0.09; 'false.': 0.09; 'handful': 0.09; 'integers': 0.09; 'means,': 0.09; 'prefix': 0.09; 'string;': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'jan': 0.12; "'c'": 0.16; "(i'm": 0.16; 'ascii,': 0.16; "b''": 0.16; 'compares': 0.16; 'fine.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'nightmare.': 0.16; 'subject:More': 0.16; 'subject:Unicode': 0.16; 'unsigned': 0.16; 'wrote:': 0.18; 'else,': 0.19; 'things.': 0.19; '>>>': 0.22; 'cc:addr:python.org': 0.22; 'bytes': 0.24; 'char': 0.24; 'integer': 0.24; 'text,': 0.24; 'text.': 0.24; 'unicode': 0.24; 'mon,': 0.24; 'versions': 0.24; 'cc:2**0': 0.24; 'header:In-Reply- To:1': 0.27; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; 'protocols': 0.31; 'subject:About': 0.31; 'this.': 0.32; 'probably': 0.32; 'guess': 0.33; "i'd": 0.34; "can't": 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'easily': 0.37; 'handle': 0.38; 'pm,': 0.38; 'embedded': 0.39; 'sure': 0.39; 'either': 0.39; 'major': 0.40; 'how': 0.40; 'subject:"': 0.60; 'field.': 0.61; 'simply': 0.61; 'simple': 0.61; 'back': 0.62; 'field': 0.63; 'happen': 0.63; 'places': 0.64; 'ethan': 0.84; 'furman': 0.84; 'norm': 0.84; 'to:none': 0.92; 'imagine': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=Lie3ovAY/vBjScrAm6VO2DuWhRdsk9+5lqYPWy9Qkho=; b=yWJNLJ7Zn/725S4B138EGJVPUsSCMw/YyQxhm5+c8EX/SehP7mzIKEAKaZr4vkFaMl +2f2c4s6RniRlNzI9O9HqCI79TQ++gdGfq49Iz656yWK0Nax5INNbAxwYBR8IyNXzwUr 14kU2l58il74Sfyrc1DEfSi5dLSIPM6UemQ8+gGt/9OVIQn8TxUEQ34lSTcNVHQIrpCP UlZyG+tNh3gP+6SffUErWSItZU6ihTZl3zbnuJ6DNiAMXPjZHT74hH8p4LEu29FR5vbc 5lCwpjFeBreVvURVSYplfVHOmyCROIMdZXmD+UwxQeh4QsNBbC5oenl2s/DA4dat55ul jX2A== MIME-Version: 1.0 X-Received: by 10.68.162.66 with SMTP id xy2mr1245402pbb.46.1388976934349; Sun, 05 Jan 2014 18:55:34 -0800 (PST) In-Reply-To: <52CA13BD.4050708@stoneleaf.us> References: <52CA13BD.4050708@stoneleaf.us> Date: Mon, 6 Jan 2014 13:55:34 +1100 Subject: Re: "More About Unicode in Python 2 and 3" From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 33 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1388976943 news.xs4all.nl 2915 [2001:888:2000:d::a6]:38490 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63262 On Mon, Jan 6, 2014 at 1:23 PM, Ethan Furman wrote: > The metadata fields are simple ascii, and in Py2 something like `if > header[FIELD_TYPE] == 'C'` did the job just fine. In Py3 that compares an > int (67) to the unicode letter 'C' and returns False. For me this is simply > a major annoyance, but I only have a handful of places where I have to deal > with this. Dealing with protocols where bytes is the norm and embedded > ascii is prevalent -- well, I can easily imagine the nightmare. It can't be both things. It's either bytes or it's text. If it's text, then decoding it as ascii will give you a Unicode string; if it's small unsigned integers that just happen to correspond to ASCII values, then I would say the right thing to do is integer constants - or, in Python 3.4, an integer enumeration: >>> socket.AF_INET >>> socket.AF_INET == 2 True I'm not sure what FIELD_TYPE of 'C' means, but my guess is that it's a CHAR field. I'd just have that as the name, something like: CHAR = b'C'[0] if header[FIELD_TYPE] == CHAR: # handle char field If nothing else, this would reduce the number of places where you actually have to handle this. Plus, the code above will work on many versions of Python (I'm not sure how far back the b'' prefix is allowed - probably 2.6). ChrisA