Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #63262

Re: "More About Unicode in Python 2 and 3"

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.06; 'ascii': 0.09; 'correspond': 0.09; 'false.': 0.09; 'handful': 0.09; 'integers': 0.09; 'means,': 0.09; 'prefix': 0.09; 'string;': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'jan': 0.12; "'c'": 0.16; "(i'm": 0.16; 'ascii,': 0.16; "b''": 0.16; 'compares': 0.16; 'fine.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'nightmare.': 0.16; 'subject:More': 0.16; 'subject:Unicode': 0.16; 'unsigned': 0.16; 'wrote:': 0.18; 'else,': 0.19; 'things.': 0.19; '>>>': 0.22; 'cc:addr:python.org': 0.22; 'bytes': 0.24; 'char': 0.24; 'integer': 0.24; 'text,': 0.24; 'text.': 0.24; 'unicode': 0.24; 'mon,': 0.24; 'versions': 0.24; 'cc:2**0': 0.24; 'header:In-Reply- To:1': 0.27; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; 'protocols': 0.31; 'subject:About': 0.31; 'this.': 0.32; 'probably': 0.32; 'guess': 0.33; "i'd": 0.34; "can't": 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'easily': 0.37; 'handle': 0.38; 'pm,': 0.38; 'embedded': 0.39; 'sure': 0.39; 'either': 0.39; 'major': 0.40; 'how': 0.40; 'subject:"': 0.60; 'field.': 0.61; 'simply': 0.61; 'simple': 0.61; 'back': 0.62; 'field': 0.63; 'happen': 0.63; 'places': 0.64; 'ethan': 0.84; 'furman': 0.84; 'norm': 0.84; 'to:none': 0.92; 'imagine': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=Lie3ovAY/vBjScrAm6VO2DuWhRdsk9+5lqYPWy9Qkho=; b=yWJNLJ7Zn/725S4B138EGJVPUsSCMw/YyQxhm5+c8EX/SehP7mzIKEAKaZr4vkFaMl +2f2c4s6RniRlNzI9O9HqCI79TQ++gdGfq49Iz656yWK0Nax5INNbAxwYBR8IyNXzwUr 14kU2l58il74Sfyrc1DEfSi5dLSIPM6UemQ8+gGt/9OVIQn8TxUEQ34lSTcNVHQIrpCP UlZyG+tNh3gP+6SffUErWSItZU6ihTZl3zbnuJ6DNiAMXPjZHT74hH8p4LEu29FR5vbc 5lCwpjFeBreVvURVSYplfVHOmyCROIMdZXmD+UwxQeh4QsNBbC5oenl2s/DA4dat55ul jX2A==
MIME-Version 1.0
X-Received by 10.68.162.66 with SMTP id xy2mr1245402pbb.46.1388976934349; Sun, 05 Jan 2014 18:55:34 -0800 (PST)
In-Reply-To <52CA13BD.4050708@stoneleaf.us>
References <lablra$1mc$2@ger.gmane.org> <labmaj$8u2$1@ger.gmane.org> <lad05k$gf6$1@ger.gmane.org> <CAPTjJmqBeoTLxXiKVcsvk395qgKt+Qv+jF_sOpzi7CgZmBjQcw@mail.gmail.com> <52CA13BD.4050708@stoneleaf.us>
Date Mon, 6 Jan 2014 13:55:34 +1100
Subject Re: "More About Unicode in Python 2 and 3"
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.5001.1388976943.18130.python-list@python.org> (permalink)
Lines 33
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1388976943 news.xs4all.nl 2915 [2001:888:2000:d::a6]:38490
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:63262

Show key headers only | View raw


On Mon, Jan 6, 2014 at 1:23 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
> The metadata fields are simple ascii, and in Py2 something like `if
> header[FIELD_TYPE] == 'C'` did the job just fine.  In Py3 that compares an
> int (67) to the unicode letter 'C' and returns False.  For me this is simply
> a major annoyance, but I only have a handful of places where I have to deal
> with this.  Dealing with protocols where bytes is the norm and embedded
> ascii is prevalent -- well, I can easily imagine the nightmare.

It can't be both things. It's either bytes or it's text. If it's text,
then decoding it as ascii will give you a Unicode string; if it's
small unsigned integers that just happen to correspond to ASCII
values, then I would say the right thing to do is integer constants -
or, in Python 3.4, an integer enumeration:

>>> socket.AF_INET
<AddressFamily.AF_INET: 2>
>>> socket.AF_INET == 2
True

I'm not sure what FIELD_TYPE of 'C' means, but my guess is that it's a
CHAR field. I'd just have that as the name, something like:

CHAR = b'C'[0]

if header[FIELD_TYPE] == CHAR:
    # handle char field

If nothing else, this would reduce the number of places where you
actually have to handle this. Plus, the code above will work on many
versions of Python (I'm not sure how far back the b'' prefix is
allowed - probably 2.6).

ChrisA

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 13:55 +1100
  Re: "More About Unicode in Python 2 and 3" Roy Smith <roy@panix.com> - 2014-01-05 23:24 -0500
    Re: "More About Unicode in Python 2 and 3" Tim Chase <python.list@tim.thechases.com> - 2014-01-05 22:41 -0600
      Re: "More About Unicode in Python 2 and 3" Roy Smith <roy@panix.com> - 2014-01-05 23:49 -0500
        Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 15:59 +1100
    Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 15:51 +1100
    Re: "More About Unicode in Python 2 and 3" Tim Chase <python.list@tim.thechases.com> - 2014-01-06 05:49 -0600
    Re: "More About Unicode in Python 2 and 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 03:24 +1100
      Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 03:30 +1100
    Re: "More About Unicode in Python 2 and 3" Serhiy Storchaka <storchaka@gmail.com> - 2014-01-06 22:20 +0200
    Re: "More About Unicode in Python 2 and 3" Serhiy Storchaka <storchaka@gmail.com> - 2014-01-06 22:21 +0200
    Re: "More About Unicode in Python 2 and 3" Tim Chase <python.list@tim.thechases.com> - 2014-01-06 14:42 -0600
    Re: "More About Unicode in Python 2 and 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-06 20:47 +0000
    Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 10:06 +1100

csiph-web