Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #63262

Re: "More About Unicode in Python 2 and 3"

References <lablra$1mc$2@ger.gmane.org> <labmaj$8u2$1@ger.gmane.org> <lad05k$gf6$1@ger.gmane.org> <CAPTjJmqBeoTLxXiKVcsvk395qgKt+Qv+jF_sOpzi7CgZmBjQcw@mail.gmail.com> <52CA13BD.4050708@stoneleaf.us>
Date 2014-01-06 13:55 +1100
Subject Re: "More About Unicode in Python 2 and 3"
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.5001.1388976943.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Mon, Jan 6, 2014 at 1:23 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
> The metadata fields are simple ascii, and in Py2 something like `if
> header[FIELD_TYPE] == 'C'` did the job just fine.  In Py3 that compares an
> int (67) to the unicode letter 'C' and returns False.  For me this is simply
> a major annoyance, but I only have a handful of places where I have to deal
> with this.  Dealing with protocols where bytes is the norm and embedded
> ascii is prevalent -- well, I can easily imagine the nightmare.

It can't be both things. It's either bytes or it's text. If it's text,
then decoding it as ascii will give you a Unicode string; if it's
small unsigned integers that just happen to correspond to ASCII
values, then I would say the right thing to do is integer constants -
or, in Python 3.4, an integer enumeration:

>>> socket.AF_INET
<AddressFamily.AF_INET: 2>
>>> socket.AF_INET == 2
True

I'm not sure what FIELD_TYPE of 'C' means, but my guess is that it's a
CHAR field. I'd just have that as the name, something like:

CHAR = b'C'[0]

if header[FIELD_TYPE] == CHAR:
    # handle char field

If nothing else, this would reduce the number of places where you
actually have to handle this. Plus, the code above will work on many
versions of Python (I'm not sure how far back the b'' prefix is
allowed - probably 2.6).

ChrisA

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 13:55 +1100
  Re: "More About Unicode in Python 2 and 3" Roy Smith <roy@panix.com> - 2014-01-05 23:24 -0500
    Re: "More About Unicode in Python 2 and 3" Tim Chase <python.list@tim.thechases.com> - 2014-01-05 22:41 -0600
      Re: "More About Unicode in Python 2 and 3" Roy Smith <roy@panix.com> - 2014-01-05 23:49 -0500
        Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 15:59 +1100
    Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-06 15:51 +1100
    Re: "More About Unicode in Python 2 and 3" Tim Chase <python.list@tim.thechases.com> - 2014-01-06 05:49 -0600
    Re: "More About Unicode in Python 2 and 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 03:24 +1100
      Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 03:30 +1100
    Re: "More About Unicode in Python 2 and 3" Serhiy Storchaka <storchaka@gmail.com> - 2014-01-06 22:20 +0200
    Re: "More About Unicode in Python 2 and 3" Serhiy Storchaka <storchaka@gmail.com> - 2014-01-06 22:21 +0200
    Re: "More About Unicode in Python 2 and 3" Tim Chase <python.list@tim.thechases.com> - 2014-01-06 14:42 -0600
    Re: "More About Unicode in Python 2 and 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-06 20:47 +0000
    Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 10:06 +1100

csiph-web