Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #63261 > unrolled thread

Re: "More About Unicode in Python 2 and 3"

Started byEthan Furman <ethan@stoneleaf.us>
First post2014-01-05 18:23 -0800
Last post2014-01-07 12:05 +1100
Articles 6 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: "More About Unicode in Python 2 and 3" Ethan Furman <ethan@stoneleaf.us> - 2014-01-05 18:23 -0800
    Re: "More About Unicode in Python 2 and 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 03:43 +1100
      Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 03:54 +1100
      Re: "More About Unicode in Python 2 and 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-06 17:07 +0000
      Re: "More About Unicode in Python 2 and 3" Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-06 19:23 -0500
      Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 12:05 +1100

#63261 — Re: "More About Unicode in Python 2 and 3"

FromEthan Furman <ethan@stoneleaf.us>
Date2014-01-05 18:23 -0800
SubjectRe: "More About Unicode in Python 2 and 3"
Message-ID<mailman.5000.1388976376.18130.python-list@python.org>
On 01/05/2014 05:48 PM, Chris Angelico wrote:
> On Mon, Jan 6, 2014 at 12:16 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
>> So now we have two revered developers vocally having trouble with Python 3.
>> You can dismiss their concerns as niche because it's only network
>> programming, but that would be a mistake.
>
> IMO, network programming (at least on the internet) is even more Py3's
> domain (pun not intended).

The issue is not how to handle text, the issue is how to handle ascii when it's in a bytes object.

Using my own project [1] as a reference:  good ol' dbf files -- character fields, numeric fields, logic fields, time 
fields, and of course the metadata that describes these fields and the dbf as a whole.  The character fields I turn into 
unicode, no sweat.  The metadata fields are simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did 
the job just fine.  In Py3 that compares an int (67) to the unicode letter 'C' and returns False.  For me this is simply 
a major annoyance, but I only have a handful of places where I have to deal with this.  Dealing with protocols where 
bytes is the norm and embedded ascii is prevalent -- well, I can easily imagine the nightmare.

The most unfortunate aspect is that even if we did "fix" it in 3.5, it wouldn't help any body who has to support 
multiple versions... unless, of course, a backport could also be made.

--
~Ethan~

[toc] | [next] | [standalone]


#63297

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-01-07 03:43 +1100
Message-ID<52cadd49$0$29999$c3e8da3$5496439d@news.astraweb.com>
In reply to#63261
Ethan Furman wrote:

> Using my own project [1] as a reference:  good ol' dbf files -- character
> fields, numeric fields, logic fields, time fields, and of course the
> metadata that describes these fields and the dbf as a whole.  The
> character fields I turn into unicode, no sweat.  The metadata fields are
> simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did
> the job just fine.  In Py3 that compares an int (67) to the unicode letter
> 'C' and returns False.  

Why haven't you converted the headers to text too? You're using them as if
they were text. They might happen to merely contain the small subset of
Unicode which matches the ASCII encoding, but that in itself is no good
reason to keep it as bytes. If you want to work with stuff as if it were
text, convert it to text.

If you do have a good reason for keeping them as bytes, say because you need
to do a bunch of bitwise operations on it, it's not that hard to do the job
correctly: instead of defining FIELD_TYPE as 3 (for example), define it as
slice(3,4). Then:

    if header[FIELD_TYPE] == b'C':

will work. For sure, this is a bit of a nuisance, and slightly error-prone,
since Python won't complain if you forget the b prefix, it will silently
return False. Which is the right thing to do, inconvenient though it may be
in this case. But it is workable, with a bit of discipline.

Or define a helper, and use that:

    def eq(byte, char):
        return byte == ord(char)


    if eq(header[FIELD_TYPE], 'C'):


Worried about the cost of all those function calls, all those ord()'s? I'll
give you the benefit of the doubt and assume that this is not premature
optimisation. So do it yourself:

    C = ord('C')  # Convert it once.
    if header[FIELD_TYPE] == C:  # And use it many times.


[Note to self: when I'm BDFL, encourage much more compile-time
optimisations.]


> For me this is simply a major annoyance, but I 
> only have a handful of places where I have to deal with this.  Dealing
> with protocols where bytes is the norm and embedded ascii is prevalent --
> well, I can easily imagine the nightmare.

Is it one of those nightmares where you're being chased down an endless long
corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
terror...


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#63301

FromChris Angelico <rosuav@gmail.com>
Date2014-01-07 03:54 +1100
Message-ID<mailman.5032.1389027302.18130.python-list@python.org>
In reply to#63297
On Tue, Jan 7, 2014 at 3:43 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>> For me this is simply a major annoyance, but I
>> only have a handful of places where I have to deal with this.  Dealing
>> with protocols where bytes is the norm and embedded ascii is prevalent --
>> well, I can easily imagine the nightmare.
>
> Is it one of those nightmares where you're being chased down an endless long
> corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
> terror...

Uhh, I think you're the only one here who has that nightmare, like
Chris Knight with his sun-god robes and naked women throwing pickles
at him.

ChrisA

[toc] | [prev] | [next] | [standalone]


#63305

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-01-06 17:07 +0000
Message-ID<mailman.5034.1389028055.18130.python-list@python.org>
In reply to#63297
On 06/01/2014 16:43, Steven D'Aprano wrote:
> Ethan Furman wrote:
>
>> For me this is simply a major annoyance, but I
>> only have a handful of places where I have to deal with this.  Dealing
>> with protocols where bytes is the norm and embedded ascii is prevalent --
>> well, I can easily imagine the nightmare.
>
> Is it one of those nightmares where you're being chased down an endless long
> corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
> terror...
>

Great minds think alike? :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#63382

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2014-01-06 19:23 -0500
Message-ID<mailman.5099.1389054215.18130.python-list@python.org>
In reply to#63297
On Tue, 7 Jan 2014 03:54:53 +1100, Chris Angelico <rosuav@gmail.com>
declaimed the following:

>On Tue, Jan 7, 2014 at 3:43 AM, Steven D'Aprano
><steve+comp.lang.python@pearwood.info> wrote:
>>> For me this is simply a major annoyance, but I
>>> only have a handful of places where I have to deal with this.  Dealing
>>> with protocols where bytes is the norm and embedded ascii is prevalent --
>>> well, I can easily imagine the nightmare.
>>
>> Is it one of those nightmares where you're being chased down an endless long
>> corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
>> terror...
>
	The kitten's father is Kzin?

>Uhh, I think you're the only one here who has that nightmare, like
>Chris Knight with his sun-god robes and naked women throwing pickles
>at him.
>

	Will somebody please wash out my brain... "Pickles straight from the
jar, or somewhat 'used'?"
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#63391

FromChris Angelico <rosuav@gmail.com>
Date2014-01-07 12:05 +1100
Message-ID<mailman.5103.1389056713.18130.python-list@python.org>
In reply to#63297
On Tue, Jan 7, 2014 at 11:23 AM, Dennis Lee Bieber
<wlfraed@ix.netcom.com> wrote:
>>Uhh, I think you're the only one here who has that nightmare, like
>>Chris Knight with his sun-god robes and naked women throwing pickles
>>at him.
>>
>
>         Will somebody please wash out my brain... "Pickles straight from the
> jar, or somewhat 'used'?"

I was making a reference to the movie "Real Genius", which involves
lasers, popcorn, and geeks. And it's been explored by Mythbusters. If
you haven't seen it, do!

ChrisA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web