Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #63261 > unrolled thread
| Started by | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| First post | 2014-01-05 18:23 -0800 |
| Last post | 2014-01-07 12:05 +1100 |
| Articles | 6 — 5 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: "More About Unicode in Python 2 and 3" Ethan Furman <ethan@stoneleaf.us> - 2014-01-05 18:23 -0800
Re: "More About Unicode in Python 2 and 3" Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 03:43 +1100
Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 03:54 +1100
Re: "More About Unicode in Python 2 and 3" Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-06 17:07 +0000
Re: "More About Unicode in Python 2 and 3" Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-01-06 19:23 -0500
Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 12:05 +1100
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2014-01-05 18:23 -0800 |
| Subject | Re: "More About Unicode in Python 2 and 3" |
| Message-ID | <mailman.5000.1388976376.18130.python-list@python.org> |
On 01/05/2014 05:48 PM, Chris Angelico wrote: > On Mon, Jan 6, 2014 at 12:16 PM, Ned Batchelder <ned@nedbatchelder.com> wrote: >> So now we have two revered developers vocally having trouble with Python 3. >> You can dismiss their concerns as niche because it's only network >> programming, but that would be a mistake. > > IMO, network programming (at least on the internet) is even more Py3's > domain (pun not intended). The issue is not how to handle text, the issue is how to handle ascii when it's in a bytes object. Using my own project [1] as a reference: good ol' dbf files -- character fields, numeric fields, logic fields, time fields, and of course the metadata that describes these fields and the dbf as a whole. The character fields I turn into unicode, no sweat. The metadata fields are simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did the job just fine. In Py3 that compares an int (67) to the unicode letter 'C' and returns False. For me this is simply a major annoyance, but I only have a handful of places where I have to deal with this. Dealing with protocols where bytes is the norm and embedded ascii is prevalent -- well, I can easily imagine the nightmare. The most unfortunate aspect is that even if we did "fix" it in 3.5, it wouldn't help any body who has to support multiple versions... unless, of course, a backport could also be made. -- ~Ethan~
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-01-07 03:43 +1100 |
| Message-ID | <52cadd49$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #63261 |
Ethan Furman wrote:
> Using my own project [1] as a reference: good ol' dbf files -- character
> fields, numeric fields, logic fields, time fields, and of course the
> metadata that describes these fields and the dbf as a whole. The
> character fields I turn into unicode, no sweat. The metadata fields are
> simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did
> the job just fine. In Py3 that compares an int (67) to the unicode letter
> 'C' and returns False.
Why haven't you converted the headers to text too? You're using them as if
they were text. They might happen to merely contain the small subset of
Unicode which matches the ASCII encoding, but that in itself is no good
reason to keep it as bytes. If you want to work with stuff as if it were
text, convert it to text.
If you do have a good reason for keeping them as bytes, say because you need
to do a bunch of bitwise operations on it, it's not that hard to do the job
correctly: instead of defining FIELD_TYPE as 3 (for example), define it as
slice(3,4). Then:
if header[FIELD_TYPE] == b'C':
will work. For sure, this is a bit of a nuisance, and slightly error-prone,
since Python won't complain if you forget the b prefix, it will silently
return False. Which is the right thing to do, inconvenient though it may be
in this case. But it is workable, with a bit of discipline.
Or define a helper, and use that:
def eq(byte, char):
return byte == ord(char)
if eq(header[FIELD_TYPE], 'C'):
Worried about the cost of all those function calls, all those ord()'s? I'll
give you the benefit of the doubt and assume that this is not premature
optimisation. So do it yourself:
C = ord('C') # Convert it once.
if header[FIELD_TYPE] == C: # And use it many times.
[Note to self: when I'm BDFL, encourage much more compile-time
optimisations.]
> For me this is simply a major annoyance, but I
> only have a handful of places where I have to deal with this. Dealing
> with protocols where bytes is the norm and embedded ascii is prevalent --
> well, I can easily imagine the nightmare.
Is it one of those nightmares where you're being chased down an endless long
corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
terror...
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-07 03:54 +1100 |
| Message-ID | <mailman.5032.1389027302.18130.python-list@python.org> |
| In reply to | #63297 |
On Tue, Jan 7, 2014 at 3:43 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: >> For me this is simply a major annoyance, but I >> only have a handful of places where I have to deal with this. Dealing >> with protocols where bytes is the norm and embedded ascii is prevalent -- >> well, I can easily imagine the nightmare. > > Is it one of those nightmares where you're being chased down an endless long > corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the > terror... Uhh, I think you're the only one here who has that nightmare, like Chris Knight with his sun-god robes and naked women throwing pickles at him. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-01-06 17:07 +0000 |
| Message-ID | <mailman.5034.1389028055.18130.python-list@python.org> |
| In reply to | #63297 |
On 06/01/2014 16:43, Steven D'Aprano wrote: > Ethan Furman wrote: > >> For me this is simply a major annoyance, but I >> only have a handful of places where I have to deal with this. Dealing >> with protocols where bytes is the norm and embedded ascii is prevalent -- >> well, I can easily imagine the nightmare. > > Is it one of those nightmares where you're being chased down an endless long > corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the > terror... > Great minds think alike? :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2014-01-06 19:23 -0500 |
| Message-ID | <mailman.5099.1389054215.18130.python-list@python.org> |
| In reply to | #63297 |
On Tue, 7 Jan 2014 03:54:53 +1100, Chris Angelico <rosuav@gmail.com>
declaimed the following:
>On Tue, Jan 7, 2014 at 3:43 AM, Steven D'Aprano
><steve+comp.lang.python@pearwood.info> wrote:
>>> For me this is simply a major annoyance, but I
>>> only have a handful of places where I have to deal with this. Dealing
>>> with protocols where bytes is the norm and embedded ascii is prevalent --
>>> well, I can easily imagine the nightmare.
>>
>> Is it one of those nightmares where you're being chased down an endless long
>> corridor by a small kitten wanting hugs? 'Cos so far I'm not seeing the
>> terror...
>
The kitten's father is Kzin?
>Uhh, I think you're the only one here who has that nightmare, like
>Chris Knight with his sun-god robes and naked women throwing pickles
>at him.
>
Will somebody please wash out my brain... "Pickles straight from the
jar, or somewhat 'used'?"
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-07 12:05 +1100 |
| Message-ID | <mailman.5103.1389056713.18130.python-list@python.org> |
| In reply to | #63297 |
On Tue, Jan 7, 2014 at 11:23 AM, Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote: >>Uhh, I think you're the only one here who has that nightmare, like >>Chris Knight with his sun-god robes and naked women throwing pickles >>at him. >> > > Will somebody please wash out my brain... "Pickles straight from the > jar, or somewhat 'used'?" I was making a reference to the movie "Real Genius", which involves lasers, popcorn, and geeks. And it's been explored by Mythbusters. If you haven't seen it, do! ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web