Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #63492
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2014-01-08 08:08 -0800 |
| References | <52cbe15a$0$29993$c3e8da3$5496439d@news.astraweb.com> <mailman.5135.1389107956.18130.python-list@python.org> <52cc278c$0$29979$c3e8da3$5496439d@news.astraweb.com> <lahll7$f9c$1@ger.gmane.org> <mailman.5162.1389179166.18130.python-list@python.org> |
| Message-ID | <7d2d5d85-afa2-474d-8739-c33745b7c00b@googlegroups.com> (permalink) |
| Subject | Re: Bytes indexing returns an int |
| From | wxjmfauth@gmail.com |
Le mercredi 8 janvier 2014 12:05:49 UTC+1, Robin Becker a écrit :
> On 07/01/2014 19:48, Serhiy Storchaka wrote:
>
> ........
>
> > data[0] == b'\xE1'[0] works as expected in both Python 2.7 and 3.x.
>
> >
>
> >
>
> I have been porting a lot of python 2 only code to a python2.7 + 3.3 version for
>
> a few months now. Bytes indexing was a particular problem. PDF uses quite a lot
>
> of single byte indicators so code like
>
>
>
> if text[k] == 'R':
>
> .....
>
>
>
> or
>
>
>
> dispatch_dict.get(text[k],error)()
>
>
>
> is much harder to make compatible because of this issue. I think this change was
>
> a mistake.
>
>
>
> To get round this I have tried the following class to resurrect the old style
>
> behaviour
>
>
>
> if isPy3:
>
> class RLBytes(bytes):
>
> '''simply ensures that B[x] returns a bytes type object and not an int'''
>
> def __getitem__(self,x):
>
> if isinstance(x,int):
>
> return RLBytes([bytes.__getitem__(self,x)])
>
> else:
>
> return RLBytes(bytes.__getitem__(self,x))
>
>
>
> I'm not sure if that covers all possible cases, but it works for my dispatching
>
> cases. Unfortunately you can't do simple class assignment to change the
>
> behaviour so you have to copy the text.
>
>
>
> I find a lot of the "so glad we got rid of byte strings" fervour a bit silly.
>
> Bytes, chars, words etc etc were around long before unicode. Byte strings could
>
> already represent unicode in efficient ways that happened to be useful for
>
> western languages. Having two string types is inconvenient and error prone,
>
> swapping their labels and making subtle changes is a real pain.
>
> --
--
Byte strings (encoded code points) or native unicode is one
thing.
But on the other side, the problem is elsewhere. These very
talented ascii narrow minded, unicode illiterate devs only
succeded to produce this (I, really, do not wish to be rude).
>>> import unicodedata
>>> unicodedata.name('ǟ')
'LATIN SMALL LETTER A WITH DIAERESIS AND MACRON'
>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ǟ')
40
>>> timeit.timeit("unicodedata.normalize('NFKD', 'ǟ')", "import unicodedata")
0.8040018888575129
>>> timeit.timeit("unicodedata.normalize('NFKD', 'zzz')", "import unicodedata")
0.3073749330963995
>>> timeit.timeit("unicodedata.normalize('NFKD', 'z')", "import unicodedata")
0.2874013282653962
>>>
>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'zzz'))", "import unicodedata")
0.3803570633857589
>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'ǟ'))", "import unicodedata")
0.9359970320201683
pdf, typography, linguistic, scripts, ... in mind, in other word the real
*unicode* world.
jmf
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 22:13 +1100
Re: Bytes indexing returns an int Ervin Hegedüs <airween@gmail.com> - 2014-01-07 12:53 +0100
Re: Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 23:04 +1100
Re: Bytes indexing returns an int Terry Reedy <tjreedy@udel.edu> - 2014-01-07 09:29 -0500
Re: Bytes indexing returns an int David Robinow <drobinow@gmail.com> - 2014-01-07 10:19 -0500
Re: Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-08 03:12 +1100
Re: Bytes indexing returns an int Serhiy Storchaka <storchaka@gmail.com> - 2014-01-07 21:48 +0200
Re: Bytes indexing returns an int Robin Becker <robin@reportlab.com> - 2014-01-08 11:05 +0000
Re: Bytes indexing returns an int wxjmfauth@gmail.com - 2014-01-08 08:08 -0800
Re: Bytes indexing returns an int Ned Batchelder <ned@nedbatchelder.com> - 2014-01-08 12:19 -0500
Re: Bytes indexing returns an int Piet van Oostrum <piet@vanoostrum.org> - 2014-01-09 18:05 +0100
Re: Bytes indexing returns an int Ethan Furman <ethan@stoneleaf.us> - 2014-01-09 09:28 -0800
Re: Bytes indexing returns an int Serhiy Storchaka <storchaka@gmail.com> - 2014-01-09 21:36 +0200
Re: Bytes indexing returns an int Michael Torrie <torriem@gmail.com> - 2014-01-08 10:25 -0700
Re: Bytes indexing returns an int David Robinow <drobinow@gmail.com> - 2014-01-07 10:23 -0500
Re: Bytes indexing returns an int Ethan Furman <ethan@stoneleaf.us> - 2014-01-07 09:02 -0800
Re: Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-08 11:15 +1100
Re: Bytes indexing returns an int Chris Angelico <rosuav@gmail.com> - 2014-01-08 11:30 +1100
Re: Bytes indexing returns an int Grant Edwards <invalid@invalid.invalid> - 2014-01-08 02:34 +0000
Re: Bytes indexing returns an int Chris Angelico <rosuav@gmail.com> - 2014-01-08 14:46 +1100
Re: Bytes indexing returns an int Ethan Furman <ethan@stoneleaf.us> - 2014-01-07 16:37 -0800
csiph-web