Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #63420 > unrolled thread

Bytes indexing returns an int

Started bySteven D'Aprano <steve+comp.lang.python@pearwood.info>
First post2014-01-07 22:13 +1100
Last post2014-01-07 16:37 -0800
Articles 20 on this page of 21 — 13 participants

Back to article view | Back to comp.lang.python


Contents

  Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 22:13 +1100
    Re: Bytes indexing returns an int Ervin Hegedüs <airween@gmail.com> - 2014-01-07 12:53 +0100
      Re: Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-07 23:04 +1100
    Re: Bytes indexing returns an int Terry Reedy <tjreedy@udel.edu> - 2014-01-07 09:29 -0500
    Re: Bytes indexing returns an int David Robinow <drobinow@gmail.com> - 2014-01-07 10:19 -0500
      Re: Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-08 03:12 +1100
        Re: Bytes indexing returns an int Serhiy Storchaka <storchaka@gmail.com> - 2014-01-07 21:48 +0200
        Re: Bytes indexing returns an int Robin Becker <robin@reportlab.com> - 2014-01-08 11:05 +0000
          Re: Bytes indexing returns an int wxjmfauth@gmail.com - 2014-01-08 08:08 -0800
            Re: Bytes indexing returns an int Ned Batchelder <ned@nedbatchelder.com> - 2014-01-08 12:19 -0500
              Re: Bytes indexing returns an int Piet van Oostrum <piet@vanoostrum.org> - 2014-01-09 18:05 +0100
                Re: Bytes indexing returns an int Ethan Furman <ethan@stoneleaf.us> - 2014-01-09 09:28 -0800
                Re: Bytes indexing returns an int Serhiy Storchaka <storchaka@gmail.com> - 2014-01-09 21:36 +0200
            Re: Bytes indexing returns an int Michael Torrie <torriem@gmail.com> - 2014-01-08 10:25 -0700
    Re: Bytes indexing returns an int David Robinow <drobinow@gmail.com> - 2014-01-07 10:23 -0500
    Re: Bytes indexing returns an int Ethan Furman <ethan@stoneleaf.us> - 2014-01-07 09:02 -0800
      Re: Bytes indexing returns an int Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-08 11:15 +1100
        Re: Bytes indexing returns an int Chris Angelico <rosuav@gmail.com> - 2014-01-08 11:30 +1100
          Re: Bytes indexing returns an int Grant Edwards <invalid@invalid.invalid> - 2014-01-08 02:34 +0000
            Re: Bytes indexing returns an int Chris Angelico <rosuav@gmail.com> - 2014-01-08 14:46 +1100
        Re: Bytes indexing returns an int Ethan Furman <ethan@stoneleaf.us> - 2014-01-07 16:37 -0800

Page 1 of 2  [1] 2  Next page →


#63420 — Bytes indexing returns an int

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-01-07 22:13 +1100
SubjectBytes indexing returns an int
Message-ID<52cbe15a$0$29993$c3e8da3$5496439d@news.astraweb.com>
Does anyone know what the rationale behind making byte-string indexing
return an int rather than a byte-string of length one?

That is, given b = b'xyz', b[1] returns 121 rather than b'y'.

This is especially surprising when one considers that it's easy to extract
the ordinal value of a byte:

ord(b'y') => 121



-- 
Steven

[toc] | [next] | [standalone]


#63423

FromErvin Hegedüs <airween@gmail.com>
Date2014-01-07 12:53 +0100
Message-ID<mailman.5131.1389095781.18130.python-list@python.org>
In reply to#63420
hi,

On Tue, Jan 07, 2014 at 10:13:29PM +1100, Steven D'Aprano wrote:
> Does anyone know what the rationale behind making byte-string indexing
> return an int rather than a byte-string of length one?
> 
> That is, given b = b'xyz', b[1] returns 121 rather than b'y'.
> 
> This is especially surprising when one considers that it's easy to extract
> the ordinal value of a byte:
> 
> ord(b'y') => 121

Which Python version?

http://docs.python.org/2/reference/lexical_analysis.html#strings
"A prefix of 'b' or 'B' is ignored in Python 2;"

if you want to store the string literal as byte array, you have
to use "bytearray()" function:

>>> a = bytearray('xyz')
>>> a
bytearray(b'xyz')
>>> a[0]
120
>>> a[1]
121


http://docs.python.org/2/library/stdtypes.html
5.6. Sequence Types


hth,


a.

[toc] | [prev] | [next] | [standalone]


#63424

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-01-07 23:04 +1100
Message-ID<52cbed4a$0$29979$c3e8da3$5496439d@news.astraweb.com>
In reply to#63423
Ervin Hegedüs wrote:

> hi,
> 
> On Tue, Jan 07, 2014 at 10:13:29PM +1100, Steven D'Aprano wrote:
>> Does anyone know what the rationale behind making byte-string indexing
>> return an int rather than a byte-string of length one?
>> 
>> That is, given b = b'xyz', b[1] returns 121 rather than b'y'.
>> 
>> This is especially surprising when one considers that it's easy to
>> extract the ordinal value of a byte:
>> 
>> ord(b'y') => 121
> 
> Which Python version?

My apologies... I've been so taken up with various threads on this list
discussing Python 3, I forgot to mention that I'm talking about Python 3.

I understand the behaviour of bytes and bytearray, I'm asking *why* that
specific behaviour was chosen.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#63428

FromTerry Reedy <tjreedy@udel.edu>
Date2014-01-07 09:29 -0500
Message-ID<mailman.5133.1389104999.18130.python-list@python.org>
In reply to#63420
On 1/7/2014 6:13 AM, Steven D'Aprano wrote:
> Does anyone know what the rationale behind making byte-string indexing
> return an int rather than a byte-string of length one?
>
> That is, given b = b'xyz', b[1] returns 121 rather than b'y'.

This former is the normal behavior of sequences, the latter is peculiar 
to strings, because there is no separate character class. A byte is a 
count n, 0 <= n < 256 and bytes and bytearrays are sequences of bytes. 
It was ultimately Guido's decision after some discussion and debate on, 
I believe, the py3k list. I do not remember enough to be any more specific.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#63430

FromDavid Robinow <drobinow@gmail.com>
Date2014-01-07 10:19 -0500
Message-ID<mailman.5135.1389107956.18130.python-list@python.org>
In reply to#63420
"treating bytes as chars" considered harmful?
 I don't know the answer to your question but the behavior seems right to me.
Python 3 grudgingly allows the "abomination" of byte strings (is that
what they're called? I haven't fully embraced Python3 yet). If you
want a substring you use a slice.
   b = b'xyz'
   b[1:2] => b'y'

also, chr(121) => 'y'   which is really what the Python 3 gods prefer.

On Tue, Jan 7, 2014 at 6:13 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Does anyone know what the rationale behind making byte-string indexing
> return an int rather than a byte-string of length one?
>
> That is, given b = b'xyz', b[1] returns 121 rather than b'y'.
>
> This is especially surprising when one considers that it's easy to extract
> the ordinal value of a byte:
>
> ord(b'y') => 121
>
>
>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#63433

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-01-08 03:12 +1100
Message-ID<52cc278c$0$29979$c3e8da3$5496439d@news.astraweb.com>
In reply to#63430
David Robinow wrote:

> "treating bytes as chars" considered harmful?

Who is talking about treating bytes as chars? You're making assumptions that
aren't justified by my question.


>  I don't know the answer to your question but the behavior seems right to
>  me.

This issue was raised in an earlier discussion about *binary data* in Python
3. (The earlier discussion also involved some ASCII-encoded text, but
that's actually irrelevant to the issue.) In Python 2.7, if you have a
chunk of binary data, you can easily do this:

data = b'\xE1\xE2\xE3\xE4'
data[0] == b'\xE1'

and it returns True just as expected. It even works if that binary data
happens to look like ASCII text:

data = b'\xE1a\xE2\xE3\xE4'
data[1] == b'a'

But in Python 3, the same code silently returns False in both cases, because
indexing a bytes object gives an int. So you have to write something like
these, all of which are ugly or inelegant:

data = b'\xE1a\xE2\xE3\xE4'
data[1] == 0x61
data[1] == ord(b'a')
chr(data[1]) == 'a'
data[1:2] == b'a'


I believe that only the last one, the one with the slice, works in both
Python 2.7 and Python 3.x.


> Python 3 grudgingly allows the "abomination" of byte strings (is that
> what they're called? I haven't fully embraced Python3 yet).

They're not abominations. They exist for processing bytes (hence the name)
and other binary data. They are necessary for low-level protocols, for
dealing with email, web, files, and similar. Application code may not need
to deal with bytes, but that is only because the libraries you call do the
hard work for you.

People trying to port these libraries from 2.7 to 3 run into this problem,
and it causes them grief. This little difference between bytes in 2.7 and
bytes in 3.x is a point of friction which makes porting harder, and I'm
trying to understand the reason for it.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#63447

FromSerhiy Storchaka <storchaka@gmail.com>
Date2014-01-07 21:48 +0200
Message-ID<mailman.5149.1389124101.18130.python-list@python.org>
In reply to#63433
07.01.14 18:12, Steven D'Aprano написав(ла):
> In Python 2.7, if you have a
> chunk of binary data, you can easily do this:
>
> data = b'\xE1\xE2\xE3\xE4'
> data[0] == b'\xE1'
>
> and it returns True just as expected.

data[0] == b'\xE1'[0] works as expected in both Python 2.7 and 3.x.

[toc] | [prev] | [next] | [standalone]


#63467

FromRobin Becker <robin@reportlab.com>
Date2014-01-08 11:05 +0000
Message-ID<mailman.5162.1389179166.18130.python-list@python.org>
In reply to#63433
On 07/01/2014 19:48, Serhiy Storchaka wrote:
........
> data[0] == b'\xE1'[0] works as expected in both Python 2.7 and 3.x.
>
>
I have been porting a lot of python 2 only code to a python2.7 + 3.3 version for 
a few months now. Bytes indexing was a particular problem. PDF uses quite a lot 
of single byte indicators so code like

if text[k] == 'R':
    .....

or

dispatch_dict.get(text[k],error)()

is much harder to make compatible because of this issue. I think this change was 
a mistake.

To get round this I have tried the following class to resurrect the old style 
behaviour

if isPy3:
	class RLBytes(bytes):
		'''simply ensures that B[x] returns a bytes type object and not an int'''
		def __getitem__(self,x):
			if isinstance(x,int):
				return RLBytes([bytes.__getitem__(self,x)])
			else:
				return RLBytes(bytes.__getitem__(self,x))

I'm not sure if that covers all possible cases, but it works for my dispatching 
cases. Unfortunately you can't do simple class assignment to change the 
behaviour so you have to copy the text.

I find a lot of the "so glad we got rid of byte strings" fervour a bit silly. 
Bytes, chars,  words etc etc were around long before unicode. Byte strings could 
already represent unicode in efficient ways that happened to be useful for 
western languages. Having two string types is inconvenient and error prone, 
swapping their labels and making subtle changes is a real pain.
-- 
Robin Becker

[toc] | [prev] | [next] | [standalone]


#63492

Fromwxjmfauth@gmail.com
Date2014-01-08 08:08 -0800
Message-ID<7d2d5d85-afa2-474d-8739-c33745b7c00b@googlegroups.com>
In reply to#63467
Le mercredi 8 janvier 2014 12:05:49 UTC+1, Robin Becker a écrit :
> On 07/01/2014 19:48, Serhiy Storchaka wrote:
> 
> ........
> 
> > data[0] == b'\xE1'[0] works as expected in both Python 2.7 and 3.x.
> 
> >
> 
> >
> 
> I have been porting a lot of python 2 only code to a python2.7 + 3.3 version for 
> 
> a few months now. Bytes indexing was a particular problem. PDF uses quite a lot 
> 
> of single byte indicators so code like
> 
> 
> 
> if text[k] == 'R':
> 
>     .....
> 
> 
> 
> or
> 
> 
> 
> dispatch_dict.get(text[k],error)()
> 
> 
> 
> is much harder to make compatible because of this issue. I think this change was 
> 
> a mistake.
> 
> 
> 
> To get round this I have tried the following class to resurrect the old style 
> 
> behaviour
> 
> 
> 
> if isPy3:
> 
> 	class RLBytes(bytes):
> 
> 		'''simply ensures that B[x] returns a bytes type object and not an int'''
> 
> 		def __getitem__(self,x):
> 
> 			if isinstance(x,int):
> 
> 				return RLBytes([bytes.__getitem__(self,x)])
> 
> 			else:
> 
> 				return RLBytes(bytes.__getitem__(self,x))
> 
> 
> 
> I'm not sure if that covers all possible cases, but it works for my dispatching 
> 
> cases. Unfortunately you can't do simple class assignment to change the 
> 
> behaviour so you have to copy the text.
> 
> 
> 
> I find a lot of the "so glad we got rid of byte strings" fervour a bit silly. 
> 
> Bytes, chars,  words etc etc were around long before unicode. Byte strings could 
> 
> already represent unicode in efficient ways that happened to be useful for 
> 
> western languages. Having two string types is inconvenient and error prone, 
> 
> swapping their labels and making subtle changes is a real pain.
> 
> -- 


--

Byte strings (encoded code points) or native unicode is one
thing.

But on the other side, the problem is elsewhere. These very 
talented ascii narrow minded, unicode illiterate devs only
succeded to produce this (I, really, do not wish to be rude).

>>> import unicodedata
>>> unicodedata.name('ǟ')
'LATIN SMALL LETTER A WITH DIAERESIS AND MACRON'
>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ǟ')
40
>>> timeit.timeit("unicodedata.normalize('NFKD', 'ǟ')", "import unicodedata")
0.8040018888575129
>>> timeit.timeit("unicodedata.normalize('NFKD', 'zzz')", "import unicodedata")
0.3073749330963995
>>> timeit.timeit("unicodedata.normalize('NFKD', 'z')", "import unicodedata")
0.2874013282653962
>>> 
>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'zzz'))", "import unicodedata")
0.3803570633857589
>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'ǟ'))", "import unicodedata")
0.9359970320201683

pdf, typography, linguistic, scripts, ... in mind, in other word the real
*unicode* world.

jmf

[toc] | [prev] | [next] | [standalone]


#63500

FromNed Batchelder <ned@nedbatchelder.com>
Date2014-01-08 12:19 -0500
Message-ID<mailman.5185.1389201567.18130.python-list@python.org>
In reply to#63492
On 1/8/14 11:08 AM, wxjmfauth@gmail.com wrote:
> Byte strings (encoded code points) or native unicode is one
> thing.
>
> But on the other side, the problem is elsewhere. These very
> talented ascii narrow minded, unicode illiterate devs only
> succeded to produce this (I, really, do not wish to be rude).

If you don't want to be rude, you are failing.  You've been told a 
number of times that your obscure micro-benchmarks are meaningless.  Now 
you've taken to calling the core devs narrow-minded and Unicode 
illiterate.  They are neither of these things.

Continuing to post these comments with no interest in learning is rude. 
Other recent threads have contained details rebuttals of your views, 
which you have ignored.  This is rude. Please stop.

--Ned.

>
>>>> import unicodedata
>>>> unicodedata.name('ǟ')
> 'LATIN SMALL LETTER A WITH DIAERESIS AND MACRON'
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('ǟ')
> 40
>>>> timeit.timeit("unicodedata.normalize('NFKD', 'ǟ')", "import unicodedata")
> 0.8040018888575129
>>>> timeit.timeit("unicodedata.normalize('NFKD', 'zzz')", "import unicodedata")
> 0.3073749330963995
>>>> timeit.timeit("unicodedata.normalize('NFKD', 'z')", "import unicodedata")
> 0.2874013282653962
>>>>
>>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'zzz'))", "import unicodedata")
> 0.3803570633857589
>>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'ǟ'))", "import unicodedata")
> 0.9359970320201683
>
> pdf, typography, linguistic, scripts, ... in mind, in other word the real
> *unicode* world.
>
> jmf
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]


#63619

FromPiet van Oostrum <piet@vanoostrum.org>
Date2014-01-09 18:05 +0100
Message-ID<m2mwj5uluu.fsf@cochabamba.vanoostrum.org>
In reply to#63500
Ned Batchelder <ned@nedbatchelder.com> writes:

> On 1/8/14 11:08 AM, wxjmfauth@gmail.com wrote:
>> Byte strings (encoded code points) or native unicode is one
>> thing.
>>
>> But on the other side, the problem is elsewhere. These very
>> talented ascii narrow minded, unicode illiterate devs only
>> succeded to produce this (I, really, do not wish to be rude).
>
> If you don't want to be rude, you are failing.  You've been told a
> number of times that your obscure micro-benchmarks are meaningless.  Now
> you've taken to calling the core devs narrow-minded and Unicode
> illiterate.  They are neither of these things.
>
> Continuing to post these comments with no interest in learning is rude.
> Other recent threads have contained details rebuttals of your views,
> which you have ignored.  This is rude. Please stop.

Please ignore jmf's repeated nonsense.
-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

[toc] | [prev] | [next] | [standalone]


#63621

FromEthan Furman <ethan@stoneleaf.us>
Date2014-01-09 09:28 -0800
Message-ID<mailman.5274.1389288493.18130.python-list@python.org>
In reply to#63619
On 01/09/2014 09:05 AM, Piet van Oostrum wrote:
> Ned Batchelder <ned@nedbatchelder.com> writes:
>
>> On 1/8/14 11:08 AM, wxjmfauth@gmail.com wrote:
>>> Byte strings (encoded code points) or native unicode is one
>>> thing.
>>>
>>> But on the other side, the problem is elsewhere. These very
>>> talented ascii narrow minded, unicode illiterate devs only
>>> succeded to produce this (I, really, do not wish to be rude).
>>
>> If you don't want to be rude, you are failing.  You've been told a
>> number of times that your obscure micro-benchmarks are meaningless.  Now
>> you've taken to calling the core devs narrow-minded and Unicode
>> illiterate.  They are neither of these things.
>>
>> Continuing to post these comments with no interest in learning is rude.
>> Other recent threads have contained details rebuttals of your views,
>> which you have ignored.  This is rude. Please stop.
>
> Please ignore jmf's repeated nonsense.

Or ban him.  His one, minor, contribution has been completely swamped by the rest of his belligerent, unfounded, refuted 
posts.

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#63627

FromSerhiy Storchaka <storchaka@gmail.com>
Date2014-01-09 21:36 +0200
Message-ID<mailman.5280.1389296224.18130.python-list@python.org>
In reply to#63619
09.01.14 19:28, Ethan Furman написав(ла):
> On 01/09/2014 09:05 AM, Piet van Oostrum wrote:
>> Please ignore jmf's repeated nonsense.
>
> Or ban him.  His one, minor, contribution has been completely swamped by
> the rest of his belligerent, unfounded, refuted posts.

Please not. I have a fun from every his appearance.

[toc] | [prev] | [next] | [standalone]


#63501

FromMichael Torrie <torriem@gmail.com>
Date2014-01-08 10:25 -0700
Message-ID<mailman.5186.1389201950.18130.python-list@python.org>
In reply to#63492
On 01/08/2014 09:08 AM, wxjmfauth@gmail.com wrote:
> Byte strings (encoded code points) or native unicode is one
> thing.

Byte strings are not necessarily "encoded code points."  Most byte
streams I work with are definitely not unicode! They are in fact things
such as BER-encoded ASN.1 data structures.  Or PDF data streams.  Or
Gzip data streams.  This issue in this thread has nothing to do with
unicode.

[toc] | [prev] | [next] | [standalone]


#63431

FromDavid Robinow <drobinow@gmail.com>
Date2014-01-07 10:23 -0500
Message-ID<mailman.5136.1389108189.18130.python-list@python.org>
In reply to#63420
Sorry for top-posting. I thought I'd mastered gmail.

[toc] | [prev] | [next] | [standalone]


#63444

FromEthan Furman <ethan@stoneleaf.us>
Date2014-01-07 09:02 -0800
Message-ID<mailman.5146.1389117017.18130.python-list@python.org>
In reply to#63420
On 01/07/2014 07:19 AM, David Robinow wrote:
>
> Python 3 grudgingly allows the "abomination" of byte strings (is that
> what they're called?)

No, that is *not* what they're called.  If you find any place in the Python3 docs that does call them bytestrings please 
submit a bug report.


On 01/07/2014 08:12 AM, Steven D'Aprano wrote:
> People trying to port these libraries from 2.7 to 3 run into this problem,
> and it causes them grief. This little difference between bytes in 2.7 and
> bytes in 3.x is a point of friction which makes porting harder, and I'm
> trying to understand the reason for it.

If I recall correctly the way it was explained to me:

bytes (lists, arrays, etc.) is a container, and when a container is indexed you get whatever the container held.  If you 
slice the container you get a smaller container with the appropriate items.

bytes (and bytearrays) are containers of ints, so indexing returns an int.  One big problem with this whole scenario is 
that bytes then lies about what it contains.  (And I hate lies! [1])

Anyway, I believe that's the rationale behind the change.

--
~Ethan~

[1] http://www.quickmeme.com/meme/3ts325

[toc] | [prev] | [next] | [standalone]


#63455

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-01-08 11:15 +1100
Message-ID<52cc988f$0$29976$c3e8da3$5496439d@news.astraweb.com>
In reply to#63444
Ethan Furman wrote:

> On 01/07/2014 07:19 AM, David Robinow wrote:
>>
>> Python 3 grudgingly allows the "abomination" of byte strings (is that
>> what they're called?)
> 
> No, that is *not* what they're called.  If you find any place in the
> Python3 docs that does call them bytestrings please submit a bug report.

The name of the class is "bytes", but what they represent *is* a string of
bytes, hence "byte-string". It's a standard computer science term for
distinguishing strings of text from strings of bytes.


> On 01/07/2014 08:12 AM, Steven D'Aprano wrote:
>> People trying to port these libraries from 2.7 to 3 run into this
>> problem, and it causes them grief. This little difference between bytes
>> in 2.7 and bytes in 3.x is a point of friction which makes porting
>> harder, and I'm trying to understand the reason for it.
> 
> If I recall correctly the way it was explained to me:
> 
> bytes (lists, arrays, etc.) is a container, and when a container is
> indexed you get whatever the container held.  If you slice the container
> you get a smaller container with the appropriate items.

(There's also a bytearray type, which is best considered as an array. Hence
the name.) Why decide that the bytes type is best considered as a list of
bytes rather than a string of bytes? It doesn't have any list methods, it
looks like a string and people use it as a string. As you have discovered,
it is an inconvenient annoyance that indexing returns an int instead of a
one-byte byte-string.

I think that, in hindsight, this was a major screw-up in Python 3.



> bytes (and bytearrays) are containers of ints, so indexing returns an int.
> One big problem with this whole scenario is
> that bytes then lies about what it contains.  (And I hate lies! [1])
> 
> Anyway, I believe that's the rationale behind the change.
> 
> --
> ~Ethan~
> 
> [1] http://www.quickmeme.com/meme/3ts325

-- 
Steven

[toc] | [prev] | [next] | [standalone]


#63456

FromChris Angelico <rosuav@gmail.com>
Date2014-01-08 11:30 +1100
Message-ID<mailman.5155.1389141054.18130.python-list@python.org>
In reply to#63455
On Wed, Jan 8, 2014 at 11:15 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Why decide that the bytes type is best considered as a list of
> bytes rather than a string of bytes? It doesn't have any list methods, it
> looks like a string and people use it as a string. As you have discovered,
> it is an inconvenient annoyance that indexing returns an int instead of a
> one-byte byte-string.
>
> I think that, in hindsight, this was a major screw-up in Python 3.

Which part was? The fact that it can be represented with a (prefixed)
quoted string?

bytes_value = (41, 42, 43, 44)
string = bytes_value.decode()  # "ABCD"

I think it's more convenient to let people use a notation similar to
what was used in Py2, but perhaps this is an attractive nuisance, if
it gives rise to issues like this. If a bytes were more like a tuple
of ints (not a list - immutability is closer) than it is like a
string, would that be clearer?

Perhaps the solution isn't even a code one, but a mental one. "A bytes
is like a tuple of ints" might be a useful mantra.

ChrisA

[toc] | [prev] | [next] | [standalone]


#63460

FromGrant Edwards <invalid@invalid.invalid>
Date2014-01-08 02:34 +0000
Message-ID<laidfl$do7$1@reader1.panix.com>
In reply to#63456
On 2014-01-08, Chris Angelico <rosuav@gmail.com> wrote:
> On Wed, Jan 8, 2014 at 11:15 AM, Steven D'Aprano
><steve+comp.lang.python@pearwood.info> wrote:
>> Why decide that the bytes type is best considered as a list of
>> bytes rather than a string of bytes? It doesn't have any list methods, it
>> looks like a string and people use it as a string. As you have discovered,
>> it is an inconvenient annoyance that indexing returns an int instead of a
>> one-byte byte-string.
>>
>> I think that, in hindsight, this was a major screw-up in Python 3.
>
> Which part was?

The fact that b'ASDF'[0] in Python2 yeilds something different than it
does in Python3 -- one yields b'A' and the other yields 0x41.  It
makes portable code a lot harder to write.  I don't really have any
preference for one over the other, but changing it for no apparent
reason was a horrible idea.

-- 
Grant

[toc] | [prev] | [next] | [standalone]


#63462

FromChris Angelico <rosuav@gmail.com>
Date2014-01-08 14:46 +1100
Message-ID<mailman.5159.1389152788.18130.python-list@python.org>
In reply to#63460
On Wed, Jan 8, 2014 at 1:34 PM, Grant Edwards <invalid@invalid.invalid> wrote:
> On 2014-01-08, Chris Angelico <rosuav@gmail.com> wrote:
>>> I think that, in hindsight, this was a major screw-up in Python 3.
>>
>> Which part was?
>
> The fact that b'ASDF'[0] in Python2 yeilds something different than it
> does in Python3 -- one yields b'A' and the other yields 0x41.

Fair enough. Either can be justified, changing is awkward.

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web