Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #22266 > unrolled thread

"convert" string to bytes without changing data (encoding)

Started byPeter Daum <gator@cs.tu-berlin.de>
First post2012-03-28 10:56 +0200
Last post2012-03-28 13:16 -0400
Articles 20 on this page of 57 — 22 participants

Back to article view | Back to comp.lang.python


Contents

  "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 10:56 +0200
    Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-28 20:02 +1100
      Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 11:43 +0200
        Re: "convert" string to bytes without changing data (encoding) Heiko Wundram <modelnine@modelnine.org> - 2012-03-28 12:42 +0200
          Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 19:43 +0200
            Re: "convert" string to bytes without changing data (encoding) Heiko Wundram <modelnine@modelnine.org> - 2012-03-28 20:13 +0200
            Re: "convert" string to bytes without changing data (encoding) Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2012-03-28 21:13 +0300
              RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 18:31 +0000
              Re: "convert" string to bytes without changing data (encoding) Ethan Furman <ethan@stoneleaf.us> - 2012-03-28 11:49 -0700
            RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 18:20 +0000
            Re: "convert" string to bytes without changing data (encoding) Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-28 12:20 -0600
            Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 18:26 +0000
              Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 19:40 +0000
            Re: "convert" string to bytes without changing data (encoding) Ethan Furman <ethan@stoneleaf.us> - 2012-03-28 11:17 -0700
            Re: "convert" string to bytes without changing data (encoding) John Nagle <nagle@animats.com> - 2012-03-28 12:30 -0700
            Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-28 17:37 -0400
              Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-29 16:57 +0200
              Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-29 16:57 +0200
            Re: "convert" string to bytes without changing data (encoding) Serhiy Storchaka <storchaka@gmail.com> - 2012-03-30 22:06 +0300
            Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-31 06:10 +1100
        Re: "convert" string to bytes without changing data (encoding) Stefan Behnel <stefan_ml@behnel.de> - 2012-03-28 13:25 +0200
        Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 18:12 +0000
      Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 11:36 -0400
        Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-29 03:18 +1100
          Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 16:33 +0000
          Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 14:05 -0400
            Re: "convert" string to bytes without changing data (encoding) Tim Chase <python.list@tim.thechases.com> - 2012-03-28 13:49 -0500
              Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 15:10 -0400
            Re: "convert" string to bytes without changing data (encoding) "Albert W. Hopkins" <marduk@letterboxes.org> - 2012-03-28 15:22 -0400
        Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 17:54 +0000
          Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 14:22 -0400
            Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-28 14:20 -0500
              Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 15:43 -0400
                Re: "convert" string to bytes without changing data (encoding) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-03-28 21:44 +0100
                Re: "convert" string to bytes without changing data (encoding) Neil Cerutti <neilc@norwich.edu> - 2012-03-28 20:56 +0000
                Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-29 00:02 +0000
                Re: Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-28 19:11 -0500
                  Re: Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 23:04 -0400
                    Re: Re: Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-29 14:31 +1100
                      Re: Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 23:58 -0400
                        Re: "convert" string to bytes without changing data (encoding) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-03-29 07:01 +0100
                        Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-29 06:51 +0000
                          Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-29 11:30 -0400
                            Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-29 12:49 -0400
                              Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-29 14:00 -0400
                                Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-30 07:41 +1100
                            Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-30 01:16 +0000
                    Re: Re: Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-29 11:31 -0500
            RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 19:02 +0000
              Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 19:44 +0000
            Re: "convert" string to bytes without changing data (encoding) MRAB <python@mrabarnett.plus.com> - 2012-03-28 20:50 +0100
            RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-29 17:36 +0000
              Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-30 01:10 +0000
                Re: "convert" string to bytes without changing data (encoding) Michael Ströder <michael@stroeder.com> - 2012-03-30 09:04 +0200
        Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-28 14:11 -0400
    Re: "convert" string to bytes without changing data (encoding) Stefan Behnel <stefan_ml@behnel.de> - 2012-03-28 11:08 +0200
    Re: "convert" string to bytes without changing data (encoding) Dave Angel <d@davea.name> - 2012-03-28 13:16 -0400

Page 2 of 3 — ← Prev page 1 [2] 3  Next page →


#22273

FromStefan Behnel <stefan_ml@behnel.de>
Date2012-03-28 13:25 +0200
Message-ID<mailman.1070.1332933946.3037.python-list@python.org>
In reply to#22270
Peter Daum, 28.03.2012 11:43:
> What I am looking for is a general way to just copy the raw data
> from a "string" object to a "byte" object without any attempt to
> "decode" or "encode" anything ...

That's why I asked about your use case - where does the data come from and
why is it contained in a character string in the first place? If you could
provide that information, we can help you further.

Stefan

[toc] | [prev] | [next] | [standalone]


#22291

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-03-28 18:12 +0000
Message-ID<4f7354a9$0$29981$c3e8da3$5496439d@news.astraweb.com>
In reply to#22270
On Wed, 28 Mar 2012 11:43:52 +0200, Peter Daum wrote:

> ... in my example, the variable s points to a "string", i.e. a series of
> bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters.

No. Strings are not sequences of bytes (except in the trivial sense that 
everything in computer memory is made of bytes). They are sequences of 
CODE POINTS. (Roughly speaking, code points are *almost* but not quite 
the same as characters.)

I suggest that you need to reset your understanding of strings and bytes. 
I suggest you start by reading this:

http://www.joelonsoftware.com/articles/Unicode.html

Then come back and try to explain what actual problem you are trying to 
solve.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#22280

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 11:36 -0400
Message-ID<jkvb5a$88d$1@rumours.uwaterloo.ca>
In reply to#22267
Chris Angelico  <rosuav@gmail.com> wrote:
>What is a string? It's not a series of bytes.

Of course it is.  Conceptually you're not supposed to think of it that
way, but a string is stored in memory as a series of bytes.

What he's asking for many not be very useful or practical, but if that's
your problem here than then that's what you should be addressing, not
pretending that it's fundamentally impossible.

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


#22283

FromChris Angelico <rosuav@gmail.com>
Date2012-03-29 03:18 +1100
Message-ID<mailman.1081.1332951527.3037.python-list@python.org>
In reply to#22280
On Thu, Mar 29, 2012 at 2:36 AM, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote:
> Chris Angelico  <rosuav@gmail.com> wrote:
>>What is a string? It's not a series of bytes.
>
> Of course it is.  Conceptually you're not supposed to think of it that
> way, but a string is stored in memory as a series of bytes.

Note that distinction. I said that a string "is not" a series of
bytes; you say that it "is stored" as bytes.

> What he's asking for many not be very useful or practical, but if that's
> your problem here than then that's what you should be addressing, not
> pretending that it's fundamentally impossible.

That's equivalent to taking a 64-bit integer and trying to treat it as
a 64-bit floating point number. They're all just bits in memory, and
in C it's quite easy to cast a pointer to a different type and
dereference it. But a Python Unicode string might be stored in several
ways; for all you know, it might actually be stored as a sequence of
apples in a refrigerator, just as long as they can be referenced
correctly. There's no logical Python way to turn that into a series of
bytes.

ChrisA

[toc] | [prev] | [next] | [standalone]


#22285

FromGrant Edwards <invalid@invalid.invalid>
Date2012-03-28 16:33 +0000
Message-ID<jkveg9$85p$1@reader1.panix.com>
In reply to#22283
On 2012-03-28, Chris Angelico <rosuav@gmail.com> wrote:

> for all you know, it might actually be stored as a sequence of
> apples in a refrigerator

[...]

> There's no logical Python way to turn that into a series of bytes.

There's got to be a joke there somewhere about how to eat an apple...

-- 
Grant Edwards               grant.b.edwards        Yow! Somewhere in DOWNTOWN
                                  at               BURBANK a prostitute is
                              gmail.com            OVERCOOKING a LAMB CHOP!!

[toc] | [prev] | [next] | [standalone]


#22290

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 14:05 -0400
Message-ID<jkvjsn$rgh$1@rumours.uwaterloo.ca>
In reply to#22283
Ross Ridge <rridge@csclub.uwaterloo.ca> wr=
> Of course it is. =A0Conceptually you're not supposed to think of it that
> way, but a string is stored in memory as a series of bytes.

Chris Angelico  <rosuav@gmail.com> wrote:
>Note that distinction. I said that a string "is not" a series of
>bytes; you say that it "is stored" as bytes.

The distinction is meaningless.  I'm not going argue with you about what
you or I ment by the word "is".

>But a Python Unicode string might be stored in several
>ways; for all you know, it might actually be stored as a sequence of
>apples in a refrigerator, just as long as they can be referenced
>correctly.

But it is in fact only stored in one particular way, as a series of bytes.

>There's no logical Python way to turn that into a series of bytes.

Nonsense.  Play all the semantic games you want, it already is a series
of bytes.

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


#22302

FromTim Chase <python.list@tim.thechases.com>
Date2012-03-28 13:49 -0500
Message-ID<mailman.1091.1332960542.3037.python-list@python.org>
In reply to#22290
On 03/28/12 13:05, Ross Ridge wrote:
> Ross Ridge<rridge@csclub.uwaterloo.ca>  wr=
>> But a Python Unicode string might be stored in several
>> ways; for all you know, it might actually be stored as a sequence of
>> apples in a refrigerator, just as long as they can be referenced
>> correctly.
>
> But it is in fact only stored in one particular way, as a series of bytes.
>
>> There's no logical Python way to turn that into a series of bytes.
>
> Nonsense.  Play all the semantic games you want, it already is a series
> of bytes.

Internally, they're a series of bytes, but they are MEANINGLESS 
bytes unless you know how they are encoded internally.  Those 
bytes could be UTF-8, UTF-16, UTF-32, or any of a number of other 
possible encodings[1].  If you get the internal byte stream, 
there's no way to meaningfully operate on it unless you also know 
how it's encoded (or you're willing to sacrifice the ability to 
reliably get the string back).

-tkc

[1]
http://docs.python.org/library/codecs.html#standard-encodings



[toc] | [prev] | [next] | [standalone]


#22303

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 15:10 -0400
Message-ID<jkvnmv$50i$1@rumours.uwaterloo.ca>
In reply to#22302
Tim Chase  <python.list@tim.thechases.com> wrote:
>Internally, they're a series of bytes, but they are MEANINGLESS 
>bytes unless you know how they are encoded internally.  Those 
>bytes could be UTF-8, UTF-16, UTF-32, or any of a number of other 
>possible encodings[1].  If you get the internal byte stream, 
>there's no way to meaningfully operate on it unless you also know 
>how it's encoded (or you're willing to sacrifice the ability to 
>reliably get the string back).

In practice the number of ways that CPython (the only Python 3
implementation) represents strings is much more limited.  Pretending
otherwise really isn't helpful.

Still, if Chris Angelico had used your much less misleading explaination,
then this could've been resolved much quicker.  The original poster
didn't buy Chris's bullshit for a minute, instead he had to find out on
his own that that the internal representation of strings wasn't what he
expected to be.

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


#22305

From"Albert W. Hopkins" <marduk@letterboxes.org>
Date2012-03-28 15:22 -0400
Message-ID<mailman.1093.1332962567.3037.python-list@python.org>
In reply to#22290
On Wed, 2012-03-28 at 14:05 -0400, Ross Ridge wrote:
> Ross Ridge <rridge@csclub.uwaterloo.ca> wr=
> > Of course it is. =A0Conceptually you're not supposed to think of it that
> > way, but a string is stored in memory as a series of bytes.
> 
> Chris Angelico  <rosuav@gmail.com> wrote:
> >Note that distinction. I said that a string "is not" a series of
> >bytes; you say that it "is stored" as bytes.
> 
> The distinction is meaningless.  I'm not going argue with you about what
> you or I ment by the word "is".
> 

Off topic, but obligatory:

https://www.youtube.com/watch?v=j4XT-l-_3y0

[toc] | [prev] | [next] | [standalone]


#22288

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-03-28 17:54 +0000
Message-ID<4f73504c$0$29981$c3e8da3$5496439d@news.astraweb.com>
In reply to#22280
On Wed, 28 Mar 2012 11:36:10 -0400, Ross Ridge wrote:

> Chris Angelico  <rosuav@gmail.com> wrote:
>>What is a string? It's not a series of bytes.
> 
> Of course it is.  Conceptually you're not supposed to think of it that
> way, but a string is stored in memory as a series of bytes.

You don't know that. They might be stored as a tree, or a rope, or some 
even more complex data structure. In fact, in Python, they are stored as 
an object.

But even if they were stored as a simple series of bytes, you don't know 
what bytes they are. That is an implementation detail of the particular 
Python build being used, and since Python doesn't give direct access to 
memory (at least not in pure Python) there's no way to retrieve those 
bytes using Python code.

Saying that strings are stored in memory as bytes is no more sensible 
than saying that dicts are stored in memory as bytes. Yes, they are. So 
what? Taken out of context in a running Python interpreter, those bytes 
are pretty much meaningless.


> What he's asking for many not be very useful or practical, but if that's
> your problem here than then that's what you should be addressing, not
> pretending that it's fundamentally impossible.

The right way to convert bytes to strings, and vice versa, is via 
encoding and decoding operations. What the OP is asking for is as silly 
as somebody asking to turn a float 1.3792 into a string without calling 
str() or any equivalent float->string conversion. They're both made up of 
bytes, right? Yeah, they are. So what?

Even if you do a hex dump of float 1.3792, the result will NOT be the 
string "1.3792". And likewise, even if you somehow did a hex dump of the 
memory representation of a string, the result will NOT be the equivalent 
sequence of bytes except *maybe* for some small subset of possible 
strings.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#22299

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 14:22 -0400
Message-ID<jkvktq$u16$1@rumours.uwaterloo.ca>
In reply to#22288
Steven D'Aprano  <steve+comp.lang.python@pearwood.info> wrote:
>The right way to convert bytes to strings, and vice versa, is via 
>encoding and decoding operations.

If you want to dictate to the original poster the correct way to do
things then you don't need to do anything more that.  You don't need to
pretend like Chris Angelico that there's isn't a direct mapping from
the his Python 3 implementation's internal respresentation of strings
to bytes in order to label what he's asking for as being "silly".

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


#22304

FromEvan Driscoll <driscoll@cs.wisc.edu>
Date2012-03-28 14:20 -0500
Message-ID<mailman.1092.1332962455.3037.python-list@python.org>
In reply to#22299
On 01/-10/-28163 01:59 PM, Ross Ridge wrote:
> Steven D'Aprano<steve+comp.lang.python@pearwood.info>  wrote:
>> The right way to convert bytes to strings, and vice versa, is via
>> encoding and decoding operations.
>
> If you want to dictate to the original poster the correct way to do
> things then you don't need to do anything more that.  You don't need to
> pretend like Chris Angelico that there's isn't a direct mapping from
> the his Python 3 implementation's internal respresentation of strings
> to bytes in order to label what he's asking for as being "silly".

That mapping may as well be:

   def get_bytes(some_string):
       import random
       length = random.randint(len(some_string), 5*len(some_string))
       bytes = [0] * length
       for i in xrange(length):
           bytes[i] = random.randint(0, 255)
       return bytes

Of course this is hyperbole, but it's essentially about as much 
guarantee as to what the result is.

As many others have said, the encoding isn't defined, and I would guess 
varies between implementations. (E.g. if Jython and IronPython use their 
host platforms' native strings, both have 16-bit chars and thus probably 
use UTF-16 encoding. I am not sure what CPython uses, but I bet it's 
*not* that.)

It's even guaranteed that the byte representation won't change! If 
something is lazily evaluated or you have a COW string or something, the 
bytes backing it will differ.


So yes, you can say that pretending there's not a mapping of strings to 
internal representation is silly, because there is. However, there's 
nothing you can say about that mapping.

Evan

[toc] | [prev] | [next] | [standalone]


#22312

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 15:43 -0400
Message-ID<jkvpl3$8tg$1@rumours.uwaterloo.ca>
In reply to#22304
Evan Driscoll  <driscoll@cs.wisc.edu> wrote:
>So yes, you can say that pretending there's not a mapping of strings to 
>internal representation is silly, because there is. However, there's 
>nothing you can say about that mapping.

I'm not the one labeling anything as being silly.  I'm the one labeling
the things as bullshit, and that's what you're doing here.  I can in
fact say what the internal byte string representation of strings is any
given build of Python 3.  Just because I can't say what it would be in
an imaginary hypothetical implementation doesn't mean I can never say
anything about it.

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


#22313

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2012-03-28 21:44 +0100
Message-ID<mailman.1097.1332967498.3037.python-list@python.org>
In reply to#22312
On 28/03/2012 20:43, Ross Ridge wrote:
> Evan Driscoll<driscoll@cs.wisc.edu>  wrote:
>> So yes, you can say that pretending there's not a mapping of strings to
>> internal representation is silly, because there is. However, there's
>> nothing you can say about that mapping.
>
> I'm not the one labeling anything as being silly.  I'm the one labeling
> the things as bullshit, and that's what you're doing here.  I can in
> fact say what the internal byte string representation of strings is any
> given build of Python 3.  Just because I can't say what it would be in
> an imaginary hypothetical implementation doesn't mean I can never say
> anything about it.
>
> 					Ross Ridge
>

Bytes is bytes and strings is strings
And the wrong one I have chose
Let's go where they keep on wearin'
Those frills and flowers and buttons and bows
Rings and things and buttons and bows.

No guessing the tune.

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]


#22314

FromNeil Cerutti <neilc@norwich.edu>
Date2012-03-28 20:56 +0000
Message-ID<9thc8hFiu9U1@mid.individual.net>
In reply to#22312
On 2012-03-28, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote:
> Evan Driscoll  <driscoll@cs.wisc.edu> wrote:
>> So yes, you can say that pretending there's not a mapping of
>> strings to internal representation is silly, because there is.
>> However, there's nothing you can say about that mapping.
>
> I'm not the one labeling anything as being silly.  I'm the one
> labeling the things as bullshit, and that's what you're doing
> here.  I can in fact say what the internal byte string
> representation of strings is any given build of Python 3.  Just
> because I can't say what it would be in an imaginary
> hypothetical implementation doesn't mean I can never say
> anything about it.

I am in a similar situation viz a viz my wife's undergarments.

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#22317

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-03-29 00:02 +0000
Message-ID<4f73a69c$0$29981$c3e8da3$5496439d@news.astraweb.com>
In reply to#22312
On Wed, 28 Mar 2012 15:43:31 -0400, Ross Ridge wrote:

> I can in
> fact say what the internal byte string representation of strings is any
> given build of Python 3.

Don't keep us in suspense! Given:

Python 3.2.2 (default, Mar  4 2012, 10:50:33)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2

what *is* the internal byte representation of the string "a∫©πz"?

(lowercase a, integral sign, copyright symbol, lowercase Greek pi, 
lowercase z)


And more importantly, given that internal byte representation, what could 
you do with it?


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#22319

FromEvan Driscoll <driscoll@cs.wisc.edu>
Date2012-03-28 19:11 -0500
Message-ID<mailman.1104.1332980161.3037.python-list@python.org>
In reply to#22312
On 3/28/2012 14:43, Ross Ridge wrote:
> Evan Driscoll  <driscoll@cs.wisc.edu> wrote:
>> So yes, you can say that pretending there's not a mapping of strings to 
>> internal representation is silly, because there is. However, there's 
>> nothing you can say about that mapping.
> 
> I'm not the one labeling anything as being silly.  I'm the one labeling
> the things as bullshit, and that's what you're doing here.  I can in
> fact say what the internal byte string representation of strings is any
> given build of Python 3.  Just because I can't say what it would be in
> an imaginary hypothetical implementation doesn't mean I can never say
> anything about it.

People like you -- who write to assumptions which are not even remotely
guaranteed by the spec -- are part of the reason software sucks.

People like you hold back progress, because system implementers aren't
free to make changes without breaking backwards compatibility. Enormous
amounts of effort are expended to test programs and diagnose problems
which are caused by unwarranted assumptions like "the encoding of a
string is UTF-8". In the worst case, assumptions like that lead to
security fixes that don't go as far as they could, like the recent
discussion about hashing.

Python is definitely closer to the "willing to break backwards
compatibility to improve" end of the spectrum than some other projects
(*cough* Windows *cough*), but that still doesn't mean that you can make
assumptions like that.


This email is a bit harsher than it deserves -- but I feel not by much.

Evan

[toc] | [prev] | [next] | [standalone]


#22323

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 23:04 -0400
Message-ID<jl0jf8$303$1@rumours.uwaterloo.ca>
In reply to#22319
Evan Driscoll  <driscoll@cs.wisc.edu> wrote:
>People like you -- who write to assumptions which are not even remotely
>guaranteed by the spec -- are part of the reason software sucks.
...
>This email is a bit harsher than it deserves -- but I feel not by much.

I don't see how you could feel the least bit justified.  Well meaning,
if unhelpful, lies about the nature Python strings in order to try to
convince someone to follow what you think are good programming practices
is one thing.  Maliciously lying about someone else's code that you've
never seen is another thing entirely.

						Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


#22324

FromChris Angelico <rosuav@gmail.com>
Date2012-03-29 14:31 +1100
Message-ID<mailman.1107.1332991922.3037.python-list@python.org>
In reply to#22323
On Thu, Mar 29, 2012 at 2:04 PM, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote:
> Evan Driscoll  <driscoll@cs.wisc.edu> wrote:
>>People like you -- who write to assumptions which are not even remotely
>>guaranteed by the spec -- are part of the reason software sucks.
> ...
>>This email is a bit harsher than it deserves -- but I feel not by much.
>
> I don't see how you could feel the least bit justified.  Well meaning,
> if unhelpful, lies about the nature Python strings in order to try to
> convince someone to follow what you think are good programming practices
> is one thing.  Maliciously lying about someone else's code that you've
> never seen is another thing entirely.

Actually, he is justified. It's one thing to work in C or assembly and
write code that depends on certain bit-pattern representations of data
(although even that causes trouble - assuming that
sizeof(int)==sizeof(int*) isn't good for portability), but in a high
level language, you cannot assume any correlation between objects and
bytes. Any code that depends on implementation details is risky.

ChrisA

[toc] | [prev] | [next] | [standalone]


#22325

FromRoss Ridge <rridge@csclub.uwaterloo.ca>
Date2012-03-28 23:58 -0400
Message-ID<jl0mlt$9q6$1@rumours.uwaterloo.ca>
In reply to#22324
Chris Angelico  <rosuav@gmail.com> wrote:
>Actually, he is justified. It's one thing to work in C or assembly and
>write code that depends on certain bit-pattern representations of data
>(although even that causes trouble - assuming that
>sizeof(int)=3D=3Dsizeof(int*) isn't good for portability), but in a high
>level language, you cannot assume any correlation between objects and
>bytes. Any code that depends on implementation details is risky.

How does that in anyway justify Evan Driscoll maliciously lying about
code he's never seen?

					Ross Ridge

-- 
 l/  //	  Ross Ridge -- The Great HTMU
[oo][oo]  rridge@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //	  

[toc] | [prev] | [next] | [standalone]


Page 2 of 3 — ← Prev page 1 [2] 3  Next page →

Back to top | Article view | comp.lang.python


csiph-web