Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #22266 > unrolled thread
| Started by | Peter Daum <gator@cs.tu-berlin.de> |
|---|---|
| First post | 2012-03-28 10:56 +0200 |
| Last post | 2012-03-28 13:16 -0400 |
| Articles | 20 on this page of 57 — 22 participants |
Back to article view | Back to comp.lang.python
"convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 10:56 +0200
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-28 20:02 +1100
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 11:43 +0200
Re: "convert" string to bytes without changing data (encoding) Heiko Wundram <modelnine@modelnine.org> - 2012-03-28 12:42 +0200
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 19:43 +0200
Re: "convert" string to bytes without changing data (encoding) Heiko Wundram <modelnine@modelnine.org> - 2012-03-28 20:13 +0200
Re: "convert" string to bytes without changing data (encoding) Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2012-03-28 21:13 +0300
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 18:31 +0000
Re: "convert" string to bytes without changing data (encoding) Ethan Furman <ethan@stoneleaf.us> - 2012-03-28 11:49 -0700
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 18:20 +0000
Re: "convert" string to bytes without changing data (encoding) Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-28 12:20 -0600
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 18:26 +0000
Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 19:40 +0000
Re: "convert" string to bytes without changing data (encoding) Ethan Furman <ethan@stoneleaf.us> - 2012-03-28 11:17 -0700
Re: "convert" string to bytes without changing data (encoding) John Nagle <nagle@animats.com> - 2012-03-28 12:30 -0700
Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-28 17:37 -0400
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-29 16:57 +0200
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-29 16:57 +0200
Re: "convert" string to bytes without changing data (encoding) Serhiy Storchaka <storchaka@gmail.com> - 2012-03-30 22:06 +0300
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-31 06:10 +1100
Re: "convert" string to bytes without changing data (encoding) Stefan Behnel <stefan_ml@behnel.de> - 2012-03-28 13:25 +0200
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 18:12 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 11:36 -0400
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-29 03:18 +1100
Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 16:33 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 14:05 -0400
Re: "convert" string to bytes without changing data (encoding) Tim Chase <python.list@tim.thechases.com> - 2012-03-28 13:49 -0500
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 15:10 -0400
Re: "convert" string to bytes without changing data (encoding) "Albert W. Hopkins" <marduk@letterboxes.org> - 2012-03-28 15:22 -0400
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 17:54 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 14:22 -0400
Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-28 14:20 -0500
Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 15:43 -0400
Re: "convert" string to bytes without changing data (encoding) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-03-28 21:44 +0100
Re: "convert" string to bytes without changing data (encoding) Neil Cerutti <neilc@norwich.edu> - 2012-03-28 20:56 +0000
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-29 00:02 +0000
Re: Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-28 19:11 -0500
Re: Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 23:04 -0400
Re: Re: Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-29 14:31 +1100
Re: Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 23:58 -0400
Re: "convert" string to bytes without changing data (encoding) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-03-29 07:01 +0100
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-29 06:51 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-29 11:30 -0400
Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-29 12:49 -0400
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-29 14:00 -0400
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-30 07:41 +1100
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-30 01:16 +0000
Re: Re: Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-29 11:31 -0500
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 19:02 +0000
Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 19:44 +0000
Re: "convert" string to bytes without changing data (encoding) MRAB <python@mrabarnett.plus.com> - 2012-03-28 20:50 +0100
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-29 17:36 +0000
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-30 01:10 +0000
Re: "convert" string to bytes without changing data (encoding) Michael Ströder <michael@stroeder.com> - 2012-03-30 09:04 +0200
Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-28 14:11 -0400
Re: "convert" string to bytes without changing data (encoding) Stefan Behnel <stefan_ml@behnel.de> - 2012-03-28 11:08 +0200
Re: "convert" string to bytes without changing data (encoding) Dave Angel <d@davea.name> - 2012-03-28 13:16 -0400
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2012-03-28 13:25 +0200 |
| Message-ID | <mailman.1070.1332933946.3037.python-list@python.org> |
| In reply to | #22270 |
Peter Daum, 28.03.2012 11:43: > What I am looking for is a general way to just copy the raw data > from a "string" object to a "byte" object without any attempt to > "decode" or "encode" anything ... That's why I asked about your use case - where does the data come from and why is it contained in a character string in the first place? If you could provide that information, we can help you further. Stefan
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-03-28 18:12 +0000 |
| Message-ID | <4f7354a9$0$29981$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #22270 |
On Wed, 28 Mar 2012 11:43:52 +0200, Peter Daum wrote: > ... in my example, the variable s points to a "string", i.e. a series of > bytes, (0x61,0x62 ...) interpreted as ascii/unicode characters. No. Strings are not sequences of bytes (except in the trivial sense that everything in computer memory is made of bytes). They are sequences of CODE POINTS. (Roughly speaking, code points are *almost* but not quite the same as characters.) I suggest that you need to reset your understanding of strings and bytes. I suggest you start by reading this: http://www.joelonsoftware.com/articles/Unicode.html Then come back and try to explain what actual problem you are trying to solve. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 11:36 -0400 |
| Message-ID | <jkvb5a$88d$1@rumours.uwaterloo.ca> |
| In reply to | #22267 |
Chris Angelico <rosuav@gmail.com> wrote: >What is a string? It's not a series of bytes. Of course it is. Conceptually you're not supposed to think of it that way, but a string is stored in memory as a series of bytes. What he's asking for many not be very useful or practical, but if that's your problem here than then that's what you should be addressing, not pretending that it's fundamentally impossible. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-03-29 03:18 +1100 |
| Message-ID | <mailman.1081.1332951527.3037.python-list@python.org> |
| In reply to | #22280 |
On Thu, Mar 29, 2012 at 2:36 AM, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote: > Chris Angelico <rosuav@gmail.com> wrote: >>What is a string? It's not a series of bytes. > > Of course it is. Conceptually you're not supposed to think of it that > way, but a string is stored in memory as a series of bytes. Note that distinction. I said that a string "is not" a series of bytes; you say that it "is stored" as bytes. > What he's asking for many not be very useful or practical, but if that's > your problem here than then that's what you should be addressing, not > pretending that it's fundamentally impossible. That's equivalent to taking a 64-bit integer and trying to treat it as a 64-bit floating point number. They're all just bits in memory, and in C it's quite easy to cast a pointer to a different type and dereference it. But a Python Unicode string might be stored in several ways; for all you know, it might actually be stored as a sequence of apples in a refrigerator, just as long as they can be referenced correctly. There's no logical Python way to turn that into a series of bytes. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2012-03-28 16:33 +0000 |
| Message-ID | <jkveg9$85p$1@reader1.panix.com> |
| In reply to | #22283 |
On 2012-03-28, Chris Angelico <rosuav@gmail.com> wrote:
> for all you know, it might actually be stored as a sequence of
> apples in a refrigerator
[...]
> There's no logical Python way to turn that into a series of bytes.
There's got to be a joke there somewhere about how to eat an apple...
--
Grant Edwards grant.b.edwards Yow! Somewhere in DOWNTOWN
at BURBANK a prostitute is
gmail.com OVERCOOKING a LAMB CHOP!!
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 14:05 -0400 |
| Message-ID | <jkvjsn$rgh$1@rumours.uwaterloo.ca> |
| In reply to | #22283 |
Ross Ridge <rridge@csclub.uwaterloo.ca> wr= > Of course it is. =A0Conceptually you're not supposed to think of it that > way, but a string is stored in memory as a series of bytes. Chris Angelico <rosuav@gmail.com> wrote: >Note that distinction. I said that a string "is not" a series of >bytes; you say that it "is stored" as bytes. The distinction is meaningless. I'm not going argue with you about what you or I ment by the word "is". >But a Python Unicode string might be stored in several >ways; for all you know, it might actually be stored as a sequence of >apples in a refrigerator, just as long as they can be referenced >correctly. But it is in fact only stored in one particular way, as a series of bytes. >There's no logical Python way to turn that into a series of bytes. Nonsense. Play all the semantic games you want, it already is a series of bytes. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2012-03-28 13:49 -0500 |
| Message-ID | <mailman.1091.1332960542.3037.python-list@python.org> |
| In reply to | #22290 |
On 03/28/12 13:05, Ross Ridge wrote: > Ross Ridge<rridge@csclub.uwaterloo.ca> wr= >> But a Python Unicode string might be stored in several >> ways; for all you know, it might actually be stored as a sequence of >> apples in a refrigerator, just as long as they can be referenced >> correctly. > > But it is in fact only stored in one particular way, as a series of bytes. > >> There's no logical Python way to turn that into a series of bytes. > > Nonsense. Play all the semantic games you want, it already is a series > of bytes. Internally, they're a series of bytes, but they are MEANINGLESS bytes unless you know how they are encoded internally. Those bytes could be UTF-8, UTF-16, UTF-32, or any of a number of other possible encodings[1]. If you get the internal byte stream, there's no way to meaningfully operate on it unless you also know how it's encoded (or you're willing to sacrifice the ability to reliably get the string back). -tkc [1] http://docs.python.org/library/codecs.html#standard-encodings
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 15:10 -0400 |
| Message-ID | <jkvnmv$50i$1@rumours.uwaterloo.ca> |
| In reply to | #22302 |
Tim Chase <python.list@tim.thechases.com> wrote: >Internally, they're a series of bytes, but they are MEANINGLESS >bytes unless you know how they are encoded internally. Those >bytes could be UTF-8, UTF-16, UTF-32, or any of a number of other >possible encodings[1]. If you get the internal byte stream, >there's no way to meaningfully operate on it unless you also know >how it's encoded (or you're willing to sacrifice the ability to >reliably get the string back). In practice the number of ways that CPython (the only Python 3 implementation) represents strings is much more limited. Pretending otherwise really isn't helpful. Still, if Chris Angelico had used your much less misleading explaination, then this could've been resolved much quicker. The original poster didn't buy Chris's bullshit for a minute, instead he had to find out on his own that that the internal representation of strings wasn't what he expected to be. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | "Albert W. Hopkins" <marduk@letterboxes.org> |
|---|---|
| Date | 2012-03-28 15:22 -0400 |
| Message-ID | <mailman.1093.1332962567.3037.python-list@python.org> |
| In reply to | #22290 |
On Wed, 2012-03-28 at 14:05 -0400, Ross Ridge wrote: > Ross Ridge <rridge@csclub.uwaterloo.ca> wr= > > Of course it is. =A0Conceptually you're not supposed to think of it that > > way, but a string is stored in memory as a series of bytes. > > Chris Angelico <rosuav@gmail.com> wrote: > >Note that distinction. I said that a string "is not" a series of > >bytes; you say that it "is stored" as bytes. > > The distinction is meaningless. I'm not going argue with you about what > you or I ment by the word "is". > Off topic, but obligatory: https://www.youtube.com/watch?v=j4XT-l-_3y0
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-03-28 17:54 +0000 |
| Message-ID | <4f73504c$0$29981$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #22280 |
On Wed, 28 Mar 2012 11:36:10 -0400, Ross Ridge wrote: > Chris Angelico <rosuav@gmail.com> wrote: >>What is a string? It's not a series of bytes. > > Of course it is. Conceptually you're not supposed to think of it that > way, but a string is stored in memory as a series of bytes. You don't know that. They might be stored as a tree, or a rope, or some even more complex data structure. In fact, in Python, they are stored as an object. But even if they were stored as a simple series of bytes, you don't know what bytes they are. That is an implementation detail of the particular Python build being used, and since Python doesn't give direct access to memory (at least not in pure Python) there's no way to retrieve those bytes using Python code. Saying that strings are stored in memory as bytes is no more sensible than saying that dicts are stored in memory as bytes. Yes, they are. So what? Taken out of context in a running Python interpreter, those bytes are pretty much meaningless. > What he's asking for many not be very useful or practical, but if that's > your problem here than then that's what you should be addressing, not > pretending that it's fundamentally impossible. The right way to convert bytes to strings, and vice versa, is via encoding and decoding operations. What the OP is asking for is as silly as somebody asking to turn a float 1.3792 into a string without calling str() or any equivalent float->string conversion. They're both made up of bytes, right? Yeah, they are. So what? Even if you do a hex dump of float 1.3792, the result will NOT be the string "1.3792". And likewise, even if you somehow did a hex dump of the memory representation of a string, the result will NOT be the equivalent sequence of bytes except *maybe* for some small subset of possible strings. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 14:22 -0400 |
| Message-ID | <jkvktq$u16$1@rumours.uwaterloo.ca> |
| In reply to | #22288 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: >The right way to convert bytes to strings, and vice versa, is via >encoding and decoding operations. If you want to dictate to the original poster the correct way to do things then you don't need to do anything more that. You don't need to pretend like Chris Angelico that there's isn't a direct mapping from the his Python 3 implementation's internal respresentation of strings to bytes in order to label what he's asking for as being "silly". Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Evan Driscoll <driscoll@cs.wisc.edu> |
|---|---|
| Date | 2012-03-28 14:20 -0500 |
| Message-ID | <mailman.1092.1332962455.3037.python-list@python.org> |
| In reply to | #22299 |
On 01/-10/-28163 01:59 PM, Ross Ridge wrote:
> Steven D'Aprano<steve+comp.lang.python@pearwood.info> wrote:
>> The right way to convert bytes to strings, and vice versa, is via
>> encoding and decoding operations.
>
> If you want to dictate to the original poster the correct way to do
> things then you don't need to do anything more that. You don't need to
> pretend like Chris Angelico that there's isn't a direct mapping from
> the his Python 3 implementation's internal respresentation of strings
> to bytes in order to label what he's asking for as being "silly".
That mapping may as well be:
def get_bytes(some_string):
import random
length = random.randint(len(some_string), 5*len(some_string))
bytes = [0] * length
for i in xrange(length):
bytes[i] = random.randint(0, 255)
return bytes
Of course this is hyperbole, but it's essentially about as much
guarantee as to what the result is.
As many others have said, the encoding isn't defined, and I would guess
varies between implementations. (E.g. if Jython and IronPython use their
host platforms' native strings, both have 16-bit chars and thus probably
use UTF-16 encoding. I am not sure what CPython uses, but I bet it's
*not* that.)
It's even guaranteed that the byte representation won't change! If
something is lazily evaluated or you have a COW string or something, the
bytes backing it will differ.
So yes, you can say that pretending there's not a mapping of strings to
internal representation is silly, because there is. However, there's
nothing you can say about that mapping.
Evan
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 15:43 -0400 |
| Message-ID | <jkvpl3$8tg$1@rumours.uwaterloo.ca> |
| In reply to | #22304 |
Evan Driscoll <driscoll@cs.wisc.edu> wrote: >So yes, you can say that pretending there's not a mapping of strings to >internal representation is silly, because there is. However, there's >nothing you can say about that mapping. I'm not the one labeling anything as being silly. I'm the one labeling the things as bullshit, and that's what you're doing here. I can in fact say what the internal byte string representation of strings is any given build of Python 3. Just because I can't say what it would be in an imaginary hypothetical implementation doesn't mean I can never say anything about it. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-03-28 21:44 +0100 |
| Message-ID | <mailman.1097.1332967498.3037.python-list@python.org> |
| In reply to | #22312 |
On 28/03/2012 20:43, Ross Ridge wrote: > Evan Driscoll<driscoll@cs.wisc.edu> wrote: >> So yes, you can say that pretending there's not a mapping of strings to >> internal representation is silly, because there is. However, there's >> nothing you can say about that mapping. > > I'm not the one labeling anything as being silly. I'm the one labeling > the things as bullshit, and that's what you're doing here. I can in > fact say what the internal byte string representation of strings is any > given build of Python 3. Just because I can't say what it would be in > an imaginary hypothetical implementation doesn't mean I can never say > anything about it. > > Ross Ridge > Bytes is bytes and strings is strings And the wrong one I have chose Let's go where they keep on wearin' Those frills and flowers and buttons and bows Rings and things and buttons and bows. No guessing the tune. -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2012-03-28 20:56 +0000 |
| Message-ID | <9thc8hFiu9U1@mid.individual.net> |
| In reply to | #22312 |
On 2012-03-28, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote: > Evan Driscoll <driscoll@cs.wisc.edu> wrote: >> So yes, you can say that pretending there's not a mapping of >> strings to internal representation is silly, because there is. >> However, there's nothing you can say about that mapping. > > I'm not the one labeling anything as being silly. I'm the one > labeling the things as bullshit, and that's what you're doing > here. I can in fact say what the internal byte string > representation of strings is any given build of Python 3. Just > because I can't say what it would be in an imaginary > hypothetical implementation doesn't mean I can never say > anything about it. I am in a similar situation viz a viz my wife's undergarments. -- Neil Cerutti
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-03-29 00:02 +0000 |
| Message-ID | <4f73a69c$0$29981$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #22312 |
On Wed, 28 Mar 2012 15:43:31 -0400, Ross Ridge wrote: > I can in > fact say what the internal byte string representation of strings is any > given build of Python 3. Don't keep us in suspense! Given: Python 3.2.2 (default, Mar 4 2012, 10:50:33) [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2 what *is* the internal byte representation of the string "a∫©πz"? (lowercase a, integral sign, copyright symbol, lowercase Greek pi, lowercase z) And more importantly, given that internal byte representation, what could you do with it? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Evan Driscoll <driscoll@cs.wisc.edu> |
|---|---|
| Date | 2012-03-28 19:11 -0500 |
| Message-ID | <mailman.1104.1332980161.3037.python-list@python.org> |
| In reply to | #22312 |
On 3/28/2012 14:43, Ross Ridge wrote: > Evan Driscoll <driscoll@cs.wisc.edu> wrote: >> So yes, you can say that pretending there's not a mapping of strings to >> internal representation is silly, because there is. However, there's >> nothing you can say about that mapping. > > I'm not the one labeling anything as being silly. I'm the one labeling > the things as bullshit, and that's what you're doing here. I can in > fact say what the internal byte string representation of strings is any > given build of Python 3. Just because I can't say what it would be in > an imaginary hypothetical implementation doesn't mean I can never say > anything about it. People like you -- who write to assumptions which are not even remotely guaranteed by the spec -- are part of the reason software sucks. People like you hold back progress, because system implementers aren't free to make changes without breaking backwards compatibility. Enormous amounts of effort are expended to test programs and diagnose problems which are caused by unwarranted assumptions like "the encoding of a string is UTF-8". In the worst case, assumptions like that lead to security fixes that don't go as far as they could, like the recent discussion about hashing. Python is definitely closer to the "willing to break backwards compatibility to improve" end of the spectrum than some other projects (*cough* Windows *cough*), but that still doesn't mean that you can make assumptions like that. This email is a bit harsher than it deserves -- but I feel not by much. Evan
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 23:04 -0400 |
| Message-ID | <jl0jf8$303$1@rumours.uwaterloo.ca> |
| In reply to | #22319 |
Evan Driscoll <driscoll@cs.wisc.edu> wrote: >People like you -- who write to assumptions which are not even remotely >guaranteed by the spec -- are part of the reason software sucks. ... >This email is a bit harsher than it deserves -- but I feel not by much. I don't see how you could feel the least bit justified. Well meaning, if unhelpful, lies about the nature Python strings in order to try to convince someone to follow what you think are good programming practices is one thing. Maliciously lying about someone else's code that you've never seen is another thing entirely. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-03-29 14:31 +1100 |
| Message-ID | <mailman.1107.1332991922.3037.python-list@python.org> |
| In reply to | #22323 |
On Thu, Mar 29, 2012 at 2:04 PM, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote: > Evan Driscoll <driscoll@cs.wisc.edu> wrote: >>People like you -- who write to assumptions which are not even remotely >>guaranteed by the spec -- are part of the reason software sucks. > ... >>This email is a bit harsher than it deserves -- but I feel not by much. > > I don't see how you could feel the least bit justified. Well meaning, > if unhelpful, lies about the nature Python strings in order to try to > convince someone to follow what you think are good programming practices > is one thing. Maliciously lying about someone else's code that you've > never seen is another thing entirely. Actually, he is justified. It's one thing to work in C or assembly and write code that depends on certain bit-pattern representations of data (although even that causes trouble - assuming that sizeof(int)==sizeof(int*) isn't good for portability), but in a high level language, you cannot assume any correlation between objects and bytes. Any code that depends on implementation details is risky. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-28 23:58 -0400 |
| Message-ID | <jl0mlt$9q6$1@rumours.uwaterloo.ca> |
| In reply to | #22324 |
Chris Angelico <rosuav@gmail.com> wrote: >Actually, he is justified. It's one thing to work in C or assembly and >write code that depends on certain bit-pattern representations of data >(although even that causes trouble - assuming that >sizeof(int)=3D=3Dsizeof(int*) isn't good for portability), but in a high >level language, you cannot assume any correlation between objects and >bytes. Any code that depends on implementation details is risky. How does that in anyway justify Evan Driscoll maliciously lying about code he's never seen? Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
Back to top | Article view | comp.lang.python
csiph-web