Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #22266 > unrolled thread
| Started by | Peter Daum <gator@cs.tu-berlin.de> |
|---|---|
| First post | 2012-03-28 10:56 +0200 |
| Last post | 2012-03-28 13:16 -0400 |
| Articles | 17 on this page of 57 — 22 participants |
Back to article view | Back to comp.lang.python
"convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 10:56 +0200
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-28 20:02 +1100
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 11:43 +0200
Re: "convert" string to bytes without changing data (encoding) Heiko Wundram <modelnine@modelnine.org> - 2012-03-28 12:42 +0200
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-28 19:43 +0200
Re: "convert" string to bytes without changing data (encoding) Heiko Wundram <modelnine@modelnine.org> - 2012-03-28 20:13 +0200
Re: "convert" string to bytes without changing data (encoding) Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2012-03-28 21:13 +0300
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 18:31 +0000
Re: "convert" string to bytes without changing data (encoding) Ethan Furman <ethan@stoneleaf.us> - 2012-03-28 11:49 -0700
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 18:20 +0000
Re: "convert" string to bytes without changing data (encoding) Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-28 12:20 -0600
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 18:26 +0000
Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 19:40 +0000
Re: "convert" string to bytes without changing data (encoding) Ethan Furman <ethan@stoneleaf.us> - 2012-03-28 11:17 -0700
Re: "convert" string to bytes without changing data (encoding) John Nagle <nagle@animats.com> - 2012-03-28 12:30 -0700
Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-28 17:37 -0400
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-29 16:57 +0200
Re: "convert" string to bytes without changing data (encoding) Peter Daum <gator@cs.tu-berlin.de> - 2012-03-29 16:57 +0200
Re: "convert" string to bytes without changing data (encoding) Serhiy Storchaka <storchaka@gmail.com> - 2012-03-30 22:06 +0300
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-31 06:10 +1100
Re: "convert" string to bytes without changing data (encoding) Stefan Behnel <stefan_ml@behnel.de> - 2012-03-28 13:25 +0200
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 18:12 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 11:36 -0400
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-29 03:18 +1100
Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 16:33 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 14:05 -0400
Re: "convert" string to bytes without changing data (encoding) Tim Chase <python.list@tim.thechases.com> - 2012-03-28 13:49 -0500
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 15:10 -0400
Re: "convert" string to bytes without changing data (encoding) "Albert W. Hopkins" <marduk@letterboxes.org> - 2012-03-28 15:22 -0400
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-28 17:54 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 14:22 -0400
Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-28 14:20 -0500
Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 15:43 -0400
Re: "convert" string to bytes without changing data (encoding) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-03-28 21:44 +0100
Re: "convert" string to bytes without changing data (encoding) Neil Cerutti <neilc@norwich.edu> - 2012-03-28 20:56 +0000
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-29 00:02 +0000
Re: Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-28 19:11 -0500
Re: Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 23:04 -0400
Re: Re: Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-29 14:31 +1100
Re: Re: Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-28 23:58 -0400
Re: "convert" string to bytes without changing data (encoding) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-03-29 07:01 +0100
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-29 06:51 +0000
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-29 11:30 -0400
Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-29 12:49 -0400
Re: "convert" string to bytes without changing data (encoding) Ross Ridge <rridge@csclub.uwaterloo.ca> - 2012-03-29 14:00 -0400
Re: "convert" string to bytes without changing data (encoding) Chris Angelico <rosuav@gmail.com> - 2012-03-30 07:41 +1100
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-30 01:16 +0000
Re: Re: Re: Re: "convert" string to bytes without changing data (encoding) Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-29 11:31 -0500
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-28 19:02 +0000
Re: "convert" string to bytes without changing data (encoding) Grant Edwards <invalid@invalid.invalid> - 2012-03-28 19:44 +0000
Re: "convert" string to bytes without changing data (encoding) MRAB <python@mrabarnett.plus.com> - 2012-03-28 20:50 +0100
RE: "convert" string to bytes without changing data (encoding) "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-29 17:36 +0000
Re: "convert" string to bytes without changing data (encoding) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-30 01:10 +0000
Re: "convert" string to bytes without changing data (encoding) Michael Ströder <michael@stroeder.com> - 2012-03-30 09:04 +0200
Re: "convert" string to bytes without changing data (encoding) Terry Reedy <tjreedy@udel.edu> - 2012-03-28 14:11 -0400
Re: "convert" string to bytes without changing data (encoding) Stefan Behnel <stefan_ml@behnel.de> - 2012-03-28 11:08 +0200
Re: "convert" string to bytes without changing data (encoding) Dave Angel <d@davea.name> - 2012-03-28 13:16 -0400
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-03-29 07:01 +0100 |
| Message-ID | <mailman.1109.1333000850.3037.python-list@python.org> |
| In reply to | #22325 |
On 29/03/2012 04:58, Ross Ridge wrote: > Chris Angelico<rosuav@gmail.com> wrote: >> Actually, he is justified. It's one thing to work in C or assembly and >> write code that depends on certain bit-pattern representations of data >> (although even that causes trouble - assuming that >> sizeof(int)=3D=3Dsizeof(int*) isn't good for portability), but in a high >> level language, you cannot assume any correlation between objects and >> bytes. Any code that depends on implementation details is risky. > > How does that in anyway justify Evan Driscoll maliciously lying about > code he's never seen? > > Ross Ridge > We appear to have a case of "would you stand up please, your voice is rather muffled". I can hear all the *plonks* from miles away. -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-03-29 06:51 +0000 |
| Message-ID | <4f740687$0$29884$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #22325 |
On Wed, 28 Mar 2012 23:58:53 -0400, Ross Ridge wrote: > How does that in anyway justify Evan Driscoll maliciously lying about > code he's never seen? You are perfectly justified to complain about Evan making sweeping generalisations about your code when he has not seen it; you are NOT justified in making your own sweeping generalisations that he is not just lying but *maliciously* lying. He might be just confused by the strength of his emotions and so making an honest mistake. Or he might have guessed perfectly accurately about your code, and you are the one being dishonest. Who knows? Evan's impassioned rant is based on his estimate of your mindset, namely that you are the sort of developer who writes code making assumptions about implementation details even when explicitly told not to by the library authors. I have no idea whether Evan's estimate is right or not, but I don't think it is justified based on the little amount we've seen of you. Your reaction is to make an equally unjustified estimate of Evan's mindset, namely that he is not just wrong about you, but *deliberately and maliciously* lying about you in the full knowledge that he is wrong. If anything, I would say that you have less justification for calling Evan a malicious liar than he has for calling you the sort of person who would write to an implementation instead of an interface. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-29 11:30 -0400 |
| Message-ID | <jl1v6b$67a$1@rumours.uwaterloo.ca> |
| In reply to | #22328 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: >Your reaction is to make an equally unjustified estimate of Evan's >mindset, namely that he is not just wrong about you, but *deliberately >and maliciously* lying about you in the full knowledge that he is wrong. No, Evan in his own words admitted that his post was ment to be harsh, "a bit harsher than it deserves", showing his malicious intent. He made accusations that where neither supported by anything I've said in this thread nor by the code I actually write. His accusation about me were completely made up, he was not telling the truth and had no reasonable basis to beleive he was telling the truth. He was malicously lying and I'm completely justified in saying so. Just to make it clear to all you zealots. I've not once advocated writing any sort "risky code" in this thread. I have not once advocated writing any style of code in thread. Just because I refuse to drink the "it's impossible to represent strings as a series of bytes" kool-aid does't mean that I'm a heretic that must oppose against everything you believe in. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-03-29 12:49 -0400 |
| Message-ID | <mailman.1126.1333039783.3037.python-list@python.org> |
| In reply to | #22345 |
On 3/29/2012 11:30 AM, Ross Ridge wrote: > No, Evan in his own words admitted that his post was ment to be harsh, I agree that he should have restrained and censored his writing. > Just because I refuse to drink the > "it's impossible to represent strings as a series of bytes" kool-aid I do not believe *anyone* has made that claim. Is this meant to be a wild exaggeration? As wild as Evan's? In my first post on this thread, I made three truthful claims. 1. A 3.x text string is logically a sequence of unicode 'characters' (codepoints). 2. The Python language definition does not require that a string be bytes or become bytes unless and until it is explicitly encoded. 3. The intentionally hidden byte implementation of strings on byte machines is version and system dependent. The bytes used for a particular character is (in 3.3) context dependent. As it turns out, the OP had mistakenly assumed that the hidden byte implementation of 3.3 strings was both well-defined and something (utf-8) that it is not and (almost certainly) never will be. Guido and most other devs strongly want string indexing (and hence slice endpoint finding) to be O(1). So all of the above is moot as far as the OP's problem is concerned. I already gave him the three standard solutions. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Ross Ridge <rridge@csclub.uwaterloo.ca> |
|---|---|
| Date | 2012-03-29 14:00 -0400 |
| Message-ID | <jl280a$pr5$1@rumours.uwaterloo.ca> |
| In reply to | #22350 |
Ross Ridge wrote: > Just because I refuse to drink the > "it's impossible to represent strings as a series of bytes" kool-aid Terry Reedy <tjreedy@udel.edu> wrote: >I do not believe *anyone* has made that claim. Is this meant to be a >wild exaggeration? As wild as Evan's? Sorry, it would've been more accurate to label the flavour of kool-aid Chris Angelico was trying to push as "it's impossible ... without encoding": What is a string? It's not a series of bytes. You can't convert it without encoding those characters into bytes in some way. >In my first post on this thread, I made three truthful claims. I'm not objecting to every post made in this thread. If your post had been made before the original poster had figured it out on his own, I would've hoped he would have found it much more convincing than what I quoted above. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rridge@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db //
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-03-30 07:41 +1100 |
| Message-ID | <mailman.1135.1333053694.3037.python-list@python.org> |
| In reply to | #22354 |
On Fri, Mar 30, 2012 at 5:00 AM, Ross Ridge <rridge@csclub.uwaterloo.ca> wrote: > Sorry, it would've been more accurate to label the flavour of kool-aid > Chris Angelico was trying to push as "it's impossible ... without > encoding": > > What is a string? It's not a series of bytes. You can't convert > it without encoding those characters into bytes in some way. I still stand by that statement. Do you try to convert a "dictionary of filename to open file object" into a "series of bytes" inside Python? It doesn't matter that, on some level, it's *stored as* a series of bytes; the actual object *is not* a series of bytes. There is no logical equivalency, ergo it is illogical and nonsensical to expect to turn one into the other without some form of encoding. Python does include an encoding that can handle lists and dictionaries. It's called Pickle, and it returns (in Python 3) a bytes object - which IS a series of bytes. It doesn't simply return some internal representation. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-03-30 01:16 +0000 |
| Message-ID | <4f750965$0$29981$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #22345 |
On Thu, 29 Mar 2012 11:30:19 -0400, Ross Ridge wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>>Your reaction is to make an equally unjustified estimate of Evan's
>>mindset, namely that he is not just wrong about you, but *deliberately
>>and maliciously* lying about you in the full knowledge that he is wrong.
>
> No, Evan in his own words admitted that his post was ment to be harsh,
> "a bit harsher than it deserves", showing his malicious intent.
Being harsher than it deserves is not synonymous with malicious. You are
making assumptions about Evan's mental state that are not supported by
the evidence. Evan may believe that by "punishing" (for some feeble sense
of punishment) you harshly, he is teaching you better behaviour that will
be to your own benefit; or that it will act as a warning to others.
Either way he may believe that he is actually doing good.
And then he entirely undermined his own actions by admitting that he was
over-reacting. This suggests that, in fact, he wasn't really motivated by
either malice or beneficence but mere frustration.
It is quite clear that Evan let his passions about writing maintainable
code get the best of him. His rant was more about "people like you" than
you personally.
Evan, if you're reading this, I think you owe Ross an apology for flying
off the handle. Ross, I think you owe Evan an apology for unjustified
accusations of malice.
> He made
> accusations that where neither supported by anything I've said
Now that is not actually true. Your posts have defended the idea that
copying the raw internal byte representation of strings is a reasonable
thing to do. You even claimed to know how to do so, for any version of
Python (but so far have ignored my request for you to demonstrate).
> in this
> thread nor by the code I actually write. His accusation about me were
> completely made up, he was not telling the truth and had no reasonable
> basis to beleive he was telling the truth. He was malicously lying and
> I'm completely justified in saying so.
No, they were not completely made up. Your posts give many signs of being
somebody who might very well write code to the implementation rather than
the interface. Whether you are or not is a separate question, but your
posts in this thread indicate that you very likely could be.
If this is not the impression you want to give, then you should
reconsider your posting style.
Ross, to be frank, your posting style in this thread has been cowardly
and pedantic, an obnoxious combination. Please take this as constructive
criticism and not an attack -- you have alienated people in this thread,
leading at least one person to publicly kill-file your future posts. I
choose to assume you aren't aware of why that is than that you are doing
so deliberately.
Without actually coming out and making a clear, explicit statement that
you approve or disapprove of the OP's attempt to use implementation
details, you *imply* support without explicitly giving it; you criticise
others for saying it can't be done without demonstrating that it can be
done. If this is a deliberate rhetorical trick, then shame on you for
being a coward without the conviction to stand behind concrete
expressions of your opinion. If not, then you should be aware that you
are using a rhetorical style that will make many people predisposed to
think you are a twat.
You *might* have said
Guys, you're technically wrong about this. This is how you can
retrieve the internal representation of a string as a sequence
of bytes: ...code... but you shouldn't use this in production
code because it is fragile and depends on implementation details
that may break in PyPy and Jython and IronPython.
But you didn't.
You *might* have said
Wrong, you can convert a string into a sequence of bytes without
encoding or decoding: ...code... but don't do this.
But you didn't.
Instead you puffed yourself up as a big shot who was more technically
correct than everyone else, but without *actually* demonstrating that you
can do what you said you can do. You labelled as "bullshit" our attempts
to discourage the OP from his misguided approached.
If your intention was to put people off-side, you succeeded very well. If
not, you should be aware that you have, and consider how you might avoid
this in the future.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Evan Driscoll <driscoll@cs.wisc.edu> |
|---|---|
| Date | 2012-03-29 11:31 -0500 |
| Message-ID | <mailman.1124.1333038695.3037.python-list@python.org> |
| In reply to | #22323 |
On 01/-10/-28163 01:59 PM, Ross Ridge wrote:
> Evan Driscoll<driscoll@cs.wisc.edu> wrote:
>> People like you -- who write to assumptions which are not even remotely
>> guaranteed by the spec -- are part of the reason software sucks.
> ...
>> This email is a bit harsher than it deserves -- but I feel not by much.
>
> I don't see how you could feel the least bit justified. Well meaning,
> if unhelpful, lies about the nature Python strings in order to try to
> convince someone to follow what you think are good programming practices
> is one thing. Maliciously lying about someone else's code that you've
> never seen is another thing entirely.
I'm not even talking about code that you or the OP has written. I'm
talking about your suggestion that
I can in fact say what the internal byte string representation
of strings is any given build of Python 3.
Aside from the questionable truth of this assertion (there's no
guarantee that an implementation uses one consistent encoding or data
structure representation consistently), that's of no consequence because
you can't depend on what the representation is. So why even bring it up?
Also irrelevant is:
In practice the number of ways that CPython (the only Python 3
implementation) represents strings is much more limited.
Pretending otherwise really isn't helpful.
If you can't depend on CPython's implementation (and, I would argue,
your code is broken if you do), then it *is* helpful. Saying that "you
can just look at what CPython does" is what is unhelpful.
That said, looking again I did misread your post that I sent that harsh
reply to; I was looking at it perhaps a bit too much through the lens of
the CPython comment I said above, and interpreting it as "I can say what
the internal representation is of CPython, so just give me that" and
launched into my spiel. If that's not what was intended, I retract my
statement. As long as everyone is clear on the fact that Python 3
implementations can use whatever encoding and data structures they want,
perhaps even different encodings or data structures for equal strings,
and that as a consequence saying "what's the internal representation of
this string" is a meaningless question as far as Python itself is
concerned, I'm happy.
Evan
[toc] | [prev] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmorgan.com> |
|---|---|
| Date | 2012-03-28 19:02 +0000 |
| Message-ID | <mailman.1094.1332962975.3037.python-list@python.org> |
| In reply to | #22299 |
> >The right way to convert bytes to strings, and vice versa, is via > >encoding and decoding operations. > > If you want to dictate to the original poster the correct way to do > things then you don't need to do anything more that. You don't need to > pretend like Chris Angelico that there's isn't a direct mapping from > the his Python 3 implementation's internal respresentation of strings > to bytes in order to label what he's asking for as being "silly". It might be technically possible to recreate internal implementation, or get the byte data. That does not mean it will make any sense or be understood in a meaningful manner. I think Ian summarized it very well: >You can't generally just "deal with the ascii portions" without >knowing something about the encoding. Say you encounter a byte >greater than 127. Is it a single non-ASCII character, or is it the >leading byte of a multi-byte character? If the next character is less >than 127, is it an ASCII character, or a continuation of the previous >character? For UTF-8 you could safely assume ASCII, but without >knowing the encoding, there is no way to be sure. If you just assume >it's ASCII and manipulate it as such, you could be messing up >non-ASCII characters. Technically, ASCII goes up to 256 but they are not A-z letters. Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2012-03-28 19:44 +0000 |
| Message-ID | <jkvpm2$9nk$2@reader1.panix.com> |
| In reply to | #22306 |
On 2012-03-28, Prasad, Ramit <ramit.prasad@jpmorgan.com> wrote:
>
>>You can't generally just "deal with the ascii portions" without
>>knowing something about the encoding. Say you encounter a byte
>>greater than 127. Is it a single non-ASCII character, or is it the
>>leading byte of a multi-byte character? If the next character is less
>>than 127, is it an ASCII character, or a continuation of the previous
>>character? For UTF-8 you could safely assume ASCII, but without
>>knowing the encoding, there is no way to be sure. If you just assume
>>it's ASCII and manipulate it as such, you could be messing up
>>non-ASCII characters.
>
> Technically, ASCII goes up to 256
No, ASCII only defines 0-127. Values >=128 are not ASCII.
From https://en.wikipedia.org/wiki/ASCII:
ASCII includes definitions for 128 characters: 33 are non-printing
control characters (now mostly obsolete) that affect how text and
space is processed and 95 printable characters, including the space
(which is considered an invisible graphic).
--
Grant Edwards grant.b.edwards Yow! Used staples are good
at with SOY SAUCE!
gmail.com
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-03-28 20:50 +0100 |
| Message-ID | <mailman.1096.1332964201.3037.python-list@python.org> |
| In reply to | #22299 |
On 28/03/2012 20:02, Prasad, Ramit wrote: >> >The right way to convert bytes to strings, and vice versa, is via >> >encoding and decoding operations. >> >> If you want to dictate to the original poster the correct way to do >> things then you don't need to do anything more that. You don't need to >> pretend like Chris Angelico that there's isn't a direct mapping from >> the his Python 3 implementation's internal respresentation of strings >> to bytes in order to label what he's asking for as being "silly". > > It might be technically possible to recreate internal implementation, > or get the byte data. That does not mean it will make any sense or > be understood in a meaningful manner. I think Ian summarized it > very well: > >>You can't generally just "deal with the ascii portions" without >>knowing something about the encoding. Say you encounter a byte >>greater than 127. Is it a single non-ASCII character, or is it the >>leading byte of a multi-byte character? If the next character is less >>than 127, is it an ASCII character, or a continuation of the previous >>character? For UTF-8 you could safely assume ASCII, but without >>knowing the encoding, there is no way to be sure. If you just assume >>it's ASCII and manipulate it as such, you could be messing up >>non-ASCII characters. > > Technically, ASCII goes up to 256 but they are not A-z letters. > Technically, ASCII is 7-bit, so it goes up to 127.
[toc] | [prev] | [next] | [standalone]
| From | "Prasad, Ramit" <ramit.prasad@jpmorgan.com> |
|---|---|
| Date | 2012-03-29 17:36 +0000 |
| Message-ID | <mailman.1128.1333042614.3037.python-list@python.org> |
| In reply to | #22299 |
> > Technically, ASCII goes up to 256 but they are not A-z letters. > > > Technically, ASCII is 7-bit, so it goes up to 127. > No, ASCII only defines 0-127. Values >=128 are not ASCII. > > >From https://en.wikipedia.org/wiki/ASCII: > > ASCII includes definitions for 128 characters: 33 are non-printing > control characters (now mostly obsolete) that affect how text and > space is processed and 95 printable characters, including the space > (which is considered an invisible graphic). Doh! I was mistaking extended ASCII for ASCII. Thanks for the correction. Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- > -----Original Message----- > From: python-list-bounces+ramit.prasad=jpmorgan.com@python.org > [mailto:python-list-bounces+ramit.prasad=jpmorgan.com@python.org] On > Behalf Of MRAB > Sent: Wednesday, March 28, 2012 2:50 PM > To: python-list@python.org > Subject: Re: "convert" string to bytes without changing data (encoding) > > On 28/03/2012 20:02, Prasad, Ramit wrote: > >> >The right way to convert bytes to strings, and vice versa, is via > >> >encoding and decoding operations. > >> > >> If you want to dictate to the original poster the correct way to do > >> things then you don't need to do anything more that. You don't need > to > >> pretend like Chris Angelico that there's isn't a direct mapping from > >> the his Python 3 implementation's internal respresentation of strings > >> to bytes in order to label what he's asking for as being "silly". > > > > It might be technically possible to recreate internal implementation, > > or get the byte data. That does not mean it will make any sense or > > be understood in a meaningful manner. I think Ian summarized it > > very well: > > > >>You can't generally just "deal with the ascii portions" without > >>knowing something about the encoding. Say you encounter a byte > >>greater than 127. Is it a single non-ASCII character, or is it the > >>leading byte of a multi-byte character? If the next character is less > >>than 127, is it an ASCII character, or a continuation of the previous > >>character? For UTF-8 you could safely assume ASCII, but without > >>knowing the encoding, there is no way to be sure. If you just assume > >>it's ASCII and manipulate it as such, you could be messing up > >>non-ASCII characters. > > > -- > http://mail.python.org/mailman/listinfo/python-list This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-03-30 01:10 +0000 |
| Message-ID | <4f75080a$0$29981$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #22352 |
On Thu, 29 Mar 2012 17:36:34 +0000, Prasad, Ramit wrote: >> > Technically, ASCII goes up to 256 but they are not A-z letters. >> > >> Technically, ASCII is 7-bit, so it goes up to 127. > >> No, ASCII only defines 0-127. Values >=128 are not ASCII. >> >> >From https://en.wikipedia.org/wiki/ASCII: >> >> ASCII includes definitions for 128 characters: 33 are non-printing >> control characters (now mostly obsolete) that affect how text and >> space is processed and 95 printable characters, including the space >> (which is considered an invisible graphic). > > > Doh! I was mistaking extended ASCII for ASCII. Thanks for the > correction. There actually is no such thing as "extended ASCII" -- there is a whole series of many different "extended ASCIIs". If you look at the encodings available in (for example) Thunderbird, many of the ISO-8859-* and Windows-* encodings are "extended ASCII" in the sense that they extend ASCII to include bytes 128-255. Unfortunately they all extend ASCII in a different way (hence they are different encodings). -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Michael Ströder <michael@stroeder.com> |
|---|---|
| Date | 2012-03-30 09:04 +0200 |
| Message-ID | <jl4pru$r1a$1@dont-email.me> |
| In reply to | #22366 |
Steven D'Aprano wrote: > On Thu, 29 Mar 2012 17:36:34 +0000, Prasad, Ramit wrote: > >>>> Technically, ASCII goes up to 256 but they are not A-z letters. >>>> >>> Technically, ASCII is 7-bit, so it goes up to 127. >> >>> No, ASCII only defines 0-127. Values >=128 are not ASCII. >>> >>> >From https://en.wikipedia.org/wiki/ASCII: >>> >>> ASCII includes definitions for 128 characters: 33 are non-printing >>> control characters (now mostly obsolete) that affect how text and >>> space is processed and 95 printable characters, including the space >>> (which is considered an invisible graphic). >> >> >> Doh! I was mistaking extended ASCII for ASCII. Thanks for the >> correction. > > There actually is no such thing as "extended ASCII" -- there is a whole > series of many different "extended ASCIIs". If you look at the encodings > available in (for example) Thunderbird, many of the ISO-8859-* and > Windows-* encodings are "extended ASCII" in the sense that they extend > ASCII to include bytes 128-255. Unfortunately they all extend ASCII in a > different way (hence they are different encodings). Yupp. Looking at RFC 1345 some years ago (while having to deal with EBCDIC) made this all pretty clear to me. I appreciate that someone did this heavy work of collecting historical encodings. Ciao, Michael.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-03-28 14:11 -0400 |
| Message-ID | <mailman.1087.1332959217.3037.python-list@python.org> |
| In reply to | #22280 |
On 3/28/2012 11:36 AM, Ross Ridge wrote: > Chris Angelico<rosuav@gmail.com> wrote: >> What is a string? It's not a series of bytes. > > Of course it is. Conceptually you're not supposed to think of it that > way, but a string is stored in memory as a series of bytes. *If* it is stored in byte memory. If you execute a 3.x program mentally or on paper, then there are no bytes. If you execute a 3.3 program on a byte-oriented computer, then the 'a' in the string might be represented by 1, 2, or 4 bytes, depending on the other characters in the string. The actual logical bit pattern will depend on the big versus little endianness of the system. My impression is that if you go down to the physical bit level, then again there are, possibly, no 'bytes' as a physical construct as the bits, possibly, are stored in parallel on multiple ram chips. > What he's asking for many not be very useful or practical, but if that's > your problem here than then that's what you should be addressing, not > pretending that it's fundamentally impossible. The python-level way to get the bytes of an object that supports the buffer interface is memoryview(). 3.x strings intentionally do not support the buffer interface as there is not any particular correspondence between characters (codepoints) and bytes. The OP could get the ordinal for each character and decide how *he* wants to convert them to bytes. ba = bytearray() for c in s: i = ord(c) <append bytes to ba corresponding to i> To get the particular bytes used for a particular string on a particular system, OP should use the C API, possibly through ctypes. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2012-03-28 11:08 +0200 |
| Message-ID | <mailman.1066.1332925721.3037.python-list@python.org> |
| In reply to | #22266 |
Peter Daum, 28.03.2012 10:56: > is there any way to convert a string to bytes without > interpreting the data in any way? Something like: > > s='abcde' > b=bytes(s, "unchanged") If you can tell us what you actually want to achieve, i.e. why you want to do this, we may be able to tell you how to do what you want. Stefan
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-03-28 13:16 -0400 |
| Message-ID | <mailman.1083.1332955038.3037.python-list@python.org> |
| In reply to | #22266 |
On 03/28/2012 04:56 AM, Peter Daum wrote: > Hi, > > is there any way to convert a string to bytes without > interpreting the data in any way? Something like: > > s='abcde' > b=bytes(s, "unchanged") > > Regards, > Peter You needed to specify that you are using Python 3.x . In python 2.x, a string is indeed a series of bytes. But in Python 3.x, you have to be much more specific. For example, if that string is coming from a literal, then you usually can convert it back to bytes simply by encoding using the same method as the one specified for the source file. So look at the encoding line at the top of the file. -- DaveA
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.python
csiph-web