Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #63289 > unrolled thread
| Started by | Chris Angelico <rosuav@gmail.com> |
|---|---|
| First post | 2014-01-07 02:46 +1100 |
| Last post | 2014-01-07 02:46 +1100 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: "More About Unicode in Python 2 and 3" Chris Angelico <rosuav@gmail.com> - 2014-01-07 02:46 +1100
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-07 02:46 +1100 |
| Subject | Re: "More About Unicode in Python 2 and 3" |
| Message-ID | <mailman.5023.1389023179.18130.python-list@python.org> |
On Tue, Jan 7, 2014 at 2:10 AM, Ethan Furman <ethan@stoneleaf.us> wrote: > On 01/05/2014 06:55 PM, Chris Angelico wrote: >> >> >> It can't be both things. It's either bytes or it's text. > > > Of course it can be: > > 0000000: 0372 0106 0000 0000 6100 1d00 0000 0000 .r......a....... > 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000 NAME.......C.... > 0000030: 1900 0000 0000 0000 0000 0000 0000 0000 ................ > 0000040: 4147 4500 0000 0000 0000 004e 1a00 0000 AGE........N.... > 0000050: 0300 0000 0000 0000 0000 0000 0000 0000 ................ > 0000060: 0d1a 0a ... > > And there we are, mixed bytes and ascii data. As I said earlier, my example > is minimal, but still very frustrating in that normal operations no longer > work. Incidentally, if you were thinking that NAME and AGE were part of the > ascii text, you'd be wrong -- the field names are also encoded, as are the > Character and Memo fields. That's alternating between encoded text and non-text bytes. Each individual piece is either text or non-text, not both. The ideal way to manipulate it would most likely be a simple decode operation that turns this into (probably) a dictionary, decoding both the structure/layout and UTF-8 in a single operation. But a less ideal (and more convenient) solution might be involving what's currently under discussion elsewhere: a (possibly partial) percent-formatting or .format() method for bytes. None of this changes the fact that there are bytes used to store/transmit stuff, and abstract concepts used to manipulate them. Just like nobody expects to be able to write a dict to a file without some form of encoding (pickle, JSON, whatever), you shouldn't expect to write a character string without first turning it into bytes. ChrisA
Back to top | Article view | comp.lang.python
csiph-web