Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #5216
| References | (3 earlier) <mailman.1435.1305157329.9059.python-list@python.org> <KDGyp.180$0t1.7@newsfe04.iad> <mailman.1439.1305167541.9059.python-list@python.org> <874o50k1eb.fsf@benfinney.id.au> <U8Lyp.1000$dL5.14@newsfe08.iad> |
|---|---|
| Date | 2011-05-12 17:58 +1000 |
| Subject | Re: unicode by default |
| From | "John Machin" <sjmachin@lexicon.net> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1450.1305187110.9059.python-list@python.org> (permalink) |
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:
>
> So, the UTF-16 UTF-32 is INTERNAL only, for Python
NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
encodings for the EXTERNAL representation of Unicode characters in byte
streams.
> I also was not aware that UTF-8 chars could be up to six(6) byes long
> from left to right.
It could be, once upon a time in ISO faerieland, when it was thought that
Unicode could grow to 2**32 codepoints. However ISO and the Unicode
consortium have agreed that 17 planes is the utter max, and accordingly a
valid UTF-8 byte sequence can be no longer than 4 bytes ... see below
>>> chr(17 * 65536)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: chr() arg not in range(0x110000)
>>> chr(17 * 65536 - 1)
'\U0010ffff'
>>> _.encode('utf8')
b'\xf4\x8f\xbf\xbf'
>>> b'\xf5\x8f\xbf\xbf'.decode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python32\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0:
invalid start byte
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 16:37 -0500
Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-11 16:09 -0600
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 17:51 -0500
Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 09:32 +1000
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 20:22 -0500
Re: unicode by default MRAB <python@mrabarnett.plus.com> - 2011-05-12 03:31 +0100
Re: unicode by default Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-05-12 03:16 +0000
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-11 22:44 -0500
Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-12 00:12 -0400
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:43 -0500
Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 14:14 +1000
Re: unicode by default Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-11 21:14 -0700
Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 14:41 +1000
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:14 -0500
Re: unicode by default TheSaint <nobody@nowhere.net.no> - 2011-05-12 20:40 +0800
Re: unicode by default Ben Finney <ben+python@benfinney.id.au> - 2011-05-12 14:07 +1000
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-12 01:31 -0500
Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 17:58 +1000
Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-12 10:17 -0600
Re: unicode by default jmfauth <wxjmfauth@gmail.com> - 2011-05-12 23:28 -0700
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-13 14:53 -0500
Re: unicode by default Robert Kern <robert.kern@gmail.com> - 2011-05-13 15:18 -0500
Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-13 21:41 -0400
Re: unicode by default harrismh777 <harrismh777@charter.net> - 2011-05-14 02:41 -0500
Re: unicode by default jmfauth <wxjmfauth@gmail.com> - 2011-05-14 03:26 -0700
Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-14 16:26 -0400
Re: unicode by default Ben Finney <ben+python@benfinney.id.au> - 2011-05-15 09:47 +1000
Re: unicode by default Nobody <nobody@nowhere.com> - 2011-05-14 09:34 +0100
Re: unicode by default Terry Reedy <tjreedy@udel.edu> - 2011-05-12 16:42 -0400
Re: unicode by default Ian Kelly <ian.g.kelly@gmail.com> - 2011-05-12 16:25 -0600
Re: unicode by default "John Machin" <sjmachin@lexicon.net> - 2011-05-12 13:54 +1000
Re: unicode by default Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-11 15:34 -0700
csiph-web