Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: How to turn a string into a list of integers?
From: Kurt Mueller <kurt.alfred.mueller@gmail.com>
In-Reply-To: <DD329F57-4675-464F-92F0-65BB85DF207E@gmail.com>
Date: Fri, 5 Sep 2014 22:41:16 +0200
Content-Transfer-Encoding: quoted-printable
References: <h2ejdb-mdk.ln1@chris.zbmc.eu> <mailman.13738.1409748804.18130.python-list@python.org> <1amjdb-p3n.ln1@chris.zbmc.eu> <mailman.13776.1409864831.18130.python-list@python.org> <1k9odb-1qs.ln1@chris.zbmc.eu> <E915BA7F-DAFA-4D9A-B70B-0C6EECD68484@gmail.com> <CAMw+j7LZu2YWw1UkAYvBnX7LZDJncP5Euh5aYjxYVvWnH=CgwA@mail.gmail.com> <DD329F57-4675-464F-92F0-65BB85DF207E@gmail.com>
To: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.13809.1409949686.18130.python-list@python.org>
Lines: 60
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:77615

Am 05.09.2014 um 21:16 schrieb Kurt Mueller =
<kurt.alfred.mueller@gmail.com>:
> Am 05.09.2014 um 20:25 schrieb Chris =93Kwpolska=94 Warrick =
<kwpolska@gmail.com>:
>> On Sep 5, 2014 7:57 PM, "Kurt Mueller" =
<kurt.alfred.mueller@gmail.com> wrote:
>>> Could someone please explain the following behavior to me:
>>> Python 2.7.7, MacOS 10.9 Mavericks
>>>=20
>>>>>> import sys
>>>>>> sys.getdefaultencoding()
>>> 'ascii'
>>>>>> [ord(c) for c in 'A=C4']
>>> [65, 195, 132]
>>>>>> [ord(c) for c in u'A=C4']
>>> [65, 196]
>>>=20
>>> My obviously wrong understanding:
>>> =82A=C4=91 in =82ascii=91 are two characters
>>>     one with ord A=3D65 and
>>>     one with ord =C4=3D196 ISO8859-1 <depends on code table>
>>>     =97-> why [65, 195, 132]
>>> u=92A=C4=92 is an Unicode string
>>>     =97-> why [65, 196]
>>>=20
>>> It is just the other way round as I would expect.
>>=20
>> Basically, the first string is just a bunch of bytes, as provided by =
your terminal =97 which sounds like UTF-8 (perfectly logical in 2014).  =
The second one is converted into a real Unicode representation. The =
codepoint for =C4 is U+00C4 (196 decimal). It's just a coincidence that =
it also matches latin1 aka ISO 8859-1 as Unicode starts with all 256 =
latin1 codepoints. Please kindly forget encodings other than UTF-8.
>=20
> So:
> =91A=C4=92 is an UTF-8 string represented by 3 bytes:
> A -> 41   -> 65  first byte decimal
> =C4 -> c384 -> 195 and 132 second and third byte decimal
>=20
> u=92A=C4=92 is an Unicode string represented by 2 bytes?:
> A -> U+0041 -> 65 first byte decimal, 00 is omitted or not yielded by =
ord()?
> =C4 -> U+00C4 -> 196 second byte decimal, 00 is ommited or not yielded =
by ord()?

After reading the ord() manual:
The second case should read:
u=92A=C4=92 is an Unicode string represented by 2 unicode characters:
If Python was built with UCS2 Unicode, then the character=92s code point =
must
be in the range [0..65535, 16 bits, U-0000..U-FFFF]
A -> U+0041 ->  65 first  character decimal (code point)
=C4 -> U+00C4 -> 196 second character decimal (code point)


Am I right now?

--=20
Kurt Mueller, kurt.alfred.mueller@gmail.com