Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'matches': 0.07; 'sys': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes,': 0.09; 'converted': 0.09; 'subject:into': 0.09; 'subject:string': 0.09; 'subject:How': 0.10; 'python': 0.11; 'kurt': 0.12; '(code': 0.16; 'characters:': 0.16; 'encodings': 0.16; 'macos': 0.16; 'ord': 0.16; 'unicode,': 0.16; 'wrote:': 0.18; 'obviously': 0.18; 'starts': 0.20; '>>>': 0.22; 'import': 0.22; 'byte': 0.24; 'logical': 0.24; 'now?': 0.24; 'unicode': 0.24; 'second': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'chris': 0.29; 'character': 0.29; 'characters': 0.30; 'subject:list': 0.30; 'code': 0.31; 'bunch': 0.31; 'decimal': 0.31; 'omitted': 0.31; 'sep': 0.31; 'third': 0.33; 'could': 0.34; 'received:google.com': 0.35; 'subject:?': 0.36; 'should': 0.36; 'wrong': 0.37; 'two': 0.37; 'message- id:@gmail.com': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'explain': 0.39; 'to:addr:python.org': 0.39; 'skip:u 10': 0.60; 'range': 0.61; 'kindly': 0.61; 'first': 0.61; 'received:62': 0.63; 'header:Message-Id:1': 0.63; 'email addr:gmail.com': 0.63; 'real': 0.63; 'charset:windows-1252': 0.65; 'behavior': 0.77; 'yielded': 0.84; 'aka': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=l7NCj0Cd1UZymNpMaxkepZl7QiP9Wc1TTw0V3Zj66Ac=; b=OjBPsD706LU1A7yOHvm2EKD9Qr2yKFQDQhb6lLnU913AjhbMsUYllF7ljPmEaSvzHa yjUUFkM4JHquulwt1oCcbxUlW6nUBs+Dvkp2hVlH59iJm2HmkAQ3W2G4NKn7f/vhCZBo sFVnN/ffFJzqo6Et4t9OSH377mhmVsiZXK4W5C5BXeyFQtkkWWfofwvl1YHC2OA7W+jf 2Kc1FMizojfVDxnFJLTgHefqV0FMa06hbUPyoOqvODpV7bIc30wmIZTBVzGFTTygZJLG WO0edaZzbkG1WaLA6isKDBquoBYKW5j2Ape+6Dnu3Zj4uFBUUHEZBBYbJE0rKoxRXN8Y RZdQ== X-Received: by 10.180.24.225 with SMTP id x1mr6149648wif.14.1409949678441; Fri, 05 Sep 2014 13:41:18 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: How to turn a string into a list of integers? From: Kurt Mueller In-Reply-To: Date: Fri, 5 Sep 2014 22:41:16 +0200 Content-Transfer-Encoding: quoted-printable References: <1amjdb-p3n.ln1@chris.zbmc.eu> <1k9odb-1qs.ln1@chris.zbmc.eu> To: python-list@python.org X-Mailer: Apple Mail (2.1878.6) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 60 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1409949686 news.xs4all.nl 2884 [2001:888:2000:d::a6]:34324 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77615 Am 05.09.2014 um 21:16 schrieb Kurt Mueller = : > Am 05.09.2014 um 20:25 schrieb Chris =93Kwpolska=94 Warrick = : >> On Sep 5, 2014 7:57 PM, "Kurt Mueller" = wrote: >>> Could someone please explain the following behavior to me: >>> Python 2.7.7, MacOS 10.9 Mavericks >>>=20 >>>>>> import sys >>>>>> sys.getdefaultencoding() >>> 'ascii' >>>>>> [ord(c) for c in 'A=C4'] >>> [65, 195, 132] >>>>>> [ord(c) for c in u'A=C4'] >>> [65, 196] >>>=20 >>> My obviously wrong understanding: >>> =82A=C4=91 in =82ascii=91 are two characters >>> one with ord A=3D65 and >>> one with ord =C4=3D196 ISO8859-1 >>> =97-> why [65, 195, 132] >>> u=92A=C4=92 is an Unicode string >>> =97-> why [65, 196] >>>=20 >>> It is just the other way round as I would expect. >>=20 >> Basically, the first string is just a bunch of bytes, as provided by = your terminal =97 which sounds like UTF-8 (perfectly logical in 2014). = The second one is converted into a real Unicode representation. The = codepoint for =C4 is U+00C4 (196 decimal). It's just a coincidence that = it also matches latin1 aka ISO 8859-1 as Unicode starts with all 256 = latin1 codepoints. Please kindly forget encodings other than UTF-8. >=20 > So: > =91A=C4=92 is an UTF-8 string represented by 3 bytes: > A -> 41 -> 65 first byte decimal > =C4 -> c384 -> 195 and 132 second and third byte decimal >=20 > u=92A=C4=92 is an Unicode string represented by 2 bytes?: > A -> U+0041 -> 65 first byte decimal, 00 is omitted or not yielded by = ord()? > =C4 -> U+00C4 -> 196 second byte decimal, 00 is ommited or not yielded = by ord()? After reading the ord() manual: The second case should read: u=92A=C4=92 is an Unicode string represented by 2 unicode characters: If Python was built with UCS2 Unicode, then the character=92s code point = must be in the range [0..65535, 16 bits, U-0000..U-FFFF] A -> U+0041 -> 65 first character decimal (code point) =C4 -> U+00C4 -> 196 second character decimal (code point) Am I right now? --=20 Kurt Mueller, kurt.alfred.mueller@gmail.com