Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'matches': 0.07; 'sys': 0.07; 'utf-8': 0.07; 'string': 0.09; '128': 0.09; 'ascii': 0.09; 'bytes,': 0.09; 'bytes.': 0.09; 'converted': 0.09; 'subject:into': 0.09; 'subject:string': 0.09; 'subject:How': 0.10; 'python': 0.11; 'kurt': 0.12; 'ack': 0.16; 'btw:': 0.16; 'encodings': 0.16; 'macos': 0.16; 'ord': 0.16; 'wrote:': 0.18; 'obviously': 0.18; 'starts': 0.20; '>>>': 0.22; 'import': 0.22; 'byte': 0.24; 'logical': 0.24; 'unicode': 0.24; 'second': 0.26; 'header:In- Reply-To:1': 0.27; 'chris': 0.29; 'characters': 0.30; 'subject:list': 0.30; 'bunch': 0.31; 'decimal': 0.31; 'omitted': 0.31; 'sep': 0.31; 'third': 0.33; 'could': 0.34; 'received:google.com': 0.35; 'subject:?': 0.36; 'wrong': 0.37; 'two': 0.37; 'message-id:@gmail.com': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'explain': 0.39; 'to:addr:python.org': 0.39; 'skip:u 10': 0.60; 'kindly': 0.61; 'first': 0.61; 'received:62': 0.63; 'header:Message-Id:1': 0.63; 'email addr:gmail.com': 0.63; 'real': 0.63; 'charset:windows-1252': 0.65; 'covers': 0.68; 'behavior': 0.77; 'yielded': 0.84; 'aka': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=vTGNXvJb6eqiB5lH0Vt4T3XbsoFsfM9HUDl20VF7OuE=; b=sKQRBGwJSLyr9Brn4nVntmoFSh7ge1ASY0OcEyETSHlEUToXQOFp0yUZUp9ySWg1mg BCPS9uin6XES6dTCQ7dI8S3utgJERRHIF2he46+mAK8YN1YKm/UbdPWtgDaB8HYFhL+F GNob/nEoYa1Ps89hgpYxtSy1fLXPmPIMK3ZX+GXea5xDLjHVB3gKw72VaPL+6INn2bDF u92ztRY4on/fyihqZQH1sZIz5xvWEmRLljdI0knIpkiI17xwH/HL0BpDSHzSKKcunNAN Qj/tMmLUBmoOvvC8m/OY4AqALar26Lf/ExGJgsLR5uxTYbYwfFH6WZGJ8x5tsxk22Hi1 YzAw== X-Received: by 10.194.84.175 with SMTP id a15mr14901832wjz.12.1409944616235; Fri, 05 Sep 2014 12:16:56 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: How to turn a string into a list of integers? From: Kurt Mueller In-Reply-To: Date: Fri, 5 Sep 2014 21:16:54 +0200 Content-Transfer-Encoding: quoted-printable References: <1amjdb-p3n.ln1@chris.zbmc.eu> <1k9odb-1qs.ln1@chris.zbmc.eu> To: python-list@python.org X-Mailer: Apple Mail (2.1878.6) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 51 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1409944624 news.xs4all.nl 2952 [2001:888:2000:d::a6]:37702 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77610 Am 05.09.2014 um 20:25 schrieb Chris =93Kwpolska=94 Warrick = : > On Sep 5, 2014 7:57 PM, "Kurt Mueller" = wrote: > > Could someone please explain the following behavior to me: > > Python 2.7.7, MacOS 10.9 Mavericks > > > > >>> import sys > > >>> sys.getdefaultencoding() > > 'ascii' > > >>> [ord(c) for c in 'A=C4'] > > [65, 195, 132] > > >>> [ord(c) for c in u'A=C4'] > > [65, 196] > > > > My obviously wrong understanding: > > =82A=C4=91 in =82ascii=91 are two characters > > one with ord A=3D65 and > > one with ord =C4=3D196 ISO8859-1 > > =97-> why [65, 195, 132] > > u=92A=C4=92 is an Unicode string > > =97-> why [65, 196] > > > > It is just the other way round as I would expect. >=20 > Basically, the first string is just a bunch of bytes, as provided by = your terminal =97 which sounds like UTF-8 (perfectly logical in 2014). = The second one is converted into a real Unicode representation. The = codepoint for =C4 is U+00C4 (196 decimal). It's just a coincidence that = it also matches latin1 aka ISO 8859-1 as Unicode starts with all 256 = latin1 codepoints. Please kindly forget encodings other than UTF-8. So: =91A=C4=92 is an UTF-8 string represented by 3 bytes: A -> 41 -> 65 first byte decimal =C4 -> c384 -> 195 and 132 second and third byte decimal u=92A=C4=92 is an Unicode string represented by 2 bytes?: A -> U+0041 -> 65 first byte decimal, 00 is omitted or not yielded by = ord()? =C4 -> U+00C4 -> 196 second byte decimal, 00 is ommited or not yielded = by ord()? > BTW: ASCII covers only the first 128 bytes. ACK --=20 Kurt Mueller, kurt.alfred.mueller@gmail.com