Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python.': 0.02; 'argument': 0.05; 'suppose': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bits': 0.09; 'subject:into': 0.09; 'subject:string': 0.09; 'subject:How': 0.10; 'python': 0.11; 'kurt': 0.12; '2.7:': 0.16; '3.3,': 0.16; 'dependent.': 0.16; 'macos': 0.16; 'unichr(i)': 0.16; 'unicode,': 0.16; 'wrote:': 0.18; 'bit': 0.19; 'manual': 0.22; 'byte': 0.24; 'bytes': 0.24; 'string,': 0.24; 'unicode': 0.24; 'header': 0.24; 'regardless': 0.24; 'this:': 0.26; 'header :In-Reply-To:1': 0.27; 'point': 0.28; 'fixed': 0.29; 'array': 0.29; 'character': 0.29; 'points': 0.29; 'subject:list': 0.30; 'code': 0.31; '255,': 0.31; "d'aprano": 0.31; 'so-called': 0.31; 'steven': 0.31; 'could': 0.34; 'skip:u 20': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'version': 0.36; 'right?': 0.36; 'thanks': 0.36; 'subject:?': 0.36; 'two': 0.37; 'represent': 0.38; 'message-id:@gmail.com': 0.38; 'configured': 0.38; 'depends': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'explain': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'how': 0.40; 'even': 0.60; 'skip:u 10': 0.60; 'range': 0.61; "you'll": 0.62; 'received:62': 0.63; 'header:Message-Id:1': 0.63; 'email addr:gmail.com': 0.63; 'more': 0.64; 'charset:windows-1252': 0.65; 'between': 0.67; 'containing': 0.69; 'behavior': 0.77; 'unclear': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=TYc4ok8rcRsT4hLrLQBxctWKKITUQWNHILD8XNfQNTU=; b=YnYXIrnRLH7Olm+o4Awr5dm+xGcMb7EkErEv36rNLPYS95NlB8WtZtjs3bxMA/JQTs 1CgrjnPUJ//J55uOpd/VVdWvohG8I12qUcY6T7mjPJ9SOfu3kmJM8OtiTvwyWmCLLSys M6t8JyB61mzgI8UD5aXPCDI+i/tCXr93csYmrE2bozey6wouhg32iVSP8snYo4+NUWLa 52uf8jqKLmUFcZ+Z8Lelekz2bXtAYHlxPDrxIc1kRx0ACe3y2a7PKp0EUDmc8O69jlYC gaiFxS/YigGHUIYsa9RoN7PT2sXg+yPWIDCouPh9zL6MbWFE8Zj3rh8o9MXQGyZ9OYWg zncw== X-Received: by 10.194.80.71 with SMTP id p7mr21132444wjx.21.1410005728730; Sat, 06 Sep 2014 05:15:28 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: How to turn a string into a list of integers? From: Kurt Mueller In-Reply-To: <540aa002$0$29968$c3e8da3$5496439d@news.astraweb.com> Date: Sat, 6 Sep 2014 14:15:26 +0200 Content-Transfer-Encoding: quoted-printable References: <1amjdb-p3n.ln1@chris.zbmc.eu> <1k9odb-1qs.ln1@chris.zbmc.eu> <540aa002$0$29968$c3e8da3$5496439d@news.astraweb.com> To: python-list@python.org X-Mailer: Apple Mail (2.1878.6) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 87 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1410005730 news.xs4all.nl 2908 [2001:888:2000:d::a6]:43001 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77650 Am 06.09.2014 um 07:47 schrieb Steven D'Aprano = : > Kurt Mueller wrote: >> Could someone please explain the following behavior to me: >> Python 2.7.7, MacOS 10.9 Mavericks [snip] Thanks for the detailed explanation. I think I understand a bit better = now. Now the part of the two Python builds is still somewhat unclear to me. > If you could peer under the hood, and see what implementation Python = uses to > store that string, you would see something version dependent. In = Python > 2.7, you would see an object more or less something vaguely like this: >=20 > [object header containing various fields] > [length =3D 2] > [array of bytes =3D 0x0041 0x00C4] >=20 >=20 > That's for a so-called "narrow build" of Python. If you have a "wide = build", > it will something like this: >=20 > [object header containing various fields] > [length =3D 2] > [array of bytes =3D 0x00000041 0x000000C4] >=20 > In Python 3.3, "narrow builds" and "wide builds" are gone, and you'll = have > something conceptually like this: >=20 > [object header containing various fields] > [length =3D 2] > [tag =3D one byte per character] > [array of bytes =3D 0x41 0xC4] >=20 > Some other implementations of Python could use UTF-8 internally: >=20 > [object header containing various fields] > [length =3D 2] > [array of bytes =3D 0x41 0xC3 0x84] >=20 >=20 > or even something more complex. But the important thing is, regardless = of > the internal implementation, Python guarantees that a Unicode string = is > treated as a fixed array of code points. Each code point has a value > between 0 and, not 127, not 255, not 65535, but 1114111. In Python 2.7: As I learned from the ord() manual: If a unicode argument is given and Python was built with UCS2 Unicode, (I suppose this is the narrow build in your terms), then the character=92s code point must be in the range [0..65535] = inclusive; I understand: In a UCS2 build each character of a Unicode string uses 16 Bits and can represent code points from U-0000..U-FFFF. =46rom the unichr(i) manual I learn: The valid range for the argument depends how Python was configured =96 it may be either UCS2 [0..0xFFFF] or UCS4 [0..0x10FFFF]. I understand: narrow build is UCS2, wide build is UCS4 - In a UCS2 build each character of an Unicode string uses 16 Bits and = has=20 code points from U-0000..U-FFFF (0..65535) - In a UCS4 build each character of an Unicode string uses 32 Bits and = has=20 code points from U-00000000..U-0010FFFF (0..1114111) Am I right? --=20 Kurt Mueller, kurt.alfred.mueller@gmail.com