Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!eternal-september.org!feeder.eternal-september.org!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'string': 0.09; '3),': 0.09; 'ascii': 0.09; 'assuming': 0.09; 'bytes,': 0.09; 'integers': 0.09; 'subject:into': 0.09; 'subject:string': 0.09; 'text"': 0.09; 'subject:How': 0.10; 'cc:addr:python-list': 0.11; 'python': 0.11; '2),': 0.16; 'accepts': 0.16; 'behave': 0.16; 'bytes;': 0.16; 'encoding.': 0.16; 'expecting': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'integers.': 0.16; 'range,': 0.16; 'repr()': 0.16; 'wrote:': 0.18; 'library': 0.18; 'all,': 0.19; 'bit': 0.19; 'thu,': 0.19; '(the': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'certainly': 0.24; 'instance,': 0.24; 'lets': 0.24; 'of.': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'characters': 0.30; 'subject:list': 0.30; 'message- id:@mail.gmail.com': 0.30; '(which': 0.31; 'correctly.': 0.31; 'sep': 0.31; 'option': 0.32; 'everyone': 0.33; 'fri,': 0.33; 'could': 0.34; 'problem': 0.35; "can't": 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'subject:?': 0.36; 'being': 0.38; 'pm,': 0.38; 'either': 0.39; 'how': 0.40; 'ian': 0.60; 'is.': 0.60; 'number,': 0.60; 'different': 0.65; 'talking': 0.65; 'default': 0.69; 'characters,': 0.84; 'careful': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=C81jnQIS1qmQVXvMjn7YTNUMoKK1P2gnxsop/7BZfFQ=; b=hEOHLpKGHU16qMvu2CyiCwbLfHsWR6YHt3RXZGibHS3FOz02RvjiU8ThltQmZ1b5fD StBefVDm67u+0Oilrr8qlGELlb2CMO6vDZ2I0dZhKaZ0T1C7/gss3t3qgv3s6QHkkQRA FBps1rn2BPOapMsn9wtYnEnj4Y65Xix1rjqKaolsv9z8XdKth9Sk633sk+wAQU1AFHUh t9oFCftKhw4+Qsxn9Dmj5o5+pdblL+YkB/iVtZf4NEEy6RbC8JgvPCqcEsgf/hgXZ9UN NhH2z5jiIklFb6uQQ3eql+aCN17xJskwSGOfW843kLSNKnJRY+7dgZWQQEnc+f5iSWqK PQvg== MIME-Version: 1.0 X-Received: by 10.152.43.46 with SMTP id t14mr8269984lal.15.1409883314253; Thu, 04 Sep 2014 19:15:14 -0700 (PDT) In-Reply-To: References: <1amjdb-p3n.ln1@chris.zbmc.eu> Date: Fri, 5 Sep 2014 12:15:14 +1000 Subject: Re: How to turn a string into a list of integers? From: Chris Angelico Cc: Python Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 33 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1409883321 news.xs4all.nl 2833 [2001:888:2000:d::a6]:57406 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77567 On Fri, Sep 5, 2014 at 12:09 PM, Ian Kelly wrote: > On Thu, Sep 4, 2014 at 6:12 PM, Chris Angelico wrote: >> If it's a Unicode string (which is the default in Python 3), all >> Unicode characters will work correctly. > > Assuming the library that needs this is expecting codepoints and will > accept integers greater than 255. They're still valid integers. It's just that someone might not know how to work with them. Everyone has limits - I don't think repr() would like to be fed Graham's Number, for instance, but we still say that it accepts integers :) >> If it's a byte string (the >> default in Python 2), then you can't actually have any Unicode >> characters in it at all, you have bytes; Py2 lets you be a bit sloppy >> with the ASCII range, but technically, you still have bytes, not >> characters.. > > In that case the library will almost certainly accept it, but could be > expecting a different encoding. Yeah. Either way, the problem isn't "be careful about Unicode characters". One option has Unicode characters, the other doesn't, and you need to know which one it is. I just don't like people talking about "Unicode characters" being somehow different from "normal text" or something, and being something that you need to be careful of. It's not that there are some characters that behave nicely, and then other ones ("Unicode" ones) that don't. ChrisA