Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'stored': 0.04; 'encoded': 0.07; 'semantic': 0.07; 'subject:data': 0.07; 'bytes.': 0.09; 'subject:string': 0.09; 'python': 0.11; 'cc:addr:python-list': 0.15; '-tkc': 0.16; 'bytes,': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'message-id:@tim.thechases.com': 0.16; 'received:70.251': 0.16; 'received:dsl.rcsntx.swbell.net': 0.16; 'received:rcsntx.swbell.net': 0.16; 'received:swbell.net': 0.16; 'subject:changing': 0.16; 'string': 0.18; 'wrote:': 0.21; 'header :In-Reply-To:1': 0.22; 'header:User-Agent:1': 0.23; '(or': 0.24; 'cc:no real name:2**0': 0.26; 'cc:addr:python.org': 0.27; 'correctly.': 0.29; 'cc:2**0': 0.31; 'fact': 0.31; 'subject: (': 0.33; 'byte': 0.33; 'bytes': 0.33; 'unicode': 0.33; 'could': 0.34; 'willing': 0.34; 'url:python': 0.34; 'ability': 0.35; 'actually': 0.35; 'subject:)': 0.36; 'url:library': 0.36; 'but': 0.36; 'url:org': 0.36; 'several': 0.38; 'received:70': 0.38; 'how': 0.40; 'long': 0.40; 'play': 0.67; 'series': 0.80; 'nonsense.': 0.84; 'received:50.22': 0.84; 'want,': 0.95 Date: Wed, 28 Mar 2012 13:49:19 -0500 From: Tim Chase User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111120 Icedove/3.1.16 MIME-Version: 1.0 To: Ross Ridge Subject: Re: "convert" string to bytes without changing data (encoding) References: <9tg21lFmo3U1@mid.dfncis.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - boston.accountservergroup.com X-AntiAbuse: Original Domain - python.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tim.thechases.com X-Source: X-Source-Args: X-Source-Dir: Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1332960542 news.xs4all.nl 6879 [2001:888:2000:d::a6]:35023 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:22302 On 03/28/12 13:05, Ross Ridge wrote: > Ross Ridge wr= >> But a Python Unicode string might be stored in several >> ways; for all you know, it might actually be stored as a sequence of >> apples in a refrigerator, just as long as they can be referenced >> correctly. > > But it is in fact only stored in one particular way, as a series of bytes. > >> There's no logical Python way to turn that into a series of bytes. > > Nonsense. Play all the semantic games you want, it already is a series > of bytes. Internally, they're a series of bytes, but they are MEANINGLESS bytes unless you know how they are encoded internally. Those bytes could be UTF-8, UTF-16, UTF-32, or any of a number of other possible encodings[1]. If you get the internal byte stream, there's no way to meaningfully operate on it unless you also know how it's encoded (or you're willing to sacrifice the ability to reliably get the string back). -tkc [1] http://docs.python.org/library/codecs.html#standard-encodings