Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.019 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'subject:data': 0.07; '*not*': 0.09; '[0]': 0.09; 'be:': 0.09; 'cpython': 0.09; 'pretend': 0.09; 'subject:string': 0.09; 'python': 0.11; 'random': 0.13; 'cc:addr:python-list': 0.15; 'essentially': 0.15; '16-bit': 0.16; 'encoding.': 0.16; 'ironpython': 0.16; 'lazily': 0.16; 'subject:changing': 0.16; 'string': 0.18; 'def': 0.20; 'wrote:': 0.21; 'header:In-Reply-To:1': 0.22; 'header:User-Agent:1': 0.23; 'convert': 0.23; 'thus': 0.23; 'import': 0.24; 'cc:no real name:2**0': 0.26; 'cc:addr:python.org': 0.27; 'pm,': 0.28; 'that.': 0.28; 'operations.': 0.29; 'vice': 0.29; 'asking': 0.29; 'cc:2**0': 0.31; 'chris': 0.32; 'guess': 0.32; 'subject: (': 0.33; 'byte': 0.33; 'bytes': 0.33; 'encoding': 0.33; 'host': 0.34; 'probably': 0.34; 'there': 0.35; 'things': 0.36; 'subject:)': 0.36; 'sure': 0.36; 'but': 0.36; 'something': 0.38; 'skip:0 10': 0.38; 'correct': 0.38; 'being': 0.39; 'course': 0.61; 'guarantee': 0.62; 'yes,': 0.63; 'more': 0.63; 'between': 0.64; 'strings': 0.66; 'is.': 0.67; 'length': 0.67; 'skip:5 10': 0.67; 'direct': 0.70; 'skip:r 30': 0.81; 'backing': 0.84; 'cow': 0.84; 'evaluated': 0.84; 'varies': 0.84; 'guaranteed': 0.85 Date: Wed, 28 Mar 2012 14:20:50 -0500 From: Evan Driscoll User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Ross Ridge Subject: Re: Re: "convert" string to bytes without changing data (encoding) References: <9tg21lFmo3U1@mid.dfncis.de> <4f73504c$0$29981$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Seen-By: mailfromd 4.1 sandstone.cs.wisc.edu Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 40 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1332962455 news.xs4all.nl 6877 [2001:888:2000:d::a6]:54164 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:22304 On 01/-10/-28163 01:59 PM, Ross Ridge wrote: > Steven D'Aprano wrote: >> The right way to convert bytes to strings, and vice versa, is via >> encoding and decoding operations. > > If you want to dictate to the original poster the correct way to do > things then you don't need to do anything more that. You don't need to > pretend like Chris Angelico that there's isn't a direct mapping from > the his Python 3 implementation's internal respresentation of strings > to bytes in order to label what he's asking for as being "silly". That mapping may as well be: def get_bytes(some_string): import random length = random.randint(len(some_string), 5*len(some_string)) bytes = [0] * length for i in xrange(length): bytes[i] = random.randint(0, 255) return bytes Of course this is hyperbole, but it's essentially about as much guarantee as to what the result is. As many others have said, the encoding isn't defined, and I would guess varies between implementations. (E.g. if Jython and IronPython use their host platforms' native strings, both have 16-bit chars and thus probably use UTF-16 encoding. I am not sure what CPython uses, but I bet it's *not* that.) It's even guaranteed that the byte representation won't change! If something is lazily evaluated or you have a COW string or something, the bytes backing it will differ. So yes, you can say that pretending there's not a mapping of strings to internal representation is silly, because there is. However, there's nothing you can say about that mapping. Evan