Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!eternal-september.org!feeder.eternal-september.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(even': 0.05; 'explicitly': 0.05; 'interpreter': 0.05; 'represents': 0.05; '*not*': 0.07; 'utf-8': 0.07; 'string': 0.09; '[0]': 0.09; 'bytes,': 0.09; 'compact': 0.09; 'correct,': 0.09; 'encode': 0.09; 'string;': 0.09; 'subject:into': 0.09; 'subject:string': 0.09; 'subject:How': 0.10; 'cc:addr:python-list': 0.11; 'python': 0.11; 'encoding.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'ignoring': 0.16; 'integers,': 0.16; 'integers.': 0.16; 'list)': 0.16; 'roy': 0.16; 'semantics': 0.16; 'somehow,': 0.16; 'subclass': 0.16; 'zero.': 0.16; 'wrote:': 0.18; 'figures': 0.19; 'implementing': 0.19; '(the': 0.22; '>>>': 0.22; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'certainly': 0.24; 'string,': 0.24; 'unicode': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; 'references': 0.26; 'somewhere': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'words': 0.29; 'characters': 0.30; 'subject:list': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'gives': 0.31; '(perhaps': 0.31; 'equality': 0.31; 'object.': 0.31; 'responded': 0.31; 'sep': 0.31; 'steven': 0.31; 'could': 0.34; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'sequence': 0.36; "i'll": 0.36; 'possible': 0.36; 'subject:?': 0.36; 'turn': 0.37; 'two': 0.37; 'list': 0.37; 'performance': 0.37; 'short': 0.38; 'anything': 0.39; 'sure': 0.39; 'most': 0.60; 'entire': 0.61; "you're": 0.61; 'here:': 0.62; 'real': 0.63; 'more': 0.64; 'different': 0.65; 'skip:1 20': 0.65; 'talking': 0.65; 'here': 0.66; 'smith': 0.68; 'to,': 0.72; 'million': 0.74; 'guaranteed': 0.75; 'pike': 0.84; 'notable': 0.91; 'same,': 0.91; 'to:none': 0.92; 'imagine': 0.93; 'directly.': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=uAydCQKLw2amXztcxMETj7jJbz8tOdamxBqywqLuG98=; b=H5DXdigwjTcKiOpI7jAz3MJUPkoroQZMKJAL+5DOuT1Fvfr02Gu9LMUSTnt5gKErP7 o6qT1L8JXqmupf09OUN8d+0HFtDbnXhdXZjoEfdFbhesIOtm05T6i29e6lrKxQvm/d7C zQomfzDZnUgiombJN1BdkU02CUYjCokq7FfDA8J84WwPFI+8qeNOqE24zW5H7MkP76mJ ycG9QpAFfQ2mI+w87iLOaPab/LhNT5cUBw2YrF1kDyGDBtUlPyxd6BLm9sBiAF+j4qsk dt9c+7zygY37+oK7ZMiZnbFe6rtADRutY8/AEzmrLaDzi+SIHBCLyIpgp7gYY6D3XHgB irHw== MIME-Version: 1.0 X-Received: by 10.42.216.82 with SMTP id hh18mr34397icb.61.1410135134092; Sun, 07 Sep 2014 17:12:14 -0700 (PDT) In-Reply-To: References: <1amjdb-p3n.ln1@chris.zbmc.eu> <1k9odb-1qs.ln1@chris.zbmc.eu> <540aa002$0$29968$c3e8da3$5496439d@news.astraweb.com> <540b504a$0$29974$c3e8da3$5496439d@news.astraweb.com> <540bb91c$0$29969$c3e8da3$5496439d@news.astraweb.com> <540C712C.8000806@mrabarnett.plus.com> Date: Mon, 8 Sep 2014 10:12:13 +1000 Subject: Re: How to turn a string into a list of integers? From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 42 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1410135142 news.xs4all.nl 2885 [2001:888:2000:d::a6]:54747 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:77690 On Mon, Sep 8, 2014 at 1:40 AM, Roy Smith wrote: > Well, technically, what you store is something which has the right > behavior. If I wrote: > > my_huffman_coded_list = [0] * 1000000 > > I don't know of anything which requires Python to actually generate a > million 0's and store them somewhere (even ignoring interning of > integers). As long as it generated an object (perhaps a subclass of > list) which responded to all of list's methods the same way a real list > would, it could certainly build a more compact representation. Steven hinted at it, but I'll say one thing more explicitly here: There's actually something that requires Python to *not* generate a million 0 integers. What you get is a million references to the *same* zero. >>> another_list = [object()] * 1000000 >>> sum(id(x) for x in another_list) 140287290433648000000 >>> id(another_list[0]) * len(another_list) 140287290433648000000 The two figures are guaranteed to be the same, these are all the same object. But what you're talking about here is an alternative encoding. And it's definitely possible for different Pythons to encode strings differently; uPy uses UTF-8 internally, which gives different performance metrics, but guarantees the same semantics; I could imagine someone implementing a Python interpreter in Pike, and using the Pike string type to store Python strings (the semantics will all be correct, as it's a Unicode string; the most notable difference is that Pike strings are guaranteed to be interned, so all equality comparisons are identity checks); if you wanted to, I'm sure you could build a Python that uses a dictionary of words (added to every time you create a string, of course), and actually represents entire words as short integers, which would mean individual characters aren't necessarily represented directly. But somehow, you have to turn the concept of "sequence of Unicode characters" into some well-defined sequence of bytes, and that's an encoding. ChrisA