Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'say,': 0.05; 'indexing': 0.07; 'utf-8': 0.07; 'python': 0.09; 'length.': 0.09; 'pep': 0.09; 'subject:string': 0.09; 'subset': 0.09; 'cc:addr:python-list': 0.10; 'encoding': 0.15; 'encodings': 0.16; 'subject:unicode': 0.16; 'worst': 0.16; 'string': 0.17; 'wrote:': 0.17; 'byte': 0.17; 'variable': 0.20; 'cc:2**0': 0.23; "python's": 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'values': 0.26; 'used,': 0.27; 'case,': 0.29; 'points': 0.29; 'probably': 0.29; "i'm": 0.29; 'code': 0.31; 'could': 0.32; 'point,': 0.33; 'version': 0.34; 'faster': 0.35; 'pm,': 0.35; 'michael': 0.36; 'but': 0.36; 'characters': 0.36; 'does': 0.37; 'subject:: ': 0.38; 'instead': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'skip:u 10': 0.60; 'subject:, ': 0.61; 'between': 0.63; 'subject:...': 0.63; 'afraid': 0.66; 'header:Reply-To:1': 0.68; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'naturally.': 0.84; 'received:74.208.4.194': 0.84; 'subject:, ...': 0.84 Date: Sun, 02 Sep 2012 16:07:35 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Michael Torrie Subject: Re: Flexible string representation, unicode, typography, ... References: <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> <2e92da71-fbd2-467f-9088-1c79fa7bcf69@googlegroups.com> <62566024-df1d-4948-a27a-45c7820ddc6c@googlegroups.com> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <2a12ba52-232a-41b7-a906-1ec379bbddd7@googlegroups.com> <5043B73F.4020908@gmail.com> In-Reply-To: <5043B73F.4020908@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:JeajhlQI+IFAve0srg145nv8mQ8PB9JvHGBW4NddqaD F30CWQZH9k5/jRzLigKgfjoOORPKNTilDI8cQlQ/4F9Zl+YDMm znZuxk4b3bAJ3BfJOZF6DPlVcD21BEtpGlYpNzN1dVoarkQhFw FTmvBle0goUkhhNsWCzpXVzdPviXKyHCpemjse5fXR/BAMW8J8 5EDzWNcYUFW5cpJ7n9A8nyrU+D+KwUtYvHBhHlYWZWs5egPOjl AY2Ag9OMDHAAKtq2KV/evvePMrv83HJBzC4kLrCKA3ZAplO6HO HgL8vo+hwTmMN3TBBZp9Z6P+k8e91MKUvH73ls9SLhPqSWiYA= = Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: d@davea.name List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 25 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346616485 news.xs4all.nl 6912 [2001:888:2000:d::a6]:55598 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28310 On 09/02/2012 03:45 PM, Michael Torrie wrote: > : > In the worst case, Python's strings are as slow as Go because Python > does the exact same thing as Go, but chooses between three encodings > instead of just one. Best case scenario, Python's strings could be > much faster than Go's because indexing through 2 of the 3 encodings is > O(1) because they are constant-width encodings. If as you say, the > latin-1 subset of UTF-8 is used, then UTF-8 indexing is O(1) too, > otherwise it's probably O(n). I'm afraid you have it backwards. the Utf-8 version of the latin-1-compatible characters would be variable length. But my understanding of the pep is that the internal one-byte format is simply the lowest order byte of each code point, after assuring that all code points in the particular string are less than 256. That's going to coincidentally resemble latin-1's encoding, but since it's an internal form, the resemblance is irrelevant. Anyway, those one-byte values are going to be O(1), naturally. No encoding involved, and no searching nor expanding. -- DaveA