Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'result,': 0.05; 'indexing': 0.07; 'python': 0.09; 'before.': 0.09; 'indexes': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:string': 0.09; 'terry': 0.09; 'units.': 0.09; '(code': 0.16; '3.3,': 0.16; 'corrects': 0.16; 'encodings': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'subject:unicode': 0.16; 'worst': 0.16; 'string': 0.17; 'wrote:': 0.17; 'jan': 0.18; '3.2': 0.22; 'latter': 0.22; "python's": 0.23; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.28; 'str': 0.29; 'case,': 0.29; 'character': 0.29; 'objects': 0.29; 'points': 0.29; 'this.': 0.29; 'code': 0.31; 'could': 0.32; 'builds': 0.33; 'to:addr:python-list': 0.33; 'text': 0.34; 'faster': 0.35; 'pm,': 0.35; 'received:org': 0.36; 'michael': 0.36; 'but': 0.36; 'does': 0.37; 'rather': 0.37; 'subject:: ': 0.38; 'instead': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'subject:, ': 0.61; 'between': 0.63; 'subject:...': 0.63; 'counts': 0.81; 'received:fios.verizon.net': 0.84; 'subject:, ...': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: Flexible string representation, unicode, typography, ... Date: Sun, 02 Sep 2012 16:38:34 -0400 References: <503a8361$0$6574$c3e8da3$5496439d@news.astraweb.com> <2e92da71-fbd2-467f-9088-1c79fa7bcf69@googlegroups.com> <62566024-df1d-4948-a27a-45c7820ddc6c@googlegroups.com> <503f0e45$0$9416$c3e8da3$76491128@news.astraweb.com> <2a12ba52-232a-41b7-a906-1ec379bbddd7@googlegroups.com> <5043B73F.4020908@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: pool-173-75-251-66.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120824 Thunderbird/15.0 In-Reply-To: <5043B73F.4020908@gmail.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 16 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346618335 news.xs4all.nl 6988 [2001:888:2000:d::a6]:49409 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28316 On 9/2/2012 3:45 PM, Michael Torrie wrote: > In the worst case, Python's strings are as slow as Go because Python > does the exact same thing as Go, but chooses between three encodings > instead of just one. Best case scenario, Python's strings could be much > faster than Go's because indexing through 2 of the 3 encodings is O(1) In CPython 3.3, indexing of str text string objects is always O(1) and it is always indexes and counts code points rather than code units. It was the latter for narrow builds in 3.2 and before. As a result, single character (code point) strings had a length of 2 rather than 1 for extended plane characters. 3.3 corrects this. -- Terry Jan Reedy