Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Thu, 28 Mar 2013 14:51:56 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130307 Thunderbird/17.0.4
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
References: <mailman.3703.1364248275.2939.python-list@python.org> <a52fbe9d-db14-4ed2-bb49-adfb4b56f973@k4g2000yqn.googlegroups.com> <mailman.3771.1364324590.2939.python-list@python.org> <0b779c80-4f50-4716-8c30-47755c15f304@m12g2000yqp.googlegroups.com> <kit1kg$g2u$1@ger.gmane.org> <nad-98F0A4.17004226032013@news.gmane.org> <kitdqr$4m4$2@ger.gmane.org> <nad-8CB9C0.18315026032013@news.gmane.org> <mailman.3805.1364385073.2939.python-list@python.org> <5153a12d$0$29998$c3e8da3$5496439d@news.astraweb.com> <mailman.3845.1364441182.2939.python-list@python.org> <d2cc443a-e049-42ed-abc6-66b5ea600fe7@j1g2000pbq.googlegroups.com> <mailman.3860.1364451682.2939.python-list@python.org> <987c4bd9-0e5e-4387-9c78-1075a77d3c47@c6g2000yqh.googlegroups.com> <mailman.3863.1364463394.2939.python-list@python.org> <rOednY4OeOjbqcnMnZ2dnUVZ_oWdnZ2d@westnet.com.au>
In-Reply-To: <rOednY4OeOjbqcnMnZ2dnUVZ_oWdnZ2d@westnet.com.au>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.3879.1364482498.2939.python-list@python.org>
Lines: 16
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:42137

On 28/03/2013 12:11, Neil Hodgson wrote:
> Ian Foote:
>
>> Specifically, indexing a variable-length encoding like utf-8 is not
>> as efficient as indexing a fixed-length encoding.
>
> Many common string operations do not require indexing by character
> which reduces the impact of this inefficiency. UTF-8 seems like a
> reasonable choice for an internal representation to me. One benefit
> of UTF-8 over Python's flexible representation is that it is, on
> average, more compact over a wide set of samples.
>
Implementing the regex module (http://pypi.python.org/pypi/regex) would
have been more difficult if the internal representation had been UTF-8,
because of the need to decode, and the implementation would also have
been slower for that reason.