Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feed.xsnews.nl!border-1.ams.xsnews.nl!xlned.com!feeder1.xlned.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <7xmx0cg204.fsf@ruckus.brouhaha.com>
References: <mailman.1294.1348560867.27098.python-list@python.org> <mailman.1333.1348581385.27098.python-list@python.org> <k3ssuq$78m$1@reader1.panix.com> <cc2771fd-0b2b-4721-9ae0-657bc722ebad@googlegroups.com> <ef917cfd-43a5-4620-a9b4-1c6934624bc4@googlegroups.com> <5062ad83$0$29997$c3e8da3$5496439d@news.astraweb.com> <693ac61b-b1d3-4192-9e50-5166fd119278@googlegroups.com> <mailman.1420.1348653316.27098.python-list@python.org> <7xmx0cg204.fsf@ruckus.brouhaha.com>
Date: Thu, 27 Sep 2012 03:04:50 +1000
Subject: Re: Article on the future of Python
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1454.1348679093.27098.python-list@python.org>
Lines: 30
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:30228

On Thu, Sep 27, 2012 at 2:52 AM, Paul Rubin <no.email@nospam.invalid> wrote:
> Chris Angelico <rosuav@gmail.com> writes:
>> When you compare against a wide build, semantics of 3.2 and 3.3 are
>> identical, and then - and ONLY then - can you sanely compare
>> performance. And 3.3 stacks up much better.
>
> I like to have seen real world benchmarks against a pure UTF-8
> implementation.  That means O(n) access to the n'th character of a
> string which could theoretically slow some programs down terribly, but I
> wonder how often that actually matters in ways that can't easily be
> worked around.

That's pretty much what we have with the PHP parts of our web site.
We've decreed that everything should be UTF-8 byte streams (actually,
it took some major campaigning from me to get rid of the underlying
thinking that "binary-safe" and "UTF-8" and "characters" and so on
were all equivalent), but there are very few places where we actually
index strings in PHP. There's a small amount of parsing, but it's all
done by splitting on particular strings - if you search for 0x0A in a
UTF-8 bytestream and split at that index, it's the same as searching
for U+000A in a Unicode string and splitting there - and all of our
structural elements fit inside ASCII. The few times we actually care
about character length (eg limiting user-specified rule names to N
characters), we don't much care about performance, because they're
unusual checks.

So, I don't actually have any stats for you, because it's really easy
to just not index strings at all.

ChrisA