Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin1!goblin.stu.neva.ru!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <051dde5c-5293-4a9a-85c8-aa6714db4f69@googlegroups.com>
References: <mailman.1294.1348560867.27098.python-list@python.org> <mailman.1333.1348581385.27098.python-list@python.org> <k3ssuq$78m$1@reader1.panix.com> <cc2771fd-0b2b-4721-9ae0-657bc722ebad@googlegroups.com> <ef917cfd-43a5-4620-a9b4-1c6934624bc4@googlegroups.com> <5062ad83$0$29997$c3e8da3$5496439d@news.astraweb.com> <693ac61b-b1d3-4192-9e50-5166fd119278@googlegroups.com> <mailman.1420.1348653316.27098.python-list@python.org> <447851a9-bc63-4711-a4e6-bff565e28f1f@googlegroups.com> <mailman.1438.1348669456.27098.python-list@python.org> <2b2d20f5-2807-4a61-b284-8075e900db22@googlegroups.com> <k3v5p5$uuo$1@ger.gmane.org> <mailman.1443.1348672708.27098.python-list@python.org> <mailman.1445.1348674333.27098.python-list@python.org> <50641d6d$0$29997$c3e8da3$5496439d@news.astraweb.com> <50642DE0.8030102@mweb.co.za> <CALwzid=AAYNzSebweg6ry1TGBBom++5CqEMnfTHen3ZF3+JdjQ@mail.gmail.com> <mailman.1503.1348766274.27098.python-list@python.org> <051dde5c-5293-4a9a-85c8-aa6714db4f69@googlegroups.com>
Date: Fri, 28 Sep 2012 08:00:16 +1000
Subject: Re: Article on the future of Python
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1520.1348783220.27098.python-list@python.org>
Lines: 36
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:30331

You're posting to both comp.lang.python and python-list, are you aware
that that's redundant?

On Fri, Sep 28, 2012 at 5:09 AM,  <wxjmfauth@gmail.com> wrote:
> This flexible string representation is wrong by design.
> Expecting to divide "Unicode" in chunks and to gain something
> is an illusion.
> It has been created by a computer scientist who thinks "bytes"
> when on that field one has to think "bytes" and usage of the
> characters at the same time.

There's another range of numbers that, in some languages, is divided
for efficiency's sake: Integers below 1<<[bit size]. In Python 2, such
numbers were an entirely different data type (int vs long); other
languages let you use the same data type for both, but "(1<<5)+1" will
be executed much faster than "(1<<500)+1". (And far as I know, a
conforming Python 3 implementation should be allowed to do that; 3.2
on Windows doesn't seem to, though.) That's all PEP 393 is; it's a
performance improvement for a particular subset of values that happens
to fit conveniently into the underlying machine's data storage.

If Python were implemented on a 9-bit computer, I wouldn't be
surprised if the PEP 393 optimizations were applied differently. It's
nothing to do with Latin-1, except insofar as the narrowest form of
string _happens_ to contain everything that's in Latin-1.

Go blame the Unicode consortium for picking that.

> The latin-1 chunk illustrates this wonderfully.

Aside from replace(), as mentioned in this thread, are there any other
ways that this is so wonderfully illustrated? Or is it "wonderfully"
as in "I wonder if people will believe me if I keep spouting
unsubstantiated claims"?

ChrisA