Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <987c4bd9-0e5e-4387-9c78-1075a77d3c47@c6g2000yqh.googlegroups.com>
References: <mailman.3703.1364248275.2939.python-list@python.org> <a52fbe9d-db14-4ed2-bb49-adfb4b56f973@k4g2000yqn.googlegroups.com> <mailman.3771.1364324590.2939.python-list@python.org> <0b779c80-4f50-4716-8c30-47755c15f304@m12g2000yqp.googlegroups.com> <kit1kg$g2u$1@ger.gmane.org> <nad-98F0A4.17004226032013@news.gmane.org> <kitdqr$4m4$2@ger.gmane.org> <nad-8CB9C0.18315026032013@news.gmane.org> <mailman.3805.1364385073.2939.python-list@python.org> <5153a12d$0$29998$c3e8da3$5496439d@news.astraweb.com> <mailman.3845.1364441182.2939.python-list@python.org> <d2cc443a-e049-42ed-abc6-66b5ea600fe7@j1g2000pbq.googlegroups.com> <mailman.3860.1364451682.2939.python-list@python.org> <987c4bd9-0e5e-4387-9c78-1075a77d3c47@c6g2000yqh.googlegroups.com>
Date: Thu, 28 Mar 2013 21:30:27 +1100
Subject: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3868.1364466636.2939.python-list@python.org>
Lines: 31
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:42112

On Thu, Mar 28, 2013 at 8:03 PM, jmfauth <wxjmfauth@gmail.com> wrote:
> Example of a good Unicode understanding.
> If you wish 1) to preserve memory, 2) to cover the whole range
> of Unicode, 3) to keep maximum performance while preserving the
> good work Unicode.org as done (normalization, sorting), there
> is only one solution: utf-8. For this you have to understand,
> what is really a "unicode transformation format".

You really REALLY need to sort out in your head the difference between
correctness and performance. I still haven't seen one single piece of
evidence from you that Python 3.3 fails on any point of Unicode
correctness. Covering the whole range of Unicode has never been a
problem.

In terms of memory usage and performance, though, there's one obvious
solution. Fork CPython 3.3 (or the current branch head[1]), change the
internal representation of a string to be UTF-8 (by the way, that's
the official spelling), and run the string benchmarks. Then post your
code and benchmark figures so other people can replicate your results.

> Python has certainly and definitvely not "revolutionize"
> Unicode.

This is one place where you're actually correct, though, because PEP
393 isn't the first instance of this kind of format - Pike's had it
for years. Funny though, I don't think that was your point :)

[1] Apologies if my terminology is wrong, I'm a git user and did one
quick Google search to see if hg uses the same term.

ChrisA