Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <987098b6-d79c-4597-b656-9b3e983740e8@z3g2000vbg.googlegroups.com>
References: <d618f760-67d0-4c5f-865b-406e9a58a611@h11g2000vbf.googlegroups.com> <kicmmc$1eb$1@reader2.panix.com> <987098b6-d79c-4597-b656-9b3e983740e8@z3g2000vbg.googlegroups.com>
Date: Thu, 21 Mar 2013 08:02:53 +1100
Subject: Re: "monty" < "python"
From: Tim Delaney <tim.delaney@aptare.com>
To: Python-List <python-list@python.org>
Content-Type: multipart/alternative; boundary="e89a8fb1fd74ad5ff404d8618b6b"
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3576.1363813553.2939.python-list@python.org>
Lines: 77
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:41623

--e89a8fb1fd74ad5ff404d8618b6b
Content-Type: text/plain; charset="UTF-8"

On 21 March 2013 06:40, jmfauth <wxjmfauth@gmail.com> wrote:

> ----
> [snip usual rant from jmf]


Franz, please pay no attention to jmf. He has become obsessed with a single
small regression in Python 3.3 in performance with how strings perform in a
very small domain that rarely shows up in practice (although as he has
demonstrated, it is easy to create a microbenchmark that makes it appear to
be much worse than it is).

The regression is a consequence of the decision in Python 3.3 to
*correctly* support the full range of Unicode characters whilst also
reducing the required memory where possible. In the vast majority of cases
this is a performance *improvement*. It is only "optimised for the ascii
user" in the sense that in the Unicode standard the pre-existing ASCII
characters only require 1 byte per code point and hence can be stored in
less memory than most other Unicode code points. The possible character
widths are 1, 2 and 4 bytes.

The actual regression occurs when concatentating/replacing/etc a character
to a string that is wider than any other character currently in the string.
In this situation the new string needs to be widened (increase the number
of bytes used by every character) which is a much more expensive operation
than simply creating a new string (which is what would happen if the
character was the same size or smaller).

It has been acknowledged as a real regression, but he keeps hijacking every
thread where strings are mentioned to harp on about it. He has shown no
inclination to attempt to *fix* the regression and is rapidly coming to be
regarded as a troll by most participants in this list.

Tim Delaney

--e89a8fb1fd74ad5ff404d8618b6b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On 21 March 2013 06:40, jmfauth <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:wxjmfauth@gmail.com" target=3D"_blank">wxjmfauth@gmail.com</a=
>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div class=3D"gmail_quote=
"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid=
;padding-left:1ex">
----<br>
[snip usual rant from jmf]</blockquote><div><br></div>Franz, please pay no =
attention to jmf. He has become obsessed with a single small regression in =
Python 3.3 in performance with how strings perform in a very small domain t=
hat rarely shows up in practice (although as he has demonstrated, it is eas=
y to create a microbenchmark that makes it appear to be much worse than it =
is).<div>
<br></div><div>The regression is a consequence of the decision in Python 3.=
3 to *correctly* support the full range of Unicode characters whilst also r=
educing the required memory where possible. In the vast majority of cases t=
his is a performance *improvement*. It is only &quot;optimised for the asci=
i user&quot; in the sense that in the Unicode standard the pre-existing ASC=
II characters only require 1 byte per code point and hence can be stored in=
 less memory than most other Unicode code points. The possible character wi=
dths are 1, 2 and 4 bytes.</div>
<div><br></div><div>The actual regression occurs when concatentating/replac=
ing/etc a character to a string that is wider than any other character curr=
ently in the string. In this situation the new string needs to be widened (=
increase the number of bytes used by every character) which is a much more =
expensive operation than simply creating a new string (which is what would =
happen if the character was the same size or smaller).<div>
<div><br></div><div>It has been acknowledged as a real regression, but he k=
eeps hijacking every thread where strings are mentioned to harp on about it=
. He has shown no inclination to attempt to *fix* the regression and is rap=
idly coming to be regarded as a troll by most participants in this list.</d=
iv>
</div><div><br></div></div><div>Tim Delaney=C2=A0</div></div></div></div>

--e89a8fb1fd74ad5ff404d8618b6b--