Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #41201

Re: String performance regression from python 3.2 to 3.3

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'win32': 0.03; 'broken': 0.03; 'url:pipermail': 0.05; 'ascii': 0.07; 'indexing': 0.07; 'raised': 0.07; 'referring': 0.07; 'python': 0.09; 'before.': 0.09; 'issue:': 0.09; 'msi': 0.09; 'notation': 0.09; 'regression': 0.09; 'sep': 0.09; 'spec': 0.09; 'way:': 0.09; 'bug': 0.10; 'stored': 0.10; 'subject:python': 0.11; '2.7': 0.13; 'index': 0.13; '(var': 0.16; '3.2.': 0.16; '3.3,': 0.16; 'buggy': 0.16; 'build"': 0.16; 'expected,': 0.16; 'foo()': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'installer,': 0.16; 'semantically': 0.16; 'subject:3.3': 0.16; 'subject:String': 0.16; 'thread.': 0.16; 'unicode)': 0.16; 'why,': 0.16; 'wed,': 0.16; 'string': 0.17; 'wrote:': 0.17; 'basically': 0.17; 'fixed.': 0.17; 'instance,': 0.17; 'thu,': 0.17; 'unicode': 0.17; '>>>': 0.18; 'memory': 0.18; 'windows': 0.19; 'versions': 0.20; 'bit': 0.21; 'fairly': 0.21; '3.2': 0.22; "i'd": 0.22; 'split': 0.23; 'long,': 0.24; 'linux': 0.24; 'script': 0.24; 'header:In-Reply-To:1': 0.25; '(which': 0.26; 'common': 0.26; 'am,': 0.27; 'bugs': 0.27; '2.6': 0.27; 'see,': 0.27; 'message- id:@mail.gmail.com': 0.27; "doesn't": 0.28; 'chris': 0.28; 'character.': 0.29; 'represented': 0.29; 'character': 0.29; 'included': 0.29; "skip:' 10": 0.30; 'function': 0.30; 'up.': 0.31; 'code': 0.31; 'says': 0.33; 'builds': 0.33; 'impression': 0.33; 'skip:j 20': 0.33; 'ubuntu': 0.33; 'problem': 0.33; 'to:addr :python-list': 0.33; 'version': 0.34; "can't": 0.34; 'received:google.com': 0.34; 'list': 0.35; 'compared': 0.35; 'platforms,': 0.35; 'pm,': 0.35; 'too.': 0.35; 'there': 0.35; 'but': 0.36; 'url:org': 0.36; 'be.': 0.36; 'useful': 0.36; 'should': 0.36; 'possible': 0.37; 'skip:t 40': 0.37; 'does': 0.37; 'two': 0.37; 'being': 0.37; 'rather': 0.37; 'subject:: ': 0.38; 'mean': 0.38; 'some': 0.38; 'things': 0.38; '2010,': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'build': 0.39; 'google': 0.39; 'little': 0.39; 'url:mail': 0.40; 'skip:u 10': 0.60; 'chance': 0.61; "you'll": 0.62; 'wide': 0.62; 'is.': 0.62; 'thomas': 0.62; 'upgrading': 0.62; 'virus:</script': 0.63; 'virus:<script': 0.63; 'different': 0.63; 'ever': 0.63; 'more': 0.63; 'replying': 0.64; 'making': 0.64; 'charset:windows-1252': 0.65; 'readers': 0.65; 'subject': 0.66; '(based': 0.84; '(oh': 0.84; '2013': 0.84; 'fortunately': 0.84; 'ships': 0.84; 'rusi': 0.91; 'url:mozilla': 0.91; 'python.org,': 0.93; 'wait,': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=Sn9yLTyrJFlsxrbnKOSeoMZxuT0dfjC7nbVRO/JsCX0=; b=YRehzVCok98PMwL3W0wrxpF8Eq4fAnr7fxqIqi4E+MgFpaIhpgSL1ev/JdF8NsgFfX mkbQyP8Bk73Whkmjt9fyVVLw4FPquk78tjsNQuJI8aLPLk+qJ/zKYykP36tM6NBO5713 fGuIZ2iLhPY+nZci+5+0dBm/rZzeqK8GjPvYbw9e48hCd7Amn4r0+pGgptcBr/nXNG/s ENNnYZS6t5udwMNJ2h2/d+vZrltfHigfOX+ty19qTQSGIxfvWrUECoUJoqDOyQ9i49bN ZcPoNaApE3s7zYv+gZZHFjNy8NDBmKnHHIkt4FDZKIUJSTMvGVZodkCS6BeOGWCtLRN7 xbZw==
MIME-Version 1.0
X-Received by 10.58.253.161 with SMTP id ab1mr129028ved.55.1363220351863; Wed, 13 Mar 2013 17:19:11 -0700 (PDT)
In-Reply-To <2992273.neLn1eVAPo@PointedEars.de>
References <23a42297-9262-4ace-87ad-138999b1ddd6@z3g2000vbg.googlegroups.com> <a1a6394a-e9c7-407b-9f6d-ff44de1b65de@y2g2000pbg.googlegroups.com> <eabe27a9-099a-4e2c-92fb-bdf3819c2561@kw7g2000pbb.googlegroups.com> <mailman.3259.1363172350.2939.python-list@python.org> <2992273.neLn1eVAPo@PointedEars.de>
Date Thu, 14 Mar 2013 11:19:11 +1100
Subject Re: String performance regression from python 3.2 to 3.3
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=windows-1252
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3278.1363220353.2939.python-list@python.org> (permalink)
Lines 97
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1363220353 news.xs4all.nl 6939 [2001:888:2000:d::a6]:42229
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:41201

Show key headers only | View raw


On Thu, Mar 14, 2013 at 4:42 AM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> Chris Angelico wrote:
>
>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote:
>>> Uhhh..
>>> Making the subject line useful for all readers
>>
>> I should have read this one before replying in the other thread.
>>
>> jmf, I'd like to see evidence that there has been a performance
>> regression compared against a wide build of Python 3.2. You still have
>> never answered this fundamental, that the narrow builds of Python are
>> *BUGGY* in the same way that JavaScript/ECMAScript is.
>
> Interesting.  From my work I was under the impression that I knew ECMAScript
> and its implementations fairly well, yet I have never heard of this before.
>
> What do you mean by “narrow build” and “wide build” and what exactly is the
> bug “narrow builds” of Python 3.2 have in common with JavaScript/ECMAScript?
> To which implementation of ECMAScript are you referring – or are you
> referring to the Specification as such?

The ECMAScript spec says that strings are stored and represented in
UTF-16. Python versions up to 3.2 came in two varieties: narrow, which
included (I believe) the Windows builds available on python.org, and
wide, which was (again, I think) the default Linux config. The problem
predates Python 3 and its default string being Unicode - the Py2
unicode type has the same issue:

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32
>>> u"\U00012345"
u'\U00012345'
>>> len(_)
2

Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u"\U00012345"
u'\U00012345'
>>> len(_)
1


That's the Python msi installer, and the default system Python from an
Ubuntu 10.10. The exact same code does different things on different
platforms, and on the Windows (narrow-build), it's possible to split
surrogates:

>>> u"\U00012345"[0]
u'\ud808'
>>> u"\U00012345"[1]
u'\udf45'

You can see the same thing in Javascript too. Here's a little demo I
just knocked together:

<script>
function foo()
{
	var txt=document.getElementById("in").value;
	var msg="";
	for (var i=0;i<txt.length;++i) msg+="["+i+"]: "+txt.charCodeAt(i)+"
"+txt.charCodeAt(i).toString(16)+"\n";
	document.getElementById("out").value=msg;
}
</script>
<input id=in><input type=button onclick="foo()"
value="Show"><br><textarea id=out rows=25 cols=80></textarea>


Give it an ASCII string and you'll see, as expected, one index (based
on string indexing or charCodeAt, same thing) for each character. Same
if it's all BMP. But put an astral character in and you'll see
00.00.d8.00/24 (oh wait, CIDR notation doesn't work in Unicode) come
up. I raised this issue on the Google V8 list and on the ECMAScript
list es-discuss@mozilla.org, and was basically told that since
JavaScript has been buggy for so long, there's no chance of ever
making it bug-free:

https://mail.mozilla.org/pipermail/es-discuss/2012-December/027384.html

Fortunately for Python, there are version numbers, and policies that
permit bugs to actually get fixed. (Which is why, for instance, Debian
Squeeze still ships Python 2.6 rather than upgrading to 2.7 - in case
some script is broken by that change. Can't do that with web
browsers.) As of Python 3.3, all Pythons function the same way: it's
semantically a "wide build" (UTF-32), but with a memory usage
optimization. That's how it needs to be.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

A reply for rusi (FSR) jmfauth <wxjmfauth@gmail.com> - 2013-03-13 02:36 -0700
  Re: A reply for rusi (FSR) rusi <rustompmody@gmail.com> - 2013-03-13 03:07 -0700
    String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-13 03:11 -0700
      Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-13 21:59 +1100
        Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-13 09:49 -0700
          Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 10:43 +1100
          Re: String performance regression from python 3.2 to 3.3 MRAB <python@mrabarnett.plus.com> - 2013-03-14 00:52 +0000
          Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 11:55 +1100
          Re: String performance regression from python 3.2 to 3.3 MRAB <python@mrabarnett.plus.com> - 2013-03-14 02:01 +0000
            Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-14 04:05 +0000
              Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 17:47 +1100
                Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-14 03:48 -0700
                Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-14 19:14 -0400
                Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-14 20:48 -0400
                Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 10:07 -0700
                RE: String performance regression from python 3.2 to 3.3 Andriy Kornatskyy <andriy.kornatskyy@live.com> - 2013-03-15 21:04 +0300
          Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-13 22:35 -0400
          Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 17:21 +1100
        Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-13 18:42 +0100
          Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 11:19 +1100
            Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-16 03:44 +0100
              Re: String performance regression from python 3.2 to 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-16 03:56 +0000
                Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 21:26 -0700
                Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 08:47 +0000
                Re: String performance regression from python 3.2 to 3.3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-17 09:00 +1100
                Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 18:10 -0400
              Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 14:59 +1100
                Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-16 05:12 +0100
                Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 15:20 +1100
                Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 22:21 -0700
              Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 15:09 +1100
                Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 21:35 -0700
                Re: String performance regression from python 3.2 to 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-16 04:56 +0000
                Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-16 01:05 -0400
                Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 05:38 +0000
                Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 05:25 +0000
                Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 09:29 -0400
                Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-16 09:39 -0700
                Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 14:00 -0400
                Re: String performance regression from python 3.2 to 3.3 jmfauth <wxjmfauth@gmail.com> - 2013-03-16 13:42 -0700
  Re: A reply for rusi (FSR) Chris Angelico <rosuav@gmail.com> - 2013-03-13 21:32 +1100

csiph-web