Groups > comp.lang.python > #41164 > unrolled thread

A reply for rusi (FSR)

Started by	jmfauth <wxjmfauth@gmail.com>
First post	2013-03-13 02:36 -0700
Last post	2013-03-13 21:32 +1100
Articles	20 on this page of 41 — 11 participants

Back to article view | Back to comp.lang.python

  A reply for rusi (FSR) jmfauth <wxjmfauth@gmail.com> - 2013-03-13 02:36 -0700
    Re: A reply for rusi (FSR) rusi <rustompmody@gmail.com> - 2013-03-13 03:07 -0700
      String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-13 03:11 -0700
        Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-13 21:59 +1100
          Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-13 09:49 -0700
            Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 10:43 +1100
            Re: String performance regression from python 3.2 to 3.3 MRAB <python@mrabarnett.plus.com> - 2013-03-14 00:52 +0000
            Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 11:55 +1100
            Re: String performance regression from python 3.2 to 3.3 MRAB <python@mrabarnett.plus.com> - 2013-03-14 02:01 +0000
              Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-14 04:05 +0000
                Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 17:47 +1100
                  Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-14 03:48 -0700
                    Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-14 19:14 -0400
                    Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-14 20:48 -0400
                    Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 10:07 -0700
                      RE: String performance regression from python 3.2 to 3.3 Andriy Kornatskyy <andriy.kornatskyy@live.com> - 2013-03-15 21:04 +0300
            Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-13 22:35 -0400
            Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 17:21 +1100
          Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-13 18:42 +0100
            Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 11:19 +1100
              Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-16 03:44 +0100
                Re: String performance regression from python 3.2 to 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-16 03:56 +0000
                  Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 21:26 -0700
                    Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 08:47 +0000
                      Re: String performance regression from python 3.2 to 3.3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-17 09:00 +1100
                        Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 18:10 -0400
                Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 14:59 +1100
                  Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-16 05:12 +0100
                    Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 15:20 +1100
                    Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 22:21 -0700
                Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 15:09 +1100
                  Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 21:35 -0700
                    Re: String performance regression from python 3.2 to 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-16 04:56 +0000
                    Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-16 01:05 -0400
                    Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 05:38 +0000
                  Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 05:25 +0000
                    Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 09:29 -0400
                      Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-16 09:39 -0700
                        Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 14:00 -0400
                          Re: String performance regression from python 3.2 to 3.3 jmfauth <wxjmfauth@gmail.com> - 2013-03-16 13:42 -0700
    Re: A reply for rusi (FSR) Chris Angelico <rosuav@gmail.com> - 2013-03-13 21:32 +1100

Page 1 of 3 [1] 2 3 Next page →

#41164 — A reply for rusi (FSR)

From	jmfauth <wxjmfauth@gmail.com>
Date	2013-03-13 02:36 -0700
Subject	A reply for rusi (FSR)
Message-ID	<23a42297-9262-4ace-87ad-138999b1ddd6@z3g2000vbg.googlegroups.com>

As a reply to rusi's comment:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/a7689b158fdca29e#

From string creation to the itertools usage. A medley. Some timings.

Important:
The real/absolute values of these experiments are not important. I do
not care and I'm not complaining at all.

These values are expected, I expected such values and they are only
confirming (*FOR ME*) my understanding of the coding of the characters
(and Unicode).

#~ py323                                      py330

#~ test   1:         0.015357737412819        0.019290216142579
#~ test   2:         0.015698801667198        0.020386269052436
#~ test   3:         0.015613338684288        0.018769561472500
#~ test   4:         0.023235297708529        0.032253414679390
#~ test   5:         0.023327062109534        0.029621391108935
#~ test   6:         1.119958127076760        1.095467665651482
#~ test   7:         0.420158472788311        0.565518010043673
#~ test   8:         0.649444234615974        1.061556978013171
#~ test   9:         0.712335144072079        1.211614222458175
#~ test  10:         0.704622996001357        1.160909074081441
#~ test  11:         0.614674584923621        1.053985430333688
#~ test  12:         0.660336235792764        1.059443246081010
#~ test  13:         4.821435927771570        5.795325214218677
#~ test  14:         0.494012668213403        0.729330462512273
#~ test  15:         0.504894429585788        0.879966255906103
#~ test  16:         0.693093370081103        1.132884304782264
#~ test  17:         0.749076743789461        3.013804437852462
#~ test  18:         7.467055989281286       13.387841650089342
#~ test  19:         7.581776062566778       13.593412812594643
#~ test  20:         9.477877493343140       15.235388291413805
#~ test  21:         0.022614608026196        0.020984116094176
#~ test  22:         6.685022041178975       12.687538276191944
#~ test  23:         6.946794763994170       12.986701250949636
#~ test  24:         0.097796827314760        0.156285014715777
#~ test  25:         0.024915807146677        0.034190706904894
#~ test  26:         0.024996544066013        0.032191582014335
#~ test  27:         0.000693943667684        0.001315421027272
#~ test  28:         0.000679765476967        0.001305968900141
#~ test  29:         0.001614344548152        0.025543979763000
#~ test  30:         0.000204008410812        0.000286714523313
#~ test  31:         0.000213460537964        0.000301286552656
#~ test  32:         0.000204008410819        0.000291440586878
#~ test  33:         0.249692904327539        0.497374474766957
#~ test  34:         0.248750448483740        0.513947598194790
#~ test  35:         0.099810130396032        0.249129715085319

jmf

[toc] | [next] | [standalone]

#41165

From	rusi <rustompmody@gmail.com>
Date	2013-03-13 03:07 -0700
Message-ID	<a1a6394a-e9c7-407b-9f6d-ff44de1b65de@y2g2000pbg.googlegroups.com>
In reply to	#41164

On Mar 13, 2:36 pm, jmfauth <wxjmfa...@gmail.com> wrote:
> As a reply to rusi's comment:http://groups.google.com/group/comp.lang.python/browse_thread/thread/...
>
> From string creation to the itertools usage. A medley. Some timings.
>
> Important:
> The real/absolute values of these experiments are not important. I do
> not care and I'm not complaining at all.
>
> These values are expected, I expected such values and they are only
> confirming (*FOR ME*) my understanding of the coding of the characters
> (and Unicode).
>
> #~ py323                                      py330
>
> #~ test   1:         0.015357737412819        0.019290216142579
> #~ test   2:         0.015698801667198        0.020386269052436
> #~ test   3:         0.015613338684288        0.018769561472500
> #~ test   4:         0.023235297708529        0.032253414679390
> #~ test   5:         0.023327062109534        0.029621391108935
> #~ test   6:         1.119958127076760        1.095467665651482
> #~ test   7:         0.420158472788311        0.565518010043673
> #~ test   8:         0.649444234615974        1.061556978013171
> #~ test   9:         0.712335144072079        1.211614222458175
> #~ test  10:         0.704622996001357        1.160909074081441
> #~ test  11:         0.614674584923621        1.053985430333688
> #~ test  12:         0.660336235792764        1.059443246081010
> #~ test  13:         4.821435927771570        5.795325214218677
> #~ test  14:         0.494012668213403        0.729330462512273
> #~ test  15:         0.504894429585788        0.879966255906103
> #~ test  16:         0.693093370081103        1.132884304782264
> #~ test  17:         0.749076743789461        3.013804437852462
> #~ test  18:         7.467055989281286       13.387841650089342
> #~ test  19:         7.581776062566778       13.593412812594643
> #~ test  20:         9.477877493343140       15.235388291413805
> #~ test  21:         0.022614608026196        0.020984116094176
> #~ test  22:         6.685022041178975       12.687538276191944
> #~ test  23:         6.946794763994170       12.986701250949636
> #~ test  24:         0.097796827314760        0.156285014715777
> #~ test  25:         0.024915807146677        0.034190706904894
> #~ test  26:         0.024996544066013        0.032191582014335
> #~ test  27:         0.000693943667684        0.001315421027272
> #~ test  28:         0.000679765476967        0.001305968900141
> #~ test  29:         0.001614344548152        0.025543979763000
> #~ test  30:         0.000204008410812        0.000286714523313
> #~ test  31:         0.000213460537964        0.000301286552656
> #~ test  32:         0.000204008410819        0.000291440586878
> #~ test  33:         0.249692904327539        0.497374474766957
> #~ test  34:         0.248750448483740        0.513947598194790
> #~ test  35:         0.099810130396032        0.249129715085319
>
> jmf

Thank you jmf. I believe that for the first time you have moved beyond
a single point of complaint to a swathe of data points which evidently
show performance regression.  You would need to provide data of what
these tests 1-35 are.

[toc] | [prev] | [next] | [standalone]

#41166 — String performance regression from python 3.2 to 3.3

From	rusi <rustompmody@gmail.com>
Date	2013-03-13 03:11 -0700
Subject	String performance regression from python 3.2 to 3.3
Message-ID	<eabe27a9-099a-4e2c-92fb-bdf3819c2561@kw7g2000pbb.googlegroups.com>
In reply to	#41165

On Mar 13, 3:07 pm, rusi <rustompm...@gmail.com> wrote:
> On Mar 13, 2:36 pm, jmfauth <wxjmfa...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > As a reply to rusi's comment:http://groups.google.com/group/comp.lang.python/browse_thread/thread/...
>
> > From string creation to the itertools usage. A medley. Some timings.
>
> > Important:
> > The real/absolute values of these experiments are not important. I do
> > not care and I'm not complaining at all.
>
> > These values are expected, I expected such values and they are only
> > confirming (*FOR ME*) my understanding of the coding of the characters
> > (and Unicode).
>
> > #~ py323                                      py330
>
> > #~ test   1:         0.015357737412819        0.019290216142579
> > #~ test   2:         0.015698801667198        0.020386269052436
> > #~ test   3:         0.015613338684288        0.018769561472500
> > #~ test   4:         0.023235297708529        0.032253414679390
> > #~ test   5:         0.023327062109534        0.029621391108935
> > #~ test   6:         1.119958127076760        1.095467665651482
> > #~ test   7:         0.420158472788311        0.565518010043673
> > #~ test   8:         0.649444234615974        1.061556978013171
> > #~ test   9:         0.712335144072079        1.211614222458175
> > #~ test  10:         0.704622996001357        1.160909074081441
> > #~ test  11:         0.614674584923621        1.053985430333688
> > #~ test  12:         0.660336235792764        1.059443246081010
> > #~ test  13:         4.821435927771570        5.795325214218677
> > #~ test  14:         0.494012668213403        0.729330462512273
> > #~ test  15:         0.504894429585788        0.879966255906103
> > #~ test  16:         0.693093370081103        1.132884304782264
> > #~ test  17:         0.749076743789461        3.013804437852462
> > #~ test  18:         7.467055989281286       13.387841650089342
> > #~ test  19:         7.581776062566778       13.593412812594643
> > #~ test  20:         9.477877493343140       15.235388291413805
> > #~ test  21:         0.022614608026196        0.020984116094176
> > #~ test  22:         6.685022041178975       12.687538276191944
> > #~ test  23:         6.946794763994170       12.986701250949636
> > #~ test  24:         0.097796827314760        0.156285014715777
> > #~ test  25:         0.024915807146677        0.034190706904894
> > #~ test  26:         0.024996544066013        0.032191582014335
> > #~ test  27:         0.000693943667684        0.001315421027272
> > #~ test  28:         0.000679765476967        0.001305968900141
> > #~ test  29:         0.001614344548152        0.025543979763000
> > #~ test  30:         0.000204008410812        0.000286714523313
> > #~ test  31:         0.000213460537964        0.000301286552656
> > #~ test  32:         0.000204008410819        0.000291440586878
> > #~ test  33:         0.249692904327539        0.497374474766957
> > #~ test  34:         0.248750448483740        0.513947598194790
> > #~ test  35:         0.099810130396032        0.249129715085319
>
> > jmf
>
> Thank you jmf. I believe that for the first time you have moved beyond
> a single point of complaint to a swathe of data points which evidently
> show performance regression.  You would need to provide data of what
> these tests 1-35 are.

Uhhh..
Making the subject line useful for all readers

[toc] | [prev] | [next] | [standalone]

#41170 — Re: String performance regression from python 3.2 to 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-13 21:59 +1100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3259.1363172350.2939.python-list@python.org>
In reply to	#41166

On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote:
> Uhhh..
> Making the subject line useful for all readers

I should have read this one before replying in the other thread.

jmf, I'd like to see evidence that there has been a performance
regression compared against a wide build of Python 3.2. You still have
never answered this fundamental, that the narrow builds of Python are
*BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
me, the utterly unnecessary hassles I have had to deal with when
permitting user-provided .js code to script my engine have wasted
rather more dev hours than you would believe - there are rather a lot
of stupid edge cases to deal with.

The PEP 393 string is simply a memory-optimized version of UTF-32. It
guarantees O(1) indexing and slicing, while still remaining tight in
many cases. Its worst case is a constant amount larger than pure
UTF-32 (the overhead of recording the string width), its best case is
equivalent to ASCII (if all strings are seven-bit).

The flexible string representation is not brand new. It has been
tested and proven in another language, one very similar to Python; and
its performance has been provably sufficient for everyday operations.
Pike's string type behaves just as Python 3.3's, and has done for
longer than I can trace backward. In terms of Unicode compliance, it
is perfect; in terms of performance, quite acceptable; the worst-case
operation is taking an ASCII string and overwriting one character in
it with an astral character - which Python flat-out doesn't permit,
but Pike does, as a known-slow operation. (It triggers a copy of the
string, so it's always going to be slow.)

There are two broad areas of complaint that you have raised. One is of
Unicode compliance and correctness. I believe those complaints are
utterly unfounded, and you have yet to show any serious evidence to
support them. Py 3.3 is perfectly compliant with everything I have yet
checked. The other complaint is of performance, and the issue of being
US-centric. While it's true that ASCII and Latin-1 strings will be
smaller/faster under Py 3.3 than 3.2, this is not purely to the
benefit of the US at the cost of everyone else; it's also a benefit to
the myriad non-US programs that use a lot of ASCII strings - for
instance, delimiters, HTML tags, builtin function names... all of
these are ASCII, even if the rest of the code isn't. And there's no
penalty for non-English speakers, when compared against a non-buggy
wide build. The very worst case is only a constant factor worse, and
that assumes astral characters in every single string... which does
not happen, trust me on that.

ChrisA

[toc] | [prev] | [next] | [standalone]

#41184 — Re: String performance regression from python 3.2 to 3.3

From	rusi <rustompmody@gmail.com>
Date	2013-03-13 09:49 -0700
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<ee2062d1-658a-4bf5-8a56-5fe9c0991bef@o9g2000pbt.googlegroups.com>
In reply to	#41170

On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:
> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote:
> > Uhhh..
> > Making the subject line useful for all readers
>
> I should have read this one before replying in the other thread.
>
> jmf, I'd like to see evidence that there has been a performance
> regression compared against a wide build of Python 3.2. You still have
> never answered this fundamental, that the narrow builds of Python are
> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
> me, the utterly unnecessary hassles I have had to deal with when
> permitting user-provided .js code to script my engine have wasted
> rather more dev hours than you would believe - there are rather a lot
> of stupid edge cases to deal with.

This assumes that there are only three choices:
- narrow build that is buggy (surrogate pairs for astral characters)
- wide build that is 4-fold space inefficient for wide variety of
common (ASCII) use-cases
- flexible string engine that chooses a small tradeoff of space
efficiency over time efficiency.

There is a fourth choice: narrow build that chooses to be partial over
being buggy. ie when an astral character is encountered, an exception
is thrown rather than trying to fudge it into a 16-bit
representation.

I am hardly a unicode expert, my impression is this: While in today's
internationalized world, going back to ASCII is not an option, most
actual uses of unicode stay within the BMP

Further if the choice is not between two python executables but
between string-engines chosen at startup by command-line switches or
equivalent, the price may be quite small.

[toc] | [prev] | [next] | [standalone]

#41199 — Re: String performance regression from python 3.2 to 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-14 10:43 +1100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3274.1363218247.2939.python-list@python.org>
In reply to	#41184

On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote:
> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:
>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote:
>> > Uhhh..
>> > Making the subject line useful for all readers
>>
>> I should have read this one before replying in the other thread.
>>
>> jmf, I'd like to see evidence that there has been a performance
>> regression compared against a wide build of Python 3.2. You still have
>> never answered this fundamental, that the narrow builds of Python are
>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
>> me, the utterly unnecessary hassles I have had to deal with when
>> permitting user-provided .js code to script my engine have wasted
>> rather more dev hours than you would believe - there are rather a lot
>> of stupid edge cases to deal with.
>
> This assumes that there are only three choices:
> - narrow build that is buggy (surrogate pairs for astral characters)
> - wide build that is 4-fold space inefficient for wide variety of
> common (ASCII) use-cases
> - flexible string engine that chooses a small tradeoff of space
> efficiency over time efficiency.
>
> There is a fourth choice: narrow build that chooses to be partial over
> being buggy. ie when an astral character is encountered, an exception
> is thrown rather than trying to fudge it into a 16-bit
> representation.

As a simple factual matter, narrow builds of Python 3.2 don't do that.
So it doesn't factor into my original statement. But if you're talking
about a proposal for 3.4, then sure, that's a theoretical possibility.
It wouldn't be "buggy" in the sense of "string indexing/slicing
unexpectedly does the wrong thing", but it would still be incomplete
Unicode support, and I don't think people would appreciate it. Much
better to have graceful degradation: if there are non-BMP characters
in the string, then instead of throwing an exception, it just makes
the string wider.

> I am hardly a unicode expert, my impression is this: While in today's
> internationalized world, going back to ASCII is not an option, most
> actual uses of unicode stay within the BMP

That's a valid line of argument for an optimization, but not for a
hard limitation. A general-purpose language, function, system,
whatever, will need to cope with astral characters at some point; it
just won't need them *often*.

> Further if the choice is not between two python executables but
> between string-engines chosen at startup by command-line switches or
> equivalent, the price may be quite small.

It's complexity cost, though, and people would need to know when it
would be worth giving Python that switch to change its string format.
Plus, every C extension would need to cope with both formats. I
personally doubt it'd be worth it, but if you want to knock together a
patched CPython and get some timing stats, I'm sure this list or
python-dev will be happy to discuss the matter. :)

ChrisA

[toc] | [prev] | [next] | [standalone]

#41203 — Re: String performance regression from python 3.2 to 3.3

From	MRAB <python@mrabarnett.plus.com>
Date	2013-03-14 00:52 +0000
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3280.1363222327.2939.python-list@python.org>
In reply to	#41184

On 13/03/2013 23:43, Chris Angelico wrote:
> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote:
>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:
>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote:
>>> > Uhhh..
>>> > Making the subject line useful for all readers
>>>
>>> I should have read this one before replying in the other thread.
>>>
>>> jmf, I'd like to see evidence that there has been a performance
>>> regression compared against a wide build of Python 3.2. You still have
>>> never answered this fundamental, that the narrow builds of Python are
>>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
>>> me, the utterly unnecessary hassles I have had to deal with when
>>> permitting user-provided .js code to script my engine have wasted
>>> rather more dev hours than you would believe - there are rather a lot
>>> of stupid edge cases to deal with.
>>
>> This assumes that there are only three choices:
>> - narrow build that is buggy (surrogate pairs for astral characters)
>> - wide build that is 4-fold space inefficient for wide variety of
>> common (ASCII) use-cases
>> - flexible string engine that chooses a small tradeoff of space
>> efficiency over time efficiency.
>>
>> There is a fourth choice: narrow build that chooses to be partial over
>> being buggy. ie when an astral character is encountered, an exception
>> is thrown rather than trying to fudge it into a 16-bit
>> representation.
>
> As a simple factual matter, narrow builds of Python 3.2 don't do that.
> So it doesn't factor into my original statement. But if you're talking
> about a proposal for 3.4, then sure, that's a theoretical possibility.
> It wouldn't be "buggy" in the sense of "string indexing/slicing
> unexpectedly does the wrong thing", but it would still be incomplete
> Unicode support, and I don't think people would appreciate it. Much
> better to have graceful degradation: if there are non-BMP characters
> in the string, then instead of throwing an exception, it just makes
> the string wider.
>
[snip]
Do you mean that instead of switching between 1/2/4 bytes per codepoint
it would switch between 2/4 bytes per codepoint?

[toc] | [prev] | [next] | [standalone]

#41204 — Re: String performance regression from python 3.2 to 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-14 11:55 +1100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3281.1363222550.2939.python-list@python.org>
In reply to	#41184

On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python@mrabarnett.plus.com> wrote:
> On 13/03/2013 23:43, Chris Angelico wrote:
>>
>> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote:
>>>
>>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:
>>>>
>>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote:
>>>> > Uhhh..
>>>> > Making the subject line useful for all readers
>>>>
>>>> I should have read this one before replying in the other thread.
>>>>
>>>> jmf, I'd like to see evidence that there has been a performance
>>>> regression compared against a wide build of Python 3.2. You still have
>>>> never answered this fundamental, that the narrow builds of Python are
>>>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
>>>> me, the utterly unnecessary hassles I have had to deal with when
>>>> permitting user-provided .js code to script my engine have wasted
>>>> rather more dev hours than you would believe - there are rather a lot
>>>> of stupid edge cases to deal with.
>>>
>>>
>>> This assumes that there are only three choices:
>>> - narrow build that is buggy (surrogate pairs for astral characters)
>>> - wide build that is 4-fold space inefficient for wide variety of
>>> common (ASCII) use-cases
>>> - flexible string engine that chooses a small tradeoff of space
>>> efficiency over time efficiency.
>>>
>>> There is a fourth choice: narrow build that chooses to be partial over
>>> being buggy. ie when an astral character is encountered, an exception
>>> is thrown rather than trying to fudge it into a 16-bit
>>> representation.
>>
>>
>> As a simple factual matter, narrow builds of Python 3.2 don't do that.
>> So it doesn't factor into my original statement. But if you're talking
>> about a proposal for 3.4, then sure, that's a theoretical possibility.
>> It wouldn't be "buggy" in the sense of "string indexing/slicing
>> unexpectedly does the wrong thing", but it would still be incomplete
>> Unicode support, and I don't think people would appreciate it. Much
>> better to have graceful degradation: if there are non-BMP characters
>> in the string, then instead of throwing an exception, it just makes
>> the string wider.
>>
> [snip]
> Do you mean that instead of switching between 1/2/4 bytes per codepoint
> it would switch between 2/4 bytes per codepoint?

That's my point. We already have the better version. :)

ChrisA

[toc] | [prev] | [next] | [standalone]

#41206 — Re: String performance regression from python 3.2 to 3.3

From	MRAB <python@mrabarnett.plus.com>
Date	2013-03-14 02:01 +0000
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3282.1363226492.2939.python-list@python.org>
In reply to	#41184

On 14/03/2013 00:55, Chris Angelico wrote:
> On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python@mrabarnett.plus.com> wrote:
>> On 13/03/2013 23:43, Chris Angelico wrote:
>>>
>>> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote:
>>>>
>>>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:
>>>>>
>>>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote:
>>>>> > Uhhh..
>>>>> > Making the subject line useful for all readers
>>>>>
>>>>> I should have read this one before replying in the other thread.
>>>>>
>>>>> jmf, I'd like to see evidence that there has been a performance
>>>>> regression compared against a wide build of Python 3.2. You still have
>>>>> never answered this fundamental, that the narrow builds of Python are
>>>>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
>>>>> me, the utterly unnecessary hassles I have had to deal with when
>>>>> permitting user-provided .js code to script my engine have wasted
>>>>> rather more dev hours than you would believe - there are rather a lot
>>>>> of stupid edge cases to deal with.
>>>>
>>>>
>>>> This assumes that there are only three choices:
>>>> - narrow build that is buggy (surrogate pairs for astral characters)
>>>> - wide build that is 4-fold space inefficient for wide variety of
>>>> common (ASCII) use-cases
>>>> - flexible string engine that chooses a small tradeoff of space
>>>> efficiency over time efficiency.
>>>>
>>>> There is a fourth choice: narrow build that chooses to be partial over
>>>> being buggy. ie when an astral character is encountered, an exception
>>>> is thrown rather than trying to fudge it into a 16-bit
>>>> representation.
>>>
>>>
>>> As a simple factual matter, narrow builds of Python 3.2 don't do that.
>>> So it doesn't factor into my original statement. But if you're talking
>>> about a proposal for 3.4, then sure, that's a theoretical possibility.
>>> It wouldn't be "buggy" in the sense of "string indexing/slicing
>>> unexpectedly does the wrong thing", but it would still be incomplete
>>> Unicode support, and I don't think people would appreciate it. Much
>>> better to have graceful degradation: if there are non-BMP characters
>>> in the string, then instead of throwing an exception, it just makes
>>> the string wider.
>>>
>> [snip]
>> Do you mean that instead of switching between 1/2/4 bytes per codepoint
>> it would switch between 2/4 bytes per codepoint?
>
> That's my point. We already have the better version. :)
>
If a later version of Python switched between 2/4 bytes per codepoint,
how much difference would it make in terms of memory and speed compared
to Python 3.2 (fixed width) and Python 3.3 (3 widths)?

The vast majority of the time, 2 bytes per codepoint is sufficient, but
would that result in less switching between widths and therefore higher
performance, or would the use of more memory (2 bytes when 1 byte would
do) offset that?

(And I'm talking about significant differences here.)

[toc] | [prev] | [next] | [standalone]

#41209 — Re: String performance regression from python 3.2 to 3.3

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-03-14 04:05 +0000
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<51414c75$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to	#41206

On Thu, 14 Mar 2013 02:01:35 +0000, MRAB wrote:

> On 14/03/2013 00:55, Chris Angelico wrote:
>> On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python@mrabarnett.plus.com>
>> wrote:
>>> On 13/03/2013 23:43, Chris Angelico wrote:
>>>>
>>>> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote:
>>>>>
>>>>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com>
>>>>>> wrote:
>>>>>> > Uhhh..
>>>>>> > Making the subject line useful for all readers
>>>>>>
>>>>>> I should have read this one before replying in the other thread.
>>>>>>
>>>>>> jmf, I'd like to see evidence that there has been a performance
>>>>>> regression compared against a wide build of Python 3.2. You still
>>>>>> have never answered this fundamental, that the narrow builds of
>>>>>> Python are *BUGGY* in the same way that JavaScript/ECMAScript is.
>>>>>> And believe you me, the utterly unnecessary hassles I have had to
>>>>>> deal with when permitting user-provided .js code to script my
>>>>>> engine have wasted rather more dev hours than you would believe -
>>>>>> there are rather a lot of stupid edge cases to deal with.
>>>>>
>>>>>
>>>>> This assumes that there are only three choices: - narrow build that
>>>>> is buggy (surrogate pairs for astral characters) - wide build that
>>>>> is 4-fold space inefficient for wide variety of common (ASCII)
>>>>> use-cases
>>>>> - flexible string engine that chooses a small tradeoff of space
>>>>> efficiency over time efficiency.
>>>>>
>>>>> There is a fourth choice: narrow build that chooses to be partial
>>>>> over being buggy. ie when an astral character is encountered, an
>>>>> exception is thrown rather than trying to fudge it into a 16-bit
>>>>> representation.
>>>>
>>>>
>>>> As a simple factual matter, narrow builds of Python 3.2 don't do
>>>> that. So it doesn't factor into my original statement. But if you're
>>>> talking about a proposal for 3.4, then sure, that's a theoretical
>>>> possibility. It wouldn't be "buggy" in the sense of "string
>>>> indexing/slicing unexpectedly does the wrong thing", but it would
>>>> still be incomplete Unicode support, and I don't think people would
>>>> appreciate it. Much better to have graceful degradation: if there are
>>>> non-BMP characters in the string, then instead of throwing an
>>>> exception, it just makes the string wider.
>>>>
>>> [snip]
>>> Do you mean that instead of switching between 1/2/4 bytes per
>>> codepoint it would switch between 2/4 bytes per codepoint?
>>
>> That's my point. We already have the better version. :)
>>
> If a later version of Python switched between 2/4 bytes per codepoint,
> how much difference would it make in terms of memory and speed compared
> to Python 3.2 (fixed width) and Python 3.3 (3 widths)?
> 
> The vast majority of the time, 2 bytes per codepoint is sufficient, but
> would that result in less switching between widths and therefore higher
> performance, or would the use of more memory (2 bytes when 1 byte would
> do) offset that?
> 
> (And I'm talking about significant differences here.)

That depends on how you use the strings. Because strings are immutable, 
there isn't really anything like "switching between widths" -- the width 
is set when the string is created, and then remains fixed.

It is true that when you create a string, Python sometimes has to do some 
work to determine what width it needs, but that's effectively a fixed-
cost per string. It's relatively trivial compared to the cost of other 
string operations, but it is a real cost. If all you do is create the 
strings then throw them away, as JMF tends to do in his benchmarks, you 
repeatedly pay the cost without seeing the benefit.

On the other hand, Python is *full* of large numbers of ASCII strings, 
and many users use lots of Latin1 strings. Both of these save significant 
amounts of memory: almost 50% of what they would otherwise use in a 
narrow build, and almost 75% in a wide build.

This memory saving has real consequences, performance-wise. Python's 
memory management can be more efficient, since objects in the heap are 
smaller. I'm not sure if objects ever move in the heap (I think Java's 
memory manager does move objects around, hence Jython will do so, but I'm 
not sure about CPython), but even if they don't, its obviously faster to 
allocate a certain sized block of memory the more free memory you have, 
and you'll have more free memory if any pre-existing objects in the heap 
are smaller.

I expect that traversing a block of memory byte-by-byte may be faster 
than traversing it 2x or 4x bytes at a time. My testing suggests that 
iterating over a 1-byte width string is about three times faster than 
iterating over a 2-byte or 4-byte wide string. But that may depend on 
your OS and hardware.

Finally, there may be CPU effects, to do with how quickly strings can be 
passed through the CPU pipelines, whether data is found in the CPU cache 
or not, etc. Obviously this too will depend on the size of the strings. 
You can squeeze 1K of data through the CPU faster than 4K of data.

In practice, how much of an effect will this have? It's hard to say 
without testing, but indications with real-world applications indicate 
that Python 3.3 not only saves significant memory over 3.2 narrow builds, 
but for real-world code, it can often be a little faster as well.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#41212 — Re: String performance regression from python 3.2 to 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-14 17:47 +1100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3286.1363243635.2939.python-list@python.org>
In reply to	#41209

On Thu, Mar 14, 2013 at 3:05 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> That depends on how you use the strings. Because strings are immutable,
> there isn't really anything like "switching between widths" -- the width
> is set when the string is created, and then remains fixed.

The nearest thing to "switching" is where you repeatedly replace() or
append/slice to add/remove the one non-ASCII character that your
contrived test is using. Let's see...

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32

ASCII -> ASCII:
>>> timeit.timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.14999895238081962

ASCII -> BMP:
>>> timeit.timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
1.7513426985832012

BMP -> BMP:
>>> timeit.timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.22562895563542895

ASCII -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
1.9037101084076369

BMP -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
1.9659967956821163

SMP -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.7214749360603037

So there *is* cost to "changing size". Trying them again in Python 2.6 Narrow:

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32

ASCII -> ASCII:
>>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000)
0.53506213778566547

ASCII -> BMP:
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000)
0.57752172412974268

BMP -> BMP:
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000)
0.53309121690045913

ASCII -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000)
0.55128347317885584

BMP -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000)
0.55610140394938412

SMP -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000)
0.6599570615818493

Much more consistent. (Note that the SMP timings are quite probably a
bit off as the string will continue to grow - I'm taking off one
16-bit character and putting on two.)

I don't have a 2.6 wide build on the same hardware, so these times
don't truly compare to the above ones. This is slower hardware than
the above tests.

Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2

>>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000)
1.5774970054626465
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000)
1.5743560791015625
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000)
1.6072981357574463
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000)
1.6745591163635254
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000)
1.6705770492553711
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000)
1.7078530788421631

Here's my reading of all these stats. Python 3.3's str is faster than
2.6's unicode when the copy can be done directly (ie when the size
isn't changing), but converting sizes costs a lot (suggestion: memcpy
is blazingly fast, no surprise there). Since MOST string operations
won't change the size, this is a benefit to most programs.

I expect that Python 3.2 will behave comparably to the 2.6 stats, but
I don't have 3.2s handy - can someone confirm please?

ChrisA

[toc] | [prev] | [next] | [standalone]

#41225 — Re: String performance regression from python 3.2 to 3.3

From	rusi <rustompmody@gmail.com>
Date	2013-03-14 03:48 -0700
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<7cd57e96-663b-4a3b-a2c8-2fdbf730fa9b@hd10g2000pbc.googlegroups.com>
In reply to	#41212

On Mar 14, 11:47 am, Chris Angelico <ros...@gmail.com> wrote:
<snipped>
> I expect that Python 3.2 will behave comparably to the 2.6 stats, but
> I don't have 3.2s handy - can someone confirm please?

I have 3.2 but not 3.3. Can run it later today if no one does.
But better if someone with both on the same machine do the comparison.

jmf will you please run Chris' examples on all your pythons?

[toc] | [prev] | [next] | [standalone]

#41250 — Re: String performance regression from python 3.2 to 3.3

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-14 19:14 -0400
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3325.1363302886.2939.python-list@python.org>
In reply to	#41225

On 3/14/2013 6:48 AM, rusi wrote:
> On Mar 14, 11:47 am, Chris Angelico <ros...@gmail.com> wrote:
> <snipped>
>> I expect that Python 3.2 will behave comparably to the 2.6 stats, but
>> I don't have 3.2s handy - can someone confirm please?
>
> I have 3.2 but not 3.3. Can run it later today if no one does.
> But better if someone with both on the same machine do the comparison.

The python devs use the microbenchmarks in 
Tools/stringbench/stringbench.py, which covers all string operations, as 
the basis for improving particular string functions. Overall, Unicode is 
nearly as fast as bytes and 3.3 as fast as 3.2. Find/replace is the 
notable exception in stringbench, so it is an anomaly. Other things are 
faster in 3.3.  In selecting the new implementation, the devs also 
considered space and speed gains that do not show up in microbenchmarks.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#41251 — Re: String performance regression from python 3.2 to 3.3

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-14 20:48 -0400
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3326.1363308513.2939.python-list@python.org>
In reply to	#41225

On 3/14/2013 7:14 PM, Terry Reedy wrote:
> On 3/14/2013 6:48 AM, rusi wrote:
>> On Mar 14, 11:47 am, Chris Angelico <ros...@gmail.com> wrote:
>> <snipped>
>>> I expect that Python 3.2 will behave comparably to the 2.6 stats, but
>>> I don't have 3.2s handy - can someone confirm please?
>>
>> I have 3.2 but not 3.3. Can run it later today if no one does.
>> But better if someone with both on the same machine do the comparison.
>
> The python devs use the microbenchmarks in
> Tools/stringbench/stringbench.py, which covers all string operations, as
> the basis for improving particular string functions. Overall, Unicode is
> nearly as fast as bytes and 3.3 as fast as 3.2. Find/replace is the
> notable exception in stringbench, so it is an anomaly. Other things are
> faster in 3.3.  In selecting the new implementation, the devs also
> considered space and speed gains that do not show up in microbenchmarks.

Links to the readme and code for stringbench can be found here:
http://hg.python.org/cpython/file/c25bc2587c48/Tools/stringbench


-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#41282 — Re: String performance regression from python 3.2 to 3.3

From	rusi <rustompmody@gmail.com>
Date	2013-03-15 10:07 -0700
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<4eb54568-3135-4e81-9784-ff3ed989916b@mz7g2000pbb.googlegroups.com>
In reply to	#41225

3.2 and 2.7 results on my desktop using Chris examples
(Hope I cut-pasted them correctly)
-----------------------------
Welcome to the Emacs shell

~ $ python3
Python 3.2.3 (default, Feb 20 2013, 17:02:41)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.2893378734588623
>>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
0.2842249870300293

>>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.28406381607055664
>>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
0.28420209884643555
>>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
0.2853250503540039
>>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.283905029296875
>>>

~ $ python
Python 2.7.3 (default, Jan  2 2013, 16:53:07)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.20418286323547363
>>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)

0.20579099655151367
>>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.5055279731750488
>>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
0.28449511528015137
>>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
0.6001529693603516
>>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.8430721759796143

[toc] | [prev] | [next] | [standalone]

#41285 — RE: String performance regression from python 3.2 to 3.3

From	Andriy Kornatskyy <andriy.kornatskyy@live.com>
Date	2013-03-15 21:04 +0300
Subject	RE: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3349.1363370733.2939.python-list@python.org>
In reply to	#41282

$ python3.2
Python 3.2.3 (default, Jun 25 2012, 22:55:05) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import repeat
>>> repeat("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
[0.2566258907318115, 0.14485502243041992, 0.14464998245239258]
>>> repeat("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
[0.25584888458251953, 0.1340939998626709, 0.1338820457458496]
>>> repeat("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
[0.2571289539337158, 0.13403892517089844, 0.13388800621032715]
>>> repeat("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
[0.5022759437561035, 0.3970041275024414, 0.3764481544494629]
>>> repeat("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
[0.5213770866394043, 0.38585615158081055, 0.40251588821411133]
>>> repeat("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
[0.768744945526123, 0.5852570533752441, 0.6029140949249268]

$ python3.3
Python 3.3.0 (default, Sep 29 2012, 15:35:49) 
[GCC 4.7.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import repeat
>>> repeat("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
[0.0985728640225716, 0.0984080360212829, 0.07457763599813916]
>>> repeat("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
[0.901988381985575, 0.7517840950167738, 0.7540924890199676]
>>> repeat("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
[0.3069786810083315, 0.17701858800137416, 0.1769046070112381]
>>> repeat("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
[1.081760977016529, 0.9099628589756321, 0.9926943230093457]
>>> repeat("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
[1.2101859120011795, 1.1039280130062252, 0.9306247030035593]
>>> repeat("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
[0.4759294819959905, 0.35435649199644104, 0.3540659479913302]


----------------------------------------
> Date: Fri, 15 Mar 2013 10:07:48 -0700
> Subject: Re: String performance regression from python 3.2 to 3.3
> From: rustompmody@gmail.com
> To: python-list@python.org
>
> 3.2 and 2.7 results on my desktop using Chris examples
> (Hope I cut-pasted them correctly)
> -----------------------------
> Welcome to the Emacs shell
>
> ~ $ python3
> Python 3.2.3 (default, Feb 20 2013, 17:02:41)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from timeit import timeit
> >>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
> 0.2893378734588623
> >>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
> 0.2842249870300293
>
> >>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
> 0.28406381607055664
> >>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
> 0.28420209884643555
> >>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
> 0.2853250503540039
> >>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
> 0.283905029296875
> >>>
>
> ~ $ python
> Python 2.7.3 (default, Jan 2 2013, 16:53:07)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from timeit import timeit
> >>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
> 0.20418286323547363
> >>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
>
> 0.20579099655151367
> >>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
> 0.5055279731750488
> >>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
> 0.28449511528015137
> >>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
> 0.6001529693603516
> >>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
> 0.8430721759796143
> --
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]

#41207 — Re: String performance regression from python 3.2 to 3.3

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-13 22:35 -0400
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3283.1363228557.2939.python-list@python.org>
In reply to	#41184

On 3/13/2013 7:43 PM, Chris Angelico wrote:
> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote:
>
>> This assumes that there are only three choices:
>> - narrow build that is buggy (surrogate pairs for astral characters)
>> - wide build that is 4-fold space inefficient for wide variety of
>> common (ASCII) use-cases
>> - flexible string engine that chooses a small tradeoff of space
>> efficiency over time efficiency.

Wrong. Python almost certainly runs faster with the new string 
representation. This has been explained previously more than once.

>> There is a fourth choice: narrow build that chooses to be partial over
>> being buggy. ie when an astral character is encountered, an exception
>> is thrown rather than trying to fudge it into a 16-bit
>> representation.

This is what tcl/tk does, and it is a dammed nuisance. Completely 
unacceptible for Python's string type.
...
> It's complexity cost, though, and people would need to know when it
> would be worth giving Python that switch to change its string format.
> Plus, every C extension would need to cope with both formats. I
> personally doubt it'd be worth it, but if you want to knock together a
> patched CPython and get some timing stats, I'm sure this list or
> python-dev will be happy to discuss the matter. :)

I presume the smiley indicates that you know that python developers are 
too busy with real problems to have any interest in bogus solutions to 
bogus problems.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#41211 — Re: String performance regression from python 3.2 to 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-14 17:21 +1100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3285.1363242063.2939.python-list@python.org>
In reply to	#41184

On Thu, Mar 14, 2013 at 1:35 PM, Terry Reedy <tjreedy@udel.edu> wrote:
>On 3/13/2013 7:43 PM, Chris Angelico wrote:
>> It's complexity cost, though, and people would need to know when it
>> would be worth giving Python that switch to change its string format.
>> Plus, every C extension would need to cope with both formats. I
>> personally doubt it'd be worth it, but if you want to knock together a
>> patched CPython and get some timing stats, I'm sure this list or
>> python-dev will be happy to discuss the matter. :)
>
>
> I presume the smiley indicates that you know that python developers are too
> busy with real problems to have any interest in bogus solutions to bogus
> problems.

It indicates more that the list(s) would almost certainly open up with
quite a bit of discussion - especially this one. It's not hard to get
talk happening, as evidenced by the number of times we've already
discussed this very topic. Frankly, I doubt there'll be anything to
discuss - that the patched version will be consistently worse; but if
I've learned one thing about timings, it's that there are surprises
*everywhere*, so I'm not prepared to state categorically that it
*cannot* be better. (I will, however, state that I do not expect any
such improvement to be worth the trouble of writing it.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#41188 — Re: String performance regression from python 3.2 to 3.3

From	Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date	2013-03-13 18:42 +0100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<2992273.neLn1eVAPo@PointedEars.de>
In reply to	#41170

Chris Angelico wrote:

> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote:
>> Uhhh..
>> Making the subject line useful for all readers
> 
> I should have read this one before replying in the other thread.
> 
> jmf, I'd like to see evidence that there has been a performance
> regression compared against a wide build of Python 3.2. You still have
> never answered this fundamental, that the narrow builds of Python are
> *BUGGY* in the same way that JavaScript/ECMAScript is.

Interesting.  From my work I was under the impression that I knew ECMAScript 
and its implementations fairly well, yet I have never heard of this before.

What do you mean by “narrow build” and “wide build” and what exactly is the 
bug “narrow builds” of Python 3.2 have in common with JavaScript/ECMAScript?  
To which implementation of ECMAScript are you referring – or are you 
referring to the Specification as such?

-- 
PointedEars

Twitter: @PointedEars2
Please do not Cc: me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]

#41201 — Re: String performance regression from python 3.2 to 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-14 11:19 +1100
Subject	Re: String performance regression from python 3.2 to 3.3
Message-ID	<mailman.3278.1363220353.2939.python-list@python.org>
In reply to	#41188

On Thu, Mar 14, 2013 at 4:42 AM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> Chris Angelico wrote:
>
>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote:
>>> Uhhh..
>>> Making the subject line useful for all readers
>>
>> I should have read this one before replying in the other thread.
>>
>> jmf, I'd like to see evidence that there has been a performance
>> regression compared against a wide build of Python 3.2. You still have
>> never answered this fundamental, that the narrow builds of Python are
>> *BUGGY* in the same way that JavaScript/ECMAScript is.
>
> Interesting.  From my work I was under the impression that I knew ECMAScript
> and its implementations fairly well, yet I have never heard of this before.
>
> What do you mean by “narrow build” and “wide build” and what exactly is the
> bug “narrow builds” of Python 3.2 have in common with JavaScript/ECMAScript?
> To which implementation of ECMAScript are you referring – or are you
> referring to the Specification as such?

The ECMAScript spec says that strings are stored and represented in
UTF-16. Python versions up to 3.2 came in two varieties: narrow, which
included (I believe) the Windows builds available on python.org, and
wide, which was (again, I think) the default Linux config. The problem
predates Python 3 and its default string being Unicode - the Py2
unicode type has the same issue:

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32
>>> u"\U00012345"
u'\U00012345'
>>> len(_)
2

Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u"\U00012345"
u'\U00012345'
>>> len(_)
1

That's the Python msi installer, and the default system Python from an
Ubuntu 10.10. The exact same code does different things on different
platforms, and on the Windows (narrow-build), it's possible to split
surrogates:

>>> u"\U00012345"[0]
u'\ud808'
>>> u"\U00012345"[1]
u'\udf45'

You can see the same thing in Javascript too. Here's a little demo I
just knocked together:

<script>
function foo()
{
	var txt=document.getElementById("in").value;
	var msg="";
	for (var i=0;i<txt.length;++i) msg+="["+i+"]: "+txt.charCodeAt(i)+"
"+txt.charCodeAt(i).toString(16)+"\n";
	document.getElementById("out").value=msg;
}
</script>
<input id=in><input type=button onclick="foo()"
value="Show"><br><textarea id=out rows=25 cols=80></textarea>

Give it an ASCII string and you'll see, as expected, one index (based
on string indexing or charCodeAt, same thing) for each character. Same
if it's all BMP. But put an astral character in and you'll see
00.00.d8.00/24 (oh wait, CIDR notation doesn't work in Unicode) come
up. I raised this issue on the Google V8 list and on the ECMAScript
list es-discuss@mozilla.org, and was basically told that since
JavaScript has been buggy for so long, there's no chance of ever
making it bug-free:

https://mail.mozilla.org/pipermail/es-discuss/2012-December/027384.html

Fortunately for Python, there are version numbers, and policies that
permit bugs to actually get fixed. (Which is why, for instance, Debian
Squeeze still ships Python 2.6 rather than upgrading to 2.7 - in case
some script is broken by that change. Can't do that with web
browsers.) As of Python 3.3, all Pythons function the same way: it's
semantically a "wide build" (UTF-32), but with a memory usage
optimization. That's how it needs to be.

ChrisA

[toc] | [prev] | [next] | [standalone]

Page 1 of 3 [1] 2 3 Next page →

csiph-web

A reply for rusi (FSR)

Contents

#41164 — A reply for rusi (FSR)

#41165

#41166 — String performance regression from python 3.2 to 3.3

#41170 — Re: String performance regression from python 3.2 to 3.3

#41184 — Re: String performance regression from python 3.2 to 3.3

#41199 — Re: String performance regression from python 3.2 to 3.3

#41203 — Re: String performance regression from python 3.2 to 3.3

#41204 — Re: String performance regression from python 3.2 to 3.3

#41206 — Re: String performance regression from python 3.2 to 3.3

#41209 — Re: String performance regression from python 3.2 to 3.3

#41212 — Re: String performance regression from python 3.2 to 3.3

#41225 — Re: String performance regression from python 3.2 to 3.3

#41250 — Re: String performance regression from python 3.2 to 3.3

#41251 — Re: String performance regression from python 3.2 to 3.3

#41282 — Re: String performance regression from python 3.2 to 3.3

#41285 — RE: String performance regression from python 3.2 to 3.3

#41207 — Re: String performance regression from python 3.2 to 3.3

#41211 — Re: String performance regression from python 3.2 to 3.3

#41188 — Re: String performance regression from python 3.2 to 3.3

#41201 — Re: String performance regression from python 3.2 to 3.3