Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #41164 > unrolled thread
| Started by | jmfauth <wxjmfauth@gmail.com> |
|---|---|
| First post | 2013-03-13 02:36 -0700 |
| Last post | 2013-03-13 21:32 +1100 |
| Articles | 20 on this page of 41 — 11 participants |
Back to article view | Back to comp.lang.python
A reply for rusi (FSR) jmfauth <wxjmfauth@gmail.com> - 2013-03-13 02:36 -0700
Re: A reply for rusi (FSR) rusi <rustompmody@gmail.com> - 2013-03-13 03:07 -0700
String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-13 03:11 -0700
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-13 21:59 +1100
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-13 09:49 -0700
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 10:43 +1100
Re: String performance regression from python 3.2 to 3.3 MRAB <python@mrabarnett.plus.com> - 2013-03-14 00:52 +0000
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 11:55 +1100
Re: String performance regression from python 3.2 to 3.3 MRAB <python@mrabarnett.plus.com> - 2013-03-14 02:01 +0000
Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-14 04:05 +0000
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 17:47 +1100
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-14 03:48 -0700
Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-14 19:14 -0400
Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-14 20:48 -0400
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 10:07 -0700
RE: String performance regression from python 3.2 to 3.3 Andriy Kornatskyy <andriy.kornatskyy@live.com> - 2013-03-15 21:04 +0300
Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-13 22:35 -0400
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 17:21 +1100
Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-13 18:42 +0100
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-14 11:19 +1100
Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-16 03:44 +0100
Re: String performance regression from python 3.2 to 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-16 03:56 +0000
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 21:26 -0700
Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 08:47 +0000
Re: String performance regression from python 3.2 to 3.3 Neil Hodgson <nhodgson@iinet.net.au> - 2013-03-17 09:00 +1100
Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 18:10 -0400
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 14:59 +1100
Re: String performance regression from python 3.2 to 3.3 Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2013-03-16 05:12 +0100
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 15:20 +1100
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 22:21 -0700
Re: String performance regression from python 3.2 to 3.3 Chris Angelico <rosuav@gmail.com> - 2013-03-16 15:09 +1100
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-15 21:35 -0700
Re: String performance regression from python 3.2 to 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-16 04:56 +0000
Re: String performance regression from python 3.2 to 3.3 Terry Reedy <tjreedy@udel.edu> - 2013-03-16 01:05 -0400
Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 05:38 +0000
Re: String performance regression from python 3.2 to 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-16 05:25 +0000
Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 09:29 -0400
Re: String performance regression from python 3.2 to 3.3 rusi <rustompmody@gmail.com> - 2013-03-16 09:39 -0700
Re: String performance regression from python 3.2 to 3.3 Roy Smith <roy@panix.com> - 2013-03-16 14:00 -0400
Re: String performance regression from python 3.2 to 3.3 jmfauth <wxjmfauth@gmail.com> - 2013-03-16 13:42 -0700
Re: A reply for rusi (FSR) Chris Angelico <rosuav@gmail.com> - 2013-03-13 21:32 +1100
Page 1 of 3 [1] 2 3 Next page →
| From | jmfauth <wxjmfauth@gmail.com> |
|---|---|
| Date | 2013-03-13 02:36 -0700 |
| Subject | A reply for rusi (FSR) |
| Message-ID | <23a42297-9262-4ace-87ad-138999b1ddd6@z3g2000vbg.googlegroups.com> |
As a reply to rusi's comment: http://groups.google.com/group/comp.lang.python/browse_thread/thread/a7689b158fdca29e# From string creation to the itertools usage. A medley. Some timings. Important: The real/absolute values of these experiments are not important. I do not care and I'm not complaining at all. These values are expected, I expected such values and they are only confirming (*FOR ME*) my understanding of the coding of the characters (and Unicode). #~ py323 py330 #~ test 1: 0.015357737412819 0.019290216142579 #~ test 2: 0.015698801667198 0.020386269052436 #~ test 3: 0.015613338684288 0.018769561472500 #~ test 4: 0.023235297708529 0.032253414679390 #~ test 5: 0.023327062109534 0.029621391108935 #~ test 6: 1.119958127076760 1.095467665651482 #~ test 7: 0.420158472788311 0.565518010043673 #~ test 8: 0.649444234615974 1.061556978013171 #~ test 9: 0.712335144072079 1.211614222458175 #~ test 10: 0.704622996001357 1.160909074081441 #~ test 11: 0.614674584923621 1.053985430333688 #~ test 12: 0.660336235792764 1.059443246081010 #~ test 13: 4.821435927771570 5.795325214218677 #~ test 14: 0.494012668213403 0.729330462512273 #~ test 15: 0.504894429585788 0.879966255906103 #~ test 16: 0.693093370081103 1.132884304782264 #~ test 17: 0.749076743789461 3.013804437852462 #~ test 18: 7.467055989281286 13.387841650089342 #~ test 19: 7.581776062566778 13.593412812594643 #~ test 20: 9.477877493343140 15.235388291413805 #~ test 21: 0.022614608026196 0.020984116094176 #~ test 22: 6.685022041178975 12.687538276191944 #~ test 23: 6.946794763994170 12.986701250949636 #~ test 24: 0.097796827314760 0.156285014715777 #~ test 25: 0.024915807146677 0.034190706904894 #~ test 26: 0.024996544066013 0.032191582014335 #~ test 27: 0.000693943667684 0.001315421027272 #~ test 28: 0.000679765476967 0.001305968900141 #~ test 29: 0.001614344548152 0.025543979763000 #~ test 30: 0.000204008410812 0.000286714523313 #~ test 31: 0.000213460537964 0.000301286552656 #~ test 32: 0.000204008410819 0.000291440586878 #~ test 33: 0.249692904327539 0.497374474766957 #~ test 34: 0.248750448483740 0.513947598194790 #~ test 35: 0.099810130396032 0.249129715085319 jmf
[toc] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-03-13 03:07 -0700 |
| Message-ID | <a1a6394a-e9c7-407b-9f6d-ff44de1b65de@y2g2000pbg.googlegroups.com> |
| In reply to | #41164 |
On Mar 13, 2:36 pm, jmfauth <wxjmfa...@gmail.com> wrote: > As a reply to rusi's comment:http://groups.google.com/group/comp.lang.python/browse_thread/thread/... > > From string creation to the itertools usage. A medley. Some timings. > > Important: > The real/absolute values of these experiments are not important. I do > not care and I'm not complaining at all. > > These values are expected, I expected such values and they are only > confirming (*FOR ME*) my understanding of the coding of the characters > (and Unicode). > > #~ py323 py330 > > #~ test 1: 0.015357737412819 0.019290216142579 > #~ test 2: 0.015698801667198 0.020386269052436 > #~ test 3: 0.015613338684288 0.018769561472500 > #~ test 4: 0.023235297708529 0.032253414679390 > #~ test 5: 0.023327062109534 0.029621391108935 > #~ test 6: 1.119958127076760 1.095467665651482 > #~ test 7: 0.420158472788311 0.565518010043673 > #~ test 8: 0.649444234615974 1.061556978013171 > #~ test 9: 0.712335144072079 1.211614222458175 > #~ test 10: 0.704622996001357 1.160909074081441 > #~ test 11: 0.614674584923621 1.053985430333688 > #~ test 12: 0.660336235792764 1.059443246081010 > #~ test 13: 4.821435927771570 5.795325214218677 > #~ test 14: 0.494012668213403 0.729330462512273 > #~ test 15: 0.504894429585788 0.879966255906103 > #~ test 16: 0.693093370081103 1.132884304782264 > #~ test 17: 0.749076743789461 3.013804437852462 > #~ test 18: 7.467055989281286 13.387841650089342 > #~ test 19: 7.581776062566778 13.593412812594643 > #~ test 20: 9.477877493343140 15.235388291413805 > #~ test 21: 0.022614608026196 0.020984116094176 > #~ test 22: 6.685022041178975 12.687538276191944 > #~ test 23: 6.946794763994170 12.986701250949636 > #~ test 24: 0.097796827314760 0.156285014715777 > #~ test 25: 0.024915807146677 0.034190706904894 > #~ test 26: 0.024996544066013 0.032191582014335 > #~ test 27: 0.000693943667684 0.001315421027272 > #~ test 28: 0.000679765476967 0.001305968900141 > #~ test 29: 0.001614344548152 0.025543979763000 > #~ test 30: 0.000204008410812 0.000286714523313 > #~ test 31: 0.000213460537964 0.000301286552656 > #~ test 32: 0.000204008410819 0.000291440586878 > #~ test 33: 0.249692904327539 0.497374474766957 > #~ test 34: 0.248750448483740 0.513947598194790 > #~ test 35: 0.099810130396032 0.249129715085319 > > jmf Thank you jmf. I believe that for the first time you have moved beyond a single point of complaint to a swathe of data points which evidently show performance regression. You would need to provide data of what these tests 1-35 are.
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-03-13 03:11 -0700 |
| Subject | String performance regression from python 3.2 to 3.3 |
| Message-ID | <eabe27a9-099a-4e2c-92fb-bdf3819c2561@kw7g2000pbb.googlegroups.com> |
| In reply to | #41165 |
On Mar 13, 3:07 pm, rusi <rustompm...@gmail.com> wrote: > On Mar 13, 2:36 pm, jmfauth <wxjmfa...@gmail.com> wrote: > > > > > > > > > > > As a reply to rusi's comment:http://groups.google.com/group/comp.lang.python/browse_thread/thread/... > > > From string creation to the itertools usage. A medley. Some timings. > > > Important: > > The real/absolute values of these experiments are not important. I do > > not care and I'm not complaining at all. > > > These values are expected, I expected such values and they are only > > confirming (*FOR ME*) my understanding of the coding of the characters > > (and Unicode). > > > #~ py323 py330 > > > #~ test 1: 0.015357737412819 0.019290216142579 > > #~ test 2: 0.015698801667198 0.020386269052436 > > #~ test 3: 0.015613338684288 0.018769561472500 > > #~ test 4: 0.023235297708529 0.032253414679390 > > #~ test 5: 0.023327062109534 0.029621391108935 > > #~ test 6: 1.119958127076760 1.095467665651482 > > #~ test 7: 0.420158472788311 0.565518010043673 > > #~ test 8: 0.649444234615974 1.061556978013171 > > #~ test 9: 0.712335144072079 1.211614222458175 > > #~ test 10: 0.704622996001357 1.160909074081441 > > #~ test 11: 0.614674584923621 1.053985430333688 > > #~ test 12: 0.660336235792764 1.059443246081010 > > #~ test 13: 4.821435927771570 5.795325214218677 > > #~ test 14: 0.494012668213403 0.729330462512273 > > #~ test 15: 0.504894429585788 0.879966255906103 > > #~ test 16: 0.693093370081103 1.132884304782264 > > #~ test 17: 0.749076743789461 3.013804437852462 > > #~ test 18: 7.467055989281286 13.387841650089342 > > #~ test 19: 7.581776062566778 13.593412812594643 > > #~ test 20: 9.477877493343140 15.235388291413805 > > #~ test 21: 0.022614608026196 0.020984116094176 > > #~ test 22: 6.685022041178975 12.687538276191944 > > #~ test 23: 6.946794763994170 12.986701250949636 > > #~ test 24: 0.097796827314760 0.156285014715777 > > #~ test 25: 0.024915807146677 0.034190706904894 > > #~ test 26: 0.024996544066013 0.032191582014335 > > #~ test 27: 0.000693943667684 0.001315421027272 > > #~ test 28: 0.000679765476967 0.001305968900141 > > #~ test 29: 0.001614344548152 0.025543979763000 > > #~ test 30: 0.000204008410812 0.000286714523313 > > #~ test 31: 0.000213460537964 0.000301286552656 > > #~ test 32: 0.000204008410819 0.000291440586878 > > #~ test 33: 0.249692904327539 0.497374474766957 > > #~ test 34: 0.248750448483740 0.513947598194790 > > #~ test 35: 0.099810130396032 0.249129715085319 > > > jmf > > Thank you jmf. I believe that for the first time you have moved beyond > a single point of complaint to a swathe of data points which evidently > show performance regression. You would need to provide data of what > these tests 1-35 are. Uhhh.. Making the subject line useful for all readers
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-13 21:59 +1100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3259.1363172350.2939.python-list@python.org> |
| In reply to | #41166 |
On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote: > Uhhh.. > Making the subject line useful for all readers I should have read this one before replying in the other thread. jmf, I'd like to see evidence that there has been a performance regression compared against a wide build of Python 3.2. You still have never answered this fundamental, that the narrow builds of Python are *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you me, the utterly unnecessary hassles I have had to deal with when permitting user-provided .js code to script my engine have wasted rather more dev hours than you would believe - there are rather a lot of stupid edge cases to deal with. The PEP 393 string is simply a memory-optimized version of UTF-32. It guarantees O(1) indexing and slicing, while still remaining tight in many cases. Its worst case is a constant amount larger than pure UTF-32 (the overhead of recording the string width), its best case is equivalent to ASCII (if all strings are seven-bit). The flexible string representation is not brand new. It has been tested and proven in another language, one very similar to Python; and its performance has been provably sufficient for everyday operations. Pike's string type behaves just as Python 3.3's, and has done for longer than I can trace backward. In terms of Unicode compliance, it is perfect; in terms of performance, quite acceptable; the worst-case operation is taking an ASCII string and overwriting one character in it with an astral character - which Python flat-out doesn't permit, but Pike does, as a known-slow operation. (It triggers a copy of the string, so it's always going to be slow.) There are two broad areas of complaint that you have raised. One is of Unicode compliance and correctness. I believe those complaints are utterly unfounded, and you have yet to show any serious evidence to support them. Py 3.3 is perfectly compliant with everything I have yet checked. The other complaint is of performance, and the issue of being US-centric. While it's true that ASCII and Latin-1 strings will be smaller/faster under Py 3.3 than 3.2, this is not purely to the benefit of the US at the cost of everyone else; it's also a benefit to the myriad non-US programs that use a lot of ASCII strings - for instance, delimiters, HTML tags, builtin function names... all of these are ASCII, even if the rest of the code isn't. And there's no penalty for non-English speakers, when compared against a non-buggy wide build. The very worst case is only a constant factor worse, and that assumes astral characters in every single string... which does not happen, trust me on that. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-03-13 09:49 -0700 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <ee2062d1-658a-4bf5-8a56-5fe9c0991bef@o9g2000pbt.googlegroups.com> |
| In reply to | #41170 |
On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote: > On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote: > > Uhhh.. > > Making the subject line useful for all readers > > I should have read this one before replying in the other thread. > > jmf, I'd like to see evidence that there has been a performance > regression compared against a wide build of Python 3.2. You still have > never answered this fundamental, that the narrow builds of Python are > *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you > me, the utterly unnecessary hassles I have had to deal with when > permitting user-provided .js code to script my engine have wasted > rather more dev hours than you would believe - there are rather a lot > of stupid edge cases to deal with. This assumes that there are only three choices: - narrow build that is buggy (surrogate pairs for astral characters) - wide build that is 4-fold space inefficient for wide variety of common (ASCII) use-cases - flexible string engine that chooses a small tradeoff of space efficiency over time efficiency. There is a fourth choice: narrow build that chooses to be partial over being buggy. ie when an astral character is encountered, an exception is thrown rather than trying to fudge it into a 16-bit representation. I am hardly a unicode expert, my impression is this: While in today's internationalized world, going back to ASCII is not an option, most actual uses of unicode stay within the BMP Further if the choice is not between two python executables but between string-engines chosen at startup by command-line switches or equivalent, the price may be quite small.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-14 10:43 +1100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3274.1363218247.2939.python-list@python.org> |
| In reply to | #41184 |
On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote: > On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote: >> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote: >> > Uhhh.. >> > Making the subject line useful for all readers >> >> I should have read this one before replying in the other thread. >> >> jmf, I'd like to see evidence that there has been a performance >> regression compared against a wide build of Python 3.2. You still have >> never answered this fundamental, that the narrow builds of Python are >> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you >> me, the utterly unnecessary hassles I have had to deal with when >> permitting user-provided .js code to script my engine have wasted >> rather more dev hours than you would believe - there are rather a lot >> of stupid edge cases to deal with. > > This assumes that there are only three choices: > - narrow build that is buggy (surrogate pairs for astral characters) > - wide build that is 4-fold space inefficient for wide variety of > common (ASCII) use-cases > - flexible string engine that chooses a small tradeoff of space > efficiency over time efficiency. > > There is a fourth choice: narrow build that chooses to be partial over > being buggy. ie when an astral character is encountered, an exception > is thrown rather than trying to fudge it into a 16-bit > representation. As a simple factual matter, narrow builds of Python 3.2 don't do that. So it doesn't factor into my original statement. But if you're talking about a proposal for 3.4, then sure, that's a theoretical possibility. It wouldn't be "buggy" in the sense of "string indexing/slicing unexpectedly does the wrong thing", but it would still be incomplete Unicode support, and I don't think people would appreciate it. Much better to have graceful degradation: if there are non-BMP characters in the string, then instead of throwing an exception, it just makes the string wider. > I am hardly a unicode expert, my impression is this: While in today's > internationalized world, going back to ASCII is not an option, most > actual uses of unicode stay within the BMP That's a valid line of argument for an optimization, but not for a hard limitation. A general-purpose language, function, system, whatever, will need to cope with astral characters at some point; it just won't need them *often*. > Further if the choice is not between two python executables but > between string-engines chosen at startup by command-line switches or > equivalent, the price may be quite small. It's complexity cost, though, and people would need to know when it would be worth giving Python that switch to change its string format. Plus, every C extension would need to cope with both formats. I personally doubt it'd be worth it, but if you want to knock together a patched CPython and get some timing stats, I'm sure this list or python-dev will be happy to discuss the matter. :) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-03-14 00:52 +0000 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3280.1363222327.2939.python-list@python.org> |
| In reply to | #41184 |
On 13/03/2013 23:43, Chris Angelico wrote: > On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote: >> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote: >>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote: >>> > Uhhh.. >>> > Making the subject line useful for all readers >>> >>> I should have read this one before replying in the other thread. >>> >>> jmf, I'd like to see evidence that there has been a performance >>> regression compared against a wide build of Python 3.2. You still have >>> never answered this fundamental, that the narrow builds of Python are >>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you >>> me, the utterly unnecessary hassles I have had to deal with when >>> permitting user-provided .js code to script my engine have wasted >>> rather more dev hours than you would believe - there are rather a lot >>> of stupid edge cases to deal with. >> >> This assumes that there are only three choices: >> - narrow build that is buggy (surrogate pairs for astral characters) >> - wide build that is 4-fold space inefficient for wide variety of >> common (ASCII) use-cases >> - flexible string engine that chooses a small tradeoff of space >> efficiency over time efficiency. >> >> There is a fourth choice: narrow build that chooses to be partial over >> being buggy. ie when an astral character is encountered, an exception >> is thrown rather than trying to fudge it into a 16-bit >> representation. > > As a simple factual matter, narrow builds of Python 3.2 don't do that. > So it doesn't factor into my original statement. But if you're talking > about a proposal for 3.4, then sure, that's a theoretical possibility. > It wouldn't be "buggy" in the sense of "string indexing/slicing > unexpectedly does the wrong thing", but it would still be incomplete > Unicode support, and I don't think people would appreciate it. Much > better to have graceful degradation: if there are non-BMP characters > in the string, then instead of throwing an exception, it just makes > the string wider. > [snip] Do you mean that instead of switching between 1/2/4 bytes per codepoint it would switch between 2/4 bytes per codepoint?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-14 11:55 +1100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3281.1363222550.2939.python-list@python.org> |
| In reply to | #41184 |
On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python@mrabarnett.plus.com> wrote: > On 13/03/2013 23:43, Chris Angelico wrote: >> >> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote: >>> >>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote: >>>> >>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote: >>>> > Uhhh.. >>>> > Making the subject line useful for all readers >>>> >>>> I should have read this one before replying in the other thread. >>>> >>>> jmf, I'd like to see evidence that there has been a performance >>>> regression compared against a wide build of Python 3.2. You still have >>>> never answered this fundamental, that the narrow builds of Python are >>>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you >>>> me, the utterly unnecessary hassles I have had to deal with when >>>> permitting user-provided .js code to script my engine have wasted >>>> rather more dev hours than you would believe - there are rather a lot >>>> of stupid edge cases to deal with. >>> >>> >>> This assumes that there are only three choices: >>> - narrow build that is buggy (surrogate pairs for astral characters) >>> - wide build that is 4-fold space inefficient for wide variety of >>> common (ASCII) use-cases >>> - flexible string engine that chooses a small tradeoff of space >>> efficiency over time efficiency. >>> >>> There is a fourth choice: narrow build that chooses to be partial over >>> being buggy. ie when an astral character is encountered, an exception >>> is thrown rather than trying to fudge it into a 16-bit >>> representation. >> >> >> As a simple factual matter, narrow builds of Python 3.2 don't do that. >> So it doesn't factor into my original statement. But if you're talking >> about a proposal for 3.4, then sure, that's a theoretical possibility. >> It wouldn't be "buggy" in the sense of "string indexing/slicing >> unexpectedly does the wrong thing", but it would still be incomplete >> Unicode support, and I don't think people would appreciate it. Much >> better to have graceful degradation: if there are non-BMP characters >> in the string, then instead of throwing an exception, it just makes >> the string wider. >> > [snip] > Do you mean that instead of switching between 1/2/4 bytes per codepoint > it would switch between 2/4 bytes per codepoint? That's my point. We already have the better version. :) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-03-14 02:01 +0000 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3282.1363226492.2939.python-list@python.org> |
| In reply to | #41184 |
On 14/03/2013 00:55, Chris Angelico wrote: > On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python@mrabarnett.plus.com> wrote: >> On 13/03/2013 23:43, Chris Angelico wrote: >>> >>> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote: >>>> >>>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote: >>>>> >>>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote: >>>>> > Uhhh.. >>>>> > Making the subject line useful for all readers >>>>> >>>>> I should have read this one before replying in the other thread. >>>>> >>>>> jmf, I'd like to see evidence that there has been a performance >>>>> regression compared against a wide build of Python 3.2. You still have >>>>> never answered this fundamental, that the narrow builds of Python are >>>>> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you >>>>> me, the utterly unnecessary hassles I have had to deal with when >>>>> permitting user-provided .js code to script my engine have wasted >>>>> rather more dev hours than you would believe - there are rather a lot >>>>> of stupid edge cases to deal with. >>>> >>>> >>>> This assumes that there are only three choices: >>>> - narrow build that is buggy (surrogate pairs for astral characters) >>>> - wide build that is 4-fold space inefficient for wide variety of >>>> common (ASCII) use-cases >>>> - flexible string engine that chooses a small tradeoff of space >>>> efficiency over time efficiency. >>>> >>>> There is a fourth choice: narrow build that chooses to be partial over >>>> being buggy. ie when an astral character is encountered, an exception >>>> is thrown rather than trying to fudge it into a 16-bit >>>> representation. >>> >>> >>> As a simple factual matter, narrow builds of Python 3.2 don't do that. >>> So it doesn't factor into my original statement. But if you're talking >>> about a proposal for 3.4, then sure, that's a theoretical possibility. >>> It wouldn't be "buggy" in the sense of "string indexing/slicing >>> unexpectedly does the wrong thing", but it would still be incomplete >>> Unicode support, and I don't think people would appreciate it. Much >>> better to have graceful degradation: if there are non-BMP characters >>> in the string, then instead of throwing an exception, it just makes >>> the string wider. >>> >> [snip] >> Do you mean that instead of switching between 1/2/4 bytes per codepoint >> it would switch between 2/4 bytes per codepoint? > > That's my point. We already have the better version. :) > If a later version of Python switched between 2/4 bytes per codepoint, how much difference would it make in terms of memory and speed compared to Python 3.2 (fixed width) and Python 3.3 (3 widths)? The vast majority of the time, 2 bytes per codepoint is sufficient, but would that result in less switching between widths and therefore higher performance, or would the use of more memory (2 bytes when 1 byte would do) offset that? (And I'm talking about significant differences here.)
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-03-14 04:05 +0000 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <51414c75$0$29965$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #41206 |
On Thu, 14 Mar 2013 02:01:35 +0000, MRAB wrote: > On 14/03/2013 00:55, Chris Angelico wrote: >> On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python@mrabarnett.plus.com> >> wrote: >>> On 13/03/2013 23:43, Chris Angelico wrote: >>>> >>>> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote: >>>>> >>>>> On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote: >>>>>> >>>>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> >>>>>> wrote: >>>>>> > Uhhh.. >>>>>> > Making the subject line useful for all readers >>>>>> >>>>>> I should have read this one before replying in the other thread. >>>>>> >>>>>> jmf, I'd like to see evidence that there has been a performance >>>>>> regression compared against a wide build of Python 3.2. You still >>>>>> have never answered this fundamental, that the narrow builds of >>>>>> Python are *BUGGY* in the same way that JavaScript/ECMAScript is. >>>>>> And believe you me, the utterly unnecessary hassles I have had to >>>>>> deal with when permitting user-provided .js code to script my >>>>>> engine have wasted rather more dev hours than you would believe - >>>>>> there are rather a lot of stupid edge cases to deal with. >>>>> >>>>> >>>>> This assumes that there are only three choices: - narrow build that >>>>> is buggy (surrogate pairs for astral characters) - wide build that >>>>> is 4-fold space inefficient for wide variety of common (ASCII) >>>>> use-cases >>>>> - flexible string engine that chooses a small tradeoff of space >>>>> efficiency over time efficiency. >>>>> >>>>> There is a fourth choice: narrow build that chooses to be partial >>>>> over being buggy. ie when an astral character is encountered, an >>>>> exception is thrown rather than trying to fudge it into a 16-bit >>>>> representation. >>>> >>>> >>>> As a simple factual matter, narrow builds of Python 3.2 don't do >>>> that. So it doesn't factor into my original statement. But if you're >>>> talking about a proposal for 3.4, then sure, that's a theoretical >>>> possibility. It wouldn't be "buggy" in the sense of "string >>>> indexing/slicing unexpectedly does the wrong thing", but it would >>>> still be incomplete Unicode support, and I don't think people would >>>> appreciate it. Much better to have graceful degradation: if there are >>>> non-BMP characters in the string, then instead of throwing an >>>> exception, it just makes the string wider. >>>> >>> [snip] >>> Do you mean that instead of switching between 1/2/4 bytes per >>> codepoint it would switch between 2/4 bytes per codepoint? >> >> That's my point. We already have the better version. :) >> > If a later version of Python switched between 2/4 bytes per codepoint, > how much difference would it make in terms of memory and speed compared > to Python 3.2 (fixed width) and Python 3.3 (3 widths)? > > The vast majority of the time, 2 bytes per codepoint is sufficient, but > would that result in less switching between widths and therefore higher > performance, or would the use of more memory (2 bytes when 1 byte would > do) offset that? > > (And I'm talking about significant differences here.) That depends on how you use the strings. Because strings are immutable, there isn't really anything like "switching between widths" -- the width is set when the string is created, and then remains fixed. It is true that when you create a string, Python sometimes has to do some work to determine what width it needs, but that's effectively a fixed- cost per string. It's relatively trivial compared to the cost of other string operations, but it is a real cost. If all you do is create the strings then throw them away, as JMF tends to do in his benchmarks, you repeatedly pay the cost without seeing the benefit. On the other hand, Python is *full* of large numbers of ASCII strings, and many users use lots of Latin1 strings. Both of these save significant amounts of memory: almost 50% of what they would otherwise use in a narrow build, and almost 75% in a wide build. This memory saving has real consequences, performance-wise. Python's memory management can be more efficient, since objects in the heap are smaller. I'm not sure if objects ever move in the heap (I think Java's memory manager does move objects around, hence Jython will do so, but I'm not sure about CPython), but even if they don't, its obviously faster to allocate a certain sized block of memory the more free memory you have, and you'll have more free memory if any pre-existing objects in the heap are smaller. I expect that traversing a block of memory byte-by-byte may be faster than traversing it 2x or 4x bytes at a time. My testing suggests that iterating over a 1-byte width string is about three times faster than iterating over a 2-byte or 4-byte wide string. But that may depend on your OS and hardware. Finally, there may be CPU effects, to do with how quickly strings can be passed through the CPU pipelines, whether data is found in the CPU cache or not, etc. Obviously this too will depend on the size of the strings. You can squeeze 1K of data through the CPU faster than 4K of data. In practice, how much of an effect will this have? It's hard to say without testing, but indications with real-world applications indicate that Python 3.3 not only saves significant memory over 3.2 narrow builds, but for real-world code, it can often be a little faster as well. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-14 17:47 +1100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3286.1363243635.2939.python-list@python.org> |
| In reply to | #41209 |
On Thu, Mar 14, 2013 at 3:05 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> That depends on how you use the strings. Because strings are immutable,
> there isn't really anything like "switching between widths" -- the width
> is set when the string is created, and then remains fixed.
The nearest thing to "switching" is where you repeatedly replace() or
append/slice to add/remove the one non-ASCII character that your
contrived test is using. Let's see...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32
ASCII -> ASCII:
>>> timeit.timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.14999895238081962
ASCII -> BMP:
>>> timeit.timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
1.7513426985832012
BMP -> BMP:
>>> timeit.timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.22562895563542895
ASCII -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
1.9037101084076369
BMP -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
1.9659967956821163
SMP -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.7214749360603037
So there *is* cost to "changing size". Trying them again in Python 2.6 Narrow:
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32
ASCII -> ASCII:
>>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000)
0.53506213778566547
ASCII -> BMP:
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000)
0.57752172412974268
BMP -> BMP:
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000)
0.53309121690045913
ASCII -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000)
0.55128347317885584
BMP -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000)
0.55610140394938412
SMP -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000)
0.6599570615818493
Much more consistent. (Note that the SMP timings are quite probably a
bit off as the string will continue to grow - I'm taking off one
16-bit character and putting on two.)
I don't have a 2.6 wide build on the same hardware, so these times
don't truly compare to the above ones. This is slower hardware than
the above tests.
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
>>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000)
1.5774970054626465
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000)
1.5743560791015625
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000)
1.6072981357574463
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000)
1.6745591163635254
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000)
1.6705770492553711
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000)
1.7078530788421631
Here's my reading of all these stats. Python 3.3's str is faster than
2.6's unicode when the copy can be done directly (ie when the size
isn't changing), but converting sizes costs a lot (suggestion: memcpy
is blazingly fast, no surprise there). Since MOST string operations
won't change the size, this is a benefit to most programs.
I expect that Python 3.2 will behave comparably to the 2.6 stats, but
I don't have 3.2s handy - can someone confirm please?
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-03-14 03:48 -0700 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <7cd57e96-663b-4a3b-a2c8-2fdbf730fa9b@hd10g2000pbc.googlegroups.com> |
| In reply to | #41212 |
On Mar 14, 11:47 am, Chris Angelico <ros...@gmail.com> wrote: <snipped> > I expect that Python 3.2 will behave comparably to the 2.6 stats, but > I don't have 3.2s handy - can someone confirm please? I have 3.2 but not 3.3. Can run it later today if no one does. But better if someone with both on the same machine do the comparison. jmf will you please run Chris' examples on all your pythons?
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-14 19:14 -0400 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3325.1363302886.2939.python-list@python.org> |
| In reply to | #41225 |
On 3/14/2013 6:48 AM, rusi wrote: > On Mar 14, 11:47 am, Chris Angelico <ros...@gmail.com> wrote: > <snipped> >> I expect that Python 3.2 will behave comparably to the 2.6 stats, but >> I don't have 3.2s handy - can someone confirm please? > > I have 3.2 but not 3.3. Can run it later today if no one does. > But better if someone with both on the same machine do the comparison. The python devs use the microbenchmarks in Tools/stringbench/stringbench.py, which covers all string operations, as the basis for improving particular string functions. Overall, Unicode is nearly as fast as bytes and 3.3 as fast as 3.2. Find/replace is the notable exception in stringbench, so it is an anomaly. Other things are faster in 3.3. In selecting the new implementation, the devs also considered space and speed gains that do not show up in microbenchmarks. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-14 20:48 -0400 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3326.1363308513.2939.python-list@python.org> |
| In reply to | #41225 |
On 3/14/2013 7:14 PM, Terry Reedy wrote: > On 3/14/2013 6:48 AM, rusi wrote: >> On Mar 14, 11:47 am, Chris Angelico <ros...@gmail.com> wrote: >> <snipped> >>> I expect that Python 3.2 will behave comparably to the 2.6 stats, but >>> I don't have 3.2s handy - can someone confirm please? >> >> I have 3.2 but not 3.3. Can run it later today if no one does. >> But better if someone with both on the same machine do the comparison. > > The python devs use the microbenchmarks in > Tools/stringbench/stringbench.py, which covers all string operations, as > the basis for improving particular string functions. Overall, Unicode is > nearly as fast as bytes and 3.3 as fast as 3.2. Find/replace is the > notable exception in stringbench, so it is an anomaly. Other things are > faster in 3.3. In selecting the new implementation, the devs also > considered space and speed gains that do not show up in microbenchmarks. Links to the readme and code for stringbench can be found here: http://hg.python.org/cpython/file/c25bc2587c48/Tools/stringbench -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-03-15 10:07 -0700 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <4eb54568-3135-4e81-9784-ff3ed989916b@mz7g2000pbb.googlegroups.com> |
| In reply to | #41225 |
3.2 and 2.7 results on my desktop using Chris examples
(Hope I cut-pasted them correctly)
-----------------------------
Welcome to the Emacs shell
~ $ python3
Python 3.2.3 (default, Feb 20 2013, 17:02:41)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.2893378734588623
>>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
0.2842249870300293
>>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.28406381607055664
>>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
0.28420209884643555
>>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
0.2853250503540039
>>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.283905029296875
>>>
~ $ python
Python 2.7.3 (default, Jan 2 2013, 16:53:07)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.20418286323547363
>>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
0.20579099655151367
>>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.5055279731750488
>>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
0.28449511528015137
>>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
0.6001529693603516
>>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.8430721759796143
[toc] | [prev] | [next] | [standalone]
| From | Andriy Kornatskyy <andriy.kornatskyy@live.com> |
|---|---|
| Date | 2013-03-15 21:04 +0300 |
| Subject | RE: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3349.1363370733.2939.python-list@python.org> |
| In reply to | #41282 |
$ python3.2
Python 3.2.3 (default, Jun 25 2012, 22:55:05)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import repeat
>>> repeat("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
[0.2566258907318115, 0.14485502243041992, 0.14464998245239258]
>>> repeat("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
[0.25584888458251953, 0.1340939998626709, 0.1338820457458496]
>>> repeat("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
[0.2571289539337158, 0.13403892517089844, 0.13388800621032715]
>>> repeat("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
[0.5022759437561035, 0.3970041275024414, 0.3764481544494629]
>>> repeat("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
[0.5213770866394043, 0.38585615158081055, 0.40251588821411133]
>>> repeat("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
[0.768744945526123, 0.5852570533752441, 0.6029140949249268]
$ python3.3
Python 3.3.0 (default, Sep 29 2012, 15:35:49)
[GCC 4.7.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import repeat
>>> repeat("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
[0.0985728640225716, 0.0984080360212829, 0.07457763599813916]
>>> repeat("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
[0.901988381985575, 0.7517840950167738, 0.7540924890199676]
>>> repeat("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
[0.3069786810083315, 0.17701858800137416, 0.1769046070112381]
>>> repeat("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
[1.081760977016529, 0.9099628589756321, 0.9926943230093457]
>>> repeat("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
[1.2101859120011795, 1.1039280130062252, 0.9306247030035593]
>>> repeat("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
[0.4759294819959905, 0.35435649199644104, 0.3540659479913302]
----------------------------------------
> Date: Fri, 15 Mar 2013 10:07:48 -0700
> Subject: Re: String performance regression from python 3.2 to 3.3
> From: rustompmody@gmail.com
> To: python-list@python.org
>
> 3.2 and 2.7 results on my desktop using Chris examples
> (Hope I cut-pasted them correctly)
> -----------------------------
> Welcome to the Emacs shell
>
> ~ $ python3
> Python 3.2.3 (default, Feb 20 2013, 17:02:41)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from timeit import timeit
> >>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
> 0.2893378734588623
> >>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
> 0.2842249870300293
>
> >>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
> 0.28406381607055664
> >>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
> 0.28420209884643555
> >>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
> 0.2853250503540039
> >>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
> 0.283905029296875
> >>>
>
> ~ $ python
> Python 2.7.3 (default, Jan 2 2013, 16:53:07)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from timeit import timeit
> >>> timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
> 0.20418286323547363
> >>> timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
>
> 0.20579099655151367
> >>> timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
> 0.5055279731750488
> >>> timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
> 0.28449511528015137
> >>> timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
> 0.6001529693603516
> >>> timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
> 0.8430721759796143
> --
> http://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-03-13 22:35 -0400 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3283.1363228557.2939.python-list@python.org> |
| In reply to | #41184 |
On 3/13/2013 7:43 PM, Chris Angelico wrote: > On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody@gmail.com> wrote: > >> This assumes that there are only three choices: >> - narrow build that is buggy (surrogate pairs for astral characters) >> - wide build that is 4-fold space inefficient for wide variety of >> common (ASCII) use-cases >> - flexible string engine that chooses a small tradeoff of space >> efficiency over time efficiency. Wrong. Python almost certainly runs faster with the new string representation. This has been explained previously more than once. >> There is a fourth choice: narrow build that chooses to be partial over >> being buggy. ie when an astral character is encountered, an exception >> is thrown rather than trying to fudge it into a 16-bit >> representation. This is what tcl/tk does, and it is a dammed nuisance. Completely unacceptible for Python's string type. ... > It's complexity cost, though, and people would need to know when it > would be worth giving Python that switch to change its string format. > Plus, every C extension would need to cope with both formats. I > personally doubt it'd be worth it, but if you want to knock together a > patched CPython and get some timing stats, I'm sure this list or > python-dev will be happy to discuss the matter. :) I presume the smiley indicates that you know that python developers are too busy with real problems to have any interest in bogus solutions to bogus problems. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-14 17:21 +1100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3285.1363242063.2939.python-list@python.org> |
| In reply to | #41184 |
On Thu, Mar 14, 2013 at 1:35 PM, Terry Reedy <tjreedy@udel.edu> wrote: >On 3/13/2013 7:43 PM, Chris Angelico wrote: >> It's complexity cost, though, and people would need to know when it >> would be worth giving Python that switch to change its string format. >> Plus, every C extension would need to cope with both formats. I >> personally doubt it'd be worth it, but if you want to knock together a >> patched CPython and get some timing stats, I'm sure this list or >> python-dev will be happy to discuss the matter. :) > > > I presume the smiley indicates that you know that python developers are too > busy with real problems to have any interest in bogus solutions to bogus > problems. It indicates more that the list(s) would almost certainly open up with quite a bit of discussion - especially this one. It's not hard to get talk happening, as evidenced by the number of times we've already discussed this very topic. Frankly, I doubt there'll be anything to discuss - that the patched version will be consistently worse; but if I've learned one thing about timings, it's that there are surprises *everywhere*, so I'm not prepared to state categorically that it *cannot* be better. (I will, however, state that I do not expect any such improvement to be worth the trouble of writing it.) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Thomas 'PointedEars' Lahn <PointedEars@web.de> |
|---|---|
| Date | 2013-03-13 18:42 +0100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <2992273.neLn1eVAPo@PointedEars.de> |
| In reply to | #41170 |
Chris Angelico wrote: > On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote: >> Uhhh.. >> Making the subject line useful for all readers > > I should have read this one before replying in the other thread. > > jmf, I'd like to see evidence that there has been a performance > regression compared against a wide build of Python 3.2. You still have > never answered this fundamental, that the narrow builds of Python are > *BUGGY* in the same way that JavaScript/ECMAScript is. Interesting. From my work I was under the impression that I knew ECMAScript and its implementations fairly well, yet I have never heard of this before. What do you mean by “narrow build” and “wide build” and what exactly is the bug “narrow builds” of Python 3.2 have in common with JavaScript/ECMAScript? To which implementation of ECMAScript are you referring – or are you referring to the Specification as such? -- PointedEars Twitter: @PointedEars2 Please do not Cc: me. / Bitte keine Kopien per E-Mail.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-03-14 11:19 +1100 |
| Subject | Re: String performance regression from python 3.2 to 3.3 |
| Message-ID | <mailman.3278.1363220353.2939.python-list@python.org> |
| In reply to | #41188 |
On Thu, Mar 14, 2013 at 4:42 AM, Thomas 'PointedEars' Lahn
<PointedEars@web.de> wrote:
> Chris Angelico wrote:
>
>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompmody@gmail.com> wrote:
>>> Uhhh..
>>> Making the subject line useful for all readers
>>
>> I should have read this one before replying in the other thread.
>>
>> jmf, I'd like to see evidence that there has been a performance
>> regression compared against a wide build of Python 3.2. You still have
>> never answered this fundamental, that the narrow builds of Python are
>> *BUGGY* in the same way that JavaScript/ECMAScript is.
>
> Interesting. From my work I was under the impression that I knew ECMAScript
> and its implementations fairly well, yet I have never heard of this before.
>
> What do you mean by “narrow build” and “wide build” and what exactly is the
> bug “narrow builds” of Python 3.2 have in common with JavaScript/ECMAScript?
> To which implementation of ECMAScript are you referring – or are you
> referring to the Specification as such?
The ECMAScript spec says that strings are stored and represented in
UTF-16. Python versions up to 3.2 came in two varieties: narrow, which
included (I believe) the Windows builds available on python.org, and
wide, which was (again, I think) the default Linux config. The problem
predates Python 3 and its default string being Unicode - the Py2
unicode type has the same issue:
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32
>>> u"\U00012345"
u'\U00012345'
>>> len(_)
2
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u"\U00012345"
u'\U00012345'
>>> len(_)
1
That's the Python msi installer, and the default system Python from an
Ubuntu 10.10. The exact same code does different things on different
platforms, and on the Windows (narrow-build), it's possible to split
surrogates:
>>> u"\U00012345"[0]
u'\ud808'
>>> u"\U00012345"[1]
u'\udf45'
You can see the same thing in Javascript too. Here's a little demo I
just knocked together:
<script>
function foo()
{
var txt=document.getElementById("in").value;
var msg="";
for (var i=0;i<txt.length;++i) msg+="["+i+"]: "+txt.charCodeAt(i)+"
"+txt.charCodeAt(i).toString(16)+"\n";
document.getElementById("out").value=msg;
}
</script>
<input id=in><input type=button onclick="foo()"
value="Show"><br><textarea id=out rows=25 cols=80></textarea>
Give it an ASCII string and you'll see, as expected, one index (based
on string indexing or charCodeAt, same thing) for each character. Same
if it's all BMP. But put an astral character in and you'll see
00.00.d8.00/24 (oh wait, CIDR notation doesn't work in Unicode) come
up. I raised this issue on the Google V8 list and on the ECMAScript
list es-discuss@mozilla.org, and was basically told that since
JavaScript has been buggy for so long, there's no chance of ever
making it bug-free:
https://mail.mozilla.org/pipermail/es-discuss/2012-December/027384.html
Fortunately for Python, there are version numbers, and policies that
permit bugs to actually get fixed. (Which is why, for instance, Debian
Squeeze still ships Python 2.6 rather than upgrading to 2.7 - in case
some script is broken by that change. Can't do that with web
browsers.) As of Python 3.3, all Pythons function the same way: it's
semantically a "wide build" (UTF-32), but with a memory usage
optimization. That's how it needs to be.
ChrisA
[toc] | [prev] | [next] | [standalone]
Page 1 of 3 [1] 2 3 Next page →
Back to top | Article view | comp.lang.python
csiph-web