Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35115 > unrolled thread
| Started by | wxjmfauth@gmail.com |
|---|---|
| First post | 2012-12-19 06:23 -0800 |
| Last post | 2012-12-20 17:34 -0700 |
| Articles | 20 on this page of 47 — 13 participants |
Back to article view | Back to comp.lang.python
Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 06:23 -0800
Re: Py 3.3, unicode / upper() Thomas Bach <thbach@students.uni-mainz.de> - 2012-12-19 15:43 +0100
Re: Py 3.3, unicode / upper() Christian Heimes <christian@python.org> - 2012-12-19 15:52 +0100
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 12:55 -0800
Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 14:23 -0700
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:42 -0800
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:42 -0800
Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 13:01 +1100
Re: Py 3.3, unicode / upper() Westley Martínez <anikom15@gmail.com> - 2012-12-19 18:53 -0800
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 12:55 -0800
Re: Py 3.3, unicode / upper() Stefan Krah <stefan-usenet@bytereef.org> - 2012-12-19 16:01 +0100
Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 02:17 +1100
Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-19 16:18 +0100
Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-19 16:22 +0100
Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 02:40 +1100
Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 15:57 +0100
Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 11:27 -0700
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 13:18 -0800
Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 14:31 -0700
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:40 -0800
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:48 -0500
Re: Py 3.3, unicode / upper() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-20 22:51 +0000
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:40 -0800
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 13:18 -0800
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-19 19:39 -0500
Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 13:03 +1100
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-19 21:54 -0500
Re: Py 3.3, unicode / upper() Westley Martínez <anikom15@gmail.com> - 2012-12-19 19:12 -0800
Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 14:22 +1100
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 00:32 -0500
Re: Py 3.3, unicode / upper() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-20 05:51 +0000
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:57 -0800
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:30 -0500
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:57 -0800
Re: Py 3.3, unicode / upper() Serhiy Storchaka <storchaka@gmail.com> - 2012-12-27 21:00 +0200
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-27 11:36 -0800
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-27 11:36 -0800
Re: Py 3.3, unicode / upper() Christian Heimes <christian@python.org> - 2012-12-19 16:33 +0100
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-29 11:16 -0800
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-29 11:16 -0800
Re: Py 3.3, unicode / upper() Benjamin Peterson <benjamin@python.org> - 2012-12-19 20:25 +0000
Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:19 -0800
Re: Py 3.3, unicode / upper() MRAB <python@mrabarnett.plus.com> - 2012-12-20 20:20 +0000
Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-21 08:19 +1100
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:12 -0500
Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:59 -0500
Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-20 17:34 -0700
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-20 17:48 -0500 |
| Message-ID | <mailman.1118.1356043711.29569.python-list@python.org> |
| In reply to | #35214 |
On 12/20/2012 2:40 PM, wxjmfauth@gmail.com wrote: > What should a Python user think, if he sees his strings > are comsuming more memory just because he uses non ascii > characters What should a Python user think, if he (or she) sees his (or her) strings sometimes or often consuming less memory than they did previously? I think the person should be grateful that people volunteered to make the improvement, rather than ungratefully bitch about it. > or he sees his strings are changing just because > he "uppercases" them. Uppercasing strings is supposed to change strings. > Unicode is here to serve anybody. This we agree on. Python3.3 unicode serves everybody better than 3.2 does. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-12-20 22:51 +0000 |
| Message-ID | <50d3965a$0$29967$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #35214 |
On Thu, 20 Dec 2012 11:40:21 -0800, wxjmfauth wrote: > I do not care > about this optimization. I'm not an ascii user. As a non ascii user, > this optimization is just irrelevant. WRONG. Every Python user is an ASCII user. Every Python program has hundreds or thousands of ASCII strings. # === example === import random There's already one ASCII string in your code: the module name "random" is ASCII. Let's look inside that module: py> dir(random) ['BPF', 'LOG4', 'NV_MAGICCONST', 'RECIP_BPF', 'Random', 'SG_MAGICCONST', 'SystemRandom', 'TWOPI', '_BuiltinMethodType', '_MethodType', '_Sequence', '_Set', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__initializing__', '__loader__', '__name__', '__package__', '_acos', '_ceil', '_cos', '_e', '_exp', '_inst', '_log', '_pi', '_random', '_sha512', '_sin', '_sqrt', '_test', '_test_generator', '_urandom', '_warn', 'betavariate', 'choice', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 'vonmisesvariate', 'weibullvariate'] That's another 58 ASCII strings. Let's pick one of those: py> dir(random.Random) ['VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_randbelow', 'betavariate', 'choice', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 'vonmisesvariate', 'weibullvariate'] That's another 51 ASCII strings. Let's pick one of them: py> dir(random.Random.shuffle) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] And another 34 ASCII strings. So to get access to just *one* method of *one* class of *one* module, we have already seen up to 144 ASCII strings. (Some of them will be duplicated.) Even if every one of *your* classes, methods, functions, modules and variables are using non-ASCII names, you will still use ASCII strings for built-in functions and standard library modules. > What should a Python user think, if he sees his strings are comsuming > more memory just because he uses non ascii characters WRONG! His strings are consuming just as much memory as they need to. You cannot fit ten thousand different characters into a single byte. A single byte can represent only 2**8 = 256 characters. Two bytes can only represent 65536 characters at most. Four bytes can represent the entire range of every character ever represented in human history, and more, but it is terribly wasteful: most strings do not use a billion different characters, and so use of a four-byte character encoding uses up to four times as much memory as necessary. You are imagining that non-ASCII users are being discriminated against, with their strings being unfairly bloated. But that is not the case. Their strings would be equally large in a Python wide-build, give or take whatever overhead of the string object that change from version to version. If you are not comparing a wide-build of Python to Python 3.3, then your comparison is faulty. You are comparing "buggy Unicode, cannot handle the supplementary planes" with "fixed Unicode, can handle the supplementary planes". Python 3.2 narrow builds save memory by introducing bugs into Unicode strings. Python 3.3 fixes those bugs and still saves memory. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-20 11:40 -0800 |
| Message-ID | <mailman.1110.1356037281.29569.python-list@python.org> |
| In reply to | #35160 |
Le mercredi 19 décembre 2012 22:31:42 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 2:18 PM, <wxjmfauth@gmail.com> wrote:
>
> > latin-1 (iso-8859-1) ? are you sure ?
>
>
>
> Yes.
>
>
>
> >>>> sys.getsizeof('a')
>
> > 26
>
> >>>> sys.getsizeof('ab')
>
> > 27
>
> >>>> sys.getsizeof('aé')
>
> > 39
>
>
>
> Compare to:
>
>
>
> >>> sys.getsizeof('a\u0100')
>
> 42
>
>
>
> The reason for the difference you posted is that pure ASCII strings
>
> have a further optimization, which I glossed over and which is purely
>
> a savings in overhead:
>
>
>
> >>> sys.getsizeof('abcde') - sys.getsizeof('a')
>
> 4
>
> >>> sys.getsizeof('ábçdê') - sys.getsizeof('á')
>
> 4
-----
I know all of this. And this is exactly, what I explained.
I do not care about this optimization. I'm not an ascii user.
As a non ascii user, this optimization is just irrelevant.
What should a Python user think, if he sees his strings
are comsuming more memory just because he uses non ascii
characters or he sees his strings are changing just because
he "uppercases" them.
Unicode is here to serve anybody.
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-19 13:18 -0800 |
| Message-ID | <mailman.1073.1355951888.29569.python-list@python.org> |
| In reply to | #35147 |
Le mercredi 19 décembre 2012 19:27:38 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 8:40 AM, Chris Angelico <rosuav@gmail.com> wrote:
>
> > You may not be familiar with jmf. He's one of our resident trolls, and
>
> > he has a bee in his bonnet about PEP 393 strings, on the basis that
>
> > they take up more space in memory than a narrow build of Python 3.2
>
> > would, for a string with lots of BMP characters and one non-BMP. In
>
> > 3.2 narrow builds, strings were stored in UTF-16, with *surrogate
>
> > pairs* for non-BMP characters. This means that len() counts them
>
> > twice, as does string indexing/slicing. That's a major bug, especially
>
> > as your Python code will do different things on different platforms -
>
> > most Linux builds of 3.2 are "wide" builds, storing characters in four
>
> > bytes each.
>
>
>
> >From what I've been able to discern, his actual complaint about PEP
>
> 393 stems from misguided moral concerns. With PEP-393, strings that
>
> can be fully represented in Latin-1 can be stored in half the space
>
> (ignoring fixed overhead) compared to strings containing at least one
>
> non-Latin-1 character. jmf thinks this optimization is unfair to
>
> non-English users and immoral; he wants Latin-1 strings to be treated
>
> exactly like non-Latin-1 strings (I don't think he actually cares
>
> about non-BMP strings at all; if narrow-build Unicode is good enough
>
> for him, then it must be good enough for everybody). Unfortunately
>
> for him, the Latin-1 optimization is rather trivial in the wider
>
> context of PEP-393, and simply removing that part alone clearly
>
> wouldn't be doing anybody any favors. So for him to get what he
>
> wants, the entire PEP has to go.
>
>
>
> It's rather like trying to solve the problem of wealth disparity by
>
> forcing everyone to dump their excess wealth into the ocean.
----
latin-1 (iso-8859-1) ? are you sure ?
>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ab')
27
>>> sys.getsizeof('aé')
39
Time to go to bed. More complete answer tomorrow.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-19 19:39 -0500 |
| Message-ID | <mailman.1081.1355963989.29569.python-list@python.org> |
| In reply to | #35130 |
On 12/19/2012 10:40 AM, Chris Angelico wrote: > Interestingly, IDLE on my Windows box can't handle the bolded > characters very well... > >>>> s="\U0001d407\U0001d41e\U0001d425\U0001d425\U0001d428, \U0001d430\U0001d428\U0001d42b\U0001d425\U0001d41d!" >>>> print(s) > Traceback (most recent call last): > File "<pyshell#2>", line 1, in <module> > print(s) > UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d407' > in position 0: Non-BMP character not supported in Tk On 3.3.0 on Win7 , the expressions 's', 'repr(s)', and 'str(s)' (without the quotes) echo the input as entered (with \U escapes) while 'print(s)' gets the same traceback you did. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-12-20 13:03 +1100 |
| Message-ID | <mailman.1084.1355969013.29569.python-list@python.org> |
| In reply to | #35130 |
On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > From what I've been able to discern, [jmf's] actual complaint about PEP > 393 stems from misguided moral concerns. With PEP-393, strings that > can be fully represented in Latin-1 can be stored in half the space > (ignoring fixed overhead) compared to strings containing at least one > non-Latin-1 character. jmf thinks this optimization is unfair to > non-English users and immoral; he wants Latin-1 strings to be treated > exactly like non-Latin-1 strings (I don't think he actually cares > about non-BMP strings at all; if narrow-build Unicode is good enough > for him, then it must be good enough for everybody). Not entirely; most of his complaints are based on performance (speed and/or memory) of 3.3 compared to a narrow build of 3.2, using silly edge cases to prove how much worse 3.3 is, while utterly ignoring the fact that, in those self-same edge cases, 3.2 is buggy. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-19 21:54 -0500 |
| Message-ID | <mailman.1088.1355972082.29569.python-list@python.org> |
| In reply to | #35130 |
On 12/19/2012 9:03 PM, Chris Angelico wrote: > On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: >> From what I've been able to discern, [jmf's] actual complaint about PEP >> 393 stems from misguided moral concerns. With PEP-393, strings that >> can be fully represented in Latin-1 can be stored in half the space >> (ignoring fixed overhead) compared to strings containing at least one >> non-Latin-1 character. jmf thinks this optimization is unfair to >> non-English users and immoral; he wants Latin-1 strings to be treated >> exactly like non-Latin-1 strings (I don't think he actually cares >> about non-BMP strings at all; if narrow-build Unicode is good enough >> for him, then it must be good enough for everybody). > > Not entirely; most of his complaints are based on performance (speed > and/or memory) of 3.3 compared to a narrow build of 3.2, using silly > edge cases to prove how much worse 3.3 is, while utterly ignoring the > fact that, in those self-same edge cases, 3.2 is buggy. And the fact that stringbench.py is overall about as fast with 3.3 as with 3.2 *on the same Windows 7 machine* (which uses narrow build in 3.2), and that unicode operations are not far from bytes operations when the same thing can be done with both. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Westley Martínez <anikom15@gmail.com> |
|---|---|
| Date | 2012-12-19 19:12 -0800 |
| Message-ID | <mailman.1089.1355973157.29569.python-list@python.org> |
| In reply to | #35130 |
On Wed, Dec 19, 2012 at 09:54:20PM -0500, Terry Reedy wrote: > On 12/19/2012 9:03 PM, Chris Angelico wrote: > >On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > >> From what I've been able to discern, [jmf's] actual complaint about PEP > >>393 stems from misguided moral concerns. With PEP-393, strings that > >>can be fully represented in Latin-1 can be stored in half the space > >>(ignoring fixed overhead) compared to strings containing at least one > >>non-Latin-1 character. jmf thinks this optimization is unfair to > >>non-English users and immoral; he wants Latin-1 strings to be treated > >>exactly like non-Latin-1 strings (I don't think he actually cares > >>about non-BMP strings at all; if narrow-build Unicode is good enough > >>for him, then it must be good enough for everybody). > > > >Not entirely; most of his complaints are based on performance (speed > >and/or memory) of 3.3 compared to a narrow build of 3.2, using silly > >edge cases to prove how much worse 3.3 is, while utterly ignoring the > >fact that, in those self-same edge cases, 3.2 is buggy. > > And the fact that stringbench.py is overall about as fast with 3.3 > as with 3.2 *on the same Windows 7 machine* (which uses narrow build > in 3.2), and that unicode operations are not far from bytes > operations when the same thing can be done with both. > > -- > Terry Jan Reedy Really, why should we be so obsessed with speed anyways? Isn't improving the language and fixing bugs far more important?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-12-20 14:22 +1100 |
| Message-ID | <mailman.1090.1355973763.29569.python-list@python.org> |
| In reply to | #35130 |
On Thu, Dec 20, 2012 at 2:12 PM, Westley Martínez <anikom15@gmail.com> wrote: > Really, why should we be so obsessed with speed anyways? Isn't > improving the language and fixing bugs far more important? Because speed is very important in certain areas. Python can be used in many ways: * Command-line calculator with awesome precision and variable handling * Proglets, written once and run once, doing one simple job and then moving on * Applications that do heaps of work and are run multiple times a day * Internet services (eg web server), contacted many times a second * Etcetera * Etcetera * And quite a few other ways too For the first two, performance isn't very important. No matter how slow the language, it's still going to respond "3" instantly when you enter "1+2", and unless you're writing something hopelessly inefficient or brute-force, the time spent writing a proglet usually dwarfs its execution time. But performance is very important for something like Mercurial, which is invoked many times and always with the user waiting for it. You want to get back to work, not sit there for X seconds while your source control engine fires up and does something. And with a web server, language performance translates fairly directly into latency AND potential requests per second on any given hardware. To be sure, a lot of Python performance hits the level of "sufficient" and doesn't need to go further, but it's still worth considering. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-20 00:32 -0500 |
| Message-ID | <mailman.1091.1355981588.29569.python-list@python.org> |
| In reply to | #35130 |
On 12/19/2012 10:12 PM, Westley Martínez wrote: > On Wed, Dec 19, 2012 at 09:54:20PM -0500, Terry Reedy wrote: >> On 12/19/2012 9:03 PM, Chris Angelico wrote: >>> On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: >>>> From what I've been able to discern, [jmf's] actual complaint about PEP >>>> 393 stems from misguided moral concerns. With PEP-393, strings that >>>> can be fully represented in Latin-1 can be stored in half the space >>>> (ignoring fixed overhead) compared to strings containing at least one >>>> non-Latin-1 character. jmf thinks this optimization is unfair to >>>> non-English users and immoral; he wants Latin-1 strings to be treated >>>> exactly like non-Latin-1 strings (I don't think he actually cares >>>> about non-BMP strings at all; if narrow-build Unicode is good enough >>>> for him, then it must be good enough for everybody). >>> >>> Not entirely; most of his complaints are based on performance (speed >>> and/or memory) of 3.3 compared to a narrow build of 3.2, using silly >>> edge cases to prove how much worse 3.3 is, while utterly ignoring the >>> fact that, in those self-same edge cases, 3.2 is buggy. >> >> And the fact that stringbench.py is overall about as fast with 3.3 >> as with 3.2 *on the same Windows 7 machine* (which uses narrow build >> in 3.2), and that unicode operations are not far from bytes >> operations when the same thing can be done with both. >> >> -- >> Terry Jan Reedy > > Really, why should we be so obsessed with speed anyways? Isn't > improving the language and fixing bugs far more important? Being conservative, there are probably at least 10 enhancement patches and 30 bug fix patches for every performance patch. Performance patches are considered enhancements and only go in new versions with enhancements, where they go through the extended alpha, beta, candidate test and evaluation process. In the unicode case, Jim discovered that find was several times slower in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran the complete stringbency.py and discovered that find (and consequently find and replace) are the only operations with such a slowdown. I also discovered that another at least as common operation, encoding strings that only contain ascii characters to ascii bytes for transmission, is several times as fast in 3.3. So I reported that unless one is only finding substrings in long strings, there is no reason to not upgrade to 3.3. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-12-20 05:51 +0000 |
| Message-ID | <50d2a773$0$29863$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #35180 |
On Thu, 20 Dec 2012 00:32:42 -0500, Terry Reedy wrote:
> In the unicode case, Jim discovered that find was several times slower
> in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran
> the complete stringbency.py and discovered that find (and consequently
> find and replace) are the only operations with such a slowdown. I also
> discovered that another at least as common operation, encoding strings
> that only contain ascii characters to ascii bytes for transmission, is
> several times as fast in 3.3. So I reported that unless one is only
> finding substrings in long strings, there is no reason to not upgrade to
> 3.3.
Yes, and if you remember, Jim (jfm) based his complaints on very possibly
the worst edge-case for the new Unicode implementation:
- generate a large string of characters
- replace every character in that string with another character
By memory:
s = "a"*100000
s = s.replace("a", "b")
or equivalent. Hardly representative of normal string processing, and
likely to be the worst-performing operation on new Unicode strings. And
yet even so, many people reported either a mild slow down or, in a few
cases, a small speed up.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-20 11:57 -0800 |
| Message-ID | <43140393-080f-4dad-98e3-c9f27acd9490@googlegroups.com> |
| In reply to | #35180 |
Le jeudi 20 décembre 2012 06:32:42 UTC+1, Terry Reedy a écrit : > On 12/19/2012 10:12 PM, Westley Martínez wrote: > > > On Wed, Dec 19, 2012 at 09:54:20PM -0500, Terry Reedy wrote: > > >> On 12/19/2012 9:03 PM, Chris Angelico wrote: > > >>> On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > > >>>> From what I've been able to discern, [jmf's] actual complaint about PEP > > >>>> 393 stems from misguided moral concerns. With PEP-393, strings that > > >>>> can be fully represented in Latin-1 can be stored in half the space > > >>>> (ignoring fixed overhead) compared to strings containing at least one > > >>>> non-Latin-1 character. jmf thinks this optimization is unfair to > > >>>> non-English users and immoral; he wants Latin-1 strings to be treated > > >>>> exactly like non-Latin-1 strings (I don't think he actually cares > > >>>> about non-BMP strings at all; if narrow-build Unicode is good enough > > >>>> for him, then it must be good enough for everybody). > > >>> > > >>> Not entirely; most of his complaints are based on performance (speed > > >>> and/or memory) of 3.3 compared to a narrow build of 3.2, using silly > > >>> edge cases to prove how much worse 3.3 is, while utterly ignoring the > > >>> fact that, in those self-same edge cases, 3.2 is buggy. > > >> > > >> And the fact that stringbench.py is overall about as fast with 3.3 > > >> as with 3.2 *on the same Windows 7 machine* (which uses narrow build > > >> in 3.2), and that unicode operations are not far from bytes > > >> operations when the same thing can be done with both. > > >> > > >> -- > > >> Terry Jan Reedy > > > > > > Really, why should we be so obsessed with speed anyways? Isn't > > > improving the language and fixing bugs far more important? > > > > Being conservative, there are probably at least 10 enhancement patches > > and 30 bug fix patches for every performance patch. Performance patches > > are considered enhancements and only go in new versions with > > enhancements, where they go through the extended alpha, beta, candidate > > test and evaluation process. > > > > In the unicode case, Jim discovered that find was several times slower > > in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran > > the complete stringbency.py and discovered that find (and consequently > > find and replace) are the only operations with such a slowdown. I also > > discovered that another at least as common operation, encoding strings > > that only contain ascii characters to ascii bytes for transmission, is > > several times as fast in 3.3. So I reported that unless one is only > > finding substrings in long strings, there is no reason to not upgrade to > > 3.3. > > > > -- > > Terry Jan Reedy -------- I shew a case where the Py33 works 10 times slower than Py32, "replace". You the devs spend your time to correct that case. Now, if I'm putting on the table an exemple working 20 times slower. Will you spend your time to optimize that? I'm affraid, this is the FSR which is problematic, not the corner cases. jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-20 17:30 -0500 |
| Message-ID | <mailman.1117.1356042659.29569.python-list@python.org> |
| In reply to | #35216 |
On 12/20/2012 2:57 PM, wxjmfauth@gmail.com wrote: > I shew a case where the Py33 works 10 times slower than Py32, > "replace". You the devs spend your time to correct that case. I discovered that it is the 'find' part of find and replace that is slower. The comparison is worse on Windows than on *nix. There is an issue on the tracker so it may be improved someday. Most devs are not especially bothered and would rather fix errors as part of their volunteer work. > Now, if I'm putting on the table an exemple working 20 times > slower. Will you spend your time to optimize that? > > I'm affraid, this is the FSR which is problematic, not the > corner cases. I showed another case where 3.3 is a thousand, a million times faster than 3.2. Does that make the old way 'problematic'? Don't you think that the bugs (wrong answers) in narrow builds to be 'problematic'? Do you really think that getting wrong answers faster is better that getting right answers possibly slower? The 'find' operation is just 1 of about 30 that are tested by stringbench.py. Run that on 3.3 and 3.2, as I did, before talking about FSR as 'problematic'. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-20 11:57 -0800 |
| Message-ID | <mailman.1106.1356034079.29569.python-list@python.org> |
| In reply to | #35180 |
Le jeudi 20 décembre 2012 06:32:42 UTC+1, Terry Reedy a écrit : > On 12/19/2012 10:12 PM, Westley Martínez wrote: > > > On Wed, Dec 19, 2012 at 09:54:20PM -0500, Terry Reedy wrote: > > >> On 12/19/2012 9:03 PM, Chris Angelico wrote: > > >>> On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > > >>>> From what I've been able to discern, [jmf's] actual complaint about PEP > > >>>> 393 stems from misguided moral concerns. With PEP-393, strings that > > >>>> can be fully represented in Latin-1 can be stored in half the space > > >>>> (ignoring fixed overhead) compared to strings containing at least one > > >>>> non-Latin-1 character. jmf thinks this optimization is unfair to > > >>>> non-English users and immoral; he wants Latin-1 strings to be treated > > >>>> exactly like non-Latin-1 strings (I don't think he actually cares > > >>>> about non-BMP strings at all; if narrow-build Unicode is good enough > > >>>> for him, then it must be good enough for everybody). > > >>> > > >>> Not entirely; most of his complaints are based on performance (speed > > >>> and/or memory) of 3.3 compared to a narrow build of 3.2, using silly > > >>> edge cases to prove how much worse 3.3 is, while utterly ignoring the > > >>> fact that, in those self-same edge cases, 3.2 is buggy. > > >> > > >> And the fact that stringbench.py is overall about as fast with 3.3 > > >> as with 3.2 *on the same Windows 7 machine* (which uses narrow build > > >> in 3.2), and that unicode operations are not far from bytes > > >> operations when the same thing can be done with both. > > >> > > >> -- > > >> Terry Jan Reedy > > > > > > Really, why should we be so obsessed with speed anyways? Isn't > > > improving the language and fixing bugs far more important? > > > > Being conservative, there are probably at least 10 enhancement patches > > and 30 bug fix patches for every performance patch. Performance patches > > are considered enhancements and only go in new versions with > > enhancements, where they go through the extended alpha, beta, candidate > > test and evaluation process. > > > > In the unicode case, Jim discovered that find was several times slower > > in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran > > the complete stringbency.py and discovered that find (and consequently > > find and replace) are the only operations with such a slowdown. I also > > discovered that another at least as common operation, encoding strings > > that only contain ascii characters to ascii bytes for transmission, is > > several times as fast in 3.3. So I reported that unless one is only > > finding substrings in long strings, there is no reason to not upgrade to > > 3.3. > > > > -- > > Terry Jan Reedy -------- I shew a case where the Py33 works 10 times slower than Py32, "replace". You the devs spend your time to correct that case. Now, if I'm putting on the table an exemple working 20 times slower. Will you spend your time to optimize that? I'm affraid, this is the FSR which is problematic, not the corner cases. jmf
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2012-12-27 21:00 +0200 |
| Message-ID | <mailman.1354.1356634864.29569.python-list@python.org> |
| In reply to | #35130 |
On 19.12.12 17:40, Chris Angelico wrote: > Interestingly, IDLE on my Windows box can't handle the bolded > characters very well... > >>>> s="\U0001d407\U0001d41e\U0001d425\U0001d425\U0001d428, \U0001d430\U0001d428\U0001d42b\U0001d425\U0001d41d!" >>>> print(s) > Traceback (most recent call last): > File "<pyshell#2>", line 1, in <module> > print(s) > UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d407' > in position 0: Non-BMP character not supported in Tk > > I think this is most likely a case of "yeah, Windows XP just sucks". > But I have no reason or inclination to get myself a newer Windows to > find out if it's any different. No, this is a Tcl/Tk limitation (I don't know if this was fixed in 8.6).
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-27 11:36 -0800 |
| Message-ID | <9e40c8de-d4cf-4a64-800d-97caa399bc0a@googlegroups.com> |
| In reply to | #35633 |
Le jeudi 27 décembre 2012 20:00:37 UTC+1, Serhiy Storchaka a écrit : > On 19.12.12 17:40, Chris Angelico wrote: > > > Interestingly, IDLE on my Windows box can't handle the bolded > > > characters very well... > > > > > >>>> s="\U0001d407\U0001d41e\U0001d425\U0001d425\U0001d428, \U0001d430\U0001d428\U0001d42b\U0001d425\U0001d41d!" > > >>>> print(s) > > > Traceback (most recent call last): > > > File "<pyshell#2>", line 1, in <module> > > > print(s) > > > UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d407' > > > in position 0: Non-BMP character not supported in Tk > > > > > > I think this is most likely a case of "yeah, Windows XP just sucks". > > > But I have no reason or inclination to get myself a newer Windows to > > > find out if it's any different. > > > > No, this is a Tcl/Tk limitation (I don't know if this was fixed in 8.6). ----- This is a strange error message. Remember: a coding scheme covers a *set of characters*. The guilty code point corresponds to a character which is not part of the ucs-2 characters set! jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-27 11:36 -0800 |
| Message-ID | <mailman.1358.1356637017.29569.python-list@python.org> |
| In reply to | #35633 |
Le jeudi 27 décembre 2012 20:00:37 UTC+1, Serhiy Storchaka a écrit : > On 19.12.12 17:40, Chris Angelico wrote: > > > Interestingly, IDLE on my Windows box can't handle the bolded > > > characters very well... > > > > > >>>> s="\U0001d407\U0001d41e\U0001d425\U0001d425\U0001d428, \U0001d430\U0001d428\U0001d42b\U0001d425\U0001d41d!" > > >>>> print(s) > > > Traceback (most recent call last): > > > File "<pyshell#2>", line 1, in <module> > > > print(s) > > > UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d407' > > > in position 0: Non-BMP character not supported in Tk > > > > > > I think this is most likely a case of "yeah, Windows XP just sucks". > > > But I have no reason or inclination to get myself a newer Windows to > > > find out if it's any different. > > > > No, this is a Tcl/Tk limitation (I don't know if this was fixed in 8.6). ----- This is a strange error message. Remember: a coding scheme covers a *set of characters*. The guilty code point corresponds to a character which is not part of the ucs-2 characters set! jmf
[toc] | [prev] | [next] | [standalone]
| From | Christian Heimes <christian@python.org> |
|---|---|
| Date | 2012-12-19 16:33 +0100 |
| Message-ID | <mailman.1056.1355931232.29569.python-list@python.org> |
| In reply to | #35115 |
Am 19.12.2012 16:01, schrieb Stefan Krah: > The uppercase ß isn't really needed, since ß does not occur at the beginning > of a word. As far as I know, most Germans wouldn't even know that it has > existed at some point or how to write it. I think Python 3.3+ is using uppercase mapping (uc) instead of simple upper case (suc). Some background: The old German Fractur has three variants of the letter S: capital s: S long s: ſ round s: s. ß is a ligature of ſs. ſ is usually used at the beginning or middle of a syllable while s is used at the end of a syllable. Compare Wachſtube (Wach-Stube == guard room) to Wachstube (Wachs-Tube == tube of wax). :) Christian
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-29 11:16 -0800 |
| Message-ID | <c3556bf7-994a-4050-aa2a-461fe362d53f@googlegroups.com> |
| In reply to | #35133 |
Le mercredi 19 décembre 2012 16:33:50 UTC+1, Christian Heimes a écrit : > > I think Python 3.3+ is using uppercase mapping (uc) instead of simple > > upper case (suc). I think you are thinking correctly. This a clever answer. Note: I do not care about the uc / suc choice. As long there is consistency, I'm fine with the choice. Anyway, the only valid "programming technique" on that field is to create a dedicated lib for a given script (esp. French!) jmf > > > > > > Some background: > > > > The old German Fractur has three variants of the letter S: > > > > capital s: S > > long s: ſ > > round s: s. > > > > ß is a ligature of ſs. ſ is usually used at the beginning or middle of a > > syllable while s is used at the end of a syllable. Compare Wachſtube > > (Wach-Stube == guard room) to Wachstube (Wachs-Tube == tube of wax). :) > > > > Christian
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-12-29 11:16 -0800 |
| Message-ID | <mailman.1449.1356819129.29569.python-list@python.org> |
| In reply to | #35133 |
Le mercredi 19 décembre 2012 16:33:50 UTC+1, Christian Heimes a écrit : > > I think Python 3.3+ is using uppercase mapping (uc) instead of simple > > upper case (suc). I think you are thinking correctly. This a clever answer. Note: I do not care about the uc / suc choice. As long there is consistency, I'm fine with the choice. Anyway, the only valid "programming technique" on that field is to create a dedicated lib for a given script (esp. French!) jmf > > > > > > Some background: > > > > The old German Fractur has three variants of the letter S: > > > > capital s: S > > long s: ſ > > round s: s. > > > > ß is a ligature of ſs. ſ is usually used at the beginning or middle of a > > syllable while s is used at the end of a syllable. Compare Wachſtube > > (Wach-Stube == guard room) to Wachstube (Wachs-Tube == tube of wax). :) > > > > Christian
[toc] | [prev] | [next] | [standalone]
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
Back to top | Article view | comp.lang.python
csiph-web