Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #27204 > unrolled thread
| Started by | Charles Jensen <hopefullycharles@gmail.com> |
|---|---|
| First post | 2012-08-16 15:09 -0700 |
| Last post | 2012-08-20 17:20 -0400 |
| Articles | 20 on this page of 145 — 26 participants |
Back to article view | Back to comp.lang.python
How do I display unicode value stored in a string variable using ord() Charles Jensen <hopefullycharles@gmail.com> - 2012-08-16 15:09 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-17 08:20 +1000
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-16 18:47 -0400
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-16 19:59 -0400
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
Re: How do I display unicode value stored in a string variable using ord() Jerry Hill <malaclypse2@gmail.com> - 2012-08-17 14:21 -0400
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 16:55 -0400
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 23:30 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 04:10 +0000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:18 -0600
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 03:59 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
Re: How do I display unicode value stored in a string variable using ord() Alister <alister.ware@ntlworld.com> - 2012-08-17 06:30 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 01:09 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 12:27 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 08:07 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 16:25 +0100
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 01:36 +1000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:51 -0600
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 02:57 +1000
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 18:28 +0100
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:34 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:35 +0000
New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Peter Otten <__peter__@web.de> - 2012-08-19 09:43 +0200
Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:56 +0000
Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 02:24 -0700
Re: New internal string format in 3.3 Peter Otten <__peter__@web.de> - 2012-08-19 11:37 +0200
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
Re: New internal string format in 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:33 +0000
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:26 +1000
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:29 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 14:46 +0100
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 15:48 +0100
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:48 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:09 +0100
Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-20 07:50 +1000
Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-19 23:38 -0600
Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-20 09:17 -0400
Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-20 22:18 -0600
Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-21 07:48 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:56 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:35 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:30 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 16:09 -0400
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 23:12 -0400
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:33 +0000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 11:50 -0600
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 11:20 -0700
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:31 -0600
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 12:23 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:16 +0000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:46 -0600
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 17:59 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:30 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:45 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:13 +0000
Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-18 11:40 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:50 +0100
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 13:22 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 22:37 +0100
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 11:26 -0700
Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:59 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 07:17 +0000
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 10:46 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:11 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 12:19 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:35 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:01 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 20:10 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:31 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 22:58 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:01 +0000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:11 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 18:24 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:44 -0700
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 01:54 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 11:46 +0100
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 12:31 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 10:51 +0000
Re: How do I display unicode value stored in a string variable using ord() Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-21 17:03 +1000
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:09 +0000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:04 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:25 +0000
Re: How do I display unicode value stored in a string variable using ord() DJC <djc@news.invalid> - 2012-08-19 17:32 +0200
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:34 -0400
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 10:48 -0700
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 11:11 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:50 +0100
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:59 -0400
Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-19 23:13 -0700
Abuse of Big Oh notation [was Re: How do I display unicode value stored in a string variable using ord()] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:15 +0000
Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-19 16:42 -0700
Re: Abuse of Big Oh notation Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-08-20 09:24 +0100
Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 09:01 -0700
Re: Abuse of Big Oh notation Chris Angelico <rosuav@gmail.com> - 2012-08-21 02:09 +1000
Re: Abuse of Big Oh notation Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-20 11:12 -0600
Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 12:29 -0700
Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:16 -0700
Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:20 -0700
Re: Abuse of Big Oh notation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-21 09:53 +0000
Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
Re: Abuse of Big Oh notation Ned Deily <nad@acm.org> - 2012-08-20 18:19 -0700
Abuse of subject, was Re: Abuse of Big Oh notation Peter Otten <__peter__@web.de> - 2012-08-21 09:52 +0200
Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
Re: How do I display unicode value stored in a string variable using ord() Hans Mulder <hansmu@xs4all.nl> - 2012-08-22 20:53 +0200
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 08:42 +1000
Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-19 19:24 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 04:21 +0000
Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-20 00:44 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 05:56 +0000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 23:24 -0700
Re: How do I display unicode value stored in a string variable using ord() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-20 12:58 -0400
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 20:35 -0400
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 14:07 +1000
Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:13 +0100
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:19 +1000
Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:49 +0100
Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 18:03 +0100
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 10:33 -0700
Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:04 +0100
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-19 14:05 -0400
Re: How do I display unicode value stored in a string variable usingord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:18 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:31 +0000
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:03 -0400
Re: How do I display unicode value stored in a string variable using ord() 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-19 17:32 -0700
Re: How do I display unicode value stored in a string variable using ord() Piet van Oostrum <piet@vanoostrum.org> - 2012-08-20 17:20 -0400
Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-21 10:16 -0700 |
| Subject | Re: Abuse of subject, was Re: Abuse of Big Oh notation |
| Message-ID | <mailman.3609.1345569376.4697.python-list@python.org> |
| In reply to | #27556 |
Le mardi 21 août 2012 09:52:09 UTC+2, Peter Otten a écrit :
> wxjmfauth@gmail.com wrote:
>
>
>
> > By chance and luckily, first attempt.
>
>
>
> > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
>
> > , 'œ')"
>
> > 1000000 loops, best of 3: 1.48 usec per loop
>
> > c:\python33\python -m timeit "('€'*100+'€'*100).replace('€'
>
> > , 'œ')"
>
> > 100000 loops, best of 3: 7.62 usec per loop
>
>
>
> OK, that is roughly factor 5. Let's see what I get:
>
>
>
> $ python3.2 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
>
> 100000 loops, best of 3: 1.8 usec per loop
>
> $ python3.3 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
>
> 10000 loops, best of 3: 9.11 usec per loop
>
>
>
> That is factor 5, too. So I can replicate your measurement on an AMD64 Linux
>
> system with self-built 3.3 versus system 3.2.
>
>
>
> > Note
>
> > The used characters are not members of the latin-1 coding
>
> > scheme (btw an *unusable* coding).
>
> > They are however charaters in cp1252 and mac-roman.
>
>
>
> You seem to imply that the slowdown is connected to the inability of latin-1
>
> to encode "œ" and "€" (to take the examples relevant to the above
>
> microbench). So let's repeat with latin-1 characters:
>
>
>
> $ python3.2 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
>
> 100000 loops, best of 3: 1.76 usec per loop
>
> $ python3.3 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
>
> 10000 loops, best of 3: 10.3 usec per loop
>
>
>
> Hm, the slowdown is even a tad bigger. So we can safely dismiss your theory
>
> that an unfortunate choice of the 8 bit encoding is causing it. Do you
>
> agree?
- I do not care too much about the numbers. It's
an attempt to show the principles.
- The fact, considering latin-1 as a bad coding,
lies on the point that is is simply unsuable
for some scripts / languages. It has mainly to do
with source/text files coding. This is not really
the point here.
- Now, the technical aspect. This "coding" (latin-1)
may be considered somehow as the pseudo-coding covering
the unicode code points range 128..255. Unfortunatelly,
this "coding" is not very optimal (or can be see as) when
you work with a full range of Unicode, but is is fine
when one works only in pure latin-1, with only 256
characters.
This range 128..255 is always the critical part
(all codings considered). And probably represents
the most used characters.
I hope, it was not too confused.
I have no proof for my theory. With my experience on that
field, I highly suspect this as the bottleneck.
Some os as before.
Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.5384088242603358, 1.532421642233382, 1.5327445924545433]
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[1.561762063667686, 1.5443503206462594, 1.5458670051605168]
3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.701523104134512, 7.720358191179441, 7.614549852683501]>>>
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[4.887939423990709, 4.868787294350611, 4.865697999795991]
Quite mysterious!
In any way it is a regression.
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-21 10:16 -0700 |
| Subject | Re: Abuse of subject, was Re: Abuse of Big Oh notation |
| Message-ID | <31b05198-4ce0-41c8-8f25-8253a885dff0@googlegroups.com> |
| In reply to | #27556 |
Le mardi 21 août 2012 09:52:09 UTC+2, Peter Otten a écrit :
> wxjmfauth@gmail.com wrote:
>
>
>
> > By chance and luckily, first attempt.
>
>
>
> > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
>
> > , 'œ')"
>
> > 1000000 loops, best of 3: 1.48 usec per loop
>
> > c:\python33\python -m timeit "('€'*100+'€'*100).replace('€'
>
> > , 'œ')"
>
> > 100000 loops, best of 3: 7.62 usec per loop
>
>
>
> OK, that is roughly factor 5. Let's see what I get:
>
>
>
> $ python3.2 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
>
> 100000 loops, best of 3: 1.8 usec per loop
>
> $ python3.3 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
>
> 10000 loops, best of 3: 9.11 usec per loop
>
>
>
> That is factor 5, too. So I can replicate your measurement on an AMD64 Linux
>
> system with self-built 3.3 versus system 3.2.
>
>
>
> > Note
>
> > The used characters are not members of the latin-1 coding
>
> > scheme (btw an *unusable* coding).
>
> > They are however charaters in cp1252 and mac-roman.
>
>
>
> You seem to imply that the slowdown is connected to the inability of latin-1
>
> to encode "œ" and "€" (to take the examples relevant to the above
>
> microbench). So let's repeat with latin-1 characters:
>
>
>
> $ python3.2 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
>
> 100000 loops, best of 3: 1.76 usec per loop
>
> $ python3.3 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
>
> 10000 loops, best of 3: 10.3 usec per loop
>
>
>
> Hm, the slowdown is even a tad bigger. So we can safely dismiss your theory
>
> that an unfortunate choice of the 8 bit encoding is causing it. Do you
>
> agree?
- I do not care too much about the numbers. It's
an attempt to show the principles.
- The fact, considering latin-1 as a bad coding,
lies on the point that is is simply unsuable
for some scripts / languages. It has mainly to do
with source/text files coding. This is not really
the point here.
- Now, the technical aspect. This "coding" (latin-1)
may be considered somehow as the pseudo-coding covering
the unicode code points range 128..255. Unfortunatelly,
this "coding" is not very optimal (or can be see as) when
you work with a full range of Unicode, but is is fine
when one works only in pure latin-1, with only 256
characters.
This range 128..255 is always the critical part
(all codings considered). And probably represents
the most used characters.
I hope, it was not too confused.
I have no proof for my theory. With my experience on that
field, I highly suspect this as the bottleneck.
Some os as before.
Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.5384088242603358, 1.532421642233382, 1.5327445924545433]
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[1.561762063667686, 1.5443503206462594, 1.5458670051605168]
3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.701523104134512, 7.720358191179441, 7.614549852683501]>>>
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[4.887939423990709, 4.868787294350611, 4.865697999795991]
Quite mysterious!
In any way it is a regression.
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-20 11:42 -0700 |
| Subject | Re: Abuse of Big Oh notation |
| Message-ID | <mailman.3577.1345488171.4697.python-list@python.org> |
| In reply to | #27477 |
By chance and luckily, first attempt.
IDLE, Windows 7.0 Pro 32, Pentium Dual Core 2.6, RAM 2 Go
Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.6939567134893707, 1.672874290786993, 1.6761219212298073]
Py 3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.924470733910917, 7.8554985620787345, 7.878623849091914]
Console
c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
, 'œ')"
1000000 loops, best of 3: 1.48 usec per loop
c:\python33\python -m timeit "('€'*100+'€'*100).replace('€'
, 'œ')"
100000 loops, best of 3: 7.62 usec per loop
Note
The used characters are not members of the latin-1 coding
scheme (btw an *unusable* coding).
They are however charaters in cp1252 and mac-roman.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Hans Mulder <hansmu@xs4all.nl> |
|---|---|
| Date | 2012-08-22 20:53 +0200 |
| Message-ID | <50352abe$0$6873$e4fe514c@news2.news.xs4all.nl> |
| In reply to | #27405 |
On 19/08/12 19:48:06, Paul Rubin wrote: > Terry Reedy <tjreedy@udel.edu> writes: >> py> s = chr(0xFFFF + 1) >> py> a, b = s > That looks like a 3.2- narrow build. Such which treat unicode strings > as sequences of code units rather than sequences of codepoints. Not an > implementation bug, but compromise design that goes back about a > decade to when unicode was added to Python. Actually, this compromise design was new in 3.0. In 2.x, unicode strings were sequences of code points. Narrow builds rejected any code points > 0xFFFF: Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> s = unichr(0xFFFF + 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: unichr() arg not in range(0x10000) (narrow Python build) -- HansM
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-20 08:42 +1000 |
| Message-ID | <mailman.3531.1345416176.4697.python-list@python.org> |
| In reply to | #27360 |
On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjreedy@udel.edu> wrote: > On 8/19/2012 4:04 AM, Paul Rubin wrote: >> I realize the folks who designed and implemented PEP 393 are very smart >> cookies and considered stuff carefully, while I'm just an internet user >> posting an immediate impression of something I hadn't seen before (I >> still use Python 2.6), but I still have to ask: if the 393 approach >> makes sense, why don't other languages do it? > > Python has often copied or borrowed, with adjustments. This time it is the > first. We will see how it goes, but it has been tested for nearly a year > already. Maybe it wasn't consciously borrowed, but whatever innovation is done, there's usually an obscure beardless language that did it earlier. :) Pike has a single string type, which can use the full Unicode range. If all codepoints are <256, the string width is 8 (measured in bits); if <65536, width is 16; otherwise 32. Using the inbuilt count_memory function (similar to the Python function used somewhere earlier in this thread, but which I can't at present put my finger to), I find that for strings of 16 bytes or more, there's a fixed 20-byte header plus the string content, stored in the correct number of bytes. (Pike strings, like Python ones, are immutable and do not need expansion room.) However, Python goes a bit further by making it VERY clear that this is a mere optimization, and that Unicode strings and bytes strings are completely different beasts. In Pike, it's possible to forget to encode something before (say) writing it to a socket. Everything works fine while you have only ASCII characters in the string, and then breaks when you have a >255 codepoint - or perhaps worse, when you have a 127<x<256, and the other end misinterprets it. Really, the only viable alternative to PEP 393 is a fixed 32-bit representation - it's the only way that's guaranteed to provide equivalent semantics. The new storage format is guaranteed to take no more memory than that, and provide equivalent functionality. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-08-19 19:24 -0400 |
| Message-ID | <roy-EAC738.19243019082012@news.panix.com> |
| In reply to | #27439 |
In article <mailman.3531.1345416176.4697.python-list@python.org>, Chris Angelico <rosuav@gmail.com> wrote: > Really, the only viable alternative to PEP 393 is a fixed 32-bit > representation - it's the only way that's guaranteed to provide > equivalent semantics. The new storage format is guaranteed to take no > more memory than that, and provide equivalent functionality. In the primordial days of computing, using 8 bits to store a character was a profligate waste of memory. What on earth did people need with TWO cases of the alphabet (not to mention all sorts of weird punctuation)? Eventually, memory became cheap enough that the convenience of using one character per byte (not to mention 8-bit bytes) outweighed the costs. And crazy things like sixbit and rad-50 got swept into the dustbin of history. So it may be with utf-8 someday. Clearly, the world has moved to a 32-bit character set. Not all parts of the world know that yet, or are willing to admit it, but that doesn't negate the fact that it's true. Equally clearly, the concept of one character per byte is a big win. The obvious conclusion is that eventually, when memory gets cheap enough, we'll all be doing utf-32 and all this transcoding nonsense will look as antiquated as rad-50 does today.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-20 04:21 +0000 |
| Message-ID | <5031bb2f$0$29972$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #27441 |
On Sun, 19 Aug 2012 19:24:30 -0400, Roy Smith wrote: > In the primordial days of computing, using 8 bits to store a character > was a profligate waste of memory. What on earth did people need with > TWO cases of the alphabet That's obvious, surely? We need two cases so that we can distinguish helping Jack off a horse from helping jack off a horse. > (not to mention all sorts of weird > punctuation)? Eventually, memory became cheap enough that the > convenience of using one character per byte (not to mention 8-bit bytes) > outweighed the costs. And crazy things like sixbit and rad-50 got swept > into the dustbin of history. 8 bit bytes are much older than 8 bit characters. For a long time, ASCII characters used only 7 bits out of the 8. > So it may be with utf-8 someday. Only if you believe that people's ability to generate data will remain lower than people's ability to install more storage. Every few years, new sizes for storage media comes out. The first thing that happens is that people say "40 megabytes? I'll NEVER fill this hard drive up!". The second thing that happens is that they say "Dammit, my 40 MB hard drive is full, and a new one is too expensive, better delete some files." Followed shortly by "400 megabytes? I'll NEVER use that much space!" -- wash, rinse, repeat, through megabytes, gigabytes, terrabytes, and it will happen for petabytes next. So long as our ability to outstrip storage continues, compression and memory-efficient storage schemes will remain in demand. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-08-20 00:44 -0400 |
| Message-ID | <roy-A160F1.00442220082012@news.panix.com> |
| In reply to | #27458 |
In article <5031bb2f$0$29972$c3e8da3$5496439d@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > > So it may be with utf-8 someday. > > Only if you believe that people's ability to generate data will remain > lower than people's ability to install more storage. We're not talking *data*, we're talking *text*. Most of those whatever-bytes people are generating are images, video, and music. Text is a pittance compared to those. In any case, text on disk can easily be stored compressed. I would expect the UTF-8 and UTF-32 versions of a text file to compress to just about the same size.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-20 05:56 +0000 |
| Message-ID | <5031d17a$0$1645$c3e8da3$76491128@news.astraweb.com> |
| In reply to | #27460 |
On Mon, 20 Aug 2012 00:44:22 -0400, Roy Smith wrote: > In article <5031bb2f$0$29972$c3e8da3$5496439d@news.astraweb.com>, > Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > >> > So it may be with utf-8 someday. >> >> Only if you believe that people's ability to generate data will remain >> lower than people's ability to install more storage. > > We're not talking *data*, we're talking *text*. Most of those > whatever-bytes people are generating are images, video, and music. Text > is a pittance compared to those. Paul Rubin already told you about his experience using OCR to generate multiple terrabytes of text, and how he would not be happy if that was stored in UCS-4. HTML is text. XML is text. SVG is text. Source code is text. Email is text. (Well, it's actually bytes, but it looks like ASCII text.) Log files are text, and they can fill a hard drive pretty quickly. Lots of data is text. Pittance or not, I do not believe that people will widely abandon compact storage formats like UTF-8 and Latin-1 for UCS-4 any time soon. Given that we're still trying to convince people to use UTF-8 over ASCII, I reckon it will be at least 40 years before there's even a slim chance of migrating from UTF-8 to UCS-4 in a widespread manner. In the IT world, that's close enough to "never" -- we might not even be using Unicode in 2052. In any case, time will tell who is right. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2012-08-19 23:24 -0700 |
| Message-ID | <7xzk5qjd1k.fsf@ruckus.brouhaha.com> |
| In reply to | #27461 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > Paul Rubin already told you about his experience using OCR to generate > multiple terrabytes of text, and how he would not be happy if that was > stored in UCS-4. That particular text was stored on disk as compressed XML that had UTF-8 in the data fields, but I think Roy is right that it would have compressed to around the same size in UCS-4. Converting it to UCS-4 on input would have bloated up the memory footprint and that was the issue of concern to me. > Pittance or not, I do not believe that people will widely abandon compact > storage formats like UTF-8 and Latin-1 for UCS-4 any time soon. Looking at http://www.icu-project.org/ the C++ classes seem to use UTF-16 sort like Python 3.2 :(. I'm not certain of this though.
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2012-08-20 12:58 -0400 |
| Message-ID | <mailman.3564.1345481890.4697.python-list@python.org> |
| In reply to | #27458 |
On 20 Aug 2012 04:21:04 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following in
gmane.comp.python.general:
>
> That's obvious, surely? We need two cases so that we can distinguish
> helping Jack off a horse from helping jack off a horse.
>
I'm sure the horse will appreciate either action... <G>
> Every few years, new sizes for storage media comes out. The first thing
> that happens is that people say "40 megabytes? I'll NEVER fill this hard
> drive up!". The second thing that happens is that they say "Dammit, my 40
> MB hard drive is full, and a new one is too expensive, better delete some
> files." Followed shortly by "400 megabytes? I'll NEVER use that much
> space!" -- wash, rinse, repeat, through megabytes, gigabytes, terrabytes,
> and it will happen for petabytes next.
>
Don't remind me... I pulled an introductory digital photography
guide out of storage last week...
The device it uses to transfer data from SmartMedia cards takes two
button-cell batteries; put the SM card into it, and slide it into a 3.5"
floppy drive.
100MB ZIP drives were suggested for sharing image files with
friends...
And if I ever take my Amiga out of storage, I'm not sure I can even
find external SCSI drives for it... It could use a few more 1GB drives
<G>
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-08-19 20:35 -0400 |
| Message-ID | <mailman.3532.1345422959.4697.python-list@python.org> |
| In reply to | #27360 |
On 8/19/2012 6:42 PM, Chris Angelico wrote: > On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjreedy@udel.edu> wrote: >> Python has often copied or borrowed, with adjustments. This time it is the >> first. I should have added 'that I know of' ;-) > Maybe it wasn't consciously borrowed, but whatever innovation is done, > there's usually an obscure beardless language that did it earlier. :) > > Pike has a single string type, which can use the full Unicode range. > If all codepoints are <256, the string width is 8 (measured in bits); > if <65536, width is 16; otherwise 32. Using the inbuilt count_memory > function (similar to the Python function used somewhere earlier in > this thread, but which I can't at present put my finger to), I find > that for strings of 16 bytes or more, there's a fixed 20-byte header > plus the string content, stored in the correct number of bytes. (Pike > strings, like Python ones, are immutable and do not need expansion > room.) It is even possible that someone involved was even vaguely aware that there was an antecedent. The PEP makes no claim that I can see, but lays out the problem and goes right to details of a Python implementation. > However, Python goes a bit further by making it VERY clear that this > is a mere optimization, and that Unicode strings and bytes strings are > completely different beasts. In Pike, it's possible to forget to > encode something before (say) writing it to a socket. Everything works > fine while you have only ASCII characters in the string, and then > breaks when you have a >255 codepoint - or perhaps worse, when you > have a 127<x<256, and the other end misinterprets it. Python writes strings to file objects, including open sockets, without creating a bytes object -- IF the file is opened in text mode, which always has an associated encoding, even if the default 'ascii'. From what you say, this is what Pike is missing. I am pretty sure that the obvious optimization has already been done. The internal bytes of all-ascii text can safely be sent to a file with ascii (or ascii-compatible) encoding without intermediate 'decoding'. I remember several patches of that sort. If a string is internally ucs2 and the file is declared usc2 or utf-16 encoding, then again, pairs of bytes can go directly (possibly with a byte swap). -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-20 14:07 +1000 |
| Message-ID | <mailman.3536.1345435662.4697.python-list@python.org> |
| In reply to | #27360 |
On Mon, Aug 20, 2012 at 10:35 AM, Terry Reedy <tjreedy@udel.edu> wrote: > On 8/19/2012 6:42 PM, Chris Angelico wrote: >> However, Python goes a bit further by making it VERY clear that this >> is a mere optimization, and that Unicode strings and bytes strings are >> completely different beasts. In Pike, it's possible to forget to >> encode something before (say) writing it to a socket. Everything works >> fine while you have only ASCII characters in the string, and then >> breaks when you have a >255 codepoint - or perhaps worse, when you >> have a 127<x<256, and the other end misinterprets it. > > Python writes strings to file objects, including open sockets, without > creating a bytes object -- IF the file is opened in text mode, which always > has an associated encoding, even if the default 'ascii'. From what you say, > this is what Pike is missing. In text mode, the library does the encoding, but an encoding still happens. > I am pretty sure that the obvious optimization has already been done. The > internal bytes of all-ascii text can safely be sent to a file with ascii (or > ascii-compatible) encoding without intermediate 'decoding'. I remember > several patches of that sort. If a string is internally ucs2 and the file is > declared usc2 or utf-16 encoding, then again, pairs of bytes can go directly > (possibly with a byte swap). Maybe it doesn't take any memory change, but there is a data type change. A Unicode string cannot be sent over the network; an encoding is needed. In Pike, I can take a string like "\x20AC" (or "\u20ac" or "\U000020ac", same thing) and manipulate it as a one-character string, but I cannot write it to a file or file-like object. I can, however, pass it through a codec (and there's string_to_utf8() for the convenience of the common case), and get back something like "\xe2\x82\xac", which is a three-byte string. The thing is, though, that this new string is of exactly the same data type as the original: 'string'. Which means that I could have a string containing Latin-1 but not ASCII characters, and Pike will happily write it to a socket without raising a compile-time or run-time error. Python, under the same circumstances, would either raise an error or quietly (and correctly) encode the data. But this is a relatively trivial point, in the scheme of things. Python has an excellent model now for handling Unicode strings, and I would STRONGLY recommend everyone to upgrade to 3.3. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | lipska the kat <lipskathekat@yahoo.co.uk> |
|---|---|
| Date | 2012-08-19 11:13 +0100 |
| Message-ID | <Foudnf2f142gIa3NnZ2dnUVZ8uCdnZ2d@bt.com> |
| In reply to | #27349 |
On 19/08/12 07:09, Steven D'Aprano wrote:
> This is a long post. If you don't feel like reading an essay, skip to the
> very bottom and read my last few paragraphs, starting with "To recap".
Thank you for this excellent post,
it has certainly cleared up a few things for me
[snip]
incidentally
> But in UTF-16, ...
[snip]
> py> s = chr(0xFFFF + 1)
> py> a, b = s
> py> a
> '\ud800'
> py> b
> '\udc00'
in IDLE
Python 3.2.3 (default, May 3 2012, 15:51:42)
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
==== No Subprocess ====
>>> s = chr(0xFFFF + 1)
>>> a, b = s
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
a, b = s
ValueError: need more than 1 value to unpack
At a terminal prompt
[lipska@ubuntu ~]$ python3.2
Python 3.2.3 (default, Jul 17 2012, 14:23:10)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = chr(0xFFFF + 1)
>>> a, b = s
>>> a
'\ud800'
>>> b
'\udc00'
>>>
The date stamp is different but the Python version is the same
No idea why this is happening, I just thought it was interesting
lipska
--
Lipska the Kat©: Troll hunter, sandbox destroyer
and farscape dreamer of Aeryn Sun
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-19 20:19 +1000 |
| Message-ID | <mailman.3490.1345371554.4697.python-list@python.org> |
| In reply to | #27371 |
On Sun, Aug 19, 2012 at 8:13 PM, lipska the kat <lipskathekat@yahoo.co.uk> wrote: > The date stamp is different but the Python version is the same Check out what 'sys.maxunicode' is in each of those Pythons. It's possible that one is a wide build and the other narrow. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | lipska the kat <lipskathekat@yahoo.co.uk> |
|---|---|
| Date | 2012-08-19 11:49 +0100 |
| Message-ID | <oMSdnan41PQuWa3NnZ2dnUVZ8kqdnZ2d@bt.com> |
| In reply to | #27372 |
On 19/08/12 11:19, Chris Angelico wrote: > On Sun, Aug 19, 2012 at 8:13 PM, lipska the kat > <lipskathekat@yahoo.co.uk> wrote: >> The date stamp is different but the Python version is the same > > Check out what 'sys.maxunicode' is in each of those Pythons. It's > possible that one is a wide build and the other narrow. Ah ... I built my local version from source and no, I didn't read the makefile so I didn't configure for a wide build :-( not that I would have known the difference at that time. [lipska@ubuntu ~]$ python3.2 Python 3.2.3 (default, Jul 17 2012, 14:23:10) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.maxunicode 65535 >>> Later, I did an apt-get install idle3 which pulled down a precompiled IDLE from the Ubuntu repos This was obviously compiled 'wide' Python 3.2.3 (default, May 3 2012, 15:51:42) [GCC 4.6.3] on linux2 Type "copyright", "credits" or "license()" for more information. ==== No Subprocess ==== >>> import sys >>> sys.maxunicode 1114111 >>> All very interesting and enlightening Thanks lipska -- Lipska the Kat©: Troll hunter, sandbox destroyer and farscape dreamer of Aeryn Sun
[toc] | [prev] | [next] | [standalone]
| From | "Blind Anagram" <noname@nowhere.com> |
|---|---|
| Date | 2012-08-19 18:03 +0100 |
| Message-ID | <XI2dncR-HMTzgazNnZ2dnUVZ8jCdnZ2d@brightview.co.uk> |
| In reply to | #27291 |
"Steven D'Aprano" wrote in message
news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
[...]
If you can consistently replicate a 100% to 1000% slowdown in string
handling, please report it as a performance bug:
http://bugs.python.org/
Don't forget to report your operating system.
====================================================
For interest, I ran your code snippets on my laptop (Intel core-i7 1.8GHz)
running Windows 7 x64.
Running Python from a Windows command prompt, I got the following on Python
3.2.3 and 3.3 beta 2:
python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
10000 loops, best of 3: 39.3 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
10000 loops, best of 3: 51.8 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
10000 loops, best of 3: 52 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
10000 loops, best of 3: 50.3 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
10000 loops, best of 3: 51.6 usec per loop
python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
10000 loops, best of 3: 38.3 usec per loop
python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
10000 loops, best of 3: 50.3 usec per loop
python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
10000 loops, best of 3: 24.5 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
10000 loops, best of 3: 24.7 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
10000 loops, best of 3: 24.8 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
10000 loops, best of 3: 24 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
10000 loops, best of 3: 24.1 usec per loop
python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
10000 loops, best of 3: 24.4 usec per loop
python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
10000 loops, best of 3: 24.3 usec per loop
This is an average slowdown by a factor of close to 2.3 on 3.3 when compared
with 3.2.
I am not posting this to perpetuate this thread but simply to ask whether,
as you suggest, I should report this as a possible problem with the beta?
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 10:33 -0700 |
| Message-ID | <5dfd1779-9442-4858-9161-8f1a06d56829@googlegroups.com> |
| In reply to | #27401 |
Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit :
> "Steven D'Aprano" wrote in message
>
> news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
>
>
>
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
>
>
>
> [...]
>
> If you can consistently replicate a 100% to 1000% slowdown in string
>
> handling, please report it as a performance bug:
>
>
>
> http://bugs.python.org/
>
>
>
> Don't forget to report your operating system.
>
>
>
> ====================================================
>
> For interest, I ran your code snippets on my laptop (Intel core-i7 1.8GHz)
>
> running Windows 7 x64.
>
>
>
> Running Python from a Windows command prompt, I got the following on Python
>
> 3.2.3 and 3.3 beta 2:
>
>
>
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
>
> 10000 loops, best of 3: 39.3 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
>
> 10000 loops, best of 3: 51.8 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
>
> 10000 loops, best of 3: 52 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
>
> 10000 loops, best of 3: 50.3 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
>
> 10000 loops, best of 3: 51.6 usec per loop
>
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
>
> 10000 loops, best of 3: 38.3 usec per loop
>
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
>
> 10000 loops, best of 3: 50.3 usec per loop
>
>
>
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
>
> 10000 loops, best of 3: 24.5 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
>
> 10000 loops, best of 3: 24.7 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
>
> 10000 loops, best of 3: 24.8 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
>
> 10000 loops, best of 3: 24 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
>
> 10000 loops, best of 3: 24.1 usec per loop
>
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
>
> 10000 loops, best of 3: 24.4 usec per loop
>
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
>
> 10000 loops, best of 3: 24.3 usec per loop
>
>
>
> This is an average slowdown by a factor of close to 2.3 on 3.3 when compared
>
> with 3.2.
>
>
>
> I am not posting this to perpetuate this thread but simply to ask whether,
>
> as you suggest, I should report this as a possible problem with the beta?
I use win7 pro 32bits in intel?
Thanks for reporting these numbers.
To be clear: I'm not complaining, but the fact that
there is a slow down is a clear indication (in my mind),
there is a point somewhere.
jmf
[toc] | [prev] | [next] | [standalone]
| From | "Blind Anagram" <noname@nowhere.com> |
|---|---|
| Date | 2012-08-19 19:04 +0100 |
| Message-ID | <j6ydnQjPqNFZt6zNnZ2dnUVZ8kGdnZ2d@brightview.co.uk> |
| In reply to | #27404 |
wrote in message
news:5dfd1779-9442-4858-9161-8f1a06d56829@googlegroups.com...
Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit :
> "Steven D'Aprano" wrote in message
>
> news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
>
>
>
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
>
>
>
> [...]
>
> If you can consistently replicate a 100% to 1000% slowdown in string
>
> handling, please report it as a performance bug:
>
>
>
> http://bugs.python.org/
>
>
>
> Don't forget to report your operating system.
>
>
>
> ====================================================
>
> For interest, I ran your code snippets on my laptop (Intel core-i7 1.8GHz)
>
> running Windows 7 x64.
>
>
>
> Running Python from a Windows command prompt, I got the following on
> Python
>
> 3.2.3 and 3.3 beta 2:
>
>
>
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
>
> 10000 loops, best of 3: 39.3 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
>
> 10000 loops, best of 3: 51.8 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
>
> 10000 loops, best of 3: 52 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
>
> 10000 loops, best of 3: 50.3 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
>
> 10000 loops, best of 3: 51.6 usec per loop
>
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
>
> 10000 loops, best of 3: 38.3 usec per loop
>
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
>
> 10000 loops, best of 3: 50.3 usec per loop
>
>
>
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
>
> 10000 loops, best of 3: 24.5 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
>
> 10000 loops, best of 3: 24.7 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
>
> 10000 loops, best of 3: 24.8 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
>
> 10000 loops, best of 3: 24 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
>
> 10000 loops, best of 3: 24.1 usec per loop
>
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
>
> 10000 loops, best of 3: 24.4 usec per loop
>
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
>
> 10000 loops, best of 3: 24.3 usec per loop
>
>
>
> This is an average slowdown by a factor of close to 2.3 on 3.3 when
> compared
>
> with 3.2.
>
>
>
> I am not posting this to perpetuate this thread but simply to ask whether,
>
> as you suggest, I should report this as a possible problem with the beta?
I use win7 pro 32bits in intel?
Thanks for reporting these numbers.
To be clear: I'm not complaining, but the fact that
there is a slow down is a clear indication (in my mind),
there is a point somewhere.
====================================
I may be reading your input wrongly, but it seems to me that you are not
only reporting a slowdown but you are also suggesting that this slowdown is
the result of bad design decisions by the Python development team.
I don't want to get involved in the latter part of your argument because I
am convinced that the Python team are doing their very best to find a good
compromise between the various design constraints that they face in meeting
these needs.
Nevertheless, the post that I responded to contained the suggestion that
slowdowns above 100% (which I took as a factor of 2) would be worth
reporting as a possible bug. So I thought that it was worth asking about
this as I may have misunderstood the level of slowdown that is worth
reporting. There is also a potential problem in timings on laptops with
turbo-boost (as I have), although the times look fairly consistent.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-08-19 14:05 -0400 |
| Message-ID | <mailman.3519.1345399574.4697.python-list@python.org> |
| In reply to | #27401 |
On 08/19/2012 01:03 PM, Blind Anagram wrote:
> "Steven D'Aprano" wrote in message
> news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
>
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
>
> [...]
> If you can consistently replicate a 100% to 1000% slowdown in string
> handling, please report it as a performance bug:
>
> http://bugs.python.org/
>
> Don't forget to report your operating system.
>
> ====================================================
> For interest, I ran your code snippets on my laptop (Intel core-i7
> 1.8GHz) running Windows 7 x64.
>
> Running Python from a Windows command prompt, I got the following on
> Python 3.2.3 and 3.3 beta 2:
>
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 10000 loops, best of 3: 39.3 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 10000 loops, best of 3: 51.8 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 10000 loops, best of 3: 52 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 10000 loops, best of 3: 50.3 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 10000 loops, best of 3: 51.6 usec per loop
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 10000 loops, best of 3: 38.3 usec per loop
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 10000 loops, best of 3: 50.3 usec per loop
>
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 10000 loops, best of 3: 24.5 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 10000 loops, best of 3: 24.7 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 10000 loops, best of 3: 24.8 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 10000 loops, best of 3: 24 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 10000 loops, best of 3: 24.1 usec per loop
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 10000 loops, best of 3: 24.4 usec per loop
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 10000 loops, best of 3: 24.3 usec per loop
>
> This is an average slowdown by a factor of close to 2.3 on 3.3 when
> compared with 3.2.
>
Using your measurement numbers, I get an average of 1.95, not 2.3
--
DaveA
[toc] | [prev] | [next] | [standalone]
Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →
Back to top | Article view | comp.lang.python
csiph-web