Groups > comp.lang.python > #27204 > unrolled thread

How do I display unicode value stored in a string variable using ord()

Started by	Charles Jensen <hopefullycharles@gmail.com>
First post	2012-08-16 15:09 -0700
Last post	2012-08-20 17:20 -0400
Articles	20 on this page of 145 — 26 participants

Back to article view | Back to comp.lang.python

  How do I display unicode value stored in a string variable using ord() Charles Jensen <hopefullycharles@gmail.com> - 2012-08-16 15:09 -0700
    Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-17 08:20 +1000
    Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-16 18:47 -0400
    Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-16 19:59 -0400
      Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
        Re: How do I display unicode value stored in a string variable using ord() Jerry Hill <malaclypse2@gmail.com> - 2012-08-17 14:21 -0400
          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
            Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 16:55 -0400
            Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 23:30 -0400
              Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 04:10 +0000
                Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:18 -0600
            Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 03:59 +0000
      Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
    Re: How do I display unicode value stored in a string variable using ord() Alister <alister.ware@ntlworld.com> - 2012-08-17 06:30 +0000
    Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 01:09 -0700
      Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 12:27 +0000
        Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 08:07 -0700
          Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 16:25 +0100
          Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 01:36 +1000
          Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:51 -0600
            Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 02:57 +1000
              Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 18:28 +0100
                Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
                  Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:34 +0100
                    Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:35 +0000
                      New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Peter Otten <__peter__@web.de> - 2012-08-19 09:43 +0200
                        Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:56 +0000
                          Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 02:24 -0700
                          Re: New internal string format in 3.3 Peter Otten <__peter__@web.de> - 2012-08-19 11:37 +0200
                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
                              Re: New internal string format in 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:33 +0000
                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
                              Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:26 +1000
                                Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
                                  Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:29 -0400
                                    Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
                                      Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 14:46 +0100
                                        Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
                                        Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
                                          Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 15:48 +0100
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
                                          Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:48 -0400
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
                                              Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:09 +0100
                                              Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-20 07:50 +1000
                                              Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-19 23:38 -0600
                                                Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-20 09:17 -0400
                                                  Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-20 22:18 -0600
                                                    Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-21 07:48 -0400
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
                                      Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:56 -0400
                                    Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
                                  Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:35 -0400
                                Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:30 +0000
                Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 16:09 -0400
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 23:12 -0400
            Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
            Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:33 +0000
              Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 11:50 -0600
                Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 11:20 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:31 -0600
                    Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 12:23 -0700
                Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:16 +0000
              Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:46 -0600
          Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 17:59 +0000
            Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:30 -0700
              Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:45 +0100
              Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:13 +0000
            Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-18 11:40 -0700
              Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:50 +0100
              Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 13:22 -0700
                Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 22:37 +0100
        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 11:26 -0700
          Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:59 +0100
            Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 07:17 +0000
          Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 10:46 +1000
            Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:11 -0700
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 12:19 +1000
                Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:35 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:01 +1000
                    Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 20:10 -0700
                      Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:31 +1000
                        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 22:58 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:01 +0000
                    Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:11 -0700
                      Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 18:24 +1000
                        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:44 -0700
                          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 01:54 -0700
                            Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 11:46 +0100
                            Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 12:31 -0400
                      Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 10:51 +0000
                        Re: How do I display unicode value stored in a string variable using ord() Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-21 17:03 +1000
          Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:09 +0000
            Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:04 -0700
              Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:25 +0000
                Re: How do I display unicode value stored in a string variable using ord() DJC <djc@news.invalid> - 2012-08-19 17:32 +0200
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:34 -0400
                Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 10:48 -0700
                  Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 11:11 -0700
                    Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:50 +0100
                    Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:59 -0400
                    Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-19 23:13 -0700
                  Abuse of Big Oh notation [was Re: How do I display unicode value stored in a string variable using ord()] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:15 +0000
                    Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-19 16:42 -0700
                      Re: Abuse of Big Oh notation Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-08-20 09:24 +0100
                        Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 09:01 -0700
                          Re: Abuse of Big Oh notation Chris Angelico <rosuav@gmail.com> - 2012-08-21 02:09 +1000
                          Re: Abuse of Big Oh notation Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-20 11:12 -0600
                            Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 12:29 -0700
                              Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:16 -0700
                              Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:20 -0700
                            Re: Abuse of Big Oh notation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-21 09:53 +0000
                        Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
                          Re: Abuse of Big Oh notation Ned Deily <nad@acm.org> - 2012-08-20 18:19 -0700
                          Abuse of subject, was Re: Abuse of Big Oh notation Peter Otten <__peter__@web.de> - 2012-08-21 09:52 +0200
                            Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
                            Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
                        Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Hans Mulder <hansmu@xs4all.nl> - 2012-08-22 20:53 +0200
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 08:42 +1000
                Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-19 19:24 -0400
                  Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 04:21 +0000
                    Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-20 00:44 -0400
                      Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 05:56 +0000
                        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 23:24 -0700
                    Re: How do I display unicode value stored in a string variable using ord() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-20 12:58 -0400
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 20:35 -0400
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 14:07 +1000
            Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:13 +0100
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:19 +1000
                Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:49 +0100
        Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 18:03 +0100
          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 10:33 -0700
            Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:04 +0100
          Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-19 14:05 -0400
            Re: How do I display unicode value stored in a string variable usingord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:18 +0100
          Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:31 +0000
          Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:03 -0400
          Re: How do I display unicode value stored in a string variable using ord() 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-19 17:32 -0700
          Re: How do I display unicode value stored in a string variable using ord() Piet van Oostrum <piet@vanoostrum.org> - 2012-08-20 17:20 -0400

Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →

#27579 — Re: Abuse of subject, was Re: Abuse of Big Oh notation

From	wxjmfauth@gmail.com
Date	2012-08-21 10:16 -0700
Subject	Re: Abuse of subject, was Re: Abuse of Big Oh notation
Message-ID	<mailman.3609.1345569376.4697.python-list@python.org>
In reply to	#27556

Le mardi 21 août 2012 09:52:09 UTC+2, Peter Otten a écrit :
> wxjmfauth@gmail.com wrote:
> 
> 
> 
> > By chance and luckily, first attempt.
> 
>  
> 
> > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
> 
> > , 'œ')"
> 
> > 1000000 loops, best of 3: 1.48 usec per loop
> 
> > c:\python33\python -m timeit "('€'*100+'€'*100).replace('€'
> 
> > , 'œ')"
> 
> > 100000 loops, best of 3: 7.62 usec per loop
> 
> 
> 
> OK, that is roughly factor 5. Let's see what I get:
> 
> 
> 
> $ python3.2 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
> 
> 100000 loops, best of 3: 1.8 usec per loop
> 
> $ python3.3 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
> 
> 10000 loops, best of 3: 9.11 usec per loop
> 
> 
> 
> That is factor 5, too. So I can replicate your measurement on an AMD64 Linux 
> 
> system with self-built 3.3 versus system 3.2.
> 
> 
> 
> > Note
> 
> > The used characters are not members of the latin-1 coding
> 
> > scheme (btw an *unusable* coding).
> 
> > They are however charaters in cp1252 and mac-roman.
> 
> 
> 
> You seem to imply that the slowdown is connected to the inability of latin-1 
> 
> to encode "œ" and "€" (to take the examples relevant to the above 
> 
> microbench). So let's repeat with latin-1 characters:
> 
> 
> 
> $ python3.2 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
> 
> 100000 loops, best of 3: 1.76 usec per loop
> 
> $ python3.3 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
> 
> 10000 loops, best of 3: 10.3 usec per loop
> 
> 
> 
> Hm, the slowdown is even a tad bigger. So we can safely dismiss your theory 
> 
> that an unfortunate choice of the 8 bit encoding is causing it. Do you 
> 
> agree?

- I do not care too much about the numbers. It's
an attempt to show the principles.

- The fact, considering latin-1 as a bad coding,
lies on the point that is is simply unsuable
for some scripts / languages. It has mainly to do
with source/text files coding. This is not really
the point here.

- Now, the technical aspect. This "coding" (latin-1)
may be considered somehow as the pseudo-coding covering
the unicode code points range 128..255. Unfortunatelly,
this "coding" is not very optimal (or can be see as) when
you work with a full range of Unicode, but is is fine
when one works only in pure latin-1, with only 256
characters.
This range 128..255 is always the critical part
(all codings considered). And probably represents
the most used characters.

I hope, it was not too confused.

I have no proof for my theory. With my experience on that
field, I highly suspect this as the bottleneck.

Some os as before.

Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.5384088242603358, 1.532421642233382, 1.5327445924545433]
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[1.561762063667686, 1.5443503206462594, 1.5458670051605168]


3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.701523104134512, 7.720358191179441, 7.614549852683501]>>> 
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[4.887939423990709, 4.868787294350611, 4.865697999795991]

Quite mysterious!

In any way it is a regression.

jmf

[toc] | [prev] | [next] | [standalone]

#27581 — Re: Abuse of subject, was Re: Abuse of Big Oh notation

From	wxjmfauth@gmail.com
Date	2012-08-21 10:16 -0700
Subject	Re: Abuse of subject, was Re: Abuse of Big Oh notation
Message-ID	<31b05198-4ce0-41c8-8f25-8253a885dff0@googlegroups.com>
In reply to	#27556

Le mardi 21 août 2012 09:52:09 UTC+2, Peter Otten a écrit :
> wxjmfauth@gmail.com wrote:
> 
> 
> 
> > By chance and luckily, first attempt.
> 
>  
> 
> > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
> 
> > , 'œ')"
> 
> > 1000000 loops, best of 3: 1.48 usec per loop
> 
> > c:\python33\python -m timeit "('€'*100+'€'*100).replace('€'
> 
> > , 'œ')"
> 
> > 100000 loops, best of 3: 7.62 usec per loop
> 
> 
> 
> OK, that is roughly factor 5. Let's see what I get:
> 
> 
> 
> $ python3.2 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
> 
> 100000 loops, best of 3: 1.8 usec per loop
> 
> $ python3.3 -m timeit '("€"*100+"€"*100).replace("€", "œ")'
> 
> 10000 loops, best of 3: 9.11 usec per loop
> 
> 
> 
> That is factor 5, too. So I can replicate your measurement on an AMD64 Linux 
> 
> system with self-built 3.3 versus system 3.2.
> 
> 
> 
> > Note
> 
> > The used characters are not members of the latin-1 coding
> 
> > scheme (btw an *unusable* coding).
> 
> > They are however charaters in cp1252 and mac-roman.
> 
> 
> 
> You seem to imply that the slowdown is connected to the inability of latin-1 
> 
> to encode "œ" and "€" (to take the examples relevant to the above 
> 
> microbench). So let's repeat with latin-1 characters:
> 
> 
> 
> $ python3.2 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
> 
> 100000 loops, best of 3: 1.76 usec per loop
> 
> $ python3.3 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")'
> 
> 10000 loops, best of 3: 10.3 usec per loop
> 
> 
> 
> Hm, the slowdown is even a tad bigger. So we can safely dismiss your theory 
> 
> that an unfortunate choice of the 8 bit encoding is causing it. Do you 
> 
> agree?

- I do not care too much about the numbers. It's
an attempt to show the principles.

- The fact, considering latin-1 as a bad coding,
lies on the point that is is simply unsuable
for some scripts / languages. It has mainly to do
with source/text files coding. This is not really
the point here.

- Now, the technical aspect. This "coding" (latin-1)
may be considered somehow as the pseudo-coding covering
the unicode code points range 128..255. Unfortunatelly,
this "coding" is not very optimal (or can be see as) when
you work with a full range of Unicode, but is is fine
when one works only in pure latin-1, with only 256
characters.
This range 128..255 is always the critical part
(all codings considered). And probably represents
the most used characters.

I hope, it was not too confused.

I have no proof for my theory. With my experience on that
field, I highly suspect this as the bottleneck.

Some os as before.

Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.5384088242603358, 1.532421642233382, 1.5327445924545433]
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[1.561762063667686, 1.5443503206462594, 1.5458670051605168]


3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.701523104134512, 7.720358191179441, 7.614549852683501]>>> 
>>> timeit.repeat("('ä'*100+'ä'*100).replace('ä', 'ß')")
[4.887939423990709, 4.868787294350611, 4.865697999795991]

Quite mysterious!

In any way it is a regression.

jmf

[toc] | [prev] | [next] | [standalone]

#27525 — Re: Abuse of Big Oh notation

From	wxjmfauth@gmail.com
Date	2012-08-20 11:42 -0700
Subject	Re: Abuse of Big Oh notation
Message-ID	<mailman.3577.1345488171.4697.python-list@python.org>
In reply to	#27477

By chance and luckily, first attempt.

IDLE, Windows 7.0 Pro 32, Pentium Dual Core 2.6, RAM 2 Go

Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.6939567134893707, 1.672874290786993, 1.6761219212298073]

Py 3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.924470733910917, 7.8554985620787345, 7.878623849091914]


Console

c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
, 'œ')"
1000000 loops, best of 3: 1.48 usec per loop
c:\python33\python -m timeit "('€'*100+'€'*100).replace('€'
, 'œ')"
100000 loops, best of 3: 7.62 usec per loop

Note
The used characters are not members of the latin-1 coding
scheme (btw an *unusable* coding).
They are however charaters in cp1252 and mac-roman.

jmf

[toc] | [prev] | [next] | [standalone]

#27671

From	Hans Mulder <hansmu@xs4all.nl>
Date	2012-08-22 20:53 +0200
Message-ID	<50352abe$0$6873$e4fe514c@news2.news.xs4all.nl>
In reply to	#27405

On 19/08/12 19:48:06, Paul Rubin wrote:
> Terry Reedy <tjreedy@udel.edu> writes:
>>      py> s = chr(0xFFFF + 1)
>>      py> a, b = s
> That looks like a 3.2- narrow build. Such which treat unicode strings
> as sequences of code units rather than sequences of codepoints. Not an
> implementation bug, but compromise design that goes back about a
> decade to when unicode was added to Python.

Actually, this compromise design was new in 3.0.

In 2.x, unicode strings were sequences of code points.
Narrow builds rejected any code points > 0xFFFF:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = unichr(0xFFFF + 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python build)


-- HansM

[toc] | [prev] | [next] | [standalone]

#27439

From	Chris Angelico <rosuav@gmail.com>
Date	2012-08-20 08:42 +1000
Message-ID	<mailman.3531.1345416176.4697.python-list@python.org>
In reply to	#27360

On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjreedy@udel.edu> wrote:
> On 8/19/2012 4:04 AM, Paul Rubin wrote:
>> I realize the folks who designed and implemented PEP 393 are very smart
>> cookies and considered stuff carefully, while I'm just an internet user
>> posting an immediate impression of something I hadn't seen before (I
>> still use Python 2.6), but I still have to ask: if the 393 approach
>> makes sense, why don't other languages do it?
>
> Python has often copied or borrowed, with adjustments. This time it is the
> first. We will see how it goes, but it has been tested for nearly a year
> already.

Maybe it wasn't consciously borrowed, but whatever innovation is done,
there's usually an obscure beardless language that did it earlier. :)

Pike has a single string type, which can use the full Unicode range.
If all codepoints are <256, the string width is 8 (measured in bits);
if <65536, width is 16; otherwise 32. Using the inbuilt count_memory
function (similar to the Python function used somewhere earlier in
this thread, but which I can't at present put my finger to), I find
that for strings of 16 bytes or more, there's a fixed 20-byte header
plus the string content, stored in the correct number of bytes. (Pike
strings, like Python ones, are immutable and do not need expansion
room.)

However, Python goes a bit further by making it VERY clear that this
is a mere optimization, and that Unicode strings and bytes strings are
completely different beasts. In Pike, it's possible to forget to
encode something before (say) writing it to a socket. Everything works
fine while you have only ASCII characters in the string, and then
breaks when you have a >255 codepoint - or perhaps worse, when you
have a 127<x<256, and the other end misinterprets it.

Really, the only viable alternative to PEP 393 is a fixed 32-bit
representation - it's the only way that's guaranteed to provide
equivalent semantics. The new storage format is guaranteed to take no
more memory than that, and provide equivalent functionality.

ChrisA

[toc] | [prev] | [next] | [standalone]

#27441

From	Roy Smith <roy@panix.com>
Date	2012-08-19 19:24 -0400
Message-ID	<roy-EAC738.19243019082012@news.panix.com>
In reply to	#27439

In article <mailman.3531.1345416176.4697.python-list@python.org>,
 Chris Angelico <rosuav@gmail.com> wrote:

> Really, the only viable alternative to PEP 393 is a fixed 32-bit
> representation - it's the only way that's guaranteed to provide
> equivalent semantics. The new storage format is guaranteed to take no
> more memory than that, and provide equivalent functionality.

In the primordial days of computing, using 8 bits to store a character 
was a profligate waste of memory.  What on earth did people need with 
TWO cases of the alphabet (not to mention all sorts of weird 
punctuation)?  Eventually, memory became cheap enough that the 
convenience of using one character per byte (not to mention 8-bit bytes) 
outweighed the costs.  And crazy things like sixbit and rad-50 got swept 
into the dustbin of history.

So it may be with utf-8 someday.

Clearly, the world has moved to a 32-bit character set.  Not all parts 
of the world know that yet, or are willing to admit it, but that doesn't 
negate the fact that it's true.  Equally clearly, the concept of one 
character per byte is a big win.  The obvious conclusion is that 
eventually, when memory gets cheap enough, we'll all be doing utf-32 and 
all this transcoding nonsense will look as antiquated as rad-50 does 
today.

[toc] | [prev] | [next] | [standalone]

#27458

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-20 04:21 +0000
Message-ID	<5031bb2f$0$29972$c3e8da3$5496439d@news.astraweb.com>
In reply to	#27441

On Sun, 19 Aug 2012 19:24:30 -0400, Roy Smith wrote:

> In the primordial days of computing, using 8 bits to store a character
> was a profligate waste of memory.  What on earth did people need with
> TWO cases of the alphabet 

That's obvious, surely? We need two cases so that we can distinguish 
helping Jack off a horse from helping jack off a horse.

> (not to mention all sorts of weird
> punctuation)?  Eventually, memory became cheap enough that the
> convenience of using one character per byte (not to mention 8-bit bytes)
> outweighed the costs.  And crazy things like sixbit and rad-50 got swept
> into the dustbin of history.

8 bit bytes are much older than 8 bit characters. For a long time, ASCII 
characters used only 7 bits out of the 8.

> So it may be with utf-8 someday.

Only if you believe that people's ability to generate data will remain 
lower than people's ability to install more storage.

Every few years, new sizes for storage media comes out. The first thing 
that happens is that people say "40 megabytes? I'll NEVER fill this hard 
drive up!". The second thing that happens is that they say "Dammit, my 40 
MB hard drive is full, and a new one is too expensive, better delete some 
files." Followed shortly by "400 megabytes? I'll NEVER use that much 
space!" -- wash, rinse, repeat, through megabytes, gigabytes, terrabytes, 
and it will happen for petabytes next.

So long as our ability to outstrip storage continues, compression and 
memory-efficient storage schemes will remain in demand.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#27460

From	Roy Smith <roy@panix.com>
Date	2012-08-20 00:44 -0400
Message-ID	<roy-A160F1.00442220082012@news.panix.com>
In reply to	#27458

In article <5031bb2f$0$29972$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> > So it may be with utf-8 someday.
> 
> Only if you believe that people's ability to generate data will remain 
> lower than people's ability to install more storage.

We're not talking *data*, we're talking *text*.  Most of those 
whatever-bytes people are generating are images, video, and music.  Text 
is a pittance compared to those.

In any case, text on disk can easily be stored compressed.  I would 
expect the UTF-8 and UTF-32 versions of a text file to compress to just 
about the same size.

[toc] | [prev] | [next] | [standalone]

#27461

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-20 05:56 +0000
Message-ID	<5031d17a$0$1645$c3e8da3$76491128@news.astraweb.com>
In reply to	#27460

On Mon, 20 Aug 2012 00:44:22 -0400, Roy Smith wrote:

> In article <5031bb2f$0$29972$c3e8da3$5496439d@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> 
>> > So it may be with utf-8 someday.
>> 
>> Only if you believe that people's ability to generate data will remain
>> lower than people's ability to install more storage.
> 
> We're not talking *data*, we're talking *text*.  Most of those
> whatever-bytes people are generating are images, video, and music.  Text
> is a pittance compared to those.

Paul Rubin already told you about his experience using OCR to generate 
multiple terrabytes of text, and how he would not be happy if that was 
stored in UCS-4.

HTML is text. XML is text. SVG is text. Source code is text. Email is 
text. (Well, it's actually bytes, but it looks like ASCII text.) Log 
files are text, and they can fill a hard drive pretty quickly. Lots of 
data is text.

Pittance or not, I do not believe that people will widely abandon compact 
storage formats like UTF-8 and Latin-1 for UCS-4 any time soon. Given 
that we're still trying to convince people to use UTF-8 over ASCII, I 
reckon it will be at least 40 years before there's even a slim chance of 
migrating from UTF-8 to UCS-4 in a widespread manner. In the IT world, 
that's close enough to "never" -- we might not even be using Unicode in 
2052.

In any case, time will tell who is right.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#27464

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-08-19 23:24 -0700
Message-ID	<7xzk5qjd1k.fsf@ruckus.brouhaha.com>
In reply to	#27461

Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> Paul Rubin already told you about his experience using OCR to generate 
> multiple terrabytes of text, and how he would not be happy if that was 
> stored in UCS-4.

That particular text was stored on disk as compressed XML that had UTF-8
in the data fields, but I think Roy is right that it would have
compressed to around the same size in UCS-4.  Converting it to UCS-4 on
input would have bloated up the memory footprint and that was the issue
of concern to me.

> Pittance or not, I do not believe that people will widely abandon compact 
> storage formats like UTF-8 and Latin-1 for UCS-4 any time soon.

Looking at http://www.icu-project.org/ the C++ classes seem to use
UTF-16 sort like Python 3.2 :(.  I'm not certain of this though.

[toc] | [prev] | [next] | [standalone]

#27508

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2012-08-20 12:58 -0400
Message-ID	<mailman.3564.1345481890.4697.python-list@python.org>
In reply to	#27458

On 20 Aug 2012 04:21:04 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following in
gmane.comp.python.general:

> 
> That's obvious, surely? We need two cases so that we can distinguish 
> helping Jack off a horse from helping jack off a horse.
>
	I'm sure the horse will appreciate either action... <G>
 
> Every few years, new sizes for storage media comes out. The first thing 
> that happens is that people say "40 megabytes? I'll NEVER fill this hard 
> drive up!". The second thing that happens is that they say "Dammit, my 40 
> MB hard drive is full, and a new one is too expensive, better delete some 
> files." Followed shortly by "400 megabytes? I'll NEVER use that much 
> space!" -- wash, rinse, repeat, through megabytes, gigabytes, terrabytes, 
> and it will happen for petabytes next.
>
	Don't remind me... I pulled an introductory digital photography
guide out of storage last week...

	The device it uses to transfer data from SmartMedia cards takes two
button-cell batteries; put the SM card into it, and slide it into a 3.5"
floppy drive.

	100MB ZIP drives were suggested for sharing image files with
friends...

	And if I ever take my Amiga out of storage, I'm not sure I can even
find external SCSI drives for it... It could use a few more 1GB drives
<G>

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

#27445

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-08-19 20:35 -0400
Message-ID	<mailman.3532.1345422959.4697.python-list@python.org>
In reply to	#27360

On 8/19/2012 6:42 PM, Chris Angelico wrote:
> On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjreedy@udel.edu> wrote:

>> Python has often copied or borrowed, with adjustments. This time it is the
>> first.

I should have added 'that I know of' ;-)

> Maybe it wasn't consciously borrowed, but whatever innovation is done,
> there's usually an obscure beardless language that did it earlier. :)
>
> Pike has a single string type, which can use the full Unicode range.
> If all codepoints are <256, the string width is 8 (measured in bits);
> if <65536, width is 16; otherwise 32. Using the inbuilt count_memory
> function (similar to the Python function used somewhere earlier in
> this thread, but which I can't at present put my finger to), I find
> that for strings of 16 bytes or more, there's a fixed 20-byte header
> plus the string content, stored in the correct number of bytes. (Pike
> strings, like Python ones, are immutable and do not need expansion
> room.)

It is even possible that someone involved was even vaguely aware that 
there was an antecedent. The PEP makes no claim that I can see, but lays 
out the problem and goes right to details of a Python implementation.

> However, Python goes a bit further by making it VERY clear that this
> is a mere optimization, and that Unicode strings and bytes strings are
> completely different beasts. In Pike, it's possible to forget to
> encode something before (say) writing it to a socket. Everything works
> fine while you have only ASCII characters in the string, and then
> breaks when you have a >255 codepoint - or perhaps worse, when you
> have a 127<x<256, and the other end misinterprets it.

Python writes strings to file objects, including open sockets, without 
creating a bytes object -- IF the file is opened in text mode, which 
always has an associated encoding, even if the default 'ascii'. From 
what you say, this is what Pike is missing.

I am pretty sure that the obvious optimization has already been done. 
The internal bytes of all-ascii text can safely be sent to a file with 
ascii (or ascii-compatible) encoding without intermediate 'decoding'. I 
remember several patches of that sort. If a string is internally ucs2 
and the file is declared usc2 or utf-16 encoding, then again, pairs of 
bytes can go directly (possibly with a byte swap).

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#27457

From	Chris Angelico <rosuav@gmail.com>
Date	2012-08-20 14:07 +1000
Message-ID	<mailman.3536.1345435662.4697.python-list@python.org>
In reply to	#27360

On Mon, Aug 20, 2012 at 10:35 AM, Terry Reedy <tjreedy@udel.edu> wrote:
> On 8/19/2012 6:42 PM, Chris Angelico wrote:
>> However, Python goes a bit further by making it VERY clear that this
>> is a mere optimization, and that Unicode strings and bytes strings are
>> completely different beasts. In Pike, it's possible to forget to
>> encode something before (say) writing it to a socket. Everything works
>> fine while you have only ASCII characters in the string, and then
>> breaks when you have a >255 codepoint - or perhaps worse, when you
>> have a 127<x<256, and the other end misinterprets it.
>
> Python writes strings to file objects, including open sockets, without
> creating a bytes object -- IF the file is opened in text mode, which always
> has an associated encoding, even if the default 'ascii'. From what you say,
> this is what Pike is missing.

In text mode, the library does the encoding, but an encoding still happens.

> I am pretty sure that the obvious optimization has already been done. The
> internal bytes of all-ascii text can safely be sent to a file with ascii (or
> ascii-compatible) encoding without intermediate 'decoding'. I remember
> several patches of that sort. If a string is internally ucs2 and the file is
> declared usc2 or utf-16 encoding, then again, pairs of bytes can go directly
> (possibly with a byte swap).

Maybe it doesn't take any memory change, but there is a data type
change. A Unicode string cannot be sent over the network; an encoding
is needed.

In Pike, I can take a string like "\x20AC" (or "\u20ac" or
"\U000020ac", same thing) and manipulate it as a one-character string,
but I cannot write it to a file or file-like object. I can, however,
pass it through a codec (and there's string_to_utf8() for the
convenience of the common case), and get back something like
"\xe2\x82\xac", which is a three-byte string. The thing is, though,
that this new string is of exactly the same data type as the original:
'string'. Which means that I could have a string containing Latin-1
but not ASCII characters, and Pike will happily write it to a socket
without raising a compile-time or run-time error. Python, under the
same circumstances, would either raise an error or quietly (and
correctly) encode the data.

But this is a relatively trivial point, in the scheme of things.
Python has an excellent model now for handling Unicode strings, and I
would STRONGLY recommend everyone to upgrade to 3.3.

ChrisA

[toc] | [prev] | [next] | [standalone]

#27371

From	lipska the kat <lipskathekat@yahoo.co.uk>
Date	2012-08-19 11:13 +0100
Message-ID	<Foudnf2f142gIa3NnZ2dnUVZ8uCdnZ2d@bt.com>
In reply to	#27349

On 19/08/12 07:09, Steven D'Aprano wrote:
> This is a long post. If you don't feel like reading an essay, skip to the
> very bottom and read my last few paragraphs, starting with "To recap".

Thank you for this excellent post,
it has certainly cleared up a few things for me

[snip]

incidentally

 > But in UTF-16, ...

[snip]

 > py>  s = chr(0xFFFF + 1)
 > py>  a, b = s
 > py>  a
 > '\ud800'
 > py>  b
 > '\udc00'

in IDLE

Python 3.2.3 (default, May  3 2012, 15:51:42)
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
==== No Subprocess ====
 >>> s = chr(0xFFFF + 1)
 >>> a, b = s
Traceback (most recent call last):
   File "<pyshell#1>", line 1, in <module>
     a, b = s
ValueError: need more than 1 value to unpack

At a terminal prompt

[lipska@ubuntu ~]$ python3.2
Python 3.2.3 (default, Jul 17 2012, 14:23:10)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> s = chr(0xFFFF + 1)
 >>> a, b = s
 >>> a
'\ud800'
 >>> b
'\udc00'
 >>>

The date stamp is different but the Python version is the same

No idea why this is happening, I just thought it was interesting

lipska

-- 
Lipska the Kat©: Troll hunter, sandbox destroyer
and farscape dreamer of Aeryn Sun

[toc] | [prev] | [next] | [standalone]

#27372

From	Chris Angelico <rosuav@gmail.com>
Date	2012-08-19 20:19 +1000
Message-ID	<mailman.3490.1345371554.4697.python-list@python.org>
In reply to	#27371

On Sun, Aug 19, 2012 at 8:13 PM, lipska the kat
<lipskathekat@yahoo.co.uk> wrote:
> The date stamp is different but the Python version is the same

Check out what 'sys.maxunicode' is in each of those Pythons. It's
possible that one is a wide build and the other narrow.

ChrisA

[toc] | [prev] | [next] | [standalone]

#27378

From	lipska the kat <lipskathekat@yahoo.co.uk>
Date	2012-08-19 11:49 +0100
Message-ID	<oMSdnan41PQuWa3NnZ2dnUVZ8kqdnZ2d@bt.com>
In reply to	#27372

On 19/08/12 11:19, Chris Angelico wrote:
> On Sun, Aug 19, 2012 at 8:13 PM, lipska the kat
> <lipskathekat@yahoo.co.uk>  wrote:
>> The date stamp is different but the Python version is the same
>
> Check out what 'sys.maxunicode' is in each of those Pythons. It's
> possible that one is a wide build and the other narrow.

Ah ...

I built my local version from source
and no, I didn't read the makefile so I didn't configure for a wide 
build :-( not that I would have known the difference at that time.

[lipska@ubuntu ~]$ python3.2
Python 3.2.3 (default, Jul 17 2012, 14:23:10)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> sys.maxunicode
65535
 >>>

Later, I did an apt-get install idle3 which pulled
down a precompiled IDLE from the Ubuntu repos
This was obviously compiled 'wide'

Python 3.2.3 (default, May  3 2012, 15:51:42)
[GCC 4.6.3] on linux2
Type "copyright", "credits" or "license()" for more information.
==== No Subprocess ====
 >>> import sys
 >>> sys.maxunicode
1114111
 >>>

All very interesting and enlightening

Thanks

lipska

-- 
Lipska the Kat©: Troll hunter, sandbox destroyer
and farscape dreamer of Aeryn Sun

[toc] | [prev] | [next] | [standalone]

#27401

From	"Blind Anagram" <noname@nowhere.com>
Date	2012-08-19 18:03 +0100
Message-ID	<XI2dncR-HMTzgazNnZ2dnUVZ8jCdnZ2d@brightview.co.uk>
In reply to	#27291

"Steven D'Aprano"  wrote in message 
news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...

On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:

[...]
If you can consistently replicate a 100% to 1000% slowdown in string
handling, please report it as a performance bug:

http://bugs.python.org/

Don't forget to report your operating system.

====================================================
For interest, I ran your code snippets on my laptop (Intel core-i7 1.8GHz) 
running Windows 7 x64.

Running Python from a Windows command prompt,  I got the following on Python 
3.2.3 and 3.3 beta 2:

python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
10000 loops, best of 3: 39.3 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
10000 loops, best of 3: 51.8 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
10000 loops, best of 3: 52 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
10000 loops, best of 3: 50.3 usec per loop
python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
10000 loops, best of 3: 51.6 usec per loop
python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
10000 loops, best of 3: 38.3 usec per loop
python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
10000 loops, best of 3: 50.3 usec per loop

python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
10000 loops, best of 3: 24.5 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
10000 loops, best of 3: 24.7 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
10000 loops, best of 3: 24.8 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
10000 loops, best of 3: 24 usec per loop
python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
10000 loops, best of 3: 24.1 usec per loop
python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
10000 loops, best of 3: 24.4 usec per loop
python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
10000 loops, best of 3: 24.3 usec per loop

This is an average slowdown by a factor of close to 2.3 on 3.3 when compared 
with 3.2.

I am not posting this to perpetuate this thread but simply to ask whether, 
as you suggest, I should report this as a possible problem with the beta?

[toc] | [prev] | [next] | [standalone]

#27404

From	wxjmfauth@gmail.com
Date	2012-08-19 10:33 -0700
Message-ID	<5dfd1779-9442-4858-9161-8f1a06d56829@googlegroups.com>
In reply to	#27401

Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit :
> "Steven D'Aprano"  wrote in message 
> 
> news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
> 
> 
> 
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
> 
> 
> 
> [...]
> 
> If you can consistently replicate a 100% to 1000% slowdown in string
> 
> handling, please report it as a performance bug:
> 
> 
> 
> http://bugs.python.org/
> 
> 
> 
> Don't forget to report your operating system.
> 
> 
> 
> ====================================================
> 
> For interest, I ran your code snippets on my laptop (Intel core-i7 1.8GHz) 
> 
> running Windows 7 x64.
> 
> 
> 
> Running Python from a Windows command prompt,  I got the following on Python 
> 
> 3.2.3 and 3.3 beta 2:
> 
> 
> 
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 
> 10000 loops, best of 3: 39.3 usec per loop
> 
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 
> 10000 loops, best of 3: 51.8 usec per loop
> 
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 
> 10000 loops, best of 3: 52 usec per loop
> 
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 
> 10000 loops, best of 3: 50.3 usec per loop
> 
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 
> 10000 loops, best of 3: 51.6 usec per loop
> 
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 
> 10000 loops, best of 3: 38.3 usec per loop
> 
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 
> 10000 loops, best of 3: 50.3 usec per loop
> 
> 
> 
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 
> 10000 loops, best of 3: 24.5 usec per loop
> 
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 
> 10000 loops, best of 3: 24.7 usec per loop
> 
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 
> 10000 loops, best of 3: 24.8 usec per loop
> 
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 
> 10000 loops, best of 3: 24 usec per loop
> 
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 
> 10000 loops, best of 3: 24.1 usec per loop
> 
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 
> 10000 loops, best of 3: 24.4 usec per loop
> 
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 
> 10000 loops, best of 3: 24.3 usec per loop
> 
> 
> 
> This is an average slowdown by a factor of close to 2.3 on 3.3 when compared 
> 
> with 3.2.
> 
> 
> 
> I am not posting this to perpetuate this thread but simply to ask whether, 
> 
> as you suggest, I should report this as a possible problem with the beta?

I use win7 pro 32bits in intel?

Thanks for reporting these numbers.
To be clear: I'm not complaining, but the fact that
there is a slow down is a clear indication (in my mind),
there is a point somewhere.

jmf

[toc] | [prev] | [next] | [standalone]

#27412

From	"Blind Anagram" <noname@nowhere.com>
Date	2012-08-19 19:04 +0100
Message-ID	<j6ydnQjPqNFZt6zNnZ2dnUVZ8kGdnZ2d@brightview.co.uk>
In reply to	#27404

wrote in message 
news:5dfd1779-9442-4858-9161-8f1a06d56829@googlegroups.com...

Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit :
> "Steven D'Aprano"  wrote in message
>
> news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
>
>
>
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
>
>
>
> [...]
>
> If you can consistently replicate a 100% to 1000% slowdown in string
>
> handling, please report it as a performance bug:
>
>
>
> http://bugs.python.org/
>
>
>
> Don't forget to report your operating system.
>
>
>
> ====================================================
>
> For interest, I ran your code snippets on my laptop (Intel core-i7 1.8GHz)
>
> running Windows 7 x64.
>
>
>
> Running Python from a Windows command prompt,  I got the following on 
> Python
>
> 3.2.3 and 3.3 beta 2:
>
>
>
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
>
> 10000 loops, best of 3: 39.3 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
>
> 10000 loops, best of 3: 51.8 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
>
> 10000 loops, best of 3: 52 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
>
> 10000 loops, best of 3: 50.3 usec per loop
>
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
>
> 10000 loops, best of 3: 51.6 usec per loop
>
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
>
> 10000 loops, best of 3: 38.3 usec per loop
>
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
>
> 10000 loops, best of 3: 50.3 usec per loop
>
>
>
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
>
> 10000 loops, best of 3: 24.5 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
>
> 10000 loops, best of 3: 24.7 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
>
> 10000 loops, best of 3: 24.8 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
>
> 10000 loops, best of 3: 24 usec per loop
>
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
>
> 10000 loops, best of 3: 24.1 usec per loop
>
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
>
> 10000 loops, best of 3: 24.4 usec per loop
>
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
>
> 10000 loops, best of 3: 24.3 usec per loop
>
>
>
> This is an average slowdown by a factor of close to 2.3 on 3.3 when 
> compared
>
> with 3.2.
>
>
>
> I am not posting this to perpetuate this thread but simply to ask whether,
>
> as you suggest, I should report this as a possible problem with the beta?

I use win7 pro 32bits in intel?

Thanks for reporting these numbers.
To be clear: I'm not complaining, but the fact that
there is a slow down is a clear indication (in my mind),
there is a point somewhere.

====================================
I may be reading your input wrongly, but it seems to me that you are not 
only reporting a slowdown but you are also suggesting that this slowdown is 
the result of bad design decisions by the Python development team.

I don't want to get involved in the latter part of your argument because I 
am convinced that the Python team are doing their very best to find a good 
compromise between the various design constraints that they face in meeting 
these needs.

Nevertheless, the post that I responded to contained the suggestion that 
slowdowns above 100% (which I took as a factor of 2) would be worth 
reporting as a possible bug.  So I thought that it was worth asking about 
this as I may have misunderstood the level of slowdown that is worth 
reporting.  There is also a potential problem in timings on laptops with 
turbo-boost (as I have), although the times look fairly consistent.

[toc] | [prev] | [next] | [standalone]

#27415

From	Dave Angel <d@davea.name>
Date	2012-08-19 14:05 -0400
Message-ID	<mailman.3519.1345399574.4697.python-list@python.org>
In reply to	#27401

On 08/19/2012 01:03 PM, Blind Anagram wrote:
> "Steven D'Aprano"  wrote in message
> news:502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com...
>
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
>
> [...]
> If you can consistently replicate a 100% to 1000% slowdown in string
> handling, please report it as a performance bug:
>
> http://bugs.python.org/
>
> Don't forget to report your operating system.
>
> ====================================================
> For interest, I ran your code snippets on my laptop (Intel core-i7
> 1.8GHz) running Windows 7 x64.
>
> Running Python from a Windows command prompt,  I got the following on
> Python 3.2.3 and 3.3 beta 2:
>
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 10000 loops, best of 3: 39.3 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 10000 loops, best of 3: 51.8 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 10000 loops, best of 3: 52 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 10000 loops, best of 3: 50.3 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 10000 loops, best of 3: 51.6 usec per loop
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 10000 loops, best of 3: 38.3 usec per loop
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 10000 loops, best of 3: 50.3 usec per loop
>
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 10000 loops, best of 3: 24.5 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 10000 loops, best of 3: 24.7 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 10000 loops, best of 3: 24.8 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 10000 loops, best of 3: 24 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 10000 loops, best of 3: 24.1 usec per loop
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 10000 loops, best of 3: 24.4 usec per loop
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 10000 loops, best of 3: 24.3 usec per loop
>
> This is an average slowdown by a factor of close to 2.3 on 3.3 when
> compared with 3.2.
>

Using your measurement numbers, I get an average of 1.95, not 2.3



-- 

DaveA

[toc] | [prev] | [next] | [standalone]

Page 7 of 8 — ← Prev page 1 2 3 4 5 6 [7] 8 Next page →

csiph-web

How do I display unicode value stored in a string variable using ord()

Contents

#27579 — Re: Abuse of subject, was Re: Abuse of Big Oh notation

#27581 — Re: Abuse of subject, was Re: Abuse of Big Oh notation

#27525 — Re: Abuse of Big Oh notation

#27671

#27439

#27441

#27458

#27460

#27461

#27464

#27508

#27445

#27457

#27371

#27372

#27378

#27401

#27404

#27412

#27415