Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #27204 > unrolled thread
| Started by | Charles Jensen <hopefullycharles@gmail.com> |
|---|---|
| First post | 2012-08-16 15:09 -0700 |
| Last post | 2012-08-20 17:20 -0400 |
| Articles | 20 on this page of 145 — 26 participants |
Back to article view | Back to comp.lang.python
How do I display unicode value stored in a string variable using ord() Charles Jensen <hopefullycharles@gmail.com> - 2012-08-16 15:09 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-17 08:20 +1000
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-16 18:47 -0400
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-16 19:59 -0400
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
Re: How do I display unicode value stored in a string variable using ord() Jerry Hill <malaclypse2@gmail.com> - 2012-08-17 14:21 -0400
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 16:55 -0400
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 23:30 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 04:10 +0000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:18 -0600
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 03:59 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
Re: How do I display unicode value stored in a string variable using ord() Alister <alister.ware@ntlworld.com> - 2012-08-17 06:30 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 01:09 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 12:27 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 08:07 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 16:25 +0100
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 01:36 +1000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:51 -0600
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 02:57 +1000
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 18:28 +0100
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:34 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:35 +0000
New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Peter Otten <__peter__@web.de> - 2012-08-19 09:43 +0200
Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:56 +0000
Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 02:24 -0700
Re: New internal string format in 3.3 Peter Otten <__peter__@web.de> - 2012-08-19 11:37 +0200
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
Re: New internal string format in 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:33 +0000
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:26 +1000
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:29 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 14:46 +0100
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 15:48 +0100
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:48 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:09 +0100
Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-20 07:50 +1000
Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-19 23:38 -0600
Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-20 09:17 -0400
Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-20 22:18 -0600
Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-21 07:48 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:56 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:35 -0400
Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:30 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 16:09 -0400
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 23:12 -0400
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:33 +0000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 11:50 -0600
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 11:20 -0700
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:31 -0600
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 12:23 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:16 +0000
Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:46 -0600
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 17:59 +0000
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:30 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:45 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:13 +0000
Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-18 11:40 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:50 +0100
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 13:22 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 22:37 +0100
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 11:26 -0700
Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:59 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 07:17 +0000
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 10:46 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:11 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 12:19 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:35 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:01 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 20:10 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:31 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 22:58 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:01 +0000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:11 -0700
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 18:24 +1000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:44 -0700
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 01:54 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 11:46 +0100
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 12:31 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 10:51 +0000
Re: How do I display unicode value stored in a string variable using ord() Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-21 17:03 +1000
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:09 +0000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:04 -0700
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:25 +0000
Re: How do I display unicode value stored in a string variable using ord() DJC <djc@news.invalid> - 2012-08-19 17:32 +0200
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:34 -0400
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 10:48 -0700
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 11:11 -0700
Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:50 +0100
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:59 -0400
Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-19 23:13 -0700
Abuse of Big Oh notation [was Re: How do I display unicode value stored in a string variable using ord()] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:15 +0000
Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-19 16:42 -0700
Re: Abuse of Big Oh notation Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-08-20 09:24 +0100
Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 09:01 -0700
Re: Abuse of Big Oh notation Chris Angelico <rosuav@gmail.com> - 2012-08-21 02:09 +1000
Re: Abuse of Big Oh notation Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-20 11:12 -0600
Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 12:29 -0700
Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:16 -0700
Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:20 -0700
Re: Abuse of Big Oh notation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-21 09:53 +0000
Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
Re: Abuse of Big Oh notation Ned Deily <nad@acm.org> - 2012-08-20 18:19 -0700
Abuse of subject, was Re: Abuse of Big Oh notation Peter Otten <__peter__@web.de> - 2012-08-21 09:52 +0200
Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
Re: How do I display unicode value stored in a string variable using ord() Hans Mulder <hansmu@xs4all.nl> - 2012-08-22 20:53 +0200
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 08:42 +1000
Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-19 19:24 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 04:21 +0000
Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-20 00:44 -0400
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 05:56 +0000
Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 23:24 -0700
Re: How do I display unicode value stored in a string variable using ord() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-20 12:58 -0400
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 20:35 -0400
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 14:07 +1000
Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:13 +0100
Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:19 +1000
Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:49 +0100
Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 18:03 +0100
Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 10:33 -0700
Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:04 +0100
Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-19 14:05 -0400
Re: How do I display unicode value stored in a string variable usingord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:18 +0100
Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:31 +0000
Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:03 -0400
Re: How do I display unicode value stored in a string variable using ord() 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-19 17:32 -0700
Re: How do I display unicode value stored in a string variable using ord() Piet van Oostrum <piet@vanoostrum.org> - 2012-08-20 17:20 -0400
Page 3 of 8 — ← Prev page 1 2 [3] 4 5 6 7 8 Next page →
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 07:09 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <f6de81c6-2965-42dd-a789-0770a019c038@googlegroups.com> |
| In reply to | #27391 |
Le dimanche 19 août 2012 15:46:34 UTC+2, Mark Lawrence a écrit : > On 19/08/2012 13:59, wxjmfauth@gmail.com wrote: > > > Le dimanche 19 ao�t 2012 14:29:17 UTC+2, Dave Angel a �crit : > > >> On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote: > > >> > > >>> Le dimanche 19 ao�t 2012 12:26:44 UTC+2, Chris Angelico a �crit : > > >> > > >>>> On Sun, Aug 19, 2012 at 8:19 PM, <wxjmfauth@gmail.com> wrote: > > >> > > >>>> > > >> > > >>>>> This is precicely the weak point of this flexible > > >> > > >>>>> representation. It uses latin-1 and latin-1 is for > > >> > > >>>>> most users simply unusable. > > >> > > >>>> > > >> > > >>>> > > >> > > >>>> No, it uses Unicode, and as an optimization, attempts to store the > > >> > > >>>> > > >> > > >>>> codepoints in less than four bytes for most strings. The fact that a > > >> > > >>>> > > >> > > >>>> one-byte storage format happens to look like latin-1 is rather > > >> > > >>>> > > >> > > >>>> coincidental. > > >> > > >>>> > > >> > > >>> And this this is the common basic mistake. You do not push your > > >> > > >>> argumentation far enough. A character may "fall" accidentally in a latin-1. > > >> > > >>> The problem lies in these european characters, which can not fall in this > > >> > > >>> coding. This *is* the cause of the negative side effects. > > >> > > >>> If you are using a correct coding scheme, like cp1252, mac-roman or > > >> > > >>> iso-8859-15, you will never see such a negative side effect. > > >> > > >>> Again, the problem is not the result, the encoded character. The critical > > >> > > >>> part is the character which may cause this side effect. > > >> > > >>> You should think "character set" and not encoded "code point", considering > > >> > > >>> this kind of expression has a sense in 8-bits coding scheme. > > >> > > >>> > > >> > > >>> jmf > > >> > > >> > > >> > > >> But that choice was made decades ago when Unicode picked its second 128 > > >> > > >> characters. The internal form used in this PEP is simply the low-order > > >> > > >> byte of the Unicode code point. Trying to scan the string deciding if > > >> > > >> converting to cp1252 (for example) would be a much more expensive > > >> > > >> operation than seeing how many bytes it'd take for the largest code point. > > >> > > >> > > > > > > You are absoletely right. (I'm quite comfortable with Unicode). > > > If Python wish to perpetuate this, lets call it, design mistake > > > or ennoyement, it will continue to live with problems. > > > > Please give a precise description of the design mistake and what you > > would do to correct it. > > > > > > > > People (tools) who chose pure utf-16 or utf-32 are not suffering > > > from this issue. > > > > > > *My* final comment on this thread. > > > > > > In August 2012, after 20 years of development, Python is not > > > able to display a piece of text correctly on a Windows console > > > (eg cp65001). > > > > Examples please. > > > > > > > > I downloaded the go language, zero experience, I did not succeed > > > to display incorrecly a piece of text. (This is by the way *the* > > > reason why I tested it). Where the problems are coming from, I have > > > no idea. > > > > > > I find this situation quite comic. Python is able to > > > produce this: > > > > > >>>> (1.1).hex() > > > '0x1.199999999999ap+0' > > > > > > but it is not able to display a piece of text! > > > > So you keep saying, but when asked for examples or evidence nothing gets > > produced. > > > > > > > > Try to convince end users IEEE 754 is more important than the > > > ability to read/wirite a piece a text, a 6-years kid has learned > > > at school :-) > > > > > > (I'm not suffering from this kind of effect, as a Windows user, > > > I'm always working via gui, it still remains, the problem exists. > > > > Windows is a law unto itself. Its problems are hardly specific to Python. > > > > > > > > Regards, > > > jmf > > > > > > > Now two or three times you've said you're going but have come back. If > > you come again could you please provide examples and or evidence of what > > you're on about, because you still have me baffled. > > > > -- > > Cheers. > > > > Mark Lawrence. Yesterday, I went to bed. More seriously. I can not give you more numbers than those I gave. As a end user, I noticed and experimented my random tests are always slower in Py3.3 than in Py3.2 on my Windows platform. It is up to you, the core developers to give an explanation about this behaviour. As I understand a little bit the coding of the characters, I pointed out, this is most probably due to this flexible string representation (with arguments appearing randomly in the misc. messages, mainly latin-1). I can not do more. (I stupidly spoke about factors 0.1 to ..., you should read of course, 1.1, to ...) jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-08-19 15:48 +0100 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3504.1345387683.4697.python-list@python.org> |
| In reply to | #27393 |
On 19/08/2012 15:09, wxjmfauth@gmail.com wrote: > > I can not give you more numbers than those I gave. > As a end user, I noticed and experimented my random tests > are always slower in Py3.3 than in Py3.2 on my Windows platform. Once again you refuse to supply anything to back up what you say. > > It is up to you, the core developers to give an explanation > about this behaviour. Core developers cannot give an explanation for something that doesn't exist, except in your imagination. Unless you can produce the evidence that supports your claims, including details of OS, benchmarks used and so on and so forth. > > As I understand a little bit the coding of the characters, > I pointed out, this is most probably due to this flexible > string representation (with arguments appearing randomly > in the misc. messages, mainly latin-1). > > I can not do more. > > (I stupidly spoke about factors 0.1 to ..., you should > read of course, 1.1, to ...) > > jmf > I suspect that I'll be dead and buried long before you can produce anything concrete in the way of evidence. I've thrown down the gauntlet several times, do you now have the courage to pick it up, or are you going to resort to the FUD approach that you've been using throughout this thread? -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 09:19 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3507.1345393173.4697.python-list@python.org> |
| In reply to | #27394 |
Le dimanche 19 août 2012 16:48:48 UTC+2, Mark Lawrence a écrit :
> On 19/08/2012 15:09, wxjmfauth@gmail.com wrote:
>
>
>
> >
>
> > I can not give you more numbers than those I gave.
>
> > As a end user, I noticed and experimented my random tests
>
> > are always slower in Py3.3 than in Py3.2 on my Windows platform.
>
>
>
> Once again you refuse to supply anything to back up what you say.
>
>
>
> >
>
> > It is up to you, the core developers to give an explanation
>
> > about this behaviour.
>
>
>
> Core developers cannot give an explanation for something that doesn't
>
> exist, except in your imagination. Unless you can produce the evidence
>
> that supports your claims, including details of OS, benchmarks used and
>
> so on and so forth.
>
>
>
> >
>
> > As I understand a little bit the coding of the characters,
>
> > I pointed out, this is most probably due to this flexible
>
> > string representation (with arguments appearing randomly
>
> > in the misc. messages, mainly latin-1).
>
> >
>
> > I can not do more.
>
> >
>
> > (I stupidly spoke about factors 0.1 to ..., you should
>
> > read of course, 1.1, to ...)
>
> >
>
> > jmf
>
> >
>
>
>
> I suspect that I'll be dead and buried long before you can produce
>
> anything concrete in the way of evidence. I've thrown down the gauntlet
>
> several times, do you now have the courage to pick it up, or are you
>
> going to resort to the FUD approach that you've been using throughout
>
> this thread?
>
>
>
> --
>
> Cheers.
>
>
>
> Mark Lawrence.
I do not remember the tests I'have done at the 1st alpha release
time. It was with an interactive interpreter. I precisely pay
attention to test these chars you can find in the range 128..256
in all 8-bits coding schemes. Chars I suspected to be problematic.
Here a short test again, a random single test, the first
idea coming in my mind.
Py 3.2.3
>>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')")
4.99396356635981
Py 3.3b2
>>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')")
7.560455708007855
Maybe, not so demonstative. It shows at least, we
are far away from the 10-30% "annouced".
>>> 7.56 / 5
1.512
>>> 5 / (7.56 - 5) * 100
195.31250000000003
jmf
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 09:19 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <dafd57a5-6070-4ff2-9fe3-b3816e1e43b3@googlegroups.com> |
| In reply to | #27394 |
Le dimanche 19 août 2012 16:48:48 UTC+2, Mark Lawrence a écrit :
> On 19/08/2012 15:09, wxjmfauth@gmail.com wrote:
>
>
>
> >
>
> > I can not give you more numbers than those I gave.
>
> > As a end user, I noticed and experimented my random tests
>
> > are always slower in Py3.3 than in Py3.2 on my Windows platform.
>
>
>
> Once again you refuse to supply anything to back up what you say.
>
>
>
> >
>
> > It is up to you, the core developers to give an explanation
>
> > about this behaviour.
>
>
>
> Core developers cannot give an explanation for something that doesn't
>
> exist, except in your imagination. Unless you can produce the evidence
>
> that supports your claims, including details of OS, benchmarks used and
>
> so on and so forth.
>
>
>
> >
>
> > As I understand a little bit the coding of the characters,
>
> > I pointed out, this is most probably due to this flexible
>
> > string representation (with arguments appearing randomly
>
> > in the misc. messages, mainly latin-1).
>
> >
>
> > I can not do more.
>
> >
>
> > (I stupidly spoke about factors 0.1 to ..., you should
>
> > read of course, 1.1, to ...)
>
> >
>
> > jmf
>
> >
>
>
>
> I suspect that I'll be dead and buried long before you can produce
>
> anything concrete in the way of evidence. I've thrown down the gauntlet
>
> several times, do you now have the courage to pick it up, or are you
>
> going to resort to the FUD approach that you've been using throughout
>
> this thread?
>
>
>
> --
>
> Cheers.
>
>
>
> Mark Lawrence.
I do not remember the tests I'have done at the 1st alpha release
time. It was with an interactive interpreter. I precisely pay
attention to test these chars you can find in the range 128..256
in all 8-bits coding schemes. Chars I suspected to be problematic.
Here a short test again, a random single test, the first
idea coming in my mind.
Py 3.2.3
>>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')")
4.99396356635981
Py 3.3b2
>>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')")
7.560455708007855
Maybe, not so demonstative. It shows at least, we
are far away from the 10-30% "annouced".
>>> 7.56 / 5
1.512
>>> 5 / (7.56 - 5) * 100
195.31250000000003
jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-08-19 13:48 -0400 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3512.1345398545.4697.python-list@python.org> |
| In reply to | #27393 |
On 8/19/2012 10:09 AM, wxjmfauth@gmail.com wrote: > I can not give you more numbers than those I gave. > As a end user, I noticed and experimented my random tests > are always slower in Py3.3 than in Py3.2 on my Windows platform. And I gave other examples where 3.3 is *faster* on my Windows, which you have thus far not even acknowledged, let alone try. > It is up to you, the core developers to give an explanation > about this behaviour. System variation, unimportance of sub-microsecond variations, and attention to more important issues. Other developer say 3.3 is generally faster on their sy stems (OSX 10.8, and unspecified). To talk about speed sensibly, one must run the full stringbench.py benchmark and real applications on multiple Windows, *nix, and Mac systems. Python is not optimized for your particular current computer. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 10:51 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <5570714c-59e7-4149-b2bd-89d7628774e3@googlegroups.com> |
| In reply to | #27406 |
Just for the story. Five minutes after a closed my interactive interpreters windows, the day I tested this stuff. I though: "Too bad I did not noted the extremely bad cases I found, I'm pretty sure, this problem will arrive on the table". jmf
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2012-08-19 19:09 +0100 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3518.1345399568.4697.python-list@python.org> |
| In reply to | #27408 |
On 19/08/2012 18:51, wxjmfauth@gmail.com wrote: > Just for the story. > > Five minutes after a closed my interactive interpreters windows, > the day I tested this stuff. I though: > "Too bad I did not noted the extremely bad cases I found, I'm pretty > sure, this problem will arrive on the table". > > jmf > How convenient. -- Cheers. Mark Lawrence.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-20 07:50 +1000 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3528.1345413053.4697.python-list@python.org> |
| In reply to | #27408 |
On Mon, Aug 20, 2012 at 4:09 AM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > On 19/08/2012 18:51, wxjmfauth@gmail.com wrote: >> >> Just for the story. >> >> Five minutes after a closed my interactive interpreters windows, >> the day I tested this stuff. I though: >> "Too bad I did not noted the extremely bad cases I found, I'm pretty >> sure, this problem will arrive on the table". > > How convenient. Not really. Even if he HAD copied-and-pasted those worst-cases, it'd prove nothing. Maybe his system just chose to glitch right then. It's always possible to find statistical outliers that take way way longer than everything else. Watch this. Python 3.2 on Windows is optimized for adding 1 to numbers. C:\Documents and Settings\M>\python32\python -m timeit -r 1 "a=1+1" 10000000 loops, best of 1: 0.0654 usec per loop C:\Documents and Settings\M>\python32\python -m timeit -r 1 "a=1+1" 10000000 loops, best of 1: 0.0654 usec per loop C:\Documents and Settings\M>\python32\python -m timeit -r 1 "a=1+1" 10000000 loops, best of 1: 0.0654 usec per loop C:\Documents and Settings\M>\python32\python -m timeit -r 1 "a=1+2" 10000000 loops, best of 1: 0.0711 usec per loop Now, as long as I don't tell you that during the last test I had quite a few other processes running, including VLC playing a movie and two Python processes running "while True: pass", this will look like a significant performance difference. So now, I'm justified in complaining about how suboptimal Python is when adding 2 to a number, which I can assure you is a VERY common case. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2012-08-19 23:38 -0600 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3538.1345442498.4697.python-list@python.org> |
| In reply to | #27408 |
On 08/19/2012 11:51 AM, wxjmfauth@gmail.com wrote: > Five minutes after a closed my interactive interpreters windows, > the day I tested this stuff. I though: > "Too bad I did not noted the extremely bad cases I found, I'm pretty > sure, this problem will arrive on the table". Reading through this thread (which is entertaining), I am reminded of the old saying, "premature optimization is the root of all evil." This "problem" that you have discovered, if fixed the way you propose, (4-byte USC-4 representation internally always) would be just such a premature optimization. It would come at a high cost with very little real-world impact. As others have made abundantly clear, the overhead of changing internal string representations is a cost that's only manifest during the creation of the immutable string object. If your code is doing a lot of operations on immutable strings, which by definition creates new immutable string objects, then the real speed problem is in your algorithm. If you are working on a string as if it were a buffer, doing many searches, replaces, etc, then you need to work on an object designed for IO, such as io.StringIO. If implemented half correctly, I imagine that StringIO uses internally the widest possible character representation in the buffer. I could be wrong here. As to your other problem, Python generally tries to follow unicode encoding rules to the letter. Thus if a piece of text cannot be represented in the character set of the terminal, then Python will properly err out. Other languages you have tried, likely fudge it somehow. Display what they can, or something similar. In general the Windows command window is an outdated thing that no serious programmer can rely on to display unicode text. Use a proper GUI api, or use a better terminal that can handle utf-8. The TLDR version: You're right that converting string representations internally incurs overhead, but if your program is slow because of this you're doing it wrong. It's not symptomatic of some python disease.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-08-20 09:17 -0400 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <roy-87956C.09170720082012@news.panix.com> |
| In reply to | #27462 |
In article <mailman.3538.1345442498.4697.python-list@python.org>, Michael Torrie <torriem@gmail.com> wrote: > Python generally tries to follow unicode > encoding rules to the letter. Thus if a piece of text cannot be > represented in the character set of the terminal, then Python will > properly err out. Other languages you have tried, likely fudge it > somehow. And if you want the "fudge it somehow" behavior (which is often very useful!), there's always http://pypi.python.org/pypi/Unidecode/
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2012-08-20 22:18 -0600 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3587.1345522727.4697.python-list@python.org> |
| In reply to | #27488 |
On 08/20/2012 07:17 AM, Roy Smith wrote: > In article <mailman.3538.1345442498.4697.python-list@python.org>, > Michael Torrie <torriem@gmail.com> wrote: > >> Python generally tries to follow unicode >> encoding rules to the letter. Thus if a piece of text cannot be >> represented in the character set of the terminal, then Python will >> properly err out. Other languages you have tried, likely fudge it >> somehow. > > And if you want the "fudge it somehow" behavior (which is often very > useful!), there's always http://pypi.python.org/pypi/Unidecode/ Sweet tip, thanks! I often want to process text that has smart quotes, emdashes, etc, and convert them to plain old ascii quotes, dashes, ticks, etc. This will do that for me without resorting to a bunch of regexes. Bravo.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-08-21 07:48 -0400 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <roy-EF0527.07485121082012@news.panix.com> |
| In reply to | #27546 |
In article <mailman.3587.1345522727.4697.python-list@python.org>, Michael Torrie <torriem@gmail.com> wrote: > > And if you want the "fudge it somehow" behavior (which is often very > > useful!), there's always http://pypi.python.org/pypi/Unidecode/ > > Sweet tip, thanks! I often want to process text that has smart quotes, > emdashes, etc, and convert them to plain old ascii quotes, dashes, > ticks, etc. This will do that for me without resorting to a bunch of > regexes. Bravo. Yup, that's one of the things it's good for. We mostly use it to help map search terms, i.e. if you search for "beyonce", you're probably expecting it to match "Beyoncé". We also special-case some weird stuff like "kesha" matching "ke$ha", but we have to hand-code those.
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 10:51 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3514.1345398671.4697.python-list@python.org> |
| In reply to | #27406 |
Just for the story. Five minutes after a closed my interactive interpreters windows, the day I tested this stuff. I though: "Too bad I did not noted the extremely bad cases I found, I'm pretty sure, this problem will arrive on the table". jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-08-19 13:56 -0400 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3515.1345399008.4697.python-list@python.org> |
| In reply to | #27387 |
On 8/19/2012 8:59 AM, wxjmfauth@gmail.com wrote: > In August 2012, after 20 years of development, Python is not able to > display a piece of text correctly on a Windows console (eg cp65001). cp65001 is known to not work right. It has been very frustrating. Bug Microsoft about it, and indeed their whole policy of still dividing the world into code page regions, even in their next version, instead of moving toward unicode and utf-8, at least as an option. > I downloaded the go language, zero experience, I did not succeed to > display incorrecly a piece of text. (This is by the way *the* reason > why I tested it). Where the problems are coming from, I have no > idea. If go can display all unicode chars on a Windows console, perhaps you can do some research and find out how they do so. Then we could consider copying it. -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 05:59 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3500.1345381786.4697.python-list@python.org> |
| In reply to | #27384 |
Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit : > On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote: > > > Le dimanche 19 ao�t 2012 12:26:44 UTC+2, Chris Angelico a �crit : > > >> On Sun, Aug 19, 2012 at 8:19 PM, <wxjmfauth@gmail.com> wrote: > > >> > > >>> This is precicely the weak point of this flexible > > >>> representation. It uses latin-1 and latin-1 is for > > >>> most users simply unusable. > > >> > > >> > > >> No, it uses Unicode, and as an optimization, attempts to store the > > >> > > >> codepoints in less than four bytes for most strings. The fact that a > > >> > > >> one-byte storage format happens to look like latin-1 is rather > > >> > > >> coincidental. > > >> > > > And this this is the common basic mistake. You do not push your > > > argumentation far enough. A character may "fall" accidentally in a latin-1. > > > The problem lies in these european characters, which can not fall in this > > > coding. This *is* the cause of the negative side effects. > > > If you are using a correct coding scheme, like cp1252, mac-roman or > > > iso-8859-15, you will never see such a negative side effect. > > > Again, the problem is not the result, the encoded character. The critical > > > part is the character which may cause this side effect. > > > You should think "character set" and not encoded "code point", considering > > > this kind of expression has a sense in 8-bits coding scheme. > > > > > > jmf > > > > But that choice was made decades ago when Unicode picked its second 128 > > characters. The internal form used in this PEP is simply the low-order > > byte of the Unicode code point. Trying to scan the string deciding if > > converting to cp1252 (for example) would be a much more expensive > > operation than seeing how many bytes it'd take for the largest code point. > > You are absoletely right. (I'm quite comfortable with Unicode). If Python wish to perpetuate this, lets call it, design mistake or ennoyement, it will continue to live with problems. People (tools) who chose pure utf-16 or utf-32 are not suffering from this issue. *My* final comment on this thread. In August 2012, after 20 years of development, Python is not able to display a piece of text correctly on a Windows console (eg cp65001). I downloaded the go language, zero experience, I did not succeed to display incorrecly a piece of text. (This is by the way *the* reason why I tested it). Where the problems are coming from, I have no idea. I find this situation quite comic. Python is able to produce this: >>> (1.1).hex() '0x1.199999999999ap+0' but it is not able to display a piece of text! Try to convince end users IEEE 754 is more important than the ability to read/wirite a piece a text, a 6-years kid has learned at school :-) (I'm not suffering from this kind of effect, as a Windows user, I'm always working via gui, it still remains, the problem exists. Regards, jmf
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-08-19 08:35 -0400 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3498.1345379751.4697.python-list@python.org> |
| In reply to | #27382 |
(pardon the resend, but I accidentally omitted a couple of words) On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote: > Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit : >> <SNIP> >> >> >> No, it uses Unicode, and as an optimization, attempts to store the >> codepoints in less than four bytes for most strings. The fact that a >> one-byte storage format happens to look like latin-1 is rather >> coincidental. >> > And this this is the common basic mistake. You do not push your > argumentation far enough. A character may "fall" accidentally in a latin-1. > The problem lies in these european characters, which can not fall in this > coding. This *is* the cause of the negative side effects. > If you are using a correct coding scheme, like cp1252, mac-roman or > iso-8859-15, you will never see such a negative side effect. > Again, the problem is not the result, the encoded character. The critical > part is the character which may cause this side effect. > You should think "character set" and not encoded "code point", considering > this kind of expression has a sense in 8-bits coding scheme. > > jmf But that choice was made decades ago when Unicode picked its second 128 characters. The internal form used in this PEP is simply the low-order byte of the Unicode code point. Trying to scan the string deciding if converting to cp1252 (for example) would work, would be a much more expensive operation than seeing how many bytes it'd take for the largest code point. The 8 bit form is used if all the code points are less than 256. That is a simple description, and simple code. As several people have said, the fact that this byte matches on of the DECODED forms is coincidence. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-19 05:14 -0700 |
| Subject | Re: New internal string format in 3.3 |
| Message-ID | <mailman.3496.1345378464.4697.python-list@python.org> |
| In reply to | #27375 |
Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit : > On Sun, Aug 19, 2012 at 8:19 PM, <wxjmfauth@gmail.com> wrote: > > > This is precicely the weak point of this flexible > > > representation. It uses latin-1 and latin-1 is for > > > most users simply unusable. > > > > No, it uses Unicode, and as an optimization, attempts to store the > > codepoints in less than four bytes for most strings. The fact that a > > one-byte storage format happens to look like latin-1 is rather > > coincidental. > And this this is the common basic mistake. You do not push your argumentation far enough. A character may "fall" accidentally in a latin-1. The problem lies in these european characters, which can not fall in this coding. This *is* the cause of the negative side effects. If you are using a correct coding scheme, like cp1252, mac-roman or iso-8859-15, you will never see such a negative side effect. Again, the problem is not the result, the encoded character. The critical part is the character which may cause this side effect. You should think "character set" and not encoded "code point", considering this kind of expression has a sense in 8-bits coding scheme. jmf
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-19 06:30 +0000 |
| Message-ID | <5030881d$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #27320 |
On Sat, 18 Aug 2012 11:05:07 -0700, wxjmfauth wrote:
> As I understand (I think) the undelying mechanism, I can only say, it is
> not a surprise that it happens.
>
> Imagine an editor, I type an "a", internally the text is saved as ascii,
> then I type en "é", the text can only be saved in at least latin-1. Then
> I enter an "€", the text become an internal ucs-4 "string". The remove
> the "€" and so on.
Firstly, that is not what Python does. For starters, € is in the BMP, and
so is nearly every character you're ever going to use unless you are
Asian or a historian using some obscure ancient script. NONE of the
examples you have shown in your emails have included 4-byte characters,
they have all been ASCII or UCS-2.
You are suffering from a misunderstanding about what is going on and
misinterpreting what you have seen.
In *both* Python 3.2 and 3.3, both é and € are represented by two bytes.
That will not change. There is a tiny amount of fixed overhead for
strings, and that overhead is slightly different between the versions,
but you'll never notice the difference.
Secondly, how a text editor or word processor chooses to store the text
that you type is not the same as how Python does it. A text editor is not
going to be creating a new immutable string after every key press. That
will be slow slow SLOW. The usual way is to keep a buffer for each
paragraph, and add and subtract characters from the buffer.
> Intuitively I expect there is some kind slow down between all these
> "strings" conversion.
Your intuition is wrong. Strings are not converted from ASCII to USC-2 to
USC-4 on the fly, they are converted once, when the string is created.
The tests we ran earlier, e.g.:
('ab…' * 1000).replace('…', 'œ…')
show the *worst possible case* for the new string handling, because all
we do is create new strings. First we create a string 'ab…', then we
create another string 'ab…'*1000, then we create two new strings '…' and
'œ…', and finally we call replace and create yet another new string.
But in real applications, once you have created a string, you don't just
immediately create a new one and throw the old one away. You likely do
work with that string:
steve@runes:~$ python3.2 -m timeit "s = 'abcœ…'*1000; n = len(s); flag =
s.startswith(('*', 'a'))"
100000 loops, best of 3: 2.41 usec per loop
steve@runes:~$ python3.3 -m timeit "s = 'abcœ…'*1000; n = len(s); flag =
s.startswith(('*', 'a'))"
100000 loops, best of 3: 2.29 usec per loop
Once you start doing *real work* with the strings, the overhead of
deciding whether they should be stored using 1, 2 or 4 bytes begins to
fade into the noise.
> When I tested this flexible representation, a few months ago, at the
> first alpha release. This is precisely what, I tested. String
> manipulations which are forcing this internal change and I concluded the
> result is not brillant. Realy, a factor 0.n up to 10.
Like I said, if you really think that there is a significant, repeatable
slow-down on Windows, report it as a bug.
> Does any body know a way to get the size of the internal "string" in
> bytes?
sys.getsizeof(some_string)
steve@runes:~$ python3.2 -c "from sys import getsizeof as size; print(size
('abcœ…'*1000))"
10030
steve@runes:~$ python3.3 -c "from sys import getsizeof as size; print(size
('abcœ…'*1000))"
10038
As I said, there is a *tiny* overhead difference. But identifiers will
generally be smaller:
steve@runes:~$ python3.2 -c "from sys import getsizeof as size; print(size
(size.__name__))"
48
steve@runes:~$ python3.3 -c "from sys import getsizeof as size; print(size
(size.__name__))"
34
You can check the object overhead by looking at the size of the empty
string.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | wxjmfauth@gmail.com |
|---|---|
| Date | 2012-08-18 11:05 -0700 |
| Message-ID | <mailman.3467.1345313116.4697.python-list@python.org> |
| In reply to | #27314 |
Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit : > > Proof that is acceptable to everybody please, not just yourself. > > I cann't, I'm only facing the fact it works slower on my Windows platform. As I understand (I think) the undelying mechanism, I can only say, it is not a surprise that it happens. Imagine an editor, I type an "a", internally the text is saved as ascii, then I type en "é", the text can only be saved in at least latin-1. Then I enter an "€", the text become an internal ucs-4 "string". The remove the "€" and so on. Intuitively I expect there is some kind slow down between all these "strings" conversion. When I tested this flexible representation, a few months ago, at the first alpha release. This is precisely what, I tested. String manipulations which are forcing this internal change and I concluded the result is not brillant. Realy, a factor 0.n up to 10. This are simply my conclusions. Related question. Does any body know a way to get the size of the internal "string" in bytes? In the narrow or wide build it is easy, I can encode with the "unicode_internal" codec. In Py 3.3, I attempted to toy with sizeof and stuct, but without success. jmf
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-08-18 16:09 -0400 |
| Message-ID | <mailman.3472.1345320581.4697.python-list@python.org> |
| In reply to | #27310 |
On 8/18/2012 12:38 PM, wxjmfauth@gmail.com wrote:
> Sorry guys, I'm not stupid (I think). I can open IDLE with
> Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
> always slower. Period.
You have not tried enough tests ;-).
On my Win7-64 system:
from timeit import timeit
print(timeit(" 'a'*10000 "))
3.3.0b2: .5
3.2.3: .8
print(timeit("c in a", "c = '…'; a = 'a'*10000"))
3.3: .05 (independent of len(a)!)
3.2: 5.8 100 times slower! Increase len(a) and the ratio can be made as
high as one wants!
print(timeit("a.encode()", "a = 'a'*1000"))
3.2: 1.5
3.3: .26
Similar with encoding='utf-8' added to call.
Jim, please stop the ranting. It does not help improve Python. utf-32 is
not a panacea; it has problems of time, space, and system compatibility
(Windows and others). Victor Stinner, whatever he may have once thought
and said, put a *lot* of effort into making the new implementation both
correct and fast.
On your replace example
>>> imeit.timeit("('ab…' * 1000).replace('…', '……')")
> 61.919225272152346
>>> timeit.timeit("('ab…' * 10).replace('…', 'œ…')")
> 1.2918679017971044
I do not see the point of changing both length and replacement. For me,
the time is about the same for either replacement. I do see about the
same slowdown ratio for 3.3 versus 3.2 I also see it for pure search
without replacement.
print(timeit("c in a", "c = '…'; a = 'a'*1000+c"))
# .6 in 3.2.3, 1.2 in 3.3.0
This does not make sense to me and I will ask about it.
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
Page 3 of 8 — ← Prev page 1 2 [3] 4 5 6 7 8 Next page →
Back to top | Article view | comp.lang.python
csiph-web