Groups > comp.lang.python > #27204 > unrolled thread

How do I display unicode value stored in a string variable using ord()

Started by	Charles Jensen <hopefullycharles@gmail.com>
First post	2012-08-16 15:09 -0700
Last post	2012-08-20 17:20 -0400
Articles	20 on this page of 145 — 26 participants

Back to article view | Back to comp.lang.python

  How do I display unicode value stored in a string variable using ord() Charles Jensen <hopefullycharles@gmail.com> - 2012-08-16 15:09 -0700
    Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-17 08:20 +1000
    Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-16 18:47 -0400
    Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-16 19:59 -0400
      Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
        Re: How do I display unicode value stored in a string variable using ord() Jerry Hill <malaclypse2@gmail.com> - 2012-08-17 14:21 -0400
          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 11:45 -0700
            Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 16:55 -0400
            Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-17 23:30 -0400
              Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 04:10 +0000
                Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:18 -0600
            Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 03:59 +0000
      Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-17 10:49 -0700
    Re: How do I display unicode value stored in a string variable using ord() Alister <alister.ware@ntlworld.com> - 2012-08-17 06:30 +0000
    Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 01:09 -0700
      Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 12:27 +0000
        Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 08:07 -0700
          Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 16:25 +0100
          Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 01:36 +1000
          Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-18 09:51 -0600
            Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 02:57 +1000
              Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 18:28 +0100
                Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
                  Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:34 +0100
                    Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:35 +0000
                      New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Peter Otten <__peter__@web.de> - 2012-08-19 09:43 +0200
                        Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:56 +0000
                          Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 02:24 -0700
                          Re: New internal string format in 3.3 Peter Otten <__peter__@web.de> - 2012-08-19 11:37 +0200
                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
                              Re: New internal string format in 3.3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:33 +0000
                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 03:19 -0700
                              Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:26 +1000
                                Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
                                  Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:29 -0400
                                    Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
                                      Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 14:46 +0100
                                        Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
                                        Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 07:09 -0700
                                          Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 15:48 +0100
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 09:19 -0700
                                          Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:48 -0400
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
                                              Re: New internal string format in 3.3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:09 +0100
                                              Re: New internal string format in 3.3 Chris Angelico <rosuav@gmail.com> - 2012-08-20 07:50 +1000
                                              Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-19 23:38 -0600
                                                Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-20 09:17 -0400
                                                  Re: New internal string format in 3.3 Michael Torrie <torriem@gmail.com> - 2012-08-20 22:18 -0600
                                                    Re: New internal string format in 3.3 Roy Smith <roy@panix.com> - 2012-08-21 07:48 -0400
                                            Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 10:51 -0700
                                      Re: New internal string format in 3.3 Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:56 -0400
                                    Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:59 -0700
                                  Re: New internal string format in 3.3 Dave Angel <d@davea.name> - 2012-08-19 08:35 -0400
                                Re: New internal string format in 3.3 wxjmfauth@gmail.com - 2012-08-19 05:14 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:30 +0000
                Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:05 -0700
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 16:09 -0400
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-18 23:12 -0400
            Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 09:38 -0700
            Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:33 +0000
              Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 11:50 -0600
                Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 11:20 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:31 -0600
                    Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 12:23 -0700
                Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:16 +0000
              Re: How do I display unicode value stored in a string variable using ord() Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-19 12:46 -0600
          Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-18 17:59 +0000
            Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 11:30 -0700
              Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:45 +0100
              Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:13 +0000
            Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-18 11:40 -0700
              Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 20:50 +0100
              Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-18 13:22 -0700
                Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-18 22:37 +0100
        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 11:26 -0700
          Re: How do I display unicode value stored in a string variable using ord() MRAB <python@mrabarnett.plus.com> - 2012-08-18 19:59 +0100
            Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 07:17 +0000
          Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 10:46 +1000
            Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:11 -0700
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 12:19 +1000
                Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 19:35 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:01 +1000
                    Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 20:10 -0700
                      Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 13:31 +1000
                        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-18 22:58 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 08:01 +0000
                    Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:11 -0700
                      Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 18:24 +1000
                        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:44 -0700
                          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 01:54 -0700
                            Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 11:46 +0100
                            Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 12:31 -0400
                      Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 10:51 +0000
                        Re: How do I display unicode value stored in a string variable using ord() Neil Hodgson <nhodgson@iinet.net.au> - 2012-08-21 17:03 +1000
          Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 06:09 +0000
            Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 01:04 -0700
              Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 13:25 +0000
                Re: How do I display unicode value stored in a string variable using ord() DJC <djc@news.invalid> - 2012-08-19 17:32 +0200
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 13:34 -0400
                Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 10:48 -0700
                  Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 11:11 -0700
                    Re: How do I display unicode value stored in a string variable using ord() Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-19 19:50 +0100
                    Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:59 -0400
                    Re: How do I display unicode value stored in a string variable using ord() rusi <rustompmody@gmail.com> - 2012-08-19 23:13 -0700
                  Abuse of Big Oh notation [was Re: How do I display unicode value stored in a string variable using ord()] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:15 +0000
                    Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-19 16:42 -0700
                      Re: Abuse of Big Oh notation Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2012-08-20 09:24 +0100
                        Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 09:01 -0700
                          Re: Abuse of Big Oh notation Chris Angelico <rosuav@gmail.com> - 2012-08-21 02:09 +1000
                          Re: Abuse of Big Oh notation Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-20 11:12 -0600
                            Re: Abuse of Big Oh notation Paul Rubin <no.email@nospam.invalid> - 2012-08-20 12:29 -0700
                              Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:16 -0700
                              Re: Abuse of Big Oh notation 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-20 15:20 -0700
                            Re: Abuse of Big Oh notation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-21 09:53 +0000
                        Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
                          Re: Abuse of Big Oh notation Ned Deily <nad@acm.org> - 2012-08-20 18:19 -0700
                          Abuse of subject, was Re: Abuse of Big Oh notation Peter Otten <__peter__@web.de> - 2012-08-21 09:52 +0200
                            Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
                            Re: Abuse of subject, was Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-21 10:16 -0700
                        Re: Abuse of Big Oh notation wxjmfauth@gmail.com - 2012-08-20 11:42 -0700
                  Re: How do I display unicode value stored in a string variable using ord() Hans Mulder <hansmu@xs4all.nl> - 2012-08-22 20:53 +0200
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 08:42 +1000
                Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-19 19:24 -0400
                  Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 04:21 +0000
                    Re: How do I display unicode value stored in a string variable using ord() Roy Smith <roy@panix.com> - 2012-08-20 00:44 -0400
                      Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-20 05:56 +0000
                        Re: How do I display unicode value stored in a string variable using ord() Paul Rubin <no.email@nospam.invalid> - 2012-08-19 23:24 -0700
                    Re: How do I display unicode value stored in a string variable using ord() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-20 12:58 -0400
              Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 20:35 -0400
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-20 14:07 +1000
            Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:13 +0100
              Re: How do I display unicode value stored in a string variable using ord() Chris Angelico <rosuav@gmail.com> - 2012-08-19 20:19 +1000
                Re: How do I display unicode value stored in a string variable using ord() lipska the kat <lipskathekat@yahoo.co.uk> - 2012-08-19 11:49 +0100
        Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 18:03 +0100
          Re: How do I display unicode value stored in a string variable using ord() wxjmfauth@gmail.com - 2012-08-19 10:33 -0700
            Re: How do I display unicode value stored in a string variable using ord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:04 +0100
          Re: How do I display unicode value stored in a string variable using ord() Dave Angel <d@davea.name> - 2012-08-19 14:05 -0400
            Re: How do I display unicode value stored in a string variable usingord() "Blind Anagram" <noname@nowhere.com> - 2012-08-19 19:18 +0100
          Re: How do I display unicode value stored in a string variable using ord() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-19 20:31 +0000
          Re: How do I display unicode value stored in a string variable using ord() Terry Reedy <tjreedy@udel.edu> - 2012-08-19 17:03 -0400
          Re: How do I display unicode value stored in a string variable using ord() 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-19 17:32 -0700
          Re: How do I display unicode value stored in a string variable using ord() Piet van Oostrum <piet@vanoostrum.org> - 2012-08-20 17:20 -0400

Page 2 of 8 — ← Prev page 1 [2] 3 4 5 6 7 8 Next page →

#27304

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-08-18 09:51 -0600
Message-ID	<mailman.3457.1345305136.4697.python-list@python.org>
In reply to	#27296

On Sat, Aug 18, 2012 at 9:07 AM,  <wxjmfauth@gmail.com> wrote:
> Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
>> [...]
>> The problem with UCS-4 is that every character requires four bytes.
>> [...]
>
> I'm aware of this (and all the blah blah blah you are
> explaining). This always the same song. Memory.
>
> Let me ask. Is Python an 'american" product for us-users
> or is it a tool for everybody [*]?
> Is there any reason why non ascii users are somehow penalized
> compared to ascii users?

The change does not just benefit ASCII users.  It primarily benefits
anybody using a wide unicode build with strings mostly containing only
BMP characters.  Even for narrow build users, there is the benefit
that with approximately the same amount of memory usage in most cases,
they no longer have to worry about non-BMP characters sneaking in and
breaking their code.

There is some additional benefit for Latin-1 users, but this has
nothing to do with Python.  If Python is going to have the option of a
1-byte representation (and as long as we have the flexible
representation, I can see no reason not to), then it is going to be
Latin-1 by definition, because that's what 1-byte Unicode (UCS-1, if
you will) is.  If you have an issue with that, take it up with the
designers of Unicode.

>
> This flexible string representation is a regression (ascii users
> or not).
>
> I recognize in practice the real impact is for many users
> closed to zero (including me) but I have shown (I think) that
> this flexible representation is, by design, not as optimal
> as it is supposed to be. This is in my mind the relevant point.

You've shown nothing of the sort.  You've demonstrated only one out of
many possible benchmarks, and other users on this list can't even
reproduce that.

[toc] | [prev] | [next] | [standalone]

#27310

From	wxjmfauth@gmail.com
Date	2012-08-18 09:38 -0700
Message-ID	<4c62a649-bc21-4e47-9c0f-acb1b1e70e36@googlegroups.com>
In reply to	#27304

Sorry guys, I'm not stupid (I think). I can open IDLE with
Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
always slower. Period.

Now, the reason. I think it is due the "flexible represention".

Deeper reason. The "boss" do not wish to hear from a (pure)
ucs-4/utf-32 "engine" (this has been discussed I do not know
how many times).

jmf

[toc] | [prev] | [next] | [standalone]

#27312

From	Chris Angelico <rosuav@gmail.com>
Date	2012-08-19 02:57 +1000
Message-ID	<mailman.3460.1345309057.4697.python-list@python.org>
In reply to	#27310

On Sun, Aug 19, 2012 at 2:38 AM,  <wxjmfauth@gmail.com> wrote:
> Sorry guys, I'm not stupid (I think). I can open IDLE with
> Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
> always slower. Period.

Ah, but what about all those other operations that use strings under
the covers? As mentioned, namespace lookups do, among other things.
And how is performance in the (very real) case where a C routine wants
to return a value to Python as a string, where the data is currently
guaranteed to be ASCII (previously using PyUnicode_FromString, now
able to use PyUnicode_FromKindAndData)? Again, I'm sure this has been
gone into in great detail before the PEP was accepted (am I
negative-bikeshedding here? "atomic reactoring"???), and I'm sure that
the gains outweigh the costs.

ChrisA

[toc] | [prev] | [next] | [standalone]

#27314

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-08-18 18:28 +0100
Message-ID	<mailman.3462.1345310859.4697.python-list@python.org>
In reply to	#27310

On 18/08/2012 17:38, wxjmfauth@gmail.com wrote:
> Sorry guys, I'm not stupid (I think). I can open IDLE with
> Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
> always slower. Period.

Proof that is acceptable to everybody please, not just yourself.

>
> Now, the reason. I think it is due the "flexible represention".
>
> Deeper reason. The "boss" do not wish to hear from a (pure)
> ucs-4/utf-32 "engine" (this has been discussed I do not know
> how many times).
>
> jmf
>

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#27320

From	wxjmfauth@gmail.com
Date	2012-08-18 11:05 -0700
Message-ID	<f9beca36-3a12-41f2-bdc2-95b159c162d1@googlegroups.com>
In reply to	#27314

Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit :
> 
> Proof that is acceptable to everybody please, not just yourself.
> 
> 
I cann't, I'm only facing the fact it works slower on my
Windows platform.

As I understand (I think) the undelying mechanism, I
can only say, it is not a surprise that it happens.

Imagine an editor, I type an "a", internally the text is
saved as ascii, then I type en "é", the text can only
be saved in at least latin-1. Then I enter an "€", the text
become an internal ucs-4 "string". The remove the "€" and so
on.

Intuitively I expect there is some kind slow down between
all these "strings" conversion.

When I tested this flexible representation, a few months
ago, at the first alpha release. This is precisely what,
I tested. String manipulations which are forcing this internal
change and I concluded the result is not brillant. Realy,
a factor 0.n up to 10.

This are simply my conclusions.

Related question.

Does any body know a way to get the size of the internal
"string" in bytes? In the narrow or wide build it is easy,
I can encode with the "unicode_internal" codec. In Py 3.3, 
I attempted to toy with sizeof and stuct, but without
success.

jmf

[toc] | [prev] | [next] | [standalone]

#27324

From	MRAB <python@mrabarnett.plus.com>
Date	2012-08-18 19:34 +0100
Message-ID	<mailman.3468.1345314897.4697.python-list@python.org>
In reply to	#27320

On 18/08/2012 19:05, wxjmfauth@gmail.com wrote:
> Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit :
>>
>> Proof that is acceptable to everybody please, not just yourself.
>>
>>
> I cann't, I'm only facing the fact it works slower on my
> Windows platform.
>
> As I understand (I think) the undelying mechanism, I
> can only say, it is not a surprise that it happens.
>
> Imagine an editor, I type an "a", internally the text is
> saved as ascii, then I type en "é", the text can only
> be saved in at least latin-1. Then I enter an "€", the text
> become an internal ucs-4 "string". The remove the "€" and so
> on.
>
[snip]

"a" will be stored as 1 byte/codepoint.

Adding "é", it will still be stored as 1 byte/codepoint.

Adding "€", it will still be stored as 2 bytes/codepoint.

But then you wouldn't be adding them one at a time in Python, you'd be
building a list and then joining them together in one operation.

[toc] | [prev] | [next] | [standalone]

#27354

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-19 06:35 +0000
Message-ID	<5030891f$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#27324

On Sat, 18 Aug 2012 19:34:50 +0100, MRAB wrote:

> "a" will be stored as 1 byte/codepoint.
> 
> Adding "é", it will still be stored as 1 byte/codepoint.

Wrong. It will be 2 bytes, just like it already is in Python 3.2.

I don't know where people are getting this myth that PEP 393 uses Latin-1 
internally, it does not. Read the PEP, it explicitly states that 1-byte 
formats are only used for ASCII strings.

> Adding "€", it will still be stored as 2 bytes/codepoint.

That is correct.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#27357 — New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

From	Peter Otten <__peter__@web.de>
Date	2012-08-19 09:43 +0200
Subject	New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()
Message-ID	<mailman.3485.1345362201.4697.python-list@python.org>
In reply to	#27354

Steven D'Aprano wrote:

> On Sat, 18 Aug 2012 19:34:50 +0100, MRAB wrote:
> 
>> "a" will be stored as 1 byte/codepoint.
>> 
>> Adding "é", it will still be stored as 1 byte/codepoint.
> 
> Wrong. It will be 2 bytes, just like it already is in Python 3.2.
> 
> I don't know where people are getting this myth that PEP 393 uses Latin-1
> internally, it does not. Read the PEP, it explicitly states that 1-byte
> formats are only used for ASCII strings.

From

Python 3.3.0a4+ (default:10a8ad665749, Jun  9 2012, 08:57:51) 
[GCC 4.6.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> [sys.getsizeof("é"*i) for i in range(10)]
[49, 74, 75, 76, 77, 78, 79, 80, 81, 82]
>>> [sys.getsizeof("e"*i) for i in range(10)]
[49, 50, 51, 52, 53, 54, 55, 56, 57, 58]
>>> sys.getsizeof("é"*101)-sys.getsizeof("é")
100
>>> sys.getsizeof("e"*101)-sys.getsizeof("e")
100
>>> sys.getsizeof("€"*101)-sys.getsizeof("€")
200

I infer that 

(1) both ASCII and Latin1 strings require one byte per character.
(2) Latin1 strings have a constant overhead of 24 bytes (on a 64bit system) 
over ASCII-only.

[toc] | [prev] | [next] | [standalone]

#27366 — Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-19 08:56 +0000
Subject	Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()
Message-ID	<5030aa44$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#27357

On Sun, 19 Aug 2012 09:43:13 +0200, Peter Otten wrote:

> Steven D'Aprano wrote:

>> I don't know where people are getting this myth that PEP 393 uses
>> Latin-1 internally, it does not. Read the PEP, it explicitly states
>> that 1-byte formats are only used for ASCII strings.
> 
> From
> 
> Python 3.3.0a4+ (default:10a8ad665749, Jun  9 2012, 08:57:51) [GCC
> 4.6.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import sys
>>>> [sys.getsizeof("é"*i) for i in range(10)]
> [49, 74, 75, 76, 77, 78, 79, 80, 81, 82]

Interesting. Say, I don't suppose you're using a 64-bit build? Because 
that would explain why your sizes are so larger than mine:

py> [sys.getsizeof("é"*i) for i in range(10)]
[25, 38, 39, 40, 41, 42, 43, 44, 45, 46]

py> [sys.getsizeof("€"*i) for i in range(10)]
[25, 40, 42, 44, 46, 48, 50, 52, 54, 56]

py> c = chr(0xFFFF + 1)
py> [sys.getsizeof(c*i) for i in range(10)]
[25, 44, 48, 52, 56, 60, 64, 68, 72, 76]

On re-reading the PEP more closely, it looks like I did misunderstand the 
internal implementation, and strings which fit exactly in Latin-1 will 
also use 1 byte per character. There are three structures used:

PyASCIIObject
PyCompactUnicodeObject
PyUnicodeObject

and the third one comes in three variant forms, for 1-byte, 2-byte and 4-
byte data. So I stand corrected.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#27368 — Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

From	wxjmfauth@gmail.com
Date	2012-08-19 02:24 -0700
Subject	Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()
Message-ID	<729f511f-c6d7-46c8-adbd-94957d8f6ab6@googlegroups.com>
In reply to	#27366

Le dimanche 19 août 2012 10:56:36 UTC+2, Steven D'Aprano a écrit :
> 
> internal implementation, and strings which fit exactly in Latin-1 will 
> 

And this is the crucial point. latin-1 is an obsolete and non usable
coding scheme (esp. for european languages).

We fall on the point I mentionned above. Microsoft know this, ditto
for Apple, ditto for "TeX", ditto for the foundries.
Even, "ISO" has recognized its error and produced iso-8859-15.

The question? Why is it still used?

jmf

[toc] | [prev] | [next] | [standalone]

#27369 — Re: New internal string format in 3.3

From	Peter Otten <__peter__@web.de>
Date	2012-08-19 11:37 +0200
Subject	Re: New internal string format in 3.3
Message-ID	<mailman.3489.1345369039.4697.python-list@python.org>
In reply to	#27366

Steven D'Aprano wrote:

> On Sun, 19 Aug 2012 09:43:13 +0200, Peter Otten wrote:
> 
>> Steven D'Aprano wrote:
> 
>>> I don't know where people are getting this myth that PEP 393 uses
>>> Latin-1 internally, it does not. Read the PEP, it explicitly states
>>> that 1-byte formats are only used for ASCII strings.
>> 
>> From
>> 
>> Python 3.3.0a4+ (default:10a8ad665749, Jun  9 2012, 08:57:51) [GCC
>> 4.6.1] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import sys
>>>>> [sys.getsizeof("é"*i) for i in range(10)]
>> [49, 74, 75, 76, 77, 78, 79, 80, 81, 82]
> 
> Interesting. Say, I don't suppose you're using a 64-bit build? Because
> that would explain why your sizes are so larger than mine:
> 
> py> [sys.getsizeof("é"*i) for i in range(10)]
> [25, 38, 39, 40, 41, 42, 43, 44, 45, 46]
> 
> 
> py> [sys.getsizeof("€"*i) for i in range(10)]
> [25, 40, 42, 44, 46, 48, 50, 52, 54, 56]

Yes, I am using a 64-bit build. I thought that

>> (2) Latin1 strings have a constant overhead of 24 bytes (on a 64bit 
>> system) over ASCII-only.

would convey that. The corresponding data structure 

typedef struct {
  PyASCIIObject _base;
  Py_ssize_t utf8_length;
  char *utf8;
  Py_ssize_t wstr_length;
} PyCompactUnicodeObject;

makes for 12 extra bytes on 32 bit, and both Py_ssize_t and pointers double 
in size (from 4 to 8 bytes) on 64 bit. I'm sure you can do the maths for the 
embedded PyASCIIObject yourself.

[toc] | [prev] | [next] | [standalone]

#27373 — Re: New internal string format in 3.3

From	wxjmfauth@gmail.com
Date	2012-08-19 03:19 -0700
Subject	Re: New internal string format in 3.3
Message-ID	<mailman.3491.1345371566.4697.python-list@python.org>
In reply to	#27369

Le dimanche 19 août 2012 11:37:09 UTC+2, Peter Otten a écrit :


You know, the techincal aspect is one thing. Understanding
the coding of the characters as a whole is something
else. The important point is not the coding per se, the
relevant point is the set of characters a coding may
represent.

You can build the most sophisticated mechanism you which,
if it does not take that point into account, it will
always fail or be not optimal.

This is precicely the weak point of this flexible
representation. It uses latin-1 and latin-1 is for
most users simply unusable.

Fascinating, isn't it? Devs are developing sophisticed
tools based on a non working basis.

jmf

[toc] | [prev] | [next] | [standalone]

#27390 — Re: New internal string format in 3.3

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-19 13:33 +0000
Subject	Re: New internal string format in 3.3
Message-ID	<5030eb10$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#27373

On Sun, 19 Aug 2012 03:19:23 -0700, wxjmfauth wrote:

> This is precicely the weak point of this flexible representation. It
> uses latin-1 and latin-1 is for most users simply unusable.

That's very funny.

Are you aware that your post is entirely Latin-1?

> Fascinating, isn't it? Devs are developing sophisticed tools based on a
> non working basis.

At the end of the day, PEP 393 fixes some major design limitations of the 
Unicode implementation in the "narrow build" Python, while saving memory 
for people using the "wide build". Everybody wins here. Your objection 
appears to be based on some sort of philosophical objection to Latin-1 
than on any genuine problem.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#27374 — Re: New internal string format in 3.3

From	wxjmfauth@gmail.com
Date	2012-08-19 03:19 -0700
Subject	Re: New internal string format in 3.3
Message-ID	<11931ec9-1858-4ae8-8a61-1d154d105229@googlegroups.com>
In reply to	#27369

Le dimanche 19 août 2012 11:37:09 UTC+2, Peter Otten a écrit :


You know, the techincal aspect is one thing. Understanding
the coding of the characters as a whole is something
else. The important point is not the coding per se, the
relevant point is the set of characters a coding may
represent.

You can build the most sophisticated mechanism you which,
if it does not take that point into account, it will
always fail or be not optimal.

This is precicely the weak point of this flexible
representation. It uses latin-1 and latin-1 is for
most users simply unusable.

Fascinating, isn't it? Devs are developing sophisticed
tools based on a non working basis.

jmf

[toc] | [prev] | [next] | [standalone]

#27375 — Re: New internal string format in 3.3

From	Chris Angelico <rosuav@gmail.com>
Date	2012-08-19 20:26 +1000
Subject	Re: New internal string format in 3.3
Message-ID	<mailman.3492.1345372006.4697.python-list@python.org>
In reply to	#27374

On Sun, Aug 19, 2012 at 8:19 PM,  <wxjmfauth@gmail.com> wrote:
> This is precicely the weak point of this flexible
> representation. It uses latin-1 and latin-1 is for
> most users simply unusable.

No, it uses Unicode, and as an optimization, attempts to store the
codepoints in less than four bytes for most strings. The fact that a
one-byte storage format happens to look like latin-1 is rather
coincidental.

ChrisA

[toc] | [prev] | [next] | [standalone]

#27382 — Re: New internal string format in 3.3

From	wxjmfauth@gmail.com
Date	2012-08-19 05:14 -0700
Subject	Re: New internal string format in 3.3
Message-ID	<73c85f3b-a4a9-4812-bc41-132b5126874c@googlegroups.com>
In reply to	#27375

Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit :
> On Sun, Aug 19, 2012 at 8:19 PM,  <wxjmfauth@gmail.com> wrote:
> 
> > This is precicely the weak point of this flexible
> 
> > representation. It uses latin-1 and latin-1 is for
> 
> > most users simply unusable.
> 
> 
> 
> No, it uses Unicode, and as an optimization, attempts to store the
> 
> codepoints in less than four bytes for most strings. The fact that a
> 
> one-byte storage format happens to look like latin-1 is rather
> 
> coincidental.
> 

And this this is the common basic mistake. You do not push your
argumentation far enough. A character may "fall" accidentally in a latin-1.
The problem lies in these european characters, which can not fall in this
coding. This *is* the cause of the negative side effects.
If you are using a correct coding scheme, like cp1252, mac-roman or
iso-8859-15, you will never see such a negative side effect.
Again, the problem is not the result, the encoded character. The critical
part is the character which may cause this side effect.
You should think "character set" and not encoded "code point", considering
this kind of expression has a sense in 8-bits coding scheme.

jmf

[toc] | [prev] | [next] | [standalone]

#27384 — Re: New internal string format in 3.3

From	Dave Angel <d@davea.name>
Date	2012-08-19 08:29 -0400
Subject	Re: New internal string format in 3.3
Message-ID	<mailman.3497.1345379387.4697.python-list@python.org>
In reply to	#27382

On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote:
> Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit :
>> On Sun, Aug 19, 2012 at 8:19 PM,  <wxjmfauth@gmail.com> wrote:
>>
>>> This is precicely the weak point of this flexible
>>> representation. It uses latin-1 and latin-1 is for
>>> most users simply unusable.
>>
>>
>> No, it uses Unicode, and as an optimization, attempts to store the
>>
>> codepoints in less than four bytes for most strings. The fact that a
>>
>> one-byte storage format happens to look like latin-1 is rather
>>
>> coincidental.
>>
> And this this is the common basic mistake. You do not push your
> argumentation far enough. A character may "fall" accidentally in a latin-1.
> The problem lies in these european characters, which can not fall in this
> coding. This *is* the cause of the negative side effects.
> If you are using a correct coding scheme, like cp1252, mac-roman or
> iso-8859-15, you will never see such a negative side effect.
> Again, the problem is not the result, the encoded character. The critical
> part is the character which may cause this side effect.
> You should think "character set" and not encoded "code point", considering
> this kind of expression has a sense in 8-bits coding scheme.
>
> jmf

But that choice was made decades ago when Unicode picked its second 128
characters.  The internal form used in this PEP is simply the low-order
byte of the Unicode code point.  Trying to scan the string deciding if
converting to cp1252 (for example) would be a much more expensive
operation than seeing how many bytes it'd take for the largest code point.





-- 

DaveA

[toc] | [prev] | [next] | [standalone]

#27387 — Re: New internal string format in 3.3

From	wxjmfauth@gmail.com
Date	2012-08-19 05:59 -0700
Subject	Re: New internal string format in 3.3
Message-ID	<1f22cebc-71aa-4881-bac5-d97d72fe2633@googlegroups.com>
In reply to	#27384

Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit :
> On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote:
> 
> > Le dimanche 19 aoï¿½t 2012 12:26:44 UTC+2, Chris Angelico a ï¿½crit :
> 
> >> On Sun, Aug 19, 2012 at 8:19 PM,  <wxjmfauth@gmail.com> wrote:
> 
> >>
> 
> >>> This is precicely the weak point of this flexible
> 
> >>> representation. It uses latin-1 and latin-1 is for
> 
> >>> most users simply unusable.
> 
> >>
> 
> >>
> 
> >> No, it uses Unicode, and as an optimization, attempts to store the
> 
> >>
> 
> >> codepoints in less than four bytes for most strings. The fact that a
> 
> >>
> 
> >> one-byte storage format happens to look like latin-1 is rather
> 
> >>
> 
> >> coincidental.
> 
> >>
> 
> > And this this is the common basic mistake. You do not push your
> 
> > argumentation far enough. A character may "fall" accidentally in a latin-1.
> 
> > The problem lies in these european characters, which can not fall in this
> 
> > coding. This *is* the cause of the negative side effects.
> 
> > If you are using a correct coding scheme, like cp1252, mac-roman or
> 
> > iso-8859-15, you will never see such a negative side effect.
> 
> > Again, the problem is not the result, the encoded character. The critical
> 
> > part is the character which may cause this side effect.
> 
> > You should think "character set" and not encoded "code point", considering
> 
> > this kind of expression has a sense in 8-bits coding scheme.
> 
> >
> 
> > jmf
> 
> 
> 
> But that choice was made decades ago when Unicode picked its second 128
> 
> characters.  The internal form used in this PEP is simply the low-order
> 
> byte of the Unicode code point.  Trying to scan the string deciding if
> 
> converting to cp1252 (for example) would be a much more expensive
> 
> operation than seeing how many bytes it'd take for the largest code point.
> 
> 

You are absoletely right. (I'm quite comfortable with Unicode).
If Python wish to perpetuate this, lets call it, design mistake
or ennoyement, it will continue to live with problems.

People (tools) who chose pure utf-16 or utf-32 are not suffering
from this issue.

*My* final comment on this thread.

In August 2012, after 20 years of development, Python is not
able to display a piece of text correctly on a Windows console
(eg cp65001).

I downloaded the go language, zero experience, I did not succeed
to display incorrecly a piece of text. (This is by the way *the*
reason why I tested it). Where the problems are coming from, I have
no idea.

I find this situation quite comic. Python is able to
produce this:

>>> (1.1).hex()
'0x1.199999999999ap+0'

but it is not able to display a piece of text!

Try to convince end users IEEE 754 is more important than the
ability to read/wirite a piece a text, a 6-years kid has learned
at school :-)

(I'm not suffering from this kind of effect, as a Windows user,
I'm always working via gui, it still remains, the problem exists.

Regards,
jmf

[toc] | [prev] | [next] | [standalone]

#27391 — Re: New internal string format in 3.3

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-08-19 14:46 +0100
Subject	Re: New internal string format in 3.3
Message-ID	<mailman.3501.1345383872.4697.python-list@python.org>
In reply to	#27387

On 19/08/2012 13:59, wxjmfauth@gmail.com wrote:
> Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit :
>> On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote:
>>
>>> Le dimanche 19 aoï¿½t 2012 12:26:44 UTC+2, Chris Angelico a ï¿½crit :
>>
>>>> On Sun, Aug 19, 2012 at 8:19 PM,  <wxjmfauth@gmail.com> wrote:
>>
>>>>
>>
>>>>> This is precicely the weak point of this flexible
>>
>>>>> representation. It uses latin-1 and latin-1 is for
>>
>>>>> most users simply unusable.
>>
>>>>
>>
>>>>
>>
>>>> No, it uses Unicode, and as an optimization, attempts to store the
>>
>>>>
>>
>>>> codepoints in less than four bytes for most strings. The fact that a
>>
>>>>
>>
>>>> one-byte storage format happens to look like latin-1 is rather
>>
>>>>
>>
>>>> coincidental.
>>
>>>>
>>
>>> And this this is the common basic mistake. You do not push your
>>
>>> argumentation far enough. A character may "fall" accidentally in a latin-1.
>>
>>> The problem lies in these european characters, which can not fall in this
>>
>>> coding. This *is* the cause of the negative side effects.
>>
>>> If you are using a correct coding scheme, like cp1252, mac-roman or
>>
>>> iso-8859-15, you will never see such a negative side effect.
>>
>>> Again, the problem is not the result, the encoded character. The critical
>>
>>> part is the character which may cause this side effect.
>>
>>> You should think "character set" and not encoded "code point", considering
>>
>>> this kind of expression has a sense in 8-bits coding scheme.
>>
>>>
>>
>>> jmf
>>
>>
>>
>> But that choice was made decades ago when Unicode picked its second 128
>>
>> characters.  The internal form used in this PEP is simply the low-order
>>
>> byte of the Unicode code point.  Trying to scan the string deciding if
>>
>> converting to cp1252 (for example) would be a much more expensive
>>
>> operation than seeing how many bytes it'd take for the largest code point.
>>
>>
>
> You are absoletely right. (I'm quite comfortable with Unicode).
> If Python wish to perpetuate this, lets call it, design mistake
> or ennoyement, it will continue to live with problems.

Please give a precise description of the design mistake and what you 
would do to correct it.

>
> People (tools) who chose pure utf-16 or utf-32 are not suffering
> from this issue.
>
> *My* final comment on this thread.
>
> In August 2012, after 20 years of development, Python is not
> able to display a piece of text correctly on a Windows console
> (eg cp65001).

Examples please.

>
> I downloaded the go language, zero experience, I did not succeed
> to display incorrecly a piece of text. (This is by the way *the*
> reason why I tested it). Where the problems are coming from, I have
> no idea.
>
> I find this situation quite comic. Python is able to
> produce this:
>
>>>> (1.1).hex()
> '0x1.199999999999ap+0'
>
> but it is not able to display a piece of text!

So you keep saying, but when asked for examples or evidence nothing gets 
produced.

>
> Try to convince end users IEEE 754 is more important than the
> ability to read/wirite a piece a text, a 6-years kid has learned
> at school :-)
>
> (I'm not suffering from this kind of effect, as a Windows user,
> I'm always working via gui, it still remains, the problem exists.

Windows is a law unto itself.  Its problems are hardly specific to Python.

>
> Regards,
> jmf
>

Now two or three times you've said you're going but have come back.  If 
you come again could you please provide examples and or evidence of what 
you're on about, because you still have me baffled.

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#27392 — Re: New internal string format in 3.3

From	wxjmfauth@gmail.com
Date	2012-08-19 07:09 -0700
Subject	Re: New internal string format in 3.3
Message-ID	<mailman.3502.1345385358.4697.python-list@python.org>
In reply to	#27391

Le dimanche 19 août 2012 15:46:34 UTC+2, Mark Lawrence a écrit :
> On 19/08/2012 13:59, wxjmfauth@gmail.com wrote:
> 
> > Le dimanche 19 aoï¿½t 2012 14:29:17 UTC+2, Dave Angel a ï¿½crit :
> 
> >> On 08/19/2012 08:14 AM, wxjmfauth@gmail.com wrote:
> 
> >>
> 
> >>> Le dimanche 19 aoï¿½t 2012 12:26:44 UTC+2, Chris Angelico a ï¿½crit :
> 
> >>
> 
> >>>> On Sun, Aug 19, 2012 at 8:19 PM,  <wxjmfauth@gmail.com> wrote:
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>>>> This is precicely the weak point of this flexible
> 
> >>
> 
> >>>>> representation. It uses latin-1 and latin-1 is for
> 
> >>
> 
> >>>>> most users simply unusable.
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>>> No, it uses Unicode, and as an optimization, attempts to store the
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>>> codepoints in less than four bytes for most strings. The fact that a
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>>> one-byte storage format happens to look like latin-1 is rather
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>>> coincidental.
> 
> >>
> 
> >>>>
> 
> >>
> 
> >>> And this this is the common basic mistake. You do not push your
> 
> >>
> 
> >>> argumentation far enough. A character may "fall" accidentally in a latin-1.
> 
> >>
> 
> >>> The problem lies in these european characters, which can not fall in this
> 
> >>
> 
> >>> coding. This *is* the cause of the negative side effects.
> 
> >>
> 
> >>> If you are using a correct coding scheme, like cp1252, mac-roman or
> 
> >>
> 
> >>> iso-8859-15, you will never see such a negative side effect.
> 
> >>
> 
> >>> Again, the problem is not the result, the encoded character. The critical
> 
> >>
> 
> >>> part is the character which may cause this side effect.
> 
> >>
> 
> >>> You should think "character set" and not encoded "code point", considering
> 
> >>
> 
> >>> this kind of expression has a sense in 8-bits coding scheme.
> 
> >>
> 
> >>>
> 
> >>
> 
> >>> jmf
> 
> >>
> 
> >>
> 
> >>
> 
> >> But that choice was made decades ago when Unicode picked its second 128
> 
> >>
> 
> >> characters.  The internal form used in this PEP is simply the low-order
> 
> >>
> 
> >> byte of the Unicode code point.  Trying to scan the string deciding if
> 
> >>
> 
> >> converting to cp1252 (for example) would be a much more expensive
> 
> >>
> 
> >> operation than seeing how many bytes it'd take for the largest code point.
> 
> >>
> 
> >>
> 
> >
> 
> > You are absoletely right. (I'm quite comfortable with Unicode).
> 
> > If Python wish to perpetuate this, lets call it, design mistake
> 
> > or ennoyement, it will continue to live with problems.
> 
> 
> 
> Please give a precise description of the design mistake and what you 
> 
> would do to correct it.
> 
> 
> 
> >
> 
> > People (tools) who chose pure utf-16 or utf-32 are not suffering
> 
> > from this issue.
> 
> >
> 
> > *My* final comment on this thread.
> 
> >
> 
> > In August 2012, after 20 years of development, Python is not
> 
> > able to display a piece of text correctly on a Windows console
> 
> > (eg cp65001).
> 
> 
> 
> Examples please.
> 
> 
> 
> >
> 
> > I downloaded the go language, zero experience, I did not succeed
> 
> > to display incorrecly a piece of text. (This is by the way *the*
> 
> > reason why I tested it). Where the problems are coming from, I have
> 
> > no idea.
> 
> >
> 
> > I find this situation quite comic. Python is able to
> 
> > produce this:
> 
> >
> 
> >>>> (1.1).hex()
> 
> > '0x1.199999999999ap+0'
> 
> >
> 
> > but it is not able to display a piece of text!
> 
> 
> 
> So you keep saying, but when asked for examples or evidence nothing gets 
> 
> produced.
> 
> 
> 
> >
> 
> > Try to convince end users IEEE 754 is more important than the
> 
> > ability to read/wirite a piece a text, a 6-years kid has learned
> 
> > at school :-)
> 
> >
> 
> > (I'm not suffering from this kind of effect, as a Windows user,
> 
> > I'm always working via gui, it still remains, the problem exists.
> 
> 
> 
> Windows is a law unto itself.  Its problems are hardly specific to Python.
> 
> 
> 
> >
> 
> > Regards,
> 
> > jmf
> 
> >
> 
> 
> 
> Now two or three times you've said you're going but have come back.  If 
> 
> you come again could you please provide examples and or evidence of what 
> 
> you're on about, because you still have me baffled.
> 
> 
> 
> -- 
> 
> Cheers.
> 
> 
> 
> Mark Lawrence.

Yesterday, I went to bed.
More seriously.

I can not give you more numbers than those I gave.
As a end user, I noticed and experimented my random tests
are always slower in Py3.3 than in Py3.2 on my Windows platform.

It is up to you, the core developers to give an explanation
about this behaviour.

As I understand a little bit the coding of the characters,
I pointed out, this is most probably due to this flexible
string representation (with arguments appearing randomly
in the misc. messages, mainly latin-1).

I can not do more.

(I stupidly spoke about factors 0.1 to ..., you should
read of course, 1.1,  to ...)

jmf

[toc] | [prev] | [next] | [standalone]

Page 2 of 8 — ← Prev page 1 [2] 3 4 5 6 7 8 Next page →

csiph-web

How do I display unicode value stored in a string variable using ord()

Contents

#27304

#27310

#27312

#27314

#27320

#27324

#27354

#27357 — New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

#27366 — Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

#27368 — Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

#27369 — Re: New internal string format in 3.3

#27373 — Re: New internal string format in 3.3

#27390 — Re: New internal string format in 3.3

#27374 — Re: New internal string format in 3.3

#27375 — Re: New internal string format in 3.3

#27382 — Re: New internal string format in 3.3

#27384 — Re: New internal string format in 3.3

#27387 — Re: New internal string format in 3.3

#27391 — Re: New internal string format in 3.3

#27392 — Re: New internal string format in 3.3