Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #51170

Re: RE Module Performance

From Terry Reedy <tjreedy@udel.edu>
Subject Re: RE Module Performance
Date 2013-07-24 18:09 -0400
References (7 earlier) <CA+vVgJVxAHEk_Z4uzZFNYtmEFXmyyoo0Jg_rBeqHSDBuF0E1Ug@mail.gmail.com> <CAPTjJmqBs0PfQ5ahfURtSyCimPnFU-1HpFndO8h6T+Vbkj10Bg@mail.gmail.com> <51EFEC17.90303@gmail.com> <ksp493$m5g$1@ger.gmane.org> <CAPTjJmoCsg0GX0ntGowbzm-gODCA8ttnf3QFirbsWa+mu0PCpw@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.5067.1374703769.3114.python-list@python.org> (permalink)

Show all headers | View raw


On 7/24/2013 2:15 PM, Chris Angelico wrote:
> On Thu, Jul 25, 2013 at 3:52 AM, Terry Reedy <tjreedy@udel.edu> wrote:

>> For my purpose, the mock Text works the same in 2.7 and 3.3+.
>
> Thanks for that report! And yes, it's going to behave exactly the same
> way, because its underlying structure is an ordered list of ordered
> lists of Unicode codepoints, ergo 3.3/PEP 393 is merely a question of
> performance. But if you put your code onto a narrow build, you'll have
> issues as seen below.

I carefully said 'For my purpose', which is to replace the tk Text 
widget. Up to 8.5, Tk's text is something like Python's narrow-build 
unicode.

If put astral chars into the toy editor, then yes, it would not work on 
narrow builds, but would on 3.3+.

  ...

 > If nobody had ever thought of doing a multi-format string
> representation, I could well imagine the Python core devs debating
> whether the cost of UTF-32 strings is worth the correctness and
> consistency improvements... and most likely concluding that narrow
> builds get abolished. And if any other language (eg ECMAScript)
> decides to move from UTF-16 to UTF-32, I would wholeheartedly support
> the move, even if it broke code to do so.

Making a UTF-16 implementation correct requires converting abstract 
'character' array indexes to concrete double byte array indexes. The 
simple O(n) method of scanning the string from the beginning for each 
index operation is too slow. When PEP393 was being discussed, I devised 
a much faster way to do the conversion.

The key idea is to add an auxiliary array of the abstract indexes of the 
astral chars in the abstract array. This is easily created when the 
string is created and can be done afterward with one linear scan (which 
is how I experimented with Python code). The length of that array is the 
number of surrogate pairs in the concrete 16-bit codepoint array. 
Subtracting that number from the length of the concrete array gives the 
length of the abstract array.

Given a target index of a character in the abstract array, use the 
auxiliary array to determine k, the number of astral characters that 
precede the target character. That can be done with either a O(k) linear 
scan or O(log k) binary search. Add 2 * k to the abstract index to get 
the corresponding index in the concrete array. When slicing a string 
with i0 and i1, slice the auxiliary array with k0 and k1 and adjusting 
the contained indexes downward to get the corresponding auxiliary array.

> To my mind, exposing UTF-16 surrogates to the application is a bug
 > to be fixed, not a feature to be maintained.

It is definitely not a feature, but a proper UTF-16 implementation would 
not expose them except to codecs, just as with the PEP 393 
implementation. (In both cases, I am excluding the sys size function as 
'exposing to the application'.)

 > But since we can get the best of both worlds with only
> a small amount of overhead, I really don't see why anyone should be
> objecting.

I presume you are referring to the PEP 393 1-2-4 byte implementation. 
Given how well it has been optimized, I think it was the right choice 
for Python. But a language that now uses USC2 or defective UTF-16 on all 
platforms might find the auxiliary array an easier fix.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-11 19:44 -0400
  Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-12 02:23 -0700
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:27 +1000
    Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 10:39 +0100
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:40 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 06:45 -0400
    Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 16:59 +0100
    Re: RE Module Performance Peter Otten <__peter__@web.de> - 2013-07-12 18:15 +0200
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-13 02:21 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 13:58 -0400
      Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 05:37 +0000
        Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-14 11:17 -0700
          Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 06:06 -0400
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-15 12:36 +0000
              Dihedral Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 08:52 -0400
              Re: Dihedral Joel Goldstick <joel.goldstick@gmail.com> - 2013-07-15 09:03 -0400
              Re: Dihedral Wayne Werner <wayne@waynewerner.com> - 2013-07-15 17:43 -0500
              Re: Dihedral Fábio Santos <fabiosantosart@gmail.com> - 2013-07-15 23:54 +0100
              Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-16 08:59 +1000
              Re: Dihedral Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-16 16:06 +1000
              Re: Dihedral Stefan Behnel <stefan_ml@behnel.de> - 2013-07-24 20:08 +0200
              Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:23 +1000
              Re: Dihedral Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-24 20:15 -0400
    Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-13 08:16 +1000
    Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-12 17:13 -0600
      Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-24 06:40 -0700
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-24 23:48 +1000
        Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:17 -0400
        Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:19 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:34 +1000
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:02 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:39 +1000
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 08:47 -0600
          Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 02:27 -0700
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:14 +1000
              Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 12:07 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 05:18 +1000
                RE: RE Module Performance "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-07-25 19:30 +0000
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:06 -0600
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 09:00 -0600
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 05:56 +0000
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:56 +1000
        Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 13:52 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:15 +1000
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:15 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:58 +1000
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 09:22 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:07 +1000
        Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 18:09 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 08:19 +1000
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 16:59 -0600
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 09:24 +1000
        Re: RE Module Performance Serhiy Storchaka <storchaka@gmail.com> - 2013-07-25 08:49 +0300
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 15:58 +1000
        Re: RE Module Performance Jeremy Sanders <jeremy@jeremysanders.net> - 2013-07-25 14:36 +0100
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 15:26 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 01:36 +1000
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 17:18 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 03:27 +1000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:45 -0500
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-26 02:48 +0000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 21:20 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:36 -0700
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 08:46 -0700
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 06:28 +0000
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 03:37 +0000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-26 22:12 -0600
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 05:04 +0000
                Re: RE Module Performance Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-27 12:13 -0400
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:19 -0700
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:09 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:21 -0700
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-26 20:05 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-27 11:21 -0700
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-27 21:53 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 11:13 -0700
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:04 +0100
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:30 -0700
                Re: RE Module Performance Lele Gaifax <lele@metapensiero.it> - 2013-07-28 22:45 +0200
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 22:01 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 07:01 -0700
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 16:38 +0200
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 15:45 +0100
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 17:13 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 18:39 +0200
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 18:14 +0100
                Re: RE Module Performance Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-31 13:09 +1000
                Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-31 03:27 +1000
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-30 18:40 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 20:19 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 12:09 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 21:04 +0100
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:54 -0600
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-31 05:45 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 08:17 +0100
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 13:15 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 21:41 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:11 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 01:32 -0700
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:59 +0200
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:44 -0600
                Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-30 17:05 -0400
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:30 -0600
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 09:23 +0200
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:27 -0600
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 10:45 +0200
                FSR and unicode compliance - was Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-28 09:52 -0600
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:23 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:44 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 21:55 +0200
                Re: FSR and unicode compliance - was Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-28 20:52 +0000
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 04:43 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 12:57 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 05:56 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 07:20 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 15:49 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 09:31 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Heiko Wundram <modelnine@modelnine.org> - 2013-07-29 14:06 +0200
                Re: FSR and unicode compliance - was Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-29 08:43 -0400
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 18:03 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 13:36 -0400
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 06:36 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:03 +0100
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 19:19 +0100
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:29 +0100
                Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 15:06 -0400
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 23:14 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 20:51 +0200
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 00:07 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-26 22:38 +0200
        Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-25 09:44 -0400
        Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:53 -0500
    Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-13 00:16 +0100
    Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-14 05:34 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-16 06:30 -0400
      Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-18 13:17 -0700

csiph-web