Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #51402

Re: RE Module Performance

Newsgroups comp.lang.python
Date 2013-07-28 12:30 -0700
References (12 earlier) <mailman.5160.1374890711.3114.python-list@python.org> <f4bb2528-930e-4c0a-820e-66f00ac2b5b6@googlegroups.com> <mailman.5188.1374983652.3114.python-list@python.org> <4117e08f-941a-42d5-87b6-09e66f8c7b60@googlegroups.com> <mailman.5200.1375038295.3114.python-list@python.org>
Message-ID <95b91473-b707-4288-860c-d02fda7af1ea@googlegroups.com> (permalink)
Subject Re: RE Module Performance
From wxjmfauth@gmail.com

Show all headers | View raw


Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit :
> On 28/07/2013 19:13, wxjmfauth@gmail.com wrote:
> 
> > Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
> 
> >> On Sat, Jul 27, 2013 at 12:21 PM,  <wxjmfauth@gmail.com> wrote:
> 
> >>
> 
> >> > Back to utf. utfs are not only elements of a unique set of encoded
> 
> >>
> 
> >> > code points. They have an interesting feature. Each "utf chunk"
> 
> >>
> 
> >> > holds intrisically the character (in fact the code point) it is
> 
> >>
> 
> >> > supposed to represent. In utf-32, the obvious case, it is just
> 
> >>
> 
> >> > the code point. In utf-8, that's the first chunk which helps and
> 
> >>
> 
> >> > utf-16 is a mixed case (utf-8 / utf-32). In other words, in an
> 
> >>
> 
> >> > implementation using bytes, for any pointer position it is always
> 
> >>
> 
> >> > possible to find the corresponding encoded code point and from this
> 
> >>
> 
> >> > the corresponding character without any "programmed" information. See
> 
> >>
> 
> >> > my editor example, how to find the char under the caret? In fact,
> 
> >>
> 
> >> > a silly example, how can the caret can be positioned or moved, if
> 
> >>
> 
> >> > the underlying corresponding encoded code point can not be
> 
> >>
> 
> >> > dicerned!
> 
> >>
> 
> >>
> 
> >>
> 
> >> Yes, given a pointer location into a utf-8 or utf-16 string, it is
> 
> >>
> 
> >> easy to determine the identity of the code point at that location.
> 
> >>
> 
> >> But this is not often a useful operation, save for resynchronization
> 
> >>
> 
> >> in the case that the string data is corrupted.  The caret of an editor
> 
> >>
> 
> >> does not conceptually correspond to a pointer location, but to a
> 
> >>
> 
> >> character index.  Given a particular character index (e.g. 127504), an
> 
> >>
> 
> >> editor must be able to determine the identity and/or the memory
> 
> >>
> 
> >> location of the character at that index, and for UTF-8 and UTF-16
> 
> >>
> 
> >> without an auxiliary data structure that is a O(n) operation.
> 
> >>
> 
> >>
> 
> >>
> 
> >> > 2) Take a look at this. Get rid of the overhead.
> 
> >>
> 
> >> >
> 
> >>
> 
> >> >>>> sys.getsizeof('b'*1000000 + 'c')
> 
> >>
> 
> >> > 1000026
> 
> >>
> 
> >> >>>> sys.getsizeof('b'*1000000 + '€')
> 
> >>
> 
> >> > 2000040
> 
> >>
> 
> >> >
> 
> >>
> 
> >> > What does it mean? It means that Python has to
> 
> >>
> 
> >> > reencode a str every time it is necessary because
> 
> >>
> 
> >> > it works with multiple codings.
> 
> >>
> 
> >>
> 
> >>
> 
> >> Large strings in practical usage do not need to be resized like this
> 
> >>
> 
> >> often.  Python 3.3 has been in production use for months now, and you
> 
> >>
> 
> >> still have yet to produce any real-world application code that
> 
> >>
> 
> >> demonstrates a performance regression.  If there is no real-world
> 
> >>
> 
> >> regression, then there is no problem.
> 
> >>
> 
> >>
> 
> >>
> 
> >> > 3) Unicode compliance. We know retrospectively, latin-1,
> 
> >>
> 
> >> > is was a bad choice. Unusable for 17 European languages.
> 
> >>
> 
> >> > Believe of not. 20 years of Unicode of incubation is not
> 
> >>
> 
> >> > long enough to learn it. When discussing once with a French
> 
> >>
> 
> >> > Python core dev, one with commit access, he did not know one
> 
> >>
> 
> >> > can not use latin-1 for the French language!
> 
> >>
> 
> >>
> 
> >>
> 
> >> Probably because for many French strings, one can.  As far as I am
> 
> >>
> 
> >> aware, the only characters that are missing from Latin-1 are the Euro
> 
> >>
> 
> >> sign (an unfortunate victim of history), the ligature œ (I have no
> 
> >>
> 
> >> doubt that many users just type oe anyway), and the rare capital Ÿ
> 
> >>
> 
> >> (the miniscule version is present in Latin-1).  All French strings
> 
> >>
> 
> >> that are fortunate enough to be absent these characters can be
> 
> >>
> 
> >> represented in Latin-1 and so will have a 1-byte width in the FSR.
> 
> >
> 
> > ------
> 
> >
> 
> > latin-1? that's not even truth.
> 
> >
> 
> >>>> sys.getsizeof('a')
> 
> > 26
> 
> >>>> sys.getsizeof('ü')
> 
> > 38
> 
> >>>> sys.getsizeof('aa')
> 
> > 27
> 
> >>>> sys.getsizeof('aü')
> 
> > 39
> 
> >
> 
> 
> 
>  >>> sys.getsizeof('aa') - sys.getsizeof('a')
> 
> 1
> 
> 
> 
> One byte per codepoint.
> 
> 
> 
>  >>> sys.getsizeof('üü') - sys.getsizeof('ü')
> 
> 1
> 
> 
> 
> Also one byte per codepoint.
> 
> 
> 
>  >>> sys.getsizeof('ü') - sys.getsizeof('a')
> 
> 12
> 
> 
> 
> Clearly there's more going on here.
> 
> 
> 
> FSR is an optimisation. You'll always be able to find some
> 
> circumstances where an optimisation makes things worse, but what
> 
> matters is the overall result.


----

Yes, I know my examples are always wrong, never
real examples.

I can point long strings, I should point short strings.
I point a short string (char), it is not long enough.
Strings as dict keys, no the problem is in Python dict.
Performance? no that's a memory issue.
Memory? no, it's a question to keep perfomance.
I am using this char, no you should not, it's no common.
The nabla operator in TeX file, who is so stupid to use
that char?
Many time, I'm just mimicking 'BDFL' examples, just
by replacing "his" ascii chars by non ascii char ;-)
And so on.

To be short, this is *never* the FSR, always something
else.

Suggestion. Start by solving all these "micro-benchmarks".
all the memory cases. It a good start, no?


jmf

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-11 19:44 -0400
  Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-12 02:23 -0700
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:27 +1000
    Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 10:39 +0100
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:40 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 06:45 -0400
    Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 16:59 +0100
    Re: RE Module Performance Peter Otten <__peter__@web.de> - 2013-07-12 18:15 +0200
    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-13 02:21 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 13:58 -0400
      Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 05:37 +0000
        Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-14 11:17 -0700
          Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 06:06 -0400
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-15 12:36 +0000
              Dihedral Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 08:52 -0400
              Re: Dihedral Joel Goldstick <joel.goldstick@gmail.com> - 2013-07-15 09:03 -0400
              Re: Dihedral Wayne Werner <wayne@waynewerner.com> - 2013-07-15 17:43 -0500
              Re: Dihedral Fábio Santos <fabiosantosart@gmail.com> - 2013-07-15 23:54 +0100
              Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-16 08:59 +1000
              Re: Dihedral Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-16 16:06 +1000
              Re: Dihedral Stefan Behnel <stefan_ml@behnel.de> - 2013-07-24 20:08 +0200
              Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:23 +1000
              Re: Dihedral Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-24 20:15 -0400
    Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-13 08:16 +1000
    Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-12 17:13 -0600
      Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-24 06:40 -0700
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-24 23:48 +1000
        Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:17 -0400
        Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:19 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:34 +1000
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:02 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:39 +1000
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 08:47 -0600
          Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 02:27 -0700
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:14 +1000
              Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 12:07 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 05:18 +1000
                RE: RE Module Performance "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-07-25 19:30 +0000
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:06 -0600
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 09:00 -0600
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 05:56 +0000
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:56 +1000
        Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 13:52 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:15 +1000
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:15 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:58 +1000
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 09:22 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:07 +1000
        Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 18:09 -0400
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 08:19 +1000
        Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 16:59 -0600
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 09:24 +1000
        Re: RE Module Performance Serhiy Storchaka <storchaka@gmail.com> - 2013-07-25 08:49 +0300
        Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 15:58 +1000
        Re: RE Module Performance Jeremy Sanders <jeremy@jeremysanders.net> - 2013-07-25 14:36 +0100
          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 15:26 +0000
            Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 01:36 +1000
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 17:18 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 03:27 +1000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:45 -0500
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-26 02:48 +0000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 21:20 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:36 -0700
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 08:46 -0700
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 06:28 +0000
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 03:37 +0000
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-26 22:12 -0600
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 05:04 +0000
                Re: RE Module Performance Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-27 12:13 -0400
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:19 -0700
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:09 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:21 -0700
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-26 20:05 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-27 11:21 -0700
                Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-27 21:53 -0600
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 11:13 -0700
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:04 +0100
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:30 -0700
                Re: RE Module Performance Lele Gaifax <lele@metapensiero.it> - 2013-07-28 22:45 +0200
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 22:01 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 07:01 -0700
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 16:38 +0200
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 15:45 +0100
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 17:13 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 18:39 +0200
                Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 18:14 +0100
                Re: RE Module Performance Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-31 13:09 +1000
                Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-31 03:27 +1000
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-30 18:40 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 20:19 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 12:09 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 21:04 +0100
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:54 -0600
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-31 05:45 +0000
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 08:17 +0100
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 13:15 -0700
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 21:41 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:11 +0200
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 01:32 -0700
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:59 +0200
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:44 -0600
                Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-30 17:05 -0400
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:30 -0600
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 09:23 +0200
                Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:27 -0600
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 10:45 +0200
                FSR and unicode compliance - was Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-28 09:52 -0600
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:23 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:44 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 21:55 +0200
                Re: FSR and unicode compliance - was Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-28 20:52 +0000
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 04:43 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 12:57 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 05:56 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 07:20 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 15:49 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 09:31 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Heiko Wundram <modelnine@modelnine.org> - 2013-07-29 14:06 +0200
                Re: FSR and unicode compliance - was Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-29 08:43 -0400
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 18:03 +0100
                Re: FSR and unicode compliance - was Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 13:36 -0400
                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 06:36 -0700
                Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:03 +0100
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 19:19 +0100
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:29 +0100
                Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 15:06 -0400
                Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 23:14 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 20:51 +0200
                Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 00:07 +0100
                Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-26 22:38 +0200
        Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-25 09:44 -0400
        Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:53 -0500
    Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-13 00:16 +0100
    Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-14 05:34 +1000
    Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-16 06:30 -0400
      Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-18 13:17 -0700

csiph-web