Groups > comp.lang.python > #50503 > unrolled thread

RE Module Performance

Started by	Devyn Collier Johnson <devyncjohnson@gmail.com>
First post	2013-07-11 19:44 -0400
Last post	2013-07-18 13:17 -0700
Articles	16 on this page of 136 — 25 participants

Back to article view | Back to comp.lang.python

  RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-11 19:44 -0400
    Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-12 02:23 -0700
      Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:27 +1000
      Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 10:39 +0100
      Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-12 19:40 +1000
      Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 06:45 -0400
      Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-12 16:59 +0100
      Re: RE Module Performance Peter Otten <__peter__@web.de> - 2013-07-12 18:15 +0200
      Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-13 02:21 +1000
      Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-12 13:58 -0400
        Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 05:37 +0000
          Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-14 11:17 -0700
            Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 06:06 -0400
              Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-15 12:36 +0000
                Dihedral Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-15 08:52 -0400
                Re: Dihedral Joel Goldstick <joel.goldstick@gmail.com> - 2013-07-15 09:03 -0400
                Re: Dihedral Wayne Werner <wayne@waynewerner.com> - 2013-07-15 17:43 -0500
                Re: Dihedral Fábio Santos <fabiosantosart@gmail.com> - 2013-07-15 23:54 +0100
                Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-16 08:59 +1000
                Re: Dihedral Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-16 16:06 +1000
                Re: Dihedral Stefan Behnel <stefan_ml@behnel.de> - 2013-07-24 20:08 +0200
                Re: Dihedral Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:23 +1000
                Re: Dihedral Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-24 20:15 -0400
      Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-13 08:16 +1000
      Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-12 17:13 -0600
        Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-24 06:40 -0700
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-24 23:48 +1000
          Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:17 -0400
          Re: RE Module Performance David Hutto <dwightdhutto@gmail.com> - 2013-07-24 10:19 -0400
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:34 +1000
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:02 +0000
              Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:39 +1000
          Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 08:47 -0600
            Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 02:27 -0700
              Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:14 +1000
                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-25 12:07 -0700
                  Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 05:18 +1000
                  RE: RE Module Performance "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-07-25 19:30 +0000
                  Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:06 -0600
          Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 09:00 -0600
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 05:56 +0000
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 00:56 +1000
          Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 13:52 -0400
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 04:15 +1000
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 07:15 +0000
              Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 17:58 +1000
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 09:22 +0000
                  Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 20:07 +1000
          Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-24 18:09 -0400
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 08:19 +1000
          Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-24 16:59 -0600
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 09:24 +1000
          Re: RE Module Performance Serhiy Storchaka <storchaka@gmail.com> - 2013-07-25 08:49 +0300
          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-25 15:58 +1000
          Re: RE Module Performance Jeremy Sanders <jeremy@jeremysanders.net> - 2013-07-25 14:36 +0100
            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 15:26 +0000
              Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 01:36 +1000
                Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-25 17:18 +0000
                  Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-26 03:27 +1000
                  Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:45 -0500
                    Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-26 02:48 +0000
                      Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 21:20 -0600
                        Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:36 -0700
                        Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 08:46 -0700
                          Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 06:28 +0000
                        Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 03:37 +0000
                          Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-26 22:12 -0600
                            Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-27 05:04 +0000
                          Re: RE Module Performance Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-07-27 12:13 -0400
                    Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:19 -0700
                  Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-25 21:09 -0600
                    Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-26 06:21 -0700
                      Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-26 20:05 -0600
                        Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-27 11:21 -0700
                          Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-27 21:53 -0600
                            Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 11:13 -0700
                              Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:04 +0100
                                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:30 -0700
                                  Re: RE Module Performance Lele Gaifax <lele@metapensiero.it> - 2013-07-28 22:45 +0200
                                  Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 22:01 +0200
                            Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 07:01 -0700
                              Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 16:38 +0200
                              Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 15:45 +0100
                              Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 17:13 +0100
                              Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 18:39 +0200
                              Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-30 18:14 +0100
                                Re: RE Module Performance Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-31 13:09 +1000
                              Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-31 03:27 +1000
                              Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-30 18:40 +0100
                              Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-30 20:19 +0200
                                Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-30 12:09 -0700
                                  Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-30 21:04 +0100
                                  Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:54 -0600
                                  Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-31 05:45 +0000
                                    Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 08:17 +0100
                                    Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 13:15 -0700
                                      Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-31 21:41 +0100
                                  Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:11 +0200
                                    Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-31 01:32 -0700
                                      Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 10:59 +0200
                                      Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:44 -0600
                              Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-30 17:05 -0400
                              Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-30 21:30 -0600
                              Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-31 09:23 +0200
                              Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-31 08:27 -0600
                          Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 10:45 +0200
                          FSR and unicode compliance - was Re: RE Module Performance Michael Torrie <torriem@gmail.com> - 2013-07-28 09:52 -0600
                            Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-28 12:23 -0700
                              Re: FSR and unicode compliance - was Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-28 20:44 +0100
                              Re: FSR and unicode compliance - was Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 21:55 +0200
                              Re: FSR and unicode compliance - was Re: RE Module Performance Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-28 20:52 +0000
                                Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 04:43 -0700
                                  Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 12:57 +0100
                                    Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 05:56 -0700
                                    Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 07:20 -0700
                                      Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 15:49 +0100
                                        Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 09:31 -0700
                                  Re: FSR and unicode compliance - was Re: RE Module Performance Heiko Wundram <modelnine@modelnine.org> - 2013-07-29 14:06 +0200
                                  Re: FSR and unicode compliance - was Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-29 08:43 -0400
                          Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 18:03 +0100
                          Re: FSR and unicode compliance - was Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 13:36 -0400
                            Re: FSR and unicode compliance - was Re: RE Module Performance wxjmfauth@gmail.com - 2013-07-29 06:36 -0700
                          Re: FSR and unicode compliance - was Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:03 +0100
                          Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 19:19 +0100
                          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-28 19:29 +0100
                          Re: RE Module Performance Terry Reedy <tjreedy@udel.edu> - 2013-07-28 15:06 -0400
                          Re: RE Module Performance Joshua Landau <joshua@landau.ws> - 2013-07-28 23:14 +0100
                          Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-28 20:51 +0200
                          Re: RE Module Performance Chris Angelico <rosuav@gmail.com> - 2013-07-29 00:07 +0100
                      Re: RE Module Performance Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-07-26 22:38 +0200
          Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-25 09:44 -0400
          Re: RE Module Performance Ian Kelly <ian.g.kelly@gmail.com> - 2013-07-25 15:53 -0500
      Re: RE Module Performance MRAB <python@mrabarnett.plus.com> - 2013-07-13 00:16 +0100
      Re: RE Module Performance Tim Delaney <timothy.c.delaney@gmail.com> - 2013-07-14 05:34 +1000
      Re: RE Module Performance Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-16 06:30 -0400
        Re: RE Module Performance 88888 Dihedral <dihedral88888@gmail.com> - 2013-07-18 13:17 -0700

Page 7 of 7 — ← Prev page 1 2 3 4 5 6 [7]

#51389 — Re: FSR and unicode compliance - was Re: RE Module Performance

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-07-28 13:36 -0400
Subject	Re: FSR and unicode compliance - was Re: RE Module Performance
Message-ID	<mailman.5193.1375032989.3114.python-list@python.org>
In reply to	#51340

On 7/28/2013 11:52 AM, Michael Torrie wrote:
>
> 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
> slicing a string would be very very slow,

Not necessarily so. See below.

> and that's unacceptable for
> the use cases of python strings.  I'm assuming you understand big O
> notation, as you talk of experience in many languages over the years.
> FSR and UTF-32 both are O(1) for slicing and lookups.

Slicing is at least O(m) where m is the length of the slice.

> UTF-8, 16 and any variable-width encoding are always O(n).\

I posted about a week ago, in response to Chris A., a method by which 
lookup for UTF-16 can be made O(log2 k), or perhaps more accurately, 
O(1+log2(k+1)), where k is the number of non-BMP chars in the string.

This uses an auxiliary array of k ints. An auxiliary array of n ints 
would make UFT-16 lookup O(1), but then one is using more space than 
with UFT-32. Similar comments apply to UTF-8.

The unicode standard says that a single strings should use exactly one 
coding scheme. It does *not* say that all strings in an application must 
use the same scheme. I just rechecked a few days ago. It also does not 
say that an application cannot associate additional data with a string 
to make processing of the string easier.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#51445 — Re: FSR and unicode compliance - was Re: RE Module Performance

From	wxjmfauth@gmail.com
Date	2013-07-29 06:36 -0700
Subject	Re: FSR and unicode compliance - was Re: RE Module Performance
Message-ID	<11a47ae7-27af-46dd-ad32-fa851ead45d5@googlegroups.com>
In reply to	#51389

Le dimanche 28 juillet 2013 19:36:00 UTC+2, Terry Reedy a écrit :
> On 7/28/2013 11:52 AM, Michael Torrie wrote:
> 
> >
> 
> > 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
> 
> > slicing a string would be very very slow,
> 
> 
> 
> Not necessarily so. See below.
> 
> 
> 
> > and that's unacceptable for
> 
> > the use cases of python strings.  I'm assuming you understand big O
> 
> > notation, as you talk of experience in many languages over the years.
> 
> > FSR and UTF-32 both are O(1) for slicing and lookups.
> 
> 
> 
> Slicing is at least O(m) where m is the length of the slice.
> 
> 
> 
> > UTF-8, 16 and any variable-width encoding are always O(n).\
> 
> 
> 
> I posted about a week ago, in response to Chris A., a method by which 
> 
> lookup for UTF-16 can be made O(log2 k), or perhaps more accurately, 
> 
> O(1+log2(k+1)), where k is the number of non-BMP chars in the string.
> 
> 
> 
> This uses an auxiliary array of k ints. An auxiliary array of n ints 
> 
> would make UFT-16 lookup O(1), but then one is using more space than 
> 
> with UFT-32. Similar comments apply to UTF-8.
> 
> 
> 
> The unicode standard says that a single strings should use exactly one 
> 
> coding scheme. It does *not* say that all strings in an application must 
> 
> use the same scheme. I just rechecked a few days ago. It also does not 
> 
> say that an application cannot associate additional data with a string 
> 
> to make processing of the string easier.
> 
> 
> 
> -- 
> 
> Terry Jan Reedy

To my knowledge, the Unicode doc always speak about
the misc. utf* coding schemes in an "exclusive or" way.

Having multiple encoded strings is one thing. Manipulating
multiple encoded strings is something else.

Maybe the mistake was to not emphasize the fact that
one has to work with a unique set of encoded code points
(utf-8 or utf-16 or utf-32) because it was considered,
as to obvious one can not work properly with multiple
coding schemes.

You are also right in saying " ...application cannot associate
additional data...".
The doc does not specify it either. It is superfleous.


jmf

[toc] | [prev] | [next] | [standalone]

#51391 — Re: FSR and unicode compliance - was Re: RE Module Performance

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-28 19:03 +0100
Subject	Re: FSR and unicode compliance - was Re: RE Module Performance
Message-ID	<mailman.5195.1375034638.3114.python-list@python.org>
In reply to	#51340

On Sun, Jul 28, 2013 at 6:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
> I posted about a week ago, in response to Chris A., a method by which lookup
> for UTF-16 can be made O(log2 k), or perhaps more accurately,
> O(1+log2(k+1)), where k is the number of non-BMP chars in the string.
>

Which is an optimization choice that favours strings containing very
few non-BMP characters. To justify the extra complexity of out-of-band
storage, you would need to be working with almost exclusively the BMP.
That would drastically improve jmf's microbenchmarks which do exactly
that, but it would penalize strings that are almost exclusively
higher-codepoint characters. Its quality, then, would be based on a
major survey of string usage: are there enough strings with
mostly-BMP-but-a-few-SMP? Bearing in mind that pure BMP is handled
better by PEP 393, so this is only of value when there are actually
those mixed strings.

ChrisA

[toc] | [prev] | [next] | [standalone]

#51393

From	Joshua Landau <joshua@landau.ws>
Date	2013-07-28 19:19 +0100
Message-ID	<mailman.5196.1375035603.3114.python-list@python.org>
In reply to	#51340

[Multipart message — attachments visible in raw view] — view raw

On 28 July 2013 09:45, Antoon Pardon <antoon.pardon@rece.vub.ac.be> wrote:

> Op 27-07-13 20:21, wxjmfauth@gmail.com schreef:
>
>> utf-8 or any (utf) never need and never spend their time
>> in reencoding.
>>
>
> So? That python sometimes needs to do some kind of background
> processing is not a problem, whether it is garbage collection,
> allocating more memory, shufling around data blocks or reencoding a
> string, that doesn't matter. If you've got a real world example where
> one of those things noticeably slows your program down or makes the
> program behave faulty then you have something that is worthy of
> attention.


Somewhat off topic, but befitting of the triviality of this thread, do I
understand correctly that you are saying garbage collection never causes
any noticeable slowdown in real-world circumstances? That's not remotely
true.

[toc] | [prev] | [next] | [standalone]

#51394

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-28 19:29 +0100
Message-ID	<mailman.5197.1375036203.3114.python-list@python.org>
In reply to	#51340

On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau <joshua@landau.ws> wrote:
> On 28 July 2013 09:45, Antoon Pardon <antoon.pardon@rece.vub.ac.be> wrote:
>>
>> Op 27-07-13 20:21, wxjmfauth@gmail.com schreef:
>>>
>>> utf-8 or any (utf) never need and never spend their time
>>> in reencoding.
>>
>>
>> So? That python sometimes needs to do some kind of background
>> processing is not a problem, whether it is garbage collection,
>> allocating more memory, shufling around data blocks or reencoding a
>> string, that doesn't matter. If you've got a real world example where
>> one of those things noticeably slows your program down or makes the
>> program behave faulty then you have something that is worthy of
>> attention.
>
>
> Somewhat off topic, but befitting of the triviality of this thread, do I
> understand correctly that you are saying garbage collection never causes any
> noticeable slowdown in real-world circumstances? That's not remotely true.

If it's done properly, garbage collection shouldn't hurt the *overall*
performance of the app; most of the issues with GC timing are when one
operation gets unexpectedly delayed for a GC run (making performance
measurement hard, and such). It should certainly never cause your
program to behave faultily, though I have seen cases where the GC run
appears to cause the program to crash - something like this:

some_string = buggy_call()
...
gc()
...
print(some_string)

The buggy call mucked up the reference count, so the gc run actually
wiped the string from memory - resulting in a segfault on next usage.
But the GC wasn't at fault, the original call was. (Which, btw, was
quite a debugging search, especially since the function in question
wasn't my code.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#51398

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-07-28 15:06 -0400
Message-ID	<mailman.5201.1375038414.3114.python-list@python.org>
In reply to	#51340

On 7/28/2013 2:29 PM, Chris Angelico wrote:
> On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau <joshua@landau.ws> wrote:

>> Somewhat off topic, but befitting of the triviality of this thread, do I
>> understand correctly that you are saying garbage collection never causes any
>> noticeable slowdown in real-world circumstances? That's not remotely true.
>
> If it's done properly, garbage collection shouldn't hurt the *overall*
> performance of the app;

There are situations, some discussed on this list, where doing gc 
'right' means turning off the cycle garbage collector. As I remember, an 
example is creating a list of a million tuples, which otherwise triggers 
a lot of useless background bookkeeping. The cyclic gc is tuned for 
'normal' use patterns.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#51410

From	Joshua Landau <joshua@landau.ws>
Date	2013-07-28 23:14 +0100
Message-ID	<mailman.5206.1375049741.3114.python-list@python.org>
In reply to	#51340

[Multipart message — attachments visible in raw view] — view raw

On 28 July 2013 19:29, Chris Angelico <rosuav@gmail.com> wrote:

> On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau <joshua@landau.ws> wrote:
> > On 28 July 2013 09:45, Antoon Pardon <antoon.pardon@rece.vub.ac.be>
> wrote:
> >>
> >> Op 27-07-13 20:21, wxjmfauth@gmail.com schreef:
> >>>
> >>> utf-8 or any (utf) never need and never spend their time
> >>> in reencoding.
> >>
> >>
> >> So? That python sometimes needs to do some kind of background
> >> processing is not a problem, whether it is garbage collection,
> >> allocating more memory, shufling around data blocks or reencoding a
> >> string, that doesn't matter. If you've got a real world example where
> >> one of those things noticeably slows your program down or makes the
> >> program behave faulty then you have something that is worthy of
> >> attention.
> >
> >
> > Somewhat off topic, but befitting of the triviality of this thread, do I
> > understand correctly that you are saying garbage collection never causes
> any
> > noticeable slowdown in real-world circumstances? That's not remotely
> true.
>
> If it's done properly, garbage collection shouldn't hurt the *overall*
> performance of the app; most of the issues with GC timing are when one
> operation gets unexpectedly delayed for a GC run (making performance
> measurement hard, and such). It should certainly never cause your
> program to behave faultily, though I have seen cases where the GC run
> appears to cause the program to crash - something like this:
>
> some_string = buggy_call()
> ...
> gc()
> ...
> print(some_string)
>
> The buggy call mucked up the reference count, so the gc run actually
> wiped the string from memory - resulting in a segfault on next usage.
> But the GC wasn't at fault, the original call was. (Which, btw, was
> quite a debugging search, especially since the function in question
> wasn't my code.)
>

GC does have sometimes severe impact in memory-constrained environments,
though. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/,
about half-way down, specifically
http://sealedabstract.com/wp-content/uploads/2013/05/Screen-Shot-2013-05-14-at-10.15.29-PM.png
.

The best verification of these graphs I could find was
https://blog.mozilla.org/nnethercote/category/garbage-collection/, although
it's not immediately clear in Chrome's and Opera's case mainly due to none
of the benchmarks pushing memory usage significantly.

I also don't quite agree with the first post (sealedabstract) because I get
by *fine* on 2GB memory, so I don't see why you can't on a phone. Maybe IOS
is just really heavy. Nonetheless, the benchmarks aren't lying.

[toc] | [prev] | [next] | [standalone]

#51422

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2013-07-28 20:51 +0200
Message-ID	<mailman.5215.1375082821.3114.python-list@python.org>
In reply to	#51340

Op 28-07-13 20:19, Joshua Landau schreef:
> On 28 July 2013 09:45, Antoon Pardon <antoon.pardon@rece.vub.ac.be
> <mailto:antoon.pardon@rece.vub.ac.be>> wrote:
>
>     Op 27-07-13 20:21, wxjmfauth@gmail.com <mailto:wxjmfauth@gmail.com>
>     schreef:
>
>         utf-8 or any (utf) never need and never spend their time
>         in reencoding.
>
>
>     So? That python sometimes needs to do some kind of background
>     processing is not a problem, whether it is garbage collection,
>     allocating more memory, shufling around data blocks or reencoding a
>     string, that doesn't matter. If you've got a real world example where
>     one of those things noticeably slows your program down or makes the
>     program behave faulty then you have something that is worthy of
>     attention.
>
>
> Somewhat off topic, but befitting of the triviality of this thread, do I
> understand correctly that you are saying garbage collection never causes
> any noticeable slowdown in real-world circumstances? That's not remotely
> true.

No that is not what I am saying. But if jmf would be complaining about
garbage collection in an analog way as he is complaining about the FSR,
he wouldn't be complaining about real-world circumstances but about
theorectical possibilities and micro bench marks. In those circunstances
the "garbage collection problem" wouldn't be worthy of attention much.

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]

#51425

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-29 00:07 +0100
Message-ID	<mailman.5217.1375082822.3114.python-list@python.org>
In reply to	#51340

On Sun, Jul 28, 2013 at 11:14 PM, Joshua Landau <joshua@landau.ws> wrote:
> GC does have sometimes severe impact in memory-constrained environments,
> though. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/,
> about half-way down, specifically
> http://sealedabstract.com/wp-content/uploads/2013/05/Screen-Shot-2013-05-14-at-10.15.29-PM.png.
>
> The best verification of these graphs I could find was
> https://blog.mozilla.org/nnethercote/category/garbage-collection/, although
> it's not immediately clear in Chrome's and Opera's case mainly due to none
> of the benchmarks pushing memory usage significantly.
>
> I also don't quite agree with the first post (sealedabstract) because I get
> by *fine* on 2GB memory, so I don't see why you can't on a phone. Maybe IOS
> is just really heavy. Nonetheless, the benchmarks aren't lying.

The ultimate in non-managed memory (the opposite of a GC) would have
to be the assembly language programming I did in my earlier days,
firing up DEBUG.EXE and writing a .COM file that lived inside a single
64KB segment for everything (256-byte Program Segment Prefix, then
code, then initialized data, then uninitialized data and stack),
crashing the computer with remarkable ease. Everything "higher level"
than that (even malloc/free) has its conveniences and its costs,
usually memory wastage. If you malloc random-sized blocks, free them
at random, and ensure that your total allocated size stays below some
limit, you'll still eventually run yourself out of memory. This is
unsurprising. The only question is, how bad is the wastage and how
much time gets devoted to it?

ChrisA

[toc] | [prev] | [next] | [standalone]

#51424

From	Antoon Pardon <antoon.pardon@rece.vub.ac.be>
Date	2013-07-26 22:38 +0200
Message-ID	<mailman.5212.1375082820.3114.python-list@python.org>
In reply to	#51300

Op 26-07-13 15:21, wxjmfauth@gmail.com schreef:
>
> Hint: To understand Unicode (and every coding scheme), you should
> understand "utf". The how and the *why*.

No you don't. You are mixing the information with how the information
is coded. utf is like base64, a way of coding the information that is
usefull for storage or transfer. But once you have decode the byte 
stream, you no longer need any understanding of base64 to process your
information. Likewise, once you have decode the bytestream into uniocde
information you don't need knowledge of utf to process unicode strings.

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]

#51220

From	Devyn Collier Johnson <devyncjohnson@gmail.com>
Date	2013-07-25 09:44 -0400
Message-ID	<mailman.5097.1374759884.3114.python-list@python.org>
In reply to	#51131

On 07/25/2013 09:36 AM, Jeremy Sanders wrote:
> wxjmfauth@gmail.com wrote:
>
>> Short example. Writing an editor with something like the
>> FSR is simply impossible (properly).
> http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations
>
> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers that are
> codepoints of text characters within buffers and strings. Rather, Emacs uses a
> variable-length internal representation of characters, that stores each
> character as a sequence of 1 to 5 8-bit bytes, depending on the magnitude of
> its codepoint[1]. For example, any ASCII character takes up only 1 byte, a
> Latin-1 character takes up 2 bytes, etc. We call this representation of text
> multibyte.
>
> ...
>
> [1] This internal representation is based on one of the encodings defined by
> the Unicode Standard, called UTF-8, for representing any Unicode codepoint, but
> Emacs extends UTF-8 to represent the additional codepoints it uses for raw 8-
> bit bytes and characters not unified with Unicode.
>
> "
>
> Jeremy
>
>
Wow! The thread that I started has changed a lot and lived a long time. 
I look forward to its first birthday (^u^).

Devyn Collier Johnson

[toc] | [prev] | [next] | [standalone]

#51259

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-07-25 15:53 -0500
Message-ID	<mailman.5120.1374785634.3114.python-list@python.org>
In reply to	#51131

On Wed, Jul 24, 2013 at 9:34 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Thu, Jul 25, 2013 at 12:17 AM, David Hutto <dwightdhutto@gmail.com> wrote:
>> I've screwed up plenty of times in python, but can write code like a pro
>> when I'm feeling better(on SSI and medicaid). An editor can be built simply,
>> but it's preference that makes the difference. Some might have used tkinter,
>> gtk. wxpython or other methods for the task.
>>
>> I think the main issue in responding is your library preference, or widget
>> set preference. These can make you right with some in your response, or
>> wrong with others that have a preferable gui library that coincides with
>> one's personal cognitive structure that makes t
>
> jmf's point is more about writing the editor widget (Scintilla, as
> opposed to SciTE), which most people will never bother to do. I've
> written several text editors, always by embedding someone else's
> widget, and therefore not concerning myself with its internal string
> representation. Frankly, Python's strings are a *terrible* internal
> representation for an editor widget - not because of PEP 393, but
> simply because they are immutable, and every keypress would result in
> a rebuilding of the string. On the flip side, I could quite plausibly
> imagine using a list of strings; whenever text gets inserted, the
> string gets split at that point, and a new string created for the
> insert (which also means that an Undo operation simply removes one
> entire string). In this usage, the FSR is beneficial, as it's possible
> to have different strings at different widths.
>
> But mainly, I'm just wondering how many people here have any basis
> from which to argue the point he's trying to make. I doubt most of us
> have (a) implemented an editor widget, or (b) tested multiple
> different internal representations to learn the true pros and cons of
> each. And even if any of us had, that still wouldn't have any bearing
> on PEP 393, which is about applications, not editor widgets. As stated
> above, Python strings before AND after PEP 393 are poor choices for an
> editor, ergo arguing from that standpoint is pretty useless. Not that
> that bothers jmf...

I think you've just motivated me to finally get around to writing the
custom output widget for my MUD client.  Of course that will be
simpler than a standard rich text editor widget, since it will never
receive input from the user and modifications will (typically) always
come in the form of append operations.  I intend to write it in pure
Python (well, wxPython), however.

[toc] | [prev] | [next] | [standalone]

#50564

From	MRAB <python@mrabarnett.plus.com>
Date	2013-07-13 00:16 +0100
Message-ID	<mailman.4667.1373670938.3114.python-list@python.org>
In reply to	#50510

On 12/07/2013 23:16, Tim Delaney wrote:
> On 13 July 2013 03:58, Devyn Collier Johnson <devyncjohnson@gmail.com
> <mailto:devyncjohnson@gmail.com>> wrote:
>
>
>     Thanks for the thorough response. I learned a lot. You should write
>     articles on Python.
>     I plan to spend some time optimizing the re.py module for Unix
>     systems. I would love to amp up my programs that use that module.
>
>
> If you are finding that regular expressions are taking too much time,
> have a look at the https://pypi.python.org/pypi/re2/ and
> https://pypi.python.org/pypi/regex/2013-06-26 modules to see if they
> already give you enough of a speedup.
>
FYI, you're better off going to http://pypi.python.org/pypi/regex
because that will take you to the latest version.

[toc] | [prev] | [next] | [standalone]

#50613

From	Tim Delaney <timothy.c.delaney@gmail.com>
Date	2013-07-14 05:34 +1000
Message-ID	<mailman.4686.1373744069.3114.python-list@python.org>
In reply to	#50510

[Multipart message — attachments visible in raw view] — view raw

On 13 July 2013 09:16, MRAB <python@mrabarnett.plus.com> wrote:

> On 12/07/2013 23:16, Tim Delaney wrote:
>
>> On 13 July 2013 03:58, Devyn Collier Johnson <devyncjohnson@gmail.com
>> <mailto:devyncjohnson@gmail.**com <devyncjohnson@gmail.com>>> wrote:
>>
>>
>>     Thanks for the thorough response. I learned a lot. You should write
>>     articles on Python.
>>     I plan to spend some time optimizing the re.py module for Unix
>>     systems. I would love to amp up my programs that use that module.
>>
>>
>> If you are finding that regular expressions are taking too much time,
>> have a look at the https://pypi.python.org/pypi/**re2/<https://pypi.python.org/pypi/re2/>and
>> https://pypi.python.org/pypi/**regex/2013-06-26<https://pypi.python.org/pypi/regex/2013-06-26>modules to see if they
>> already give you enough of a speedup.
>>
>>  FYI, you're better off going to http://pypi.python.org/pypi/**regex<http://pypi.python.org/pypi/regex>
> because that will take you to the latest version.


Absolutely - what was I thinking?

Tim Delaney

[toc] | [prev] | [next] | [standalone]

#50738

From	Devyn Collier Johnson <devyncjohnson@gmail.com>
Date	2013-07-16 06:30 -0400
Message-ID	<mailman.4767.1373970644.3114.python-list@python.org>
In reply to	#50510

Am 07/12/2013 07:16 PM, schrieb MRAB:
> On 12/07/2013 23:16, Tim Delaney wrote:
>> On 13 July 2013 03:58, Devyn Collier Johnson <devyncjohnson@gmail.com
>> <mailto:devyncjohnson@gmail.com>> wrote:
>>
>>
>>     Thanks for the thorough response. I learned a lot. You should write
>>     articles on Python.
>>     I plan to spend some time optimizing the re.py module for Unix
>>     systems. I would love to amp up my programs that use that module.
>>
>>
>> If you are finding that regular expressions are taking too much time,
>> have a look at the https://pypi.python.org/pypi/re2/ and
>> https://pypi.python.org/pypi/regex/2013-06-26 modules to see if they
>> already give you enough of a speedup.
>>
> FYI, you're better off going to http://pypi.python.org/pypi/regex
> because that will take you to the latest version.
Thank you everyone for the suggestions. I have not tried them yet.

Devyn Collier Johnson

[toc] | [prev] | [next] | [standalone]

#50865

From	88888 Dihedral <dihedral88888@gmail.com>
Date	2013-07-18 13:17 -0700
Message-ID	<aaa2e128-b8a8-4685-952d-ff724d7ab3d9@googlegroups.com>
In reply to	#50738

Devyn Collier Johnson於 2013年7月16日星期二UTC+8下午6時30分33秒寫道：
> Am 07/12/2013 07:16 PM, schrieb MRAB:
> 
> > On 12/07/2013 23:16, Tim Delaney wrote:
> 
> >> On 13 July 2013 03:58, Devyn Collier Johnson <devyncjohnson@gmail.com
> 
> >> <mailto:devyncjohnson@gmail.com>> wrote:
> 
> >>
> 
> >>
> 
> >>     Thanks for the thorough response. I learned a lot. You should write
> 
> >>     articles on Python.
> 
> >>     I plan to spend some time optimizing the re.py module for Unix
> 
> >>     systems. I would love to amp up my programs that use that module.
> 
> >>
> 
> >>
> 
> >> If you are finding that regular expressions are taking too much time,
> 
> >> have a look at the https://pypi.python.org/pypi/re2/ and
> 
> >> https://pypi.python.org/pypi/regex/2013-06-26 modules to see if they
> 
> >> already give you enough of a speedup.
> 
> >>
> 
> > FYI, you're better off going to http://pypi.python.org/pypi/regex
> 
> > because that will take you to the latest version.
> 
> Thank you everyone for the suggestions. I have not tried them yet.
> 
> 
> 
> Devyn Collier Johnson

I was thinking to decompose RE patterns into string matching 
formats of various strings in some formats.

Anyway that involves some compiler techniques.

[toc] | [prev] | [standalone]

Page 7 of 7 — ← Prev page 1 2 3 4 5 6 [7]

csiph-web

RE Module Performance

Contents

#51389 — Re: FSR and unicode compliance - was Re: RE Module Performance

#51445 — Re: FSR and unicode compliance - was Re: RE Module Performance

#51391 — Re: FSR and unicode compliance - was Re: RE Module Performance

#51393

#51394

#51398

#51410

#51422

#51425

#51424

#51220

#51259

#50564

#50613

#50738

#50865