Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #40095 > unrolled thread

Re: Python Speed

Started byTerry Reedy <tjreedy@udel.edu>
First post2013-02-27 21:11 -0500
Last post2013-02-28 08:55 +0100
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Python Speed Terry Reedy <tjreedy@udel.edu> - 2013-02-27 21:11 -0500
    Re: Python Speed Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-28 07:05 +0000
      Re: Python Speed Stefan Behnel <stefan_ml@behnel.de> - 2013-02-28 08:55 +0100

#40095 — Re: Python Speed

FromTerry Reedy <tjreedy@udel.edu>
Date2013-02-27 21:11 -0500
SubjectRe: Python Speed
Message-ID<mailman.2633.1362017506.2939.python-list@python.org>
On 2/27/2013 7:15 PM, Ian Kelly wrote:
> On Wed, Feb 27, 2013 at 3:24 PM, Terry Reedy <tjreedy@udel.edu> wrote:
>>> Py33
>>>>>> timeit.repeat("{1:'abc需'}")
>>> [0.2573893570572636, 0.24261832285651508, 0.24259548003601594]
>>
>> On my win system, I get a lower time for this:
>> [0.16579443757208878, 0.1475787649924598, 0.14970205670637426]
>>
>>> Py323
>>> timeit.repeat("{1:'abc需'}")
>>> [0.11000708521282831, 0.0994753634273593, 0.09901023634051853]
>>
>> While I get the same time for 3.2.3.
>> [0.11759353304428544, 0.09482448029000068, 0.09532802044164157]
>>
>> It seems that something about Jim's machine does not like 3.3.
>> *nix will probably see even less of a difference. Times are in microseconds,
>> so few programs will ever notice the difference.
>
> Running the same tests in IDLE on my Windows XP laptop, I see similar
> results to what jmf reports.

Whereas I run win 7 on a pentium i7 desktop. For this, I suspect the 
processor difference more than the OS. To really investigate, one should 
separately time string creation from dict creation with a pre-built string.

repeat('pass')  # .013 to .02 on both
repeat("'abc需'") # same, untimeable
repeat("'abc需'*10") # .12 versus .14 on 3.2 and 3.3
repeat("{1:s}", "s='abc需'")  # .10 versus .14

There is a problem with timer overhead for sub-microsecond operations. 
In interactive use, the code is compiled within a function that gets 
called. The string 'abc需' should be stored as a constant in the code 
object. To force repeated string operation, one should either time from 
command line or do an operation, as with the example above. I notice 
that the first of 3 times is almost always higher for some reason.

> But from what Christian posted, it
> sounds like this regression may have more to do with PEP 412 than PEP
> 393.

That change traded a space saving and for a small initial time cost.
Christian also showed that initial cost has since been cut. There may be 
more internal dict tweaks before 3.4.

-- 
Terry Jan Reedy

[toc] | [next] | [standalone]


#40117

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-02-28 07:05 +0000
Message-ID<512f01af$0$30001$c3e8da3$5496439d@news.astraweb.com>
In reply to#40095
On Wed, 27 Feb 2013 21:11:25 -0500, Terry Reedy wrote:

> There is a problem with timer overhead for sub-microsecond operations.
> In interactive use, the code is compiled within a function that gets
> called. The string 'abc需' should be stored as a constant in the code
> object. To force repeated string operation, one should either time from
> command line or do an operation, as with the example above. I notice
> that the first of 3 times is almost always higher for some reason.

I am not an expert on this, but I suspect the problem may have something 
to do with CPU pipelines and cache. The first time the timer runs, the 
cache is empty, and you get a slightly higher time. Subsequently there 
are not as many CPU cache misses, and the code runs more quickly.

Or, I could be talking out of my arse. Once upon a time CPUs were simple 
enough for me to understand what make code faster or slower, but no 
more...


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#40121

FromStefan Behnel <stefan_ml@behnel.de>
Date2013-02-28 08:55 +0100
Message-ID<mailman.2643.1362038393.2939.python-list@python.org>
In reply to#40117
Steven D'Aprano, 28.02.2013 08:05:
> On Wed, 27 Feb 2013 21:11:25 -0500, Terry Reedy wrote:
> 
>> There is a problem with timer overhead for sub-microsecond operations.
>> In interactive use, the code is compiled within a function that gets
>> called. The string 'abc需' should be stored as a constant in the code
>> object. To force repeated string operation, one should either time from
>> command line or do an operation, as with the example above. I notice
>> that the first of 3 times is almost always higher for some reason.
> 
> I am not an expert on this, but I suspect the problem may have something 
> to do with CPU pipelines and cache. The first time the timer runs, the 
> cache is empty, and you get a slightly higher time. Subsequently there 
> are not as many CPU cache misses, and the code runs more quickly.

Well, the default loop iteration count is 1000000, so warming up any caches
might make a little difference at the very beginning but should rarely have
a major impact on the overall running time, as each iteration only changes
the final result by 1/1000000 of its runtime.

However, it's best to run timeit as a main program (python -m timeit),
because the way it works then is that it first runs the code a couple of
times to see how often it should repeat it in a loop to get meaningful
results. Only *then* it starts benchmarking it. That initial testing phase
should usually be enough to warm up any caches, so that you'd get better
results. You can still get the results of all repeated runs if you pass -v.

Stefan

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web