Groups > comp.lang.python > #95580 > unrolled thread

how to handle cpu cache in python ( or fastest way to call a function once)

Started by	Yuzhi Xu <yuzhixu.ruc@gmail.com>
First post	2015-08-22 23:10 -0700
Last post	2015-08-23 22:42 +1000
Articles	5 — 4 participants

Back to article view | Back to comp.lang.python

  how to handle cpu cache in python ( or fastest way to call a function once) Yuzhi Xu <yuzhixu.ruc@gmail.com> - 2015-08-22 23:10 -0700
    Re: how to handle cpu cache in python ( or fastest way to call a function once) Stefan Behnel <stefan_ml@behnel.de> - 2015-08-23 11:54 +0200
    Re: how to handle cpu cache in python ( or fastest way to call a function once) Steven D'Aprano <steve@pearwood.info> - 2015-08-23 21:59 +1000
    Re: how to handle cpu cache in python ( or fastest way to call a function once) Vladimir Ignatov <kmisoft@gmail.com> - 2015-08-23 08:07 -0400
      Re: how to handle cpu cache in python ( or fastest way to call a function once) Steven D'Aprano <steve@pearwood.info> - 2015-08-23 22:42 +1000

#95580 — how to handle cpu cache in python ( or fastest way to call a function once)

From	Yuzhi Xu <yuzhixu.ruc@gmail.com>
Date	2015-08-22 23:10 -0700
Subject	how to handle cpu cache in python ( or fastest way to call a function once)
Message-ID	<2aa39ddd-bb07-4a09-a046-a011e215882a@googlegroups.com>

I find out that python's VM seems to be very unfriendly with CPU-Cache.


see:
http://stackoverflow.com/questions/32163585/how-to-handle-cpu-cache-in-python-or-fastest-way-to-call-a-function-once

http://stackoverflow.com/questions/32153178/python-functionor-a-code-block-runs-much-slower-with-a-time-interval-in-a-loop




for example:
*******************************************
import time
a = range(500)

sum(a)

for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
    pass


st = time.time()
sum(a)
print (time.time() - st)*1e6

*********************************************
time:> 100us


another case:
*********************************************
import time
a = range(500)

for i in range(100000):
    st = time.time()
    sum(a)
    print (time.time() - st)*1e6

*********************************************
time:~ 20us


we can see when running frequently, the code becomes much faster.

is there a solution?


I feel this question is very difficult. one must has indepth unstanding about the mechanism of python virtual machine, c and cpu-cache.

Do you have any suggestion about where to post this question for a possible answer?

[toc] | [next] | [standalone]

#95584

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2015-08-23 11:54 +0200
Message-ID	<mailman.32.1440323670.17298.python-list@python.org>
In reply to	#95580

Yuzhi Xu schrieb am 23.08.2015 um 08:10:
> I find out that python's VM seems to be very unfriendly with CPU-Cache.
> see:
> http://stackoverflow.com/questions/32163585/how-to-handle-cpu-cache-in-python-or-fastest-way-to-call-a-function-once
> http://stackoverflow.com/questions/32153178/python-functionor-a-code-block-runs-much-slower-with-a-time-interval-in-a-loop
> 
> for example:
> *******************************************
> import time
> a = range(500)
> 
> sum(a)
> 
> for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
>     pass
> 
> 
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
> 
> *********************************************
> time:> 100us
> 
> 
> another case:
> *********************************************
> import time
> a = range(500)
> 
> for i in range(100000):
>     st = time.time()
>     sum(a)
>     print (time.time() - st)*1e6
> 
> *********************************************
> time:~ 20us
> 
> 
> we can see when running frequently, the code becomes much faster.

That does not seem like a straight forward deduction. Especially the
interpretation that the CPU caching behaviour is to blame here seems rather
far fetched.

My guess is that it rather has to do with CPython's internal object caching
or something at that level. However, given the absolute timings above, I
wouldn't bother too much finding it out. It's unlikely to hurt real-world
code. (And in fact, the more interesting case where things are happing
several times in a row rather than being a negligible constant one-time
effort seems to be substantially faster in your timings. Congratulations!)


> is there a solution?

Is there a problem?

Stefan

[toc] | [prev] | [next] | [standalone]

#95585

From	Steven D'Aprano <steve@pearwood.info>
Date	2015-08-23 21:59 +1000
Message-ID	<55d9b59e$0$1652$c3e8da3$5496439d@news.astraweb.com>
In reply to	#95580

On Sun, 23 Aug 2015 04:10 pm, Yuzhi Xu wrote:

> I find out that python's VM seems to be very unfriendly with CPU-Cache.

Possibly. More comments below.


> for example:
> *******************************************
> import time
> a = range(500)
> 
> sum(a)
> 
> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
>     pass
> 
> 
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
> 
> *********************************************
> time:> 100us

On my machine, I get about 20-25 μs for this:

(a.py contains your code above)

[steve@ando ~]$ python2.7 a.py
21.9345092773
[steve@ando ~]$ python2.7 a.py
21.9345092773
[steve@ando ~]$ python2.7 a.py
24.0802764893
[steve@ando ~]$ python2.7 a.py
23.8418579102



> another case:
> *********************************************
> import time
> a = range(500)
> 
> for i in range(100000):
>     st = time.time()
>     sum(a)
>     print (time.time() - st)*1e6
> 
> *********************************************
> time:~ 20us


Running this as b.py, I get times of around 15μs, a bit faster than the
first version, but not a factor of five times faster as you get.

[steve@ando ~]$ python2.7 b.py
[...]
15.0203704834
15.0203704834
25.0339508057
16.9277191162
20.0271606445
94.8905944824
15.9740447998
15.0203704834
15.0203704834
15.0203704834
15.0203704834
14.066696167
13.8282775879
15.0203704834
Traceback (most recent call last):
  File "b.py", line 6, in <module>
    sum(a)
KeyboardInterrupt


Above, you say:

> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
>     pass

But remember that range() is a function, and so, yes, it may disturb the CPU
cache. What did you expect?

But I'm not sure how the CPU cache will interact with code in a high-level
language like Python. I suspect that more likely, it simply has something
to do with range(1000000) building an enormous list of integers.

Here's another version:

[steve@ando ~]$ cat c.py
import time
a = range(500)
sum(a)
for i in range(1000000):
    pass
sum(a)
st = time.time()
sum(a)
print (time.time() - st)*1e6

[steve@ando ~]$ python2.7 c.py
15.9740447998



And one more:


[steve@ando ~]$ cat d.py
import time
a = range(500)
sum(a)
for i in xrange(1000000): # Use xrange instead of range
    pass
st = time.time()
sum(a)
print (time.time() - st)*1e6

[steve@ando ~]$ python2.7 d.py
22.1729278564
[steve@ando ~]$ python2.7 d.py
23.1266021729



So... on my machine, the difference between xrange and range makes no
difference: in both cases, calling sum() takes about 22μs.

But calling sum() twice speeds up the second call to about 16μs, or about
25% faster. (Not 80% faster, as you find.)


One last test:


[steve@ando ~]$ cat e.py
import time
a = range(500)
# Without warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Second time, with warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Add a delay.
for i in xrange(1000):
    pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
st = time.time()
sum(a)
print (time.time() - st)*1e6


[steve@ando ~]$ python2.7 e.py
15.0203704834
15.0203704834
10.9672546387
10.9672546387
[steve@ando ~]$ python2.7 e.py
15.9740447998
12.8746032715
12.1593475342
10.9672546387
[steve@ando ~]$ python2.7 e.py
15.9740447998
20.0271606445
15.0203704834
15.9740447998




-- 
Steven

[toc] | [prev] | [next] | [standalone]

#95586

From	Vladimir Ignatov <kmisoft@gmail.com>
Date	2015-08-23 08:07 -0400
Message-ID	<mailman.33.1440331655.17298.python-list@python.org>
In reply to	#95580

Hi,

>> for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
>>     pass

Python interpreter consumes memory quite extensively because
"everything is object".  So constructions like:

range(1000000):

_take_ memory.  Additionally it will trigger garbage collecting code
on deallocation time so expect even more delay.
To get most out of Python - all "numbers crushing" / "pixel pushing" /
"store gigabytes" code should go to low-level compiled binary
libraries.


Vladimir

https://itunes.apple.com/us/app/python-code-samples/id1025613117

[toc] | [prev] | [next] | [standalone]

#95587

From	Steven D'Aprano <steve@pearwood.info>
Date	2015-08-23 22:42 +1000
Message-ID	<55d9bfcf$0$1657$c3e8da3$5496439d@news.astraweb.com>
In reply to	#95586

On Sun, 23 Aug 2015 10:07 pm, Vladimir Ignatov wrote:

> Hi,
> 
>>> for i in range(1000000): #just to create a time interval, seems this
>>> disturb cpu cache?
>>>     pass
> 
> Python interpreter consumes memory quite extensively because
> "everything is object".  So constructions like:
> 
> range(1000000):
> 
> _take_ memory.  Additionally it will trigger garbage collecting code
> on deallocation time so expect even more delay.

Normally you would be correct, but as my timing results show, using xrange
instead of range does not make any difference.

Whatever is going on here, it isn't as simple as "range(1000000) builds a
giant list".

-- 
Steven

[toc] | [prev] | [standalone]

csiph-web

how to handle cpu cache in python ( or fastest way to call a function once)

Contents

#95580 — how to handle cpu cache in python ( or fastest way to call a function once)

#95584

#95585

#95586

#95587