Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #95580 > unrolled thread
| Started by | Yuzhi Xu <yuzhixu.ruc@gmail.com> |
|---|---|
| First post | 2015-08-22 23:10 -0700 |
| Last post | 2015-08-23 22:42 +1000 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
how to handle cpu cache in python ( or fastest way to call a function once) Yuzhi Xu <yuzhixu.ruc@gmail.com> - 2015-08-22 23:10 -0700
Re: how to handle cpu cache in python ( or fastest way to call a function once) Stefan Behnel <stefan_ml@behnel.de> - 2015-08-23 11:54 +0200
Re: how to handle cpu cache in python ( or fastest way to call a function once) Steven D'Aprano <steve@pearwood.info> - 2015-08-23 21:59 +1000
Re: how to handle cpu cache in python ( or fastest way to call a function once) Vladimir Ignatov <kmisoft@gmail.com> - 2015-08-23 08:07 -0400
Re: how to handle cpu cache in python ( or fastest way to call a function once) Steven D'Aprano <steve@pearwood.info> - 2015-08-23 22:42 +1000
| From | Yuzhi Xu <yuzhixu.ruc@gmail.com> |
|---|---|
| Date | 2015-08-22 23:10 -0700 |
| Subject | how to handle cpu cache in python ( or fastest way to call a function once) |
| Message-ID | <2aa39ddd-bb07-4a09-a046-a011e215882a@googlegroups.com> |
I find out that python's VM seems to be very unfriendly with CPU-Cache.
see:
http://stackoverflow.com/questions/32163585/how-to-handle-cpu-cache-in-python-or-fastest-way-to-call-a-function-once
http://stackoverflow.com/questions/32153178/python-functionor-a-code-block-runs-much-slower-with-a-time-interval-in-a-loop
for example:
*******************************************
import time
a = range(500)
sum(a)
for i in range(1000000): #just to create a time interval, seems this disturb cpu cache?
pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
*********************************************
time:> 100us
another case:
*********************************************
import time
a = range(500)
for i in range(100000):
st = time.time()
sum(a)
print (time.time() - st)*1e6
*********************************************
time:~ 20us
we can see when running frequently, the code becomes much faster.
is there a solution?
I feel this question is very difficult. one must has indepth unstanding about the mechanism of python virtual machine, c and cpu-cache.
Do you have any suggestion about where to post this question for a possible answer?
[toc] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2015-08-23 11:54 +0200 |
| Message-ID | <mailman.32.1440323670.17298.python-list@python.org> |
| In reply to | #95580 |
Yuzhi Xu schrieb am 23.08.2015 um 08:10: > I find out that python's VM seems to be very unfriendly with CPU-Cache. > see: > http://stackoverflow.com/questions/32163585/how-to-handle-cpu-cache-in-python-or-fastest-way-to-call-a-function-once > http://stackoverflow.com/questions/32153178/python-functionor-a-code-block-runs-much-slower-with-a-time-interval-in-a-loop > > for example: > ******************************************* > import time > a = range(500) > > sum(a) > > for i in range(1000000): #just to create a time interval, seems this disturb cpu cache? > pass > > > st = time.time() > sum(a) > print (time.time() - st)*1e6 > > ********************************************* > time:> 100us > > > another case: > ********************************************* > import time > a = range(500) > > for i in range(100000): > st = time.time() > sum(a) > print (time.time() - st)*1e6 > > ********************************************* > time:~ 20us > > > we can see when running frequently, the code becomes much faster. That does not seem like a straight forward deduction. Especially the interpretation that the CPU caching behaviour is to blame here seems rather far fetched. My guess is that it rather has to do with CPython's internal object caching or something at that level. However, given the absolute timings above, I wouldn't bother too much finding it out. It's unlikely to hurt real-world code. (And in fact, the more interesting case where things are happing several times in a row rather than being a negligible constant one-time effort seems to be substantially faster in your timings. Congratulations!) > is there a solution? Is there a problem? Stefan
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-08-23 21:59 +1000 |
| Message-ID | <55d9b59e$0$1652$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #95580 |
On Sun, 23 Aug 2015 04:10 pm, Yuzhi Xu wrote:
> I find out that python's VM seems to be very unfriendly with CPU-Cache.
Possibly. More comments below.
> for example:
> *******************************************
> import time
> a = range(500)
>
> sum(a)
>
> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
> pass
>
>
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
>
> *********************************************
> time:> 100us
On my machine, I get about 20-25 μs for this:
(a.py contains your code above)
[steve@ando ~]$ python2.7 a.py
21.9345092773
[steve@ando ~]$ python2.7 a.py
21.9345092773
[steve@ando ~]$ python2.7 a.py
24.0802764893
[steve@ando ~]$ python2.7 a.py
23.8418579102
> another case:
> *********************************************
> import time
> a = range(500)
>
> for i in range(100000):
> st = time.time()
> sum(a)
> print (time.time() - st)*1e6
>
> *********************************************
> time:~ 20us
Running this as b.py, I get times of around 15μs, a bit faster than the
first version, but not a factor of five times faster as you get.
[steve@ando ~]$ python2.7 b.py
[...]
15.0203704834
15.0203704834
25.0339508057
16.9277191162
20.0271606445
94.8905944824
15.9740447998
15.0203704834
15.0203704834
15.0203704834
15.0203704834
14.066696167
13.8282775879
15.0203704834
Traceback (most recent call last):
File "b.py", line 6, in <module>
sum(a)
KeyboardInterrupt
Above, you say:
> for i in range(1000000): #just to create a time interval, seems this
> disturb cpu cache?
> pass
But remember that range() is a function, and so, yes, it may disturb the CPU
cache. What did you expect?
But I'm not sure how the CPU cache will interact with code in a high-level
language like Python. I suspect that more likely, it simply has something
to do with range(1000000) building an enormous list of integers.
Here's another version:
[steve@ando ~]$ cat c.py
import time
a = range(500)
sum(a)
for i in range(1000000):
pass
sum(a)
st = time.time()
sum(a)
print (time.time() - st)*1e6
[steve@ando ~]$ python2.7 c.py
15.9740447998
And one more:
[steve@ando ~]$ cat d.py
import time
a = range(500)
sum(a)
for i in xrange(1000000): # Use xrange instead of range
pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
[steve@ando ~]$ python2.7 d.py
22.1729278564
[steve@ando ~]$ python2.7 d.py
23.1266021729
So... on my machine, the difference between xrange and range makes no
difference: in both cases, calling sum() takes about 22μs.
But calling sum() twice speeds up the second call to about 16μs, or about
25% faster. (Not 80% faster, as you find.)
One last test:
[steve@ando ~]$ cat e.py
import time
a = range(500)
# Without warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Second time, with warm-up.
st = time.time()
sum(a)
print (time.time() - st)*1e6
# Add a delay.
for i in xrange(1000):
pass
st = time.time()
sum(a)
print (time.time() - st)*1e6
st = time.time()
sum(a)
print (time.time() - st)*1e6
[steve@ando ~]$ python2.7 e.py
15.0203704834
15.0203704834
10.9672546387
10.9672546387
[steve@ando ~]$ python2.7 e.py
15.9740447998
12.8746032715
12.1593475342
10.9672546387
[steve@ando ~]$ python2.7 e.py
15.9740447998
20.0271606445
15.0203704834
15.9740447998
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Vladimir Ignatov <kmisoft@gmail.com> |
|---|---|
| Date | 2015-08-23 08:07 -0400 |
| Message-ID | <mailman.33.1440331655.17298.python-list@python.org> |
| In reply to | #95580 |
Hi, >> for i in range(1000000): #just to create a time interval, seems this disturb cpu cache? >> pass Python interpreter consumes memory quite extensively because "everything is object". So constructions like: range(1000000): _take_ memory. Additionally it will trigger garbage collecting code on deallocation time so expect even more delay. To get most out of Python - all "numbers crushing" / "pixel pushing" / "store gigabytes" code should go to low-level compiled binary libraries. Vladimir https://itunes.apple.com/us/app/python-code-samples/id1025613117
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-08-23 22:42 +1000 |
| Message-ID | <55d9bfcf$0$1657$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #95586 |
On Sun, 23 Aug 2015 10:07 pm, Vladimir Ignatov wrote: > Hi, > >>> for i in range(1000000): #just to create a time interval, seems this >>> disturb cpu cache? >>> pass > > Python interpreter consumes memory quite extensively because > "everything is object". So constructions like: > > range(1000000): > > _take_ memory. Additionally it will trigger garbage collecting code > on deallocation time so expect even more delay. Normally you would be correct, but as my timing results show, using xrange instead of range does not make any difference. Whatever is going on here, it isn't as simple as "range(1000000) builds a giant list". -- Steven
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web