Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89770 > unrolled thread
| Started by | Cecil Westerhof <Cecil@decebal.nl> |
|---|---|
| First post | 2015-05-02 16:20 +0200 |
| Last post | 2015-05-06 10:24 -0700 |
| Articles | 15 on this page of 35 — 11 participants |
Back to article view | Back to comp.lang.python
Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-02 16:20 +0200
Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-03 17:12 +0200
Re: Throw the cat among the pigeons Paul Moore <p.f.moore@gmail.com> - 2015-05-05 08:47 -0700
Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-05 18:18 +0200
Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-05 14:45 -0400
Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-05 21:42 +0200
Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 12:57 +1000
Re: Throw the cat among the pigeons Chris Angelico <rosuav@gmail.com> - 2015-05-06 14:03 +1000
Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 14:30 -0600
Re: Throw the cat among the pigeons Terry Reedy <tjreedy@udel.edu> - 2015-05-05 16:46 -0400
Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-05 23:12 +0200
Re: Throw the cat among the pigeons Terry Reedy <tjreedy@udel.edu> - 2015-05-05 19:22 -0400
Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-05 17:00 -0400
Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 15:23 -0600
Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 11:27 +1000
Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 23:58 -0600
Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 17:08 +1000
Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-06 10:17 -0600
Re: Throw the cat among the pigeons Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-06 21:23 +0100
Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 15:39 -0600
Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-05 17:55 -0400
Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 12:19 +1000
Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 14:05 +1000
Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 16:26 +1000
Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-06 09:12 -0400
Re: Throw the cat among the pigeons Dan Sommers <dan@tombstonezero.net> - 2015-05-07 03:16 +0000
Re: Throw the cat among the pigeons Chris Angelico <rosuav@gmail.com> - 2015-05-06 23:55 +1000
Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-06 10:22 -0400
Re: Throw the cat among the pigeons Paul Rubin <no.email@nospam.invalid> - 2015-05-06 08:12 -0700
Re: Throw the cat among the pigeons Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2015-05-06 17:36 +0200
Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-06 16:23 -0400
Re: Throw the cat among the pigeons Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2015-05-07 11:14 +0200
Re: Throw the cat among the pigeons Chris Angelico <rosuav@gmail.com> - 2015-05-07 19:38 +1000
Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-06 09:47 -0600
Re: Throw the cat among the pigeons Paul Rubin <no.email@nospam.invalid> - 2015-05-06 10:24 -0700
Page 2 of 2 — ← Prev page 1 [2]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-05-05 17:55 -0400 |
| Message-ID | <mailman.144.1430862924.12865.python-list@python.org> |
| In reply to | #89973 |
On 05/05/2015 05:39 PM, Ian Kelly wrote:
> On Tue, May 5, 2015 at 3:23 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
>> On Tue, May 5, 2015 at 3:00 PM, Dave Angel <davea@davea.name> wrote:
>>> def loop(func, funcname, arg):
>>> start = time.time()
>>> for i in range(repeats):
>>> func(arg, True)
>>> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start))
>>>
>>> start = time.time()
>>> for i in range(repeats):
>>> func(arg)
>>> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start))
>>
>> Note that you're explicitly passing True in one case but leaving the
>> default in the other. I don't know whether that might be responsible
>> for the difference you're seeing.
>
> I don't think that's the cause, but I do think that it has something
> to do with the way the timing is being run. When I run your loop
> function, I do observe the difference. If I reverse the order so that
> the False case is tested first, I observe the opposite. That is, the
> slower case is consistently the one that is timed *first* in the loop
> function, regardless of which case that is.
>
I created two functions and called them with Timeit(), and the
difference is now below 3%
And when I take your lead and double the loop() function so it runs each
test twice, I get steadily decreasing numbers.
I think all of this has been noise caused by the caching of objects
including function objects. I was surprised by this, as the loops are
small enough I'd figure the function object would be fully cached the
first time it was called.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-05-06 12:19 +1000 |
| Message-ID | <55497a28$0$12995$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #89973 |
On Wed, 6 May 2015 02:18 am, Cecil Westerhof wrote: > Well, I did not write many tail recursive functions. But what surprised > me was that for large values the ‘tail recursive’ version was more > efficient as the iterative version. And that was with myself > implementing the tail recursion. I expect the code to be more > efficient when the compiler implements the tail recursion. You cannot know that. Python is not C or Java, and your intuition as to what will be fast and what will be slow will not be accurate in Python. Python has its own fast paths, and slow paths, and they are not the same as C's or Java's. I have been using Python for over 15 years, and I would not want to guess whether a hypothetical compiler-generated tail-recursion-elimination function would be faster or slower than manually unrolling the recursion into a while loop. I am surprised that you find a manually unrolled version with a while loop is faster than an iterative version. I assume the iterative version uses a for-loop. In my experience, a for-loop is faster than a while loop. But perhaps that depends on the exact version and platform. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-05-06 14:05 +1000 |
| Message-ID | <55499304$0$12978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #89770 |
On Sun, 3 May 2015 12:20 am, Cecil Westerhof wrote:
[...]
> But to my surprise tail recursion could even be more efficient. I
> wrote two different versions of factorial with self implemented tail
> recursion. For bigger values both are more efficient. And I expect
> that if the tail recursion is done by the compiler instead of by hand,
> it will be a little faster still.
I decided to do some experimentation.
Firstly, when timing code, one should minimise the amount of "scaffolding"
used around the code you care about. Using a single function like this:
def factorial_iterative(x, flag):
if flag:
# first implementation
else:
# second implementation
is a bad idea, because you are not just timing the implementations, but also
the scaffolding that selects between them. Also, you increase the size of
the function, which increases the chance of cache misses and other
processor effects. So first step is to split the implementations into
separate functions. Here are my four implementations, the first two are
taken from your code:
def factorial_forloop1(n):
assert n >= 0
result = 1
for i in range(2, n + 1):
result *= i
return result
def factorial_forloop2(n):
# Version with a silly extra variable.
assert n >= 0
result = 1
j=2
for i in range(2, n + 1):
result *= j
j += 1
return result
from operator import mul
try:
reduce
except NameError:
from functools import reduce
def factorial_reduce(n):
assert n >= 0
return reduce(mul, range(2, n+1), 1)
def factorial_while(n):
assert n >= 0
result = 1
while n > 1:
result *= n
n -= 1
return result
A quick test to verify that they return the same results:
py> factorial_while(10)
3628800
py> factorial_reduce(10)
3628800
py> factorial_forloop1(10)
3628800
py> factorial_forloop2(10)
3628800
There's no point in optimising code that does the wrong thing!
Now, let's do some timing tests. It is best to use a well-tested timing
framework rather than invent your own dodgy one, so I use the timeit
module. Read the comments in the module to see why rolling your own is a
bad idea.
I'll start with some relative small input sizes. Remember, the inputs may be
small, but the outputs will be very large.
from timeit import Timer
code = "fact(10); fact(20); fact(30)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
I'm in a bit of a hurry, so I may be a bit more slap-dash than normal.
Normally, I would pick a larger number of trials, and a larger number of
iterations per trial, but here I'm going to use just best of three trials,
each of 10000 iterations each:
for t in (t1, t2, t3, t4):
print(min(t.repeat(repeat=3, number=10000)))
which prints:
0.22797810286283493 # while
0.17145151272416115 # reduce
0.16230526939034462 # for-loop
0.22819281555712223 # silly for-loop
(Comments added by me.) See my earlier post for why the minimum is the only
statistic worth looking at. These results show that:
- the version using while is significantly slower than the more
straightforward iterative versions using either reduce or a for loop.
- adding an extra, unnecessary, variable to the for-loop, and incrementing
it by hand, carries as much overhead as using a while loop;
- reduce is slightly slower than a pure Python for-loop (at least according
to this simple trial, on my computer -- your results may vary);
- the obvious winner is the straightforward iterative version with a
for-loop.
Now I'm going to test it with a larger input:
py> big = factorial_forloop1(50000)
py> sys.getsizeof(big) # number of bytes
94460
py> len(str(big)) # number of digits
213237
Recreate the timer objects, and (again, because I'm in something of a hurry)
do a best-of-three with just 2 iterations per trial.
code = "fact(50000)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
for t in (t1, t2, t3, t4):
print(min(t.repeat(repeat=3, number=2)))
which takes about two minutes on my computer, and prints:
8.604736926034093 # while loop
10.786483339965343 # reduce
10.91099695302546 # for loop
10.821452282369137 # silly version of the for loop
(Again, annotations are by me.)
These results are fascinating, and rather surprising. I think that they
demonstrate that, at this size of input argument, the time is dominated by
processing the BigNum int objects, not the iteration: whether we use
reduce, a straight-forward for-loop, or a for-loop with an extra variable
makes very little difference, of the order of 1%.
(I wouldn't read too much into the fact that the for-loop which does *more*
work, but manually adjusting a second variable, is slightly faster. Unless
that can be proven to be consistently the case, I expect that is just
random noise. I ran some quick trials, and did not replicate that result:
py> for t in (t3, t4, t3, t4):
... print(min(t.repeat(repeat=3, number=1)))
...
5.282072028145194 # for-loop
5.415546240285039 # silly for-loop
5.358346642926335 # for-loop
5.328046130016446 # silly for-loop
Note that, as expected, doing two iterations per trial takes about twice as
long as one iteration per trial: 5 seconds versus 10.
What is surprising is that for very large input like this, the while loop is
significantly faster than reduce or either of the for-loops. I cannot
explain that result.
(Just goes to show that timing code in Python can surprise you when you
least expect it.)
Oh, and for the record, I'm using Python 3.3 on Linux:
py> sys.version
'3.3.0rc3 (default, Sep 27 2012, 18:44:58) \n[GCC 4.1.2 20080704 (Red Hat
4.1.2-52)]'
Results may vary on other platforms and versions.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-05-06 16:26 +1000 |
| Message-ID | <5549b41f$0$12927$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #90025 |
On Wednesday 06 May 2015 14:05, Steven D'Aprano wrote:
[...]
Here are those anomalous timing results again:
> code = "fact(50000)"
> t1 = Timer(code, setup="from __main__ import factorial_while as fact")
> t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
> t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
> t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
> for t in (t1, t2, t3, t4):
> print(min(t.repeat(repeat=3, number=2)))
>
>
> which takes about two minutes on my computer, and prints:
>
> 8.604736926034093 # while loop
> 10.786483339965343 # reduce
> 10.91099695302546 # for loop
> 10.821452282369137 # silly version of the for loop
[...]
> What is surprising is that for very large input like this, the while loop
> is significantly faster than reduce or either of the for-loops. I cannot
> explain that result.
I re-ran the results on a different computer, and got similar results:
7.364120149984956
9.26512472704053
9.141491871327162
9.16900822892785
The while loop is surprisingly faster for calculating fact(50000). A thought
came to mind: the while loop is different from the other three versions, in
that it performs the multiplications from n down to 2, rather than 2 up to
n. Maybe that has something to do with it?
Introducing version five of the factorial:
def factorial_reduce2(n):
assert n >= 0
return reduce(mul, range(n, 1, -1), 1)
t5 = Timer(code, setup="from __main__ import factorial_reduce2 as fact")
for t in (t1, t2, t3, t4, t5):
print(min(t.repeat(repeat=3, number=2)))
And results:
7.36792928725481 # while loop
9.271950023248792 # reduce, counting up
9.14769447594881 # for loop
9.154150342568755 # silly for loop
7.430811045691371 # reduce, counting down
My interpretation of this is that the difference has something to do with
the cost of multiplications. Multiplying upwards seems to be more expensive
than multiplying downwards, a result I never would have predicted, but
that's what I'm seeing. I can only guess that it has something to do with
the way multiplication is implemented, or perhaps the memory management
involved, or something. Who the hell knows?
Just to be sure:
def factorial_forloop3(n):
assert n >= 0
result = 1
for i in range(n, 1, -1):
result *= i
return result
t6 = Timer(code, setup="from __main__ import factorial_forloop3 as fact")
print(min(t6.repeat(repeat=3, number=2)))
which prints 7.36256698705256.
Lesson to be learned: what you think is important may not be, and what you
think is irrelevant may be.
Historical fact: for a period of about 60 years, long after the trick to
curing and preventing scurvy was known, the British navy suffered greatly
from scurvy again (although not to the same degree as they had been in the
17th and 18th centuries). The problem was due to confusion between *lemons*
and *limes*. The navy started by using the term "lime" for both lemons and
sweet limes from the Mediterranean, as well as West Indian limes: three
different citrus fruit. To cut costs and encourage British commerce, the
navy gradually stopping buying lemons from Spain and swapped over almost
entirely to West Indian limes. Unfortunately limes, and especially West
Indian limes, have significantly less Vitamin C than lemons, and the
sailors' daily allowance was about 75% less effective on ships that used
limes than for those that used Spanish lemons. Scurvy, which had practically
be eradicated in the 1850s and 60s, returned, and was a significant factor
in World War One. (Especially for the French and Russian armies.)
This simple confusion wasn't sorted out until the 1920s, long after Vitamin
C was isolated and named. Apparently nobody noticed, or cared, that the two
different fruit tasted different and looked different, or concluded that
perhaps they had different medicinal properties.
But then, for many decades, vinegar and dilute sulphuric acid were
recommended as cures for scurvy *solely* on the basis that they were acidic
just like proven cures oranges, lemons and sauerkraut.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-05-06 09:12 -0400 |
| Message-ID | <mailman.175.1430919653.12865.python-list@python.org> |
| In reply to | #90030 |
On 05/06/2015 02:26 AM, Steven D'Aprano wrote: > On Wednesday 06 May 2015 14:05, Steven D'Aprano wrote: > My interpretation of this is that the difference has something to do with > the cost of multiplications. Multiplying upwards seems to be more expensive > than multiplying downwards, a result I never would have predicted, but > that's what I'm seeing. I can only guess that it has something to do with > the way multiplication is implemented, or perhaps the memory management > involved, or something. Who the hell knows? > I had guessed that the order of multiplication would make a big difference, once the product started getting bigger than the machine word size. Reason I thought that is that if you multiply starting at the top value (and end with multiplying by 2) you're spending more of the time multiplying big-ints. That's why I made sure that both Cecil's and my implementations were counting up, so that wouldn't be a distinction. I'm still puzzled, as it seems your results imply that big-int*int is faster than int*int where the product is also int. That could use some more testing, though. I still say a cutoff of about 10% is where we should draw the line in an interpretive system. Below that, you're frequently measuring noise and coincidence. Remember the days when you knew how many cycles each assembly instruction took, and could simply add them up to compare algorithms? -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Date | 2015-05-07 03:16 +0000 |
| Message-ID | <mielf4$omr$1@dont-email.me> |
| In reply to | #90056 |
On Wed, 06 May 2015 09:12:05 -0400, Dave Angel wrote: > Remember the days when you knew how many cycles each assembly > instruction took, and could simply add them up to compare algorithms? I do! I do! :-) And then the MC68020 came out, and instruction execution overlapped in weird (but predictable) ways, and interrupts would have different (unpredictable) latency depending on how much internal state of the CPU had to be saved and restored. Man, am I *old*. Dan
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-06 23:55 +1000 |
| Message-ID | <mailman.176.1430920559.12865.python-list@python.org> |
| In reply to | #90030 |
On Wed, May 6, 2015 at 11:12 PM, Dave Angel <davea@davea.name> wrote: > I had guessed that the order of multiplication would make a big difference, > once the product started getting bigger than the machine word size. > > Reason I thought that is that if you multiply starting at the top value (and > end with multiplying by 2) you're spending more of the time multiplying > big-ints. > > That's why I made sure that both Cecil's and my implementations were > counting up, so that wouldn't be a distinction. > > I'm still puzzled, as it seems your results imply that big-int*int is faster > than int*int where the product is also int. Are you using Python 2 or Python 3 for your testing? In Py3, there's no type difference, and barely no performance difference as you cross the word-size boundary. (Bigger numbers are a bit slower to work with, but not hugely.) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-05-06 10:22 -0400 |
| Message-ID | <mailman.177.1430922190.12865.python-list@python.org> |
| In reply to | #90030 |
On 05/06/2015 09:55 AM, Chris Angelico wrote: > On Wed, May 6, 2015 at 11:12 PM, Dave Angel <davea@davea.name> wrote: >> I had guessed that the order of multiplication would make a big difference, >> once the product started getting bigger than the machine word size. >> >> Reason I thought that is that if you multiply starting at the top value (and >> end with multiplying by 2) you're spending more of the time multiplying >> big-ints. >> >> That's why I made sure that both Cecil's and my implementations were >> counting up, so that wouldn't be a distinction. >> >> I'm still puzzled, as it seems your results imply that big-int*int is faster >> than int*int where the product is also int. > > Are you using Python 2 or Python 3 for your testing? In Py3, there's > no type difference, and barely no performance difference as you cross > the word-size boundary. (Bigger numbers are a bit slower to work with, > but not hugely.) > Both Cecil and I are using 3.x I'm using 3.4 in particular. And I know int covers both big-int and int32. that's why I called it big-int, rather than long. I was, however, mistaken. it's not that threshold that we're crossing here, but another one, for MUCH larger numbers. factorial of 100000 and of 200000 have 456473 and 97350 digits, respectively. In binary, that would be about 190k bytes and 404k bytes, respectively. I was seeing factorial of 200000 taking about 4.5 times as long as factorial of 100000. All the other increments seemed fairly proportional. I'll bet the difference is something like the memory allocator using a different algorithm for blocks above 256k. Or the cache logic hitting a threshold. If it's caching, of course the threshold will differ wildly between machine architectures. If it's the memory allocator, that could easily vary between Python versions as well. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2015-05-06 08:12 -0700 |
| Message-ID | <8761853fkd.fsf@jester.gateway.sonic.net> |
| In reply to | #90030 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> Multiplying upwards seems to be more expensive than multiplying
> downwards... I can only guess that it has something to do with the way
> multiplication is implemented, or perhaps the memory management
> involved, or something. Who the hell knows?
It seems pretty natural if multiplication uses the obvious
quadratic-time pencil and paper algorithm. The cost of multiplying m*n
is basically w(m)*w(n) where w(x) is the width of x in machine words.
So for factorial where m is the counter and n is the running product,
w(m) is always 1 while w(n) is basically log2(n!). From
from math import log
def xfac(seq):
cost = logfac = 0.0
for i in seq:
logfac += log(i,2)
cost += logfac
return cost
def upward(n): return xfac(xrange(1,n+1))
def downward(n): return xfac(xrange(n,1,-1))
print upward(40000),downward(40000)
I get: 10499542692.6 11652843833.5
A lower number for upward than downward. The difference isn't as large
as your timings, but I think it still gives some explanation of the
effect.
[toc] | [prev] | [next] | [standalone]
| From | Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> |
|---|---|
| Date | 2015-05-06 17:36 +0200 |
| Message-ID | <87pp6doh09.fsf@universite-de-strasbourg.fr.invalid> |
| In reply to | #90059 |
Paul Rubin <no.email@nospam.invalid> writes: > Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: >> Multiplying upwards seems to be more expensive than multiplying >> downwards... I can only guess that it has something to do with the way >> multiplication is implemented, or perhaps the memory management >> involved, or something. Who the hell knows? > > It seems pretty natural if multiplication uses the obvious > quadratic-time pencil and paper algorithm. The cost of multiplying m*n > is basically w(m)*w(n) where w(x) is the width of x in machine words. > So for factorial where m is the counter and n is the running product, > w(m) is always 1 while w(n) is basically log2(n!). From > > from math import log > def xfac(seq): > cost = logfac = 0.0 > for i in seq: > logfac += log(i,2) > cost += logfac > return cost > > def upward(n): return xfac(xrange(1,n+1)) > def downward(n): return xfac(xrange(n,1,-1)) > > print upward(40000),downward(40000) > > I get: 10499542692.6 11652843833.5 > > A lower number for upward than downward. The difference isn't as large > as your timings, but I think it still gives some explanation of the > effect. Yes, plus the time for memory allocation. Since the code uses "r *= ...", space is reallocated when the result doesn't fit. The new size is probably proportional to the current (insufficient) size. This means that overall, you'll need fewer reallocations, because allocations are made in bigger chunks. -- Alain.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2015-05-06 16:23 -0400 |
| Message-ID | <mailman.189.1430943800.12865.python-list@python.org> |
| In reply to | #90060 |
On 05/06/2015 11:36 AM, Alain Ketterlin wrote: > Yes, plus the time for memory allocation. Since the code uses "r *= > ...", space is reallocated when the result doesn't fit. The new size is > probably proportional to the current (insufficient) size. This means > that overall, you'll need fewer reallocations, because allocations are > made in bigger chunks. That sounds plausible, but a=5; a*=4 does not update in place. It calculates and creates a new object. Updating lists can work as you say, but an int is immutable. It's an optimization that might be applied if the code generator were a lot smarter, (and if the ref count is exactly 1), but it would then be confusing to anyone who used id(). -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> |
|---|---|
| Date | 2015-05-07 11:14 +0200 |
| Message-ID | <87lhh0oikp.fsf@universite-de-strasbourg.fr.invalid> |
| In reply to | #90071 |
Dave Angel <davea@davea.name> writes: > On 05/06/2015 11:36 AM, Alain Ketterlin wrote: >> Yes, plus the time for memory allocation. Since the code uses "r *= >> ...", space is reallocated when the result doesn't fit. The new size is >> probably proportional to the current (insufficient) size. This means >> that overall, you'll need fewer reallocations, because allocations are >> made in bigger chunks. > > That sounds plausible, but a=5; a*=4 does not update in place. It > calculates and creates a new object. Updating lists can work as you > say, but an int is immutable. Ah, okay. Even for big ints? If that is the case, my suggestion doesn't explain anything. Anyway, with so many allocations for so little arithmetic, the difference is probably due to the behavior of the allocator (which maybe always finds blocks big enough, since one was released after the previous multiplication, or something like that). The only way to know would be to profile the VM. > It's an optimization that might be applied if the code generator were > a lot smarter, (and if the ref count is exactly 1), but it would then > be confusing to anyone who used id(). "Abandon all hope, ye [optimizer] who enter here." Thanks for the clarification. -- Alain.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-05-07 19:38 +1000 |
| Message-ID | <mailman.202.1430991497.12865.python-list@python.org> |
| In reply to | #90091 |
On Thu, May 7, 2015 at 7:14 PM, Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> wrote: > Dave Angel <davea@davea.name> writes: > >> On 05/06/2015 11:36 AM, Alain Ketterlin wrote: >>> Yes, plus the time for memory allocation. Since the code uses "r *= >>> ...", space is reallocated when the result doesn't fit. The new size is >>> probably proportional to the current (insufficient) size. This means >>> that overall, you'll need fewer reallocations, because allocations are >>> made in bigger chunks. >> >> That sounds plausible, but a=5; a*=4 does not update in place. It >> calculates and creates a new object. Updating lists can work as you >> say, but an int is immutable. > > Ah, okay. Even for big ints? If that is the case, my suggestion doesn't > explain anything. Anyway, with so many allocations for so little > arithmetic, the difference is probably due to the behavior of the > allocator (which maybe always finds blocks big enough, since one was > released after the previous multiplication, or something like that). The > only way to know would be to profile the VM. Yes, all integers are immutable. This is true regardless of the size of the integer, because: x = some_big_long_calculation() y = x y += 1 should never change the value of x. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-05-06 09:47 -0600 |
| Message-ID | <mailman.179.1430927606.12865.python-list@python.org> |
| In reply to | #90059 |
On Wed, May 6, 2015 at 9:12 AM, Paul Rubin <no.email@nospam.invalid> wrote: > Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: >> Multiplying upwards seems to be more expensive than multiplying >> downwards... I can only guess that it has something to do with the way >> multiplication is implemented, or perhaps the memory management >> involved, or something. Who the hell knows? > > It seems pretty natural if multiplication uses the obvious > quadratic-time pencil and paper algorithm. The cost of multiplying m*n > is basically w(m)*w(n) where w(x) is the width of x in machine words. > So for factorial where m is the counter and n is the running product, > w(m) is always 1 while w(n) is basically log2(n!). From > > from math import log > def xfac(seq): > cost = logfac = 0.0 > for i in seq: > logfac += log(i,2) > cost += logfac > return cost > > def upward(n): return xfac(xrange(1,n+1)) > def downward(n): return xfac(xrange(n,1,-1)) > > print upward(40000),downward(40000) > > I get: 10499542692.6 11652843833.5 > > A lower number for upward than downward. The difference isn't as large > as your timings, but I think it still gives some explanation of the > effect. That was my initial thought as well, but the problem is that this actually predicts the *opposite* of what is being reported: upward should be less expensive, not more.
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2015-05-06 10:24 -0700 |
| Message-ID | <871tit39gk.fsf@jester.gateway.sonic.net> |
| In reply to | #90061 |
Ian Kelly <ian.g.kelly@gmail.com> writes: > That was my initial thought as well, but the problem is that this > actually predicts the *opposite* of what is being reported: upward > should be less expensive, not more. Wait, what? Hmm, you're right. Needed coffee, will think about it more later.
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web