Groups > comp.lang.python > #89770 > unrolled thread

Throw the cat among the pigeons

Started by	Cecil Westerhof <Cecil@decebal.nl>
First post	2015-05-02 16:20 +0200
Last post	2015-05-06 10:24 -0700
Articles	15 on this page of 35 — 11 participants

Back to article view | Back to comp.lang.python

  Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-02 16:20 +0200
    Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-03 17:12 +0200
      Re: Throw the cat among the pigeons Paul  Moore <p.f.moore@gmail.com> - 2015-05-05 08:47 -0700
        Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-05 18:18 +0200
          Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-05 14:45 -0400
            Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-05 21:42 +0200
              Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 12:57 +1000
                Re: Throw the cat among the pigeons Chris Angelico <rosuav@gmail.com> - 2015-05-06 14:03 +1000
          Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 14:30 -0600
          Re: Throw the cat among the pigeons Terry Reedy <tjreedy@udel.edu> - 2015-05-05 16:46 -0400
            Re: Throw the cat among the pigeons Cecil Westerhof <Cecil@decebal.nl> - 2015-05-05 23:12 +0200
              Re: Throw the cat among the pigeons Terry Reedy <tjreedy@udel.edu> - 2015-05-05 19:22 -0400
          Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-05 17:00 -0400
          Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 15:23 -0600
            Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 11:27 +1000
              Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 23:58 -0600
                Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 17:08 +1000
                  Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-06 10:17 -0600
                  Re: Throw the cat among the pigeons Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-05-06 21:23 +0100
          Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-05 15:39 -0600
          Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-05 17:55 -0400
          Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 12:19 +1000
    Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 14:05 +1000
      Re: Throw the cat among the pigeons Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-05-06 16:26 +1000
        Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-06 09:12 -0400
          Re: Throw the cat among the pigeons Dan Sommers <dan@tombstonezero.net> - 2015-05-07 03:16 +0000
        Re: Throw the cat among the pigeons Chris Angelico <rosuav@gmail.com> - 2015-05-06 23:55 +1000
        Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-06 10:22 -0400
        Re: Throw the cat among the pigeons Paul Rubin <no.email@nospam.invalid> - 2015-05-06 08:12 -0700
          Re: Throw the cat among the pigeons Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2015-05-06 17:36 +0200
            Re: Throw the cat among the pigeons Dave Angel <davea@davea.name> - 2015-05-06 16:23 -0400
              Re: Throw the cat among the pigeons Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> - 2015-05-07 11:14 +0200
                Re: Throw the cat among the pigeons Chris Angelico <rosuav@gmail.com> - 2015-05-07 19:38 +1000
          Re: Throw the cat among the pigeons Ian Kelly <ian.g.kelly@gmail.com> - 2015-05-06 09:47 -0600
            Re: Throw the cat among the pigeons Paul Rubin <no.email@nospam.invalid> - 2015-05-06 10:24 -0700

Page 2 of 2 — ← Prev page 1 [2]

#90005

From	Dave Angel <davea@davea.name>
Date	2015-05-05 17:55 -0400
Message-ID	<mailman.144.1430862924.12865.python-list@python.org>
In reply to	#89973

On 05/05/2015 05:39 PM, Ian Kelly wrote:
> On Tue, May 5, 2015 at 3:23 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
>> On Tue, May 5, 2015 at 3:00 PM, Dave Angel <davea@davea.name> wrote:
>>> def loop(func, funcname, arg):
>>>      start = time.time()
>>>      for i in range(repeats):
>>>          func(arg, True)
>>>      print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start))
>>>
>>>      start = time.time()
>>>      for i in range(repeats):
>>>          func(arg)
>>>      print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start))
>>
>> Note that you're explicitly passing True in one case but leaving the
>> default in the other. I don't know whether that might be responsible
>> for the difference you're seeing.
>
> I don't think that's the cause, but I do think that it has something
> to do with the way the timing is being run. When I run your loop
> function, I do observe the difference. If I reverse the order so that
> the False case is tested first, I observe the opposite. That is, the
> slower case is consistently the one that is timed *first* in the loop
> function, regardless of which case that is.
>

I created two functions and called them with Timeit(), and the 
difference is now below 3%

And when I take your lead and double the loop() function so it runs each 
test twice, I get steadily decreasing numbers.

I think all of this has been noise caused by the caching of objects 
including function objects.  I was surprised by this, as the loops are 
small enough I'd figure the function object would be fully cached the 
first time it was called.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#90021

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-06 12:19 +1000
Message-ID	<55497a28$0$12995$c3e8da3$5496439d@news.astraweb.com>
In reply to	#89973

On Wed, 6 May 2015 02:18 am, Cecil Westerhof wrote:

> Well, I did not write many tail recursive functions. But what surprised
> me was that for large values the ‘tail recursive’ version was more
> efficient as the iterative version. And that was with myself
> implementing the tail recursion. I expect the code to be more
> efficient when the compiler implements the tail recursion.

You cannot know that. Python is not C or Java, and your intuition as to what
will be fast and what will be slow will not be accurate in Python. Python
has its own fast paths, and slow paths, and they are not the same as C's or
Java's.

I have been using Python for over 15 years, and I would not want to guess
whether a hypothetical compiler-generated tail-recursion-elimination
function would be faster or slower than manually unrolling the recursion
into a while loop.

I am surprised that you find a manually unrolled version with a while loop
is faster than an iterative version. I assume the iterative version uses a
for-loop. In my experience, a for-loop is faster than a while loop.

But perhaps that depends on the exact version and platform.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#90025

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-06 14:05 +1000
Message-ID	<55499304$0$12978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#89770

On Sun, 3 May 2015 12:20 am, Cecil Westerhof wrote:

[...]
> But to my surprise tail recursion could even be more efficient. I
> wrote two different versions of factorial with self implemented tail
> recursion. For bigger values both are more efficient. And I expect
> that if the tail recursion is done by the compiler instead of by hand,
> it will be a little faster still.

I decided to do some experimentation.

Firstly, when timing code, one should minimise the amount of "scaffolding"
used around the code you care about. Using a single function like this:

def factorial_iterative(x, flag):
    if flag:
        # first implementation
    else:
        # second implementation

is a bad idea, because you are not just timing the implementations, but also
the scaffolding that selects between them. Also, you increase the size of
the function, which increases the chance of cache misses and other
processor effects. So first step is to split the implementations into
separate functions. Here are my four implementations, the first two are
taken from your code:

def factorial_forloop1(n):
    assert n >= 0
    result = 1
    for i in range(2, n + 1):
        result *= i
    return result

def factorial_forloop2(n):
    # Version with a silly extra variable.
    assert n >= 0
    result = 1
    j=2
    for i in range(2, n + 1):
        result *= j
        j += 1
    return result

from operator import mul
try:
    reduce
except NameError:
    from functools import reduce

def factorial_reduce(n):
    assert n >= 0
    return reduce(mul, range(2, n+1), 1)

def factorial_while(n):
    assert n >= 0
    result = 1
    while n > 1:
        result *= n
        n -= 1
    return result

A quick test to verify that they return the same results:

py> factorial_while(10)
3628800
py> factorial_reduce(10)
3628800
py> factorial_forloop1(10)
3628800
py> factorial_forloop2(10)
3628800

There's no point in optimising code that does the wrong thing!

Now, let's do some timing tests. It is best to use a well-tested timing
framework rather than invent your own dodgy one, so I use the timeit
module. Read the comments in the module to see why rolling your own is a
bad idea.

I'll start with some relative small input sizes. Remember, the inputs may be
small, but the outputs will be very large.

from timeit import Timer
code = "fact(10); fact(20); fact(30)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")

I'm in a bit of a hurry, so I may be a bit more slap-dash than normal.
Normally, I would pick a larger number of trials, and a larger number of
iterations per trial, but here I'm going to use just best of three trials,
each of 10000 iterations each:

for t in (t1, t2, t3, t4):
    print(min(t.repeat(repeat=3, number=10000)))

which prints:

0.22797810286283493  # while
0.17145151272416115  # reduce
0.16230526939034462  # for-loop
0.22819281555712223  # silly for-loop

(Comments added by me.) See my earlier post for why the minimum is the only
statistic worth looking at. These results show that:

- the version using while is significantly slower than the more
straightforward iterative versions using either reduce or a for loop.

- adding an extra, unnecessary, variable to the for-loop, and incrementing
it by hand, carries as much overhead as using a while loop;

- reduce is slightly slower than a pure Python for-loop (at least according
to this simple trial, on my computer -- your results may vary);

- the obvious winner is the straightforward iterative version with a
for-loop.

Now I'm going to test it with a larger input:

py> big = factorial_forloop1(50000)
py> sys.getsizeof(big)  # number of bytes
94460
py> len(str(big))  # number of digits
213237

Recreate the timer objects, and (again, because I'm in something of a hurry)
do a best-of-three with just 2 iterations per trial.

code = "fact(50000)"
t1 = Timer(code, setup="from __main__ import factorial_while as fact")
t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
for t in (t1, t2, t3, t4):
    print(min(t.repeat(repeat=3, number=2)))

which takes about two minutes on my computer, and prints:

8.604736926034093  # while loop
10.786483339965343  # reduce
10.91099695302546  # for loop
10.821452282369137  # silly version of the for loop

(Again, annotations are by me.)

These results are fascinating, and rather surprising. I think that they
demonstrate that, at this size of input argument, the time is dominated by
processing the BigNum int objects, not the iteration: whether we use
reduce, a straight-forward for-loop, or a for-loop with an extra variable
makes very little difference, of the order of 1%.

(I wouldn't read too much into the fact that the for-loop which does *more*
work, but manually adjusting a second variable, is slightly faster. Unless
that can be proven to be consistently the case, I expect that is just
random noise. I ran some quick trials, and did not replicate that result:

py> for t in (t3, t4, t3, t4):
...     print(min(t.repeat(repeat=3, number=1)))
...
5.282072028145194  # for-loop
5.415546240285039  # silly for-loop
5.358346642926335  # for-loop
5.328046130016446  # silly for-loop

Note that, as expected, doing two iterations per trial takes about twice as
long as one iteration per trial: 5 seconds versus 10.

What is surprising is that for very large input like this, the while loop is
significantly faster than reduce or either of the for-loops. I cannot
explain that result.

(Just goes to show that timing code in Python can surprise you when you
least expect it.)

Oh, and for the record, I'm using Python 3.3 on Linux:

py> sys.version
'3.3.0rc3 (default, Sep 27 2012, 18:44:58) \n[GCC 4.1.2 20080704 (Red Hat
4.1.2-52)]'

Results may vary on other platforms and versions.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#90030

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2015-05-06 16:26 +1000
Message-ID	<5549b41f$0$12927$c3e8da3$5496439d@news.astraweb.com>
In reply to	#90025

On Wednesday 06 May 2015 14:05, Steven D'Aprano wrote:

[...]

Here are those anomalous timing results again:

> code = "fact(50000)"
> t1 = Timer(code, setup="from __main__ import factorial_while as fact")
> t2 = Timer(code, setup="from __main__ import factorial_reduce as fact")
> t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact")
> t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact")
> for t in (t1, t2, t3, t4):
>     print(min(t.repeat(repeat=3, number=2)))
> 
> 
> which takes about two minutes on my computer, and prints:
> 
> 8.604736926034093  # while loop
> 10.786483339965343  # reduce
> 10.91099695302546  # for loop
> 10.821452282369137  # silly version of the for loop
[...]
> What is surprising is that for very large input like this, the while loop
> is significantly faster than reduce or either of the for-loops. I cannot
> explain that result.

I re-ran the results on a different computer, and got similar results:

7.364120149984956
9.26512472704053
9.141491871327162
9.16900822892785

The while loop is surprisingly faster for calculating fact(50000). A thought 
came to mind: the while loop is different from the other three versions, in 
that it performs the multiplications from n down to 2, rather than 2 up to 
n. Maybe that has something to do with it?

Introducing version five of the factorial:

def factorial_reduce2(n):
    assert n >= 0
    return reduce(mul, range(n, 1, -1), 1)

t5 = Timer(code, setup="from __main__ import factorial_reduce2 as fact")
for t in (t1, t2, t3, t4, t5):
    print(min(t.repeat(repeat=3, number=2)))

And results:

7.36792928725481  # while loop
9.271950023248792  # reduce, counting up
9.14769447594881  # for loop
9.154150342568755  # silly for loop
7.430811045691371  # reduce, counting down

My interpretation of this is that the difference has something to do with 
the cost of multiplications. Multiplying upwards seems to be more expensive 
than multiplying downwards, a result I never would have predicted, but 
that's what I'm seeing. I can only guess that it has something to do with 
the way multiplication is implemented, or perhaps the memory management 
involved, or something. Who the hell knows?

Just to be sure:

def factorial_forloop3(n):
    assert n >= 0
    result = 1
    for i in range(n, 1, -1):
        result *= i
    return result

t6 = Timer(code, setup="from __main__ import factorial_forloop3 as fact")
print(min(t6.repeat(repeat=3, number=2)))

which prints 7.36256698705256.

Lesson to be learned: what you think is important may not be, and what you 
think is irrelevant may be.

Historical fact: for a period of about 60 years, long after the trick to 
curing and preventing scurvy was known, the British navy suffered greatly 
from scurvy again (although not to the same degree as they had been in the 
17th and 18th centuries). The problem was due to confusion between *lemons* 
and *limes*. The navy started by using the term "lime" for both lemons and 
sweet limes from the Mediterranean, as well as West Indian limes: three 
different citrus fruit. To cut costs and encourage British commerce, the 
navy gradually stopping buying lemons from Spain and swapped over almost 
entirely to West Indian limes. Unfortunately limes, and especially West 
Indian limes, have significantly less Vitamin C than lemons, and the 
sailors' daily allowance was about 75% less effective on ships that used 
limes than for those that used Spanish lemons. Scurvy, which had practically 
be eradicated in the 1850s and 60s, returned, and was a significant factor 
in World War One. (Especially for the French and Russian armies.)

This simple confusion wasn't sorted out until the 1920s, long after Vitamin 
C was isolated and named. Apparently nobody noticed, or cared, that the two 
different fruit tasted different and looked different, or concluded that 
perhaps they had different medicinal properties.

But then, for many decades, vinegar and dilute sulphuric acid were 
recommended as cures for scurvy *solely* on the basis that they were acidic 
just like proven cures oranges, lemons and sauerkraut.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#90056

From	Dave Angel <davea@davea.name>
Date	2015-05-06 09:12 -0400
Message-ID	<mailman.175.1430919653.12865.python-list@python.org>
In reply to	#90030

On 05/06/2015 02:26 AM, Steven D'Aprano wrote:
> On Wednesday 06 May 2015 14:05, Steven D'Aprano wrote:

> My interpretation of this is that the difference has something to do with
> the cost of multiplications. Multiplying upwards seems to be more expensive
> than multiplying downwards, a result I never would have predicted, but
> that's what I'm seeing. I can only guess that it has something to do with
> the way multiplication is implemented, or perhaps the memory management
> involved, or something. Who the hell knows?
>

I had guessed that the order of multiplication would make a big 
difference, once the product started getting bigger than the machine 
word size.

Reason I thought that is that if you multiply starting at the top value 
(and end with multiplying by 2) you're spending more of the time 
multiplying big-ints.

That's why I made sure that both Cecil's and my implementations were 
counting up, so that wouldn't be a distinction.

I'm still puzzled, as it seems your results imply that big-int*int is 
faster than int*int where the product is also int.

That could use some more testing, though.

I still say a cutoff of about 10% is where we should draw the line in an 
interpretive system.  Below that, you're frequently measuring noise and 
coincidence.

Remember the days when you knew how many cycles each assembly 
instruction took, and could simply add them up to compare algorithms?

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#90086

From	Dan Sommers <dan@tombstonezero.net>
Date	2015-05-07 03:16 +0000
Message-ID	<mielf4$omr$1@dont-email.me>
In reply to	#90056

On Wed, 06 May 2015 09:12:05 -0400, Dave Angel wrote:

> Remember the days when you knew how many cycles each assembly
> instruction took, and could simply add them up to compare algorithms?

I do!  I do!  :-)

And then the MC68020 came out, and instruction execution overlapped in
weird (but predictable) ways, and interrupts would have different
(unpredictable) latency depending on how much internal state of the CPU
had to be saved and restored.

Man, am I *old*.

Dan

[toc] | [prev] | [next] | [standalone]

#90057

From	Chris Angelico <rosuav@gmail.com>
Date	2015-05-06 23:55 +1000
Message-ID	<mailman.176.1430920559.12865.python-list@python.org>
In reply to	#90030

On Wed, May 6, 2015 at 11:12 PM, Dave Angel <davea@davea.name> wrote:
> I had guessed that the order of multiplication would make a big difference,
> once the product started getting bigger than the machine word size.
>
> Reason I thought that is that if you multiply starting at the top value (and
> end with multiplying by 2) you're spending more of the time multiplying
> big-ints.
>
> That's why I made sure that both Cecil's and my implementations were
> counting up, so that wouldn't be a distinction.
>
> I'm still puzzled, as it seems your results imply that big-int*int is faster
> than int*int where the product is also int.

Are you using Python 2 or Python 3 for your testing? In Py3, there's
no type difference, and barely no performance difference as you cross
the word-size boundary. (Bigger numbers are a bit slower to work with,
but not hugely.)

ChrisA

[toc] | [prev] | [next] | [standalone]

#90058

From	Dave Angel <davea@davea.name>
Date	2015-05-06 10:22 -0400
Message-ID	<mailman.177.1430922190.12865.python-list@python.org>
In reply to	#90030

On 05/06/2015 09:55 AM, Chris Angelico wrote:
> On Wed, May 6, 2015 at 11:12 PM, Dave Angel <davea@davea.name> wrote:
>> I had guessed that the order of multiplication would make a big difference,
>> once the product started getting bigger than the machine word size.
>>
>> Reason I thought that is that if you multiply starting at the top value (and
>> end with multiplying by 2) you're spending more of the time multiplying
>> big-ints.
>>
>> That's why I made sure that both Cecil's and my implementations were
>> counting up, so that wouldn't be a distinction.
>>
>> I'm still puzzled, as it seems your results imply that big-int*int is faster
>> than int*int where the product is also int.
>
> Are you using Python 2 or Python 3 for your testing? In Py3, there's
> no type difference, and barely no performance difference as you cross
> the word-size boundary. (Bigger numbers are a bit slower to work with,
> but not hugely.)
>

Both Cecil and I are using 3.x  I'm using 3.4 in particular.  And I know 
int covers both big-int and int32.  that's why I called it big-int, 
rather than long.

I was, however, mistaken.  it's not that threshold that we're crossing 
here, but another one, for MUCH larger numbers.  factorial of 100000 and 
of 200000 have 456473 and 97350 digits, respectively.  In binary, that 
would be about 190k bytes and 404k bytes, respectively.

I was seeing factorial of 200000 taking about 4.5 times as long as 
factorial of 100000.  All the other increments seemed fairly proportional.

I'll bet the difference is something like the memory allocator using a 
different algorithm for blocks above 256k.  Or the cache logic hitting a 
threshold.

If it's caching, of course the threshold will differ wildly between 
machine architectures.

If it's the memory allocator, that could easily vary between Python 
versions as well.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#90059

From	Paul Rubin <no.email@nospam.invalid>
Date	2015-05-06 08:12 -0700
Message-ID	<8761853fkd.fsf@jester.gateway.sonic.net>
In reply to	#90030

Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> Multiplying upwards seems to be more expensive than multiplying
> downwards... I can only guess that it has something to do with the way
> multiplication is implemented, or perhaps the memory management
> involved, or something. Who the hell knows?

It seems pretty natural if multiplication uses the obvious
quadratic-time pencil and paper algorithm.  The cost of multiplying m*n
is basically w(m)*w(n) where w(x) is the width of x in machine words.
So for factorial where m is the counter and n is the running product,
w(m) is always 1 while w(n) is basically log2(n!).  From

    from math import log
    def xfac(seq):
        cost = logfac = 0.0
        for i in seq:
            logfac += log(i,2)
            cost += logfac
        return cost

    def upward(n): return xfac(xrange(1,n+1))
    def downward(n): return xfac(xrange(n,1,-1))

    print upward(40000),downward(40000)

I get: 10499542692.6 11652843833.5

A lower number for upward than downward.  The difference isn't as large
as your timings, but I think it still gives some explanation of the
effect.

[toc] | [prev] | [next] | [standalone]

#90060

From	Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid>
Date	2015-05-06 17:36 +0200
Message-ID	<87pp6doh09.fsf@universite-de-strasbourg.fr.invalid>
In reply to	#90059

Paul Rubin <no.email@nospam.invalid> writes:

> Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
>> Multiplying upwards seems to be more expensive than multiplying
>> downwards... I can only guess that it has something to do with the way
>> multiplication is implemented, or perhaps the memory management
>> involved, or something. Who the hell knows?
>
> It seems pretty natural if multiplication uses the obvious
> quadratic-time pencil and paper algorithm.  The cost of multiplying m*n
> is basically w(m)*w(n) where w(x) is the width of x in machine words.
> So for factorial where m is the counter and n is the running product,
> w(m) is always 1 while w(n) is basically log2(n!).  From
>
>     from math import log
>     def xfac(seq):
>         cost = logfac = 0.0
>         for i in seq:
>             logfac += log(i,2)
>             cost += logfac
>         return cost
>
>     def upward(n): return xfac(xrange(1,n+1))
>     def downward(n): return xfac(xrange(n,1,-1))
>
>     print upward(40000),downward(40000)
>
> I get: 10499542692.6 11652843833.5
>
> A lower number for upward than downward.  The difference isn't as large
> as your timings, but I think it still gives some explanation of the
> effect.

Yes, plus the time for memory allocation. Since the code uses "r *=
...", space is reallocated when the result doesn't fit. The new size is
probably proportional to the current (insufficient) size. This means
that overall, you'll need fewer reallocations, because allocations are
made in bigger chunks.

-- Alain.

[toc] | [prev] | [next] | [standalone]

#90071

From	Dave Angel <davea@davea.name>
Date	2015-05-06 16:23 -0400
Message-ID	<mailman.189.1430943800.12865.python-list@python.org>
In reply to	#90060

On 05/06/2015 11:36 AM, Alain Ketterlin wrote:
> Yes, plus the time for memory allocation. Since the code uses "r *=
> ...", space is reallocated when the result doesn't fit. The new size is
> probably proportional to the current (insufficient) size. This means
> that overall, you'll need fewer reallocations, because allocations are
> made in bigger chunks.

That sounds plausible, but  a=5; a*=4  does not update in place. It 
calculates and creates a new object.  Updating lists can work as you 
say, but an int is immutable.

It's an optimization that might be applied if the code generator were a 
lot smarter, (and if the ref count is exactly 1), but it would then be 
confusing to anyone who used id().

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#90091

From	Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid>
Date	2015-05-07 11:14 +0200
Message-ID	<87lhh0oikp.fsf@universite-de-strasbourg.fr.invalid>
In reply to	#90071

Dave Angel <davea@davea.name> writes:

> On 05/06/2015 11:36 AM, Alain Ketterlin wrote:
>> Yes, plus the time for memory allocation. Since the code uses "r *=
>> ...", space is reallocated when the result doesn't fit. The new size is
>> probably proportional to the current (insufficient) size. This means
>> that overall, you'll need fewer reallocations, because allocations are
>> made in bigger chunks.
>
> That sounds plausible, but  a=5; a*=4  does not update in place. It
> calculates and creates a new object.  Updating lists can work as you
> say, but an int is immutable.

Ah, okay. Even for big ints? If that is the case, my suggestion doesn't
explain anything. Anyway, with so many allocations for so little
arithmetic, the difference is probably due to the behavior of the
allocator (which maybe always finds blocks big enough, since one was
released after the previous multiplication, or something like that). The
only way to know would be to profile the VM.

> It's an optimization that might be applied if the code generator were
> a lot smarter, (and if the ref count is exactly 1), but it would then
> be confusing to anyone who used id().

"Abandon all hope, ye [optimizer] who enter here."

Thanks for the clarification.

-- Alain.

[toc] | [prev] | [next] | [standalone]

#90093

From	Chris Angelico <rosuav@gmail.com>
Date	2015-05-07 19:38 +1000
Message-ID	<mailman.202.1430991497.12865.python-list@python.org>
In reply to	#90091

On Thu, May 7, 2015 at 7:14 PM, Alain Ketterlin
<alain@universite-de-strasbourg.fr.invalid> wrote:
> Dave Angel <davea@davea.name> writes:
>
>> On 05/06/2015 11:36 AM, Alain Ketterlin wrote:
>>> Yes, plus the time for memory allocation. Since the code uses "r *=
>>> ...", space is reallocated when the result doesn't fit. The new size is
>>> probably proportional to the current (insufficient) size. This means
>>> that overall, you'll need fewer reallocations, because allocations are
>>> made in bigger chunks.
>>
>> That sounds plausible, but  a=5; a*=4  does not update in place. It
>> calculates and creates a new object.  Updating lists can work as you
>> say, but an int is immutable.
>
> Ah, okay. Even for big ints? If that is the case, my suggestion doesn't
> explain anything. Anyway, with so many allocations for so little
> arithmetic, the difference is probably due to the behavior of the
> allocator (which maybe always finds blocks big enough, since one was
> released after the previous multiplication, or something like that). The
> only way to know would be to profile the VM.

Yes, all integers are immutable. This is true regardless of the size
of the integer, because:

x = some_big_long_calculation()
y = x
y += 1

should never change the value of x.

ChrisA

[toc] | [prev] | [next] | [standalone]

#90061

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2015-05-06 09:47 -0600
Message-ID	<mailman.179.1430927606.12865.python-list@python.org>
In reply to	#90059

On Wed, May 6, 2015 at 9:12 AM, Paul Rubin <no.email@nospam.invalid> wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
>> Multiplying upwards seems to be more expensive than multiplying
>> downwards... I can only guess that it has something to do with the way
>> multiplication is implemented, or perhaps the memory management
>> involved, or something. Who the hell knows?
>
> It seems pretty natural if multiplication uses the obvious
> quadratic-time pencil and paper algorithm.  The cost of multiplying m*n
> is basically w(m)*w(n) where w(x) is the width of x in machine words.
> So for factorial where m is the counter and n is the running product,
> w(m) is always 1 while w(n) is basically log2(n!).  From
>
>     from math import log
>     def xfac(seq):
>         cost = logfac = 0.0
>         for i in seq:
>             logfac += log(i,2)
>             cost += logfac
>         return cost
>
>     def upward(n): return xfac(xrange(1,n+1))
>     def downward(n): return xfac(xrange(n,1,-1))
>
>     print upward(40000),downward(40000)
>
> I get: 10499542692.6 11652843833.5
>
> A lower number for upward than downward.  The difference isn't as large
> as your timings, but I think it still gives some explanation of the
> effect.

That was my initial thought as well, but the problem is that this
actually predicts the *opposite* of what is being reported: upward
should be less expensive, not more.

[toc] | [prev] | [next] | [standalone]

#90065

From	Paul Rubin <no.email@nospam.invalid>
Date	2015-05-06 10:24 -0700
Message-ID	<871tit39gk.fsf@jester.gateway.sonic.net>
In reply to	#90061

Ian Kelly <ian.g.kelly@gmail.com> writes:
> That was my initial thought as well, but the problem is that this
> actually predicts the *opposite* of what is being reported: upward
> should be less expensive, not more.

Wait, what?  Hmm, you're right.  Needed coffee, will think about it more
later.

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Throw the cat among the pigeons

Contents

#90005

#90021

#90025

#90030

#90056

#90086

#90057

#90058

#90059

#90060

#90071

#90091

#90093

#90061

#90065