Groups > comp.lang.python > #102324 > unrolled thread

Heap Implementation

Started by	"Sven R. Kunze" <srkunze@mail.de>
First post	2016-01-30 23:47 +0100
Last post	2016-02-02 14:53 +0000
Articles	5 — 3 participants

Back to article view | Back to comp.lang.python

  Heap Implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-01-30 23:47 +0100
    Re: Heap Implementation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-01-31 12:48 +1100
      Re: Heap Implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-01 20:32 +0100
        Re: Heap Implementation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-02 16:38 +1100
          Re: Heap Implementation Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-02 14:53 +0000

#102324 — Heap Implementation

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-01-30 23:47 +0100
Subject	Heap Implementation
Message-ID	<mailman.148.1454194055.2338.python-list@python.org>

Hi again,

as the topic of the old thread actually was fully discussed, I dare to 
open a new one.

I finally managed to finish my heap implementation. You can find it at 
https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.

I described my motivations and design decisions at 
http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html 


@Cem
You've been worried about a C implementation. I can assure you that I 
did not intend to rewrite the incredibly fast and well-tested heapq 
implementation. I just re-used it.

I would really be grateful for your feedback as you have first-hand 
experience with heaps.

@srinivas
You might want to have a look at the removal implementation. Do you 
think it would be wiser/faster to switch for the sweeping approach?

I plan to publish some benchmarks to compare heapq and xheap.

@all
What's the best/standardized tool in Python to perform benchmarking? 
Right now, I use a self-made combo of unittest.TestCase and time.time + 
proper formatting.

Best,
Sven


PS: fixing some weird typos and added missing part.

[toc] | [next] | [standalone]

#102326

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2016-01-31 12:48 +1100
Message-ID	<56ad6800$0$14486$c3e8da3@news.astraweb.com>
In reply to	#102324

On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:

> @all
> What's the best/standardized tool in Python to perform benchmarking?

timeit



-- 
Steve

[toc] | [prev] | [next] | [standalone]

#102392

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-02-01 20:32 +0100
Message-ID	<mailman.6.1454355132.3032.python-list@python.org>
In reply to	#102326

On 31.01.2016 02:48, Steven D'Aprano wrote:
> On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:
>
>> @all
>> What's the best/standardized tool in Python to perform benchmarking?
> timeit
Thanks, Steven.

Maybe, I am doing it wrong but I get some weird results:

 >>> min(timeit.Timer('for _ in range(10000): heappop(h)', 'from heapq 
import heappop; h=list(range(10000000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(10000): h.pop()', 'from xheap import 
Heap; h=Heap(range(10000000))').repeat(10, 1))
(0.017267618000005314, 0.01615345600021101)

 >>> min(timeit.Timer('for _ in range(100000): heappop(h)', 'from heapq 
import heappop; h=list(range(10000000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(100000): h.pop()', 'from xheap import 
Heap; h=Heap(range(10000000))').repeat(10, 1))
(0.12321608699949138, 0.13042051299999002)

 >>> min(timeit.Timer('for _ in range(10000): heappop(h)', 'from heapq 
import heappop; h=list(range(1000000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(10000): h.pop()', 'from xheap import 
Heap; h=Heap(range(1000000))').repeat(10, 1))
(0.010081621999233903, 0.008791901999757101)

 >>> min(timeit.Timer('for _ in range(1000000): heappop(h)', 'from heapq 
import heappop; h=list(range(10000000))').repeat(10, 1)), 
min(timeit.Timer('for _ in range(1000000): h.pop()', 'from xheap import 
Heap; h=Heap(range(10000000))').repeat(10, 1))
(0.6218949679996513, 0.7172151949998806)


How can it be that my wrapper is sometimes faster and sometimes slower 
than heapq? I wouldn't mind slower, but faster*?


Best,
Sven


* that behavior is reproducible also for other combos and other machines.

[toc] | [prev] | [next] | [standalone]

#102398

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2016-02-02 16:38 +1100
Message-ID	<56b040c5$0$1590$c3e8da3$5496439d@news.astraweb.com>
In reply to	#102392

On Tuesday 02 February 2016 06:32, Sven R. Kunze wrote:

> On 31.01.2016 02:48, Steven D'Aprano wrote:
>> On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:
>>
>>> @all
>>> What's the best/standardized tool in Python to perform benchmarking?
>> timeit
> Thanks, Steven.
> 
> Maybe, I am doing it wrong but I get some weird results:

You need to understand that in any modern desktop, server or laptop computer 
(embedded devices may be different) there is *constantly* a large amount of 
processing going on in the background. Your OS is constantly responding to 
network events, managing memory, skipping from one process to another, 
scheduling tasks; your anti-virus and other background programs are working; 
your window environment is watching the mouse, etc. So each time you run a 
test, it comes down to pure dumb luck how many other programs are busy at 
the same time. (You can, sometimes, influence that by quitting all other 
applications, unplugging the network cable, and keeping your hands off the 
keyboard and mouse while the test is running. But who does that?)

All that adds noise to the measured times. It's not unusual for timings to 
differ by a factor of ten from one run to the next. The speed will also 
differ depending on processor cache misses, and many other factors that are 
effectively impossible to predict with any certainty. This makes timing 
measurements on modern systems an inherently noisy process.

In effect, each measurement you take is made up of two components:

* the actual time that the code would take if it had exclusive 
  access to the machine with no other programs running, call it t;

* and the noise added by the system, call it Δt.

It is impossible to measure t alone, you always get t+Δt. The aim in timing 
is to reduce the effect of the noise as much as possible.

The way to do this with timeit is:

- don't run extraneous code as part of your test (just measure what 
  you care about, with as little scaffolding around it);

- run that code snippet as many times as you can bear in a loop, 
  *but* let timeit manage the loop; don't try doing it by hand;

- only care about "best of N", where N is as large a number as you 
  can stand;

- averages, standard deviation, and other statistics are pretty 
  much useless here, the only statistic that you care about is
  the minimum.

Out of those four guidelines, you missed three of them:

(1) Your test code includes scaffolding, the "for _ in range..." loop. 
You're timing how expensive it is to build a range object and loop over it.

(2) You picked a tiny number for the number of loops: ten. Timeit defaults 
to one million, which is good for *really* fast and tiny snippets.

I normally reduce that to 10000, but run a larger number of trials. If the 
timing from each trial is too small, I increase the number of loops, and if 
it takes too long (I am impatient) I decrease it, but never below 100.

(3) You then use the repeat() method to calculate the "best of one", which 
is pretty much pointless. There is a timeit() method for "best of one", if 
that's ever useful.

I normally run 5 or 7 trials, and report only the best (smallest).

Here's your code:

>  >>> min(timeit.Timer('for _ in range(10000): heappop(h)', 'from heapq
> import heappop; h=list(range(10000000))').repeat(10, 1)),
> min(timeit.Timer('for _ in range(10000): h.pop()', 'from xheap import
> Heap; h=Heap(range(10000000))').repeat(10, 1))
> (0.017267618000005314, 0.01615345600021101)

Your code would be easier to read and understand if it were split over 
multiple lines. This is not Perl and there's no advantage to one-liners.

Pulling your code apart and re-writing it in a way which I feel is more 
understandable:

t1 = Timer(
    'for _ in range(10000): heappop(h)',  # code being tested
    'from heapq import heappop; h=list(range(10000000))'  # setup
    )

t2 = Timer(
    'for _ in range(10000): h.pop()',
    'from xheap import Heap; h=Heap(range(10000000))'
    )

min(t1.repeat(10, 1))
min(t2.repeat(10, 1))

Something like this will probably be less noisy:

setup = """
from heapq import heappop
from xheap import Heap
a = list(range(10000000))
h = Heap(a)
"""

t1 = Timer("heappop(a)", setup)
t2 = Timer("h.pop()", setup)

# print best of 7 trials, where each trial runs the code snippet 10000 times
print(min(t1.repeat(10000, 7)))
print(min(t2.repeat(10000, 7)))

The times printed will be in seconds per 10000 runs; divide it by 10 to get 
the time in *milliseconds* per run.

Note that this will *not* eliminate all noise or variation from one timing 
test to the other, but it will (I hope!) reduce it. If you're still seeing a 
lot of variation, try turning the garbage collector off and quitting as many 
other applications as possible.

If anyone else can suggest any other techniques for reducing noise, I'd love 
to hear about them.

Also note that times generated by timeit with one version of Python may not 
be measuring the same thing as those using a different version. I believe 
that Python 3.4 and better will do a better job of measuring only the time 
the code snippet process is active, rather than wall-time. But that's also 
operating system dependent.

> How can it be that my wrapper is sometimes faster and sometimes slower
> than heapq? I wouldn't mind slower, but faster*?

Maybe your code genuinely is "about as fast" as heapq, so the differences 
you're seeing are just due to noise in the system.

-- 
Steve

[toc] | [prev] | [next] | [standalone]

#102407

From	Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date	2016-02-02 14:53 +0000
Message-ID	<mailman.16.1454424804.3032.python-list@python.org>
In reply to	#102398

On 2 February 2016 at 05:38, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>
> In effect, each measurement you take is made up of two components:
>
> * the actual time that the code would take if it had exclusive
>   access to the machine with no other programs running, call it t;
>
> * and the noise added by the system, call it Δt.
>
> It is impossible to measure t alone, you always get t+Δt. The aim in timing
> is to reduce the effect of the noise as much as possible.

That's a reasonable model in some cases but not all. In particular the
"actual time" part is itself not a constant but rather depends on all
sorts of things like RAM caching etc. On the CPython side there could
be all kinds of optimisations that would kick in at different times to
change the "actual time". Likewise there are optimisations at OS and
hardware levels that kick in at different times.

> The way to do this with timeit is:
>
> - don't run extraneous code as part of your test (just measure what
>   you care about, with as little scaffolding around it);
>
> - run that code snippet as many times as you can bear in a loop,
>   *but* let timeit manage the loop; don't try doing it by hand;

There are many situations where running something in a loop over and
over triggers all kinds of optimisations. PyPy's jit warmup is an
example but the effect can occur at the hardware level as well.
Performance of code run repeatedly in a tight loop can be different to
the same code called once or called sporadically in a long-running
process.

This is especially true of code that operates on a data structure. In
real usage the data structure might be accessed rarely but a
tight-profiling loop forces the entire structure into the CPU's
caches.

> - only care about "best of N", where N is as large a number as you
>   can stand;
>
> - averages, standard deviation, and other statistics are pretty
>   much useless here, the only statistic that you care about is
>   the minimum.

This argument makes sense if you adopt the t+Δt model described above
but that model it is just a model. The minimum from micro-benchmarking
in a tight-loop is not necessarily a more representative statistic
than mean, median or whatever else. Standard deviation is actually
useful for assessing the variability of time taken which can also be
important and gives additional information about performance: I would
like to minimise both the expected time taken and also the variability
in that simultaneously if possible.

You seem to be describing timeit's way of profiling which is
reasonable but it's not the only way.

--
Oscar

[toc] | [prev] | [standalone]

csiph-web

Heap Implementation

Contents

#102324 — Heap Implementation

#102326

#102392

#102398

#102407