Groups > comp.lang.python > #102506 > unrolled thread

_siftup and _siftdown implementation

Started by	srinivas devaki <mr.eightnoteight@gmail.com>
First post	2016-02-05 02:20 +0530
Last post	2016-02-06 20:16 +0530
Articles	13 — 4 participants

Back to article view | Back to comp.lang.python

  _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 02:20 +0530
    Re: _siftup and _siftdown implementation Steven D'Aprano <steve@pearwood.info> - 2016-02-05 11:12 +1100
      Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 01:21 +0100
      Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 06:56 +0530
      Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 15:42 +0100
      Re: _siftup and _siftdown implementation Bernardo Sulzbach <mafagafogigante@gmail.com> - 2016-02-05 12:48 -0200
      Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 15:55 +0100
      Re: _siftup and _siftdown implementation Bernardo Sulzbach <mafagafogigante@gmail.com> - 2016-02-05 12:59 -0200
      Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 21:15 +0530
      Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 17:27 +0100
      Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 17:35 +0100
      Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 23:12 +0530
      Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-06 20:16 +0530

#102506 — _siftup and _siftdown implementation

From	srinivas devaki <mr.eightnoteight@gmail.com>
Date	2016-02-05 02:20 +0530
Subject	_siftup and _siftdown implementation
Message-ID	<mailman.72.1454619005.30993.python-list@python.org>

_siftdown function breaks out of the loop when the current pos has a valid
parent.

but _siftup function is not implemented in that fashion, if a valid subheap
is given to the _siftup, it will bring down the root of sub heap and then
again bring it up to its original place.

I was wondering why it is so, is it just to make the code look simple???

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight

[toc] | [next] | [standalone]

#102508

From	Steven D'Aprano <steve@pearwood.info>
Date	2016-02-05 11:12 +1100
Message-ID	<56b3e902$0$1613$c3e8da3$5496439d@news.astraweb.com>
In reply to	#102506

On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote:

> _siftdown function breaks out of the loop when the current pos has a valid
> parent.
> 
> but _siftup function is not implemented in that fashion, if a valid
> subheap is given to the _siftup, it will bring down the root of sub heap
> and then again bring it up to its original place.
> 
> I was wondering why it is so, is it just to make the code look simple???

Hi Srinivas,

I'm sure that your question is obvious to you, but it's not obvious to us.
Where are _siftup and _siftdown defined? Are they in your code? Somebody
else's code? A library? Which library? What do they do? Where are they
from?

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#102509

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-02-05 01:21 +0100
Message-ID	<mailman.74.1454631720.30993.python-list@python.org>
In reply to	#102508

On 05.02.2016 01:12, Steven D'Aprano wrote:
> On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote:
>
>> _siftdown function breaks out of the loop when the current pos has a valid
>> parent.
>>
>> but _siftup function is not implemented in that fashion, if a valid
>> subheap is given to the _siftup, it will bring down the root of sub heap
>> and then again bring it up to its original place.
>>
>> I was wondering why it is so, is it just to make the code look simple???
> Hi Srinivas,
>
> I'm sure that your question is obvious to you, but it's not obvious to us.
> Where are _siftup and _siftdown defined? Are they in your code? Somebody
> else's code? A library? Which library? What do they do? Where are they
> from?


The question originated here: 
https://github.com/srkunze/xheap/pull/1#discussion_r51770210


(btw, Steven, your email client somehow breaks my threading view in 
thunderbird. This reply appeared unconnected to Srinivas' post.)

[toc] | [prev] | [next] | [standalone]

#102510

From	srinivas devaki <mr.eightnoteight@gmail.com>
Date	2016-02-05 06:56 +0530
Message-ID	<mailman.75.1454635582.30993.python-list@python.org>
In reply to	#102508

On Feb 5, 2016 5:45 AM, "Steven D'Aprano" <steve@pearwood.info> wrote:
>
> On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote:
>
> > _siftdown function breaks out of the loop when the current pos has a
valid
> > parent.
> >
> > but _siftup function is not implemented in that fashion, if a valid
> > subheap is given to the _siftup, it will bring down the root of sub heap
> > and then again bring it up to its original place.

as I come to think of it again, it is not subheap, it actually heap cut at
some level hope you get the idea from the usage of _siftup. so even though
the `pos` children are valid the _siftup brings down the new element (i.e
the element which is at first at `pos`) upto its leaf level and then again
it is brought up by using _siftdown. why do the redundant work when it can
simply breakout?

> >
> > I was wondering why it is so, is it just to make the code look simple???
>
> Hi Srinivas,
>
> I'm sure that your question is obvious to you, but it's not obvious to us.
> Where are _siftup and _siftdown defined? Are they in your code? Somebody
> else's code? A library? Which library? What do they do? Where are they
> from?

_siftup and _siftdown are functions from python standard heapq module.

PS: I do competitive programming, I use these modules every couple of days
when compared to other modules. so didn't give much thought when posting to
the mailing list. sorry for that.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight

[toc] | [prev] | [next] | [standalone]

#102522

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-02-05 15:42 +0100
Message-ID	<mailman.7.1454683355.19407.python-list@python.org>
In reply to	#102508

On 05.02.2016 02:26, srinivas devaki wrote:
> as I come to think of it again, it is not subheap, it actually heap cut at
> some level hope you get the idea from the usage of _siftup. so even though
> the `pos` children are valid the _siftup brings down the new element (i.e
> the element which is at first at `pos`) upto its leaf level and then again
> it is brought up by using _siftdown. why do the redundant work when it can
> simply breakout?

The heapq module itself has a very extensive documentation inside. This 
is what it says for _siftup. I think this is sort of an optimization 
that works pretty well (cf. the numbers) for popping off the FIRST item:

"""

# The child indices of heap index pos are already heaps, and we want to 
make # a heap at index pos too. We do this by bubbling the smaller child 
of # pos up (and so on with that child's children, etc) until hitting a 
leaf, # then using _siftdown to move the oddball originally at index pos 
into place. # # We *could* break out of the loop as soon as we find a 
pos where newitem <= # both its children, but turns out that's not a 
good idea, and despite that # many books write the algorithm that way. 
During a heap pop, the last array # element is sifted in, and that tends 
to be large, so that comparing it # against values starting from the 
root usually doesn't pay (= usually doesn't # get us out of the loop 
early). See Knuth, Volume 3, where this is # explained and quantified in 
an exercise. # # Cutting the # of comparisons is important, since these 
routines have no # way to extract "the priority" from an array element, 
so that intelligence # is likely to be hiding in custom comparison 
methods, or in array elements # storing (priority, record) tuples. 
Comparisons are thus potentially # expensive. # # On random arrays of 
length 1000, making this change cut the number of # comparisons made by 
heapify() a little, and those made by exhaustive # heappop() a lot, in 
accord with theory. Here are typical results from 3 # runs (3 just to 
demonstrate how small the variance is): # # Compares needed by heapify 
Compares needed by 1000 heappops # -------------------------- 
-------------------------------- # 1837 cut to 1663 14996 cut to 8680 # 
1855 cut to 1659 14966 cut to 8678 # 1847 cut to 1660 15024 cut to 8703 
# # Building the heap by using heappush() 1000 times instead required # 
2198, 2148, and 2219 compares: heapify() is more efficient, when # you 
can use it. # # The total compares needed by list.sort() on the same 
lists were 8627, # 8627, and 8632 (this should be compared to the sum of 
heapify() and # heappop() compares): list.sort() is (unsurprisingly!) 
more efficient # for sorting.

"""

What do you think about our use-case?

> _siftup and _siftdown are functions from python standard heapq module.
>
> PS: I do competitive programming, I use these modules every couple of days
> when compared to other modules. so didn't give much thought when posting to
> the mailing list. sorry for that.

Competitive programming? That sounds interesting. :)

Best,
Sven

[toc] | [prev] | [next] | [standalone]

#102523

From	Bernardo Sulzbach <mafagafogigante@gmail.com>
Date	2016-02-05 12:48 -0200
Message-ID	<mailman.8.1454683705.19407.python-list@python.org>
In reply to	#102508

On 02/05/2016 12:42 PM, Sven R. Kunze wrote:
>>
>> PS: I do competitive programming, I use these modules every couple of
>> days
>> when compared to other modules. so didn't give much thought when
>> posting to
>> the mailing list. sorry for that.
>
> Competitive programming? That sounds interesting. :)
>

I wonder why you *can* use this amount of already done stuff in 
competitive programming. When I was into that you could use what the 
standard library of the language gave you and nothing else.

[toc] | [prev] | [next] | [standalone]

#102524

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-02-05 15:55 +0100
Message-ID	<mailman.9.1454684121.19407.python-list@python.org>
In reply to	#102508

On 05.02.2016 15:48, Bernardo Sulzbach wrote:
> On 02/05/2016 12:42 PM, Sven R. Kunze wrote:
>>>
>>> PS: I do competitive programming, I use these modules every couple of
>>> days
>>> when compared to other modules. so didn't give much thought when
>>> posting to
>>> the mailing list. sorry for that.
>>
>> Competitive programming? That sounds interesting. :)
>>
>
> I wonder why you *can* use this amount of already done stuff in 
> competitive programming. When I was into that you could use what the 
> standard library of the language gave you and nothing else.

AFAICT, heapq is part of the standard lib. :)

Best,
Sven

[toc] | [prev] | [next] | [standalone]

#102525

From	Bernardo Sulzbach <mafagafogigante@gmail.com>
Date	2016-02-05 12:59 -0200
Message-ID	<mailman.10.1454684392.19407.python-list@python.org>
In reply to	#102508

On 02/05/2016 12:55 PM, Sven R. Kunze wrote:
> On 05.02.2016 15:48, Bernardo Sulzbach wrote:
>> On 02/05/2016 12:42 PM, Sven R. Kunze wrote:
>>>>
>>>> PS: I do competitive programming, I use these modules every couple of
>>>> days
>>>> when compared to other modules. so didn't give much thought when
>>>> posting to
>>>> the mailing list. sorry for that.
>>>
>>> Competitive programming? That sounds interesting. :)
>>>
>>
>> I wonder why you *can* use this amount of already done stuff in
>> competitive programming. When I was into that you could use what the
>> standard library of the language gave you and nothing else.
>
> AFAICT, heapq is part of the standard lib. :)
>

Yes. I thought he was talking about XHEAP. However, rereading makes it 
look like he meant heapq indeed.

[toc] | [prev] | [next] | [standalone]

#102527

From	srinivas devaki <mr.eightnoteight@gmail.com>
Date	2016-02-05 21:15 +0530
Message-ID	<mailman.12.1454687182.19407.python-list@python.org>
In reply to	#102508

On Fri, Feb 5, 2016 at 8:12 PM, Sven R. Kunze <srkunze@mail.de> wrote:
> On 05.02.2016 02:26, srinivas devaki wrote:
> What do you think about our use-case?
>
Oh, the logic is sound, every element that we have inserted has to be popped,
We are spending some *extra* time in rearranging the elements only to be
sure that we won't be spending more than this *extra* time when doing other
operations, and our use-case isn't much different either, If by rearranging the
elements in the heap(*subheap*) gets optimal for other operations like
popping the
root element(heap[0]) then obviously it is optimal for popping other elements
(children of heap[0]).


PS: @sven But don't yet merge the pull request, I could be wrong.
as the heapq module already says that the variance is very small,
let me write some tests(on more than 10**3 elements) and then get back here.
-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight

[toc] | [prev] | [next] | [standalone]

#102529

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-02-05 17:27 +0100
Message-ID	<mailman.14.1454689669.19407.python-list@python.org>
In reply to	#102508

Hi srinivas,

I wrote this simple benchmark to measure comparisons:

import random

from xheapimport RemovalHeap


class X(object):
     c =0 def __init__(self, x):
         self.x = x
     def __lt__(self, other):
         X.c +=1 return self.x < other.x

n =100000 for jjin range(5):
     items = [X(i)for iin range(n)]
     random.shuffle(items)
     heap = RemovalHeap(items)

     random.shuffle(items)
     for i  in items:
         heap.remove(i)

     print(X.c)
     X.c =0


old version:
430457
430810
430657
429971
430583

your pull request version:
426414
426045
425437
425528
425522


Can we do better here?

Best,
Sven

[toc] | [prev] | [next] | [standalone]

#102530

From	"Sven R. Kunze" <srkunze@mail.de>
Date	2016-02-05 17:35 +0100
Message-ID	<mailman.15.1454690143.19407.python-list@python.org>
In reply to	#102508

again for the list:
###################


import random

from xheap import RemovalHeap


class X(object):
     c = 0
     def __init__(self, x):
         self.x = x
     def __lt__(self, other):
         X.c += 1
         return self.x < other.x

n = 100000

for jj in range(5):
     items = [X(i) for i in range(n)]
     random.shuffle(items)
     heap = RemovalHeap(items)

     random.shuffle(items)
     for i in items:
         heap.remove(i)

     print(X.c)
     X.c = 0


(note to myself: never copy PyCharm formatting strings to this list).

On 05.02.2016 17:27, Sven R. Kunze wrote:
> Hi srinivas,
>
> I wrote this simple benchmark to measure comparisons:
>
> import random
>
> from xheapimport RemovalHeap
>
>
> class X(object):
>     c =0 def __init__(self, x):
>         self.x = x
>     def __lt__(self, other):
>         X.c +=1 return self.x < other.x
>
> n =100000 for jjin range(5):
>     items = [X(i)for iin range(n)]
>     random.shuffle(items)
>     heap = RemovalHeap(items)
>
>     random.shuffle(items)
>     for i  in items:
>         heap.remove(i)
>
>     print(X.c)
>     X.c =0
>
>
> old version:
> 430457
> 430810
> 430657
> 429971
> 430583
>
> your pull request version:
> 426414
> 426045
> 425437
> 425528
> 425522
>
>
> Can we do better here?
>
> Best,
> Sven

[toc] | [prev] | [next] | [standalone]

#102540

From	srinivas devaki <mr.eightnoteight@gmail.com>
Date	2016-02-05 23:12 +0530
Message-ID	<mailman.7.1454694210.2317.python-list@python.org>
In reply to	#102508

wow, that's great.
you read a comment in the code, and you test it, to only find that it
is indeed true,
sounds ok, but feels great. :)

Just experimenting a bit, I swaped the lines _siftdown and _siftup and something
strange happened the number of comparisions in both the versions remained same.
I'm attaching the files.

do you have any idea why this happened?

On Fri, Feb 5, 2016 at 9:57 PM, Sven R. Kunze <srkunze@mail.de> wrote:
>
> Can we do better here?
>
I don't know, I have to read TAOP knuth article.

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight

[toc] | [prev] | [next] | [standalone]

#102586

From	srinivas devaki <mr.eightnoteight@gmail.com>
Date	2016-02-06 20:16 +0530
Message-ID	<mailman.44.1454770038.2317.python-list@python.org>
In reply to	#102508

As the comments in the heapq module says, in most of the cases
(probability 0.837 [from knuth vol3]) the the new inserted element
whose initial location is end of the array, which means that it will
be larger than the `pos` children. So the old version and new version
code is same in these cases because in old version the comparison is
taking place at line 129, in the new version comparison is taking
place inside _siftdown.
SO the total number of comparisons are not decreased by those cases
the other type case is the swapped element is smaller than `pos`
parents, in this case the old version does a comparison and then goto
_sifdown but in the new version _siftdown will be executed and then it
comes to _siftup, here in 50% of the cases(len(heap last level) ==
len(heap) // 2) the pos doesn't have a child, so no comparison.

================================================================================================
`pos`             | probability | `old version comparisons`  | `new
version comparisions`
================================================================================================
last level        |  0.5        | 1                          | 0
last - 1 level    |  0.25       | 1                          | 1
lower levels      |  0.25       | 1                          | >=1
================================================================================================

as the new version doesn't do a comparison if it is in the last level,
the optimization is occurring in that place.
Which makes the reason behind the heapq module's choice of _siftup
code is not at all related to this cause.

PS:
please copy the table to some text editor, for better visualization.

On Fri, Feb 5, 2016 at 11:12 PM, srinivas devaki
<mr.eightnoteight@gmail.com> wrote:
> wow, that's great.
> you read a comment in the code, and you test it, to only find that it
> is indeed true,
> sounds ok, but feels great. :)
>
> Just experimenting a bit, I swaped the lines _siftdown and _siftup and something
> strange happened the number of comparisions in both the versions remained same.
> I'm attaching the files.
>
> do you have any idea why this happened?
>
> On Fri, Feb 5, 2016 at 9:57 PM, Sven R. Kunze <srkunze@mail.de> wrote:
>>
>> Can we do better here?
>>
> I don't know, I have to read TAOP knuth article.
>
> --
> Regards
> Srinivas Devaki
> Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
> Computer Science and Engineering Department
> ph: +91 9491 383 249
> telegram_id: @eightnoteight

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight

[toc] | [prev] | [standalone]

csiph-web

_siftup and _siftdown implementation

Contents

#102506 — _siftup and _siftdown implementation

#102508

#102509

#102510

#102522

#102523

#102524

#102525

#102527

#102529

#102530

#102540

#102586