Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #102506 > unrolled thread
| Started by | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| First post | 2016-02-05 02:20 +0530 |
| Last post | 2016-02-06 20:16 +0530 |
| Articles | 13 — 4 participants |
Back to article view | Back to comp.lang.python
_siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 02:20 +0530
Re: _siftup and _siftdown implementation Steven D'Aprano <steve@pearwood.info> - 2016-02-05 11:12 +1100
Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 01:21 +0100
Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 06:56 +0530
Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 15:42 +0100
Re: _siftup and _siftdown implementation Bernardo Sulzbach <mafagafogigante@gmail.com> - 2016-02-05 12:48 -0200
Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 15:55 +0100
Re: _siftup and _siftdown implementation Bernardo Sulzbach <mafagafogigante@gmail.com> - 2016-02-05 12:59 -0200
Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 21:15 +0530
Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 17:27 +0100
Re: _siftup and _siftdown implementation "Sven R. Kunze" <srkunze@mail.de> - 2016-02-05 17:35 +0100
Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-05 23:12 +0530
Re: _siftup and _siftdown implementation srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-06 20:16 +0530
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2016-02-05 02:20 +0530 |
| Subject | _siftup and _siftdown implementation |
| Message-ID | <mailman.72.1454619005.30993.python-list@python.org> |
_siftdown function breaks out of the loop when the current pos has a valid parent. but _siftup function is not implemented in that fashion, if a valid subheap is given to the _siftup, it will bring down the root of sub heap and then again bring it up to its original place. I was wondering why it is so, is it just to make the code look simple??? Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-02-05 11:12 +1100 |
| Message-ID | <56b3e902$0$1613$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #102506 |
On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote: > _siftdown function breaks out of the loop when the current pos has a valid > parent. > > but _siftup function is not implemented in that fashion, if a valid > subheap is given to the _siftup, it will bring down the root of sub heap > and then again bring it up to its original place. > > I was wondering why it is so, is it just to make the code look simple??? Hi Srinivas, I'm sure that your question is obvious to you, but it's not obvious to us. Where are _siftup and _siftdown defined? Are they in your code? Somebody else's code? A library? Which library? What do they do? Where are they from? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | "Sven R. Kunze" <srkunze@mail.de> |
|---|---|
| Date | 2016-02-05 01:21 +0100 |
| Message-ID | <mailman.74.1454631720.30993.python-list@python.org> |
| In reply to | #102508 |
On 05.02.2016 01:12, Steven D'Aprano wrote: > On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote: > >> _siftdown function breaks out of the loop when the current pos has a valid >> parent. >> >> but _siftup function is not implemented in that fashion, if a valid >> subheap is given to the _siftup, it will bring down the root of sub heap >> and then again bring it up to its original place. >> >> I was wondering why it is so, is it just to make the code look simple??? > Hi Srinivas, > > I'm sure that your question is obvious to you, but it's not obvious to us. > Where are _siftup and _siftdown defined? Are they in your code? Somebody > else's code? A library? Which library? What do they do? Where are they > from? The question originated here: https://github.com/srkunze/xheap/pull/1#discussion_r51770210 (btw, Steven, your email client somehow breaks my threading view in thunderbird. This reply appeared unconnected to Srinivas' post.)
[toc] | [prev] | [next] | [standalone]
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2016-02-05 06:56 +0530 |
| Message-ID | <mailman.75.1454635582.30993.python-list@python.org> |
| In reply to | #102508 |
On Feb 5, 2016 5:45 AM, "Steven D'Aprano" <steve@pearwood.info> wrote: > > On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote: > > > _siftdown function breaks out of the loop when the current pos has a valid > > parent. > > > > but _siftup function is not implemented in that fashion, if a valid > > subheap is given to the _siftup, it will bring down the root of sub heap > > and then again bring it up to its original place. as I come to think of it again, it is not subheap, it actually heap cut at some level hope you get the idea from the usage of _siftup. so even though the `pos` children are valid the _siftup brings down the new element (i.e the element which is at first at `pos`) upto its leaf level and then again it is brought up by using _siftdown. why do the redundant work when it can simply breakout? > > > > I was wondering why it is so, is it just to make the code look simple??? > > Hi Srinivas, > > I'm sure that your question is obvious to you, but it's not obvious to us. > Where are _siftup and _siftdown defined? Are they in your code? Somebody > else's code? A library? Which library? What do they do? Where are they > from? _siftup and _siftdown are functions from python standard heapq module. PS: I do competitive programming, I use these modules every couple of days when compared to other modules. so didn't give much thought when posting to the mailing list. sorry for that. Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight
[toc] | [prev] | [next] | [standalone]
| From | "Sven R. Kunze" <srkunze@mail.de> |
|---|---|
| Date | 2016-02-05 15:42 +0100 |
| Message-ID | <mailman.7.1454683355.19407.python-list@python.org> |
| In reply to | #102508 |
On 05.02.2016 02:26, srinivas devaki wrote: > as I come to think of it again, it is not subheap, it actually heap cut at > some level hope you get the idea from the usage of _siftup. so even though > the `pos` children are valid the _siftup brings down the new element (i.e > the element which is at first at `pos`) upto its leaf level and then again > it is brought up by using _siftdown. why do the redundant work when it can > simply breakout? The heapq module itself has a very extensive documentation inside. This is what it says for _siftup. I think this is sort of an optimization that works pretty well (cf. the numbers) for popping off the FIRST item: """ # The child indices of heap index pos are already heaps, and we want to make # a heap at index pos too. We do this by bubbling the smaller child of # pos up (and so on with that child's children, etc) until hitting a leaf, # then using _siftdown to move the oddball originally at index pos into place. # # We *could* break out of the loop as soon as we find a pos where newitem <= # both its children, but turns out that's not a good idea, and despite that # many books write the algorithm that way. During a heap pop, the last array # element is sifted in, and that tends to be large, so that comparing it # against values starting from the root usually doesn't pay (= usually doesn't # get us out of the loop early). See Knuth, Volume 3, where this is # explained and quantified in an exercise. # # Cutting the # of comparisons is important, since these routines have no # way to extract "the priority" from an array element, so that intelligence # is likely to be hiding in custom comparison methods, or in array elements # storing (priority, record) tuples. Comparisons are thus potentially # expensive. # # On random arrays of length 1000, making this change cut the number of # comparisons made by heapify() a little, and those made by exhaustive # heappop() a lot, in accord with theory. Here are typical results from 3 # runs (3 just to demonstrate how small the variance is): # # Compares needed by heapify Compares needed by 1000 heappops # -------------------------- -------------------------------- # 1837 cut to 1663 14996 cut to 8680 # 1855 cut to 1659 14966 cut to 8678 # 1847 cut to 1660 15024 cut to 8703 # # Building the heap by using heappush() 1000 times instead required # 2198, 2148, and 2219 compares: heapify() is more efficient, when # you can use it. # # The total compares needed by list.sort() on the same lists were 8627, # 8627, and 8632 (this should be compared to the sum of heapify() and # heappop() compares): list.sort() is (unsurprisingly!) more efficient # for sorting. """ What do you think about our use-case? > _siftup and _siftdown are functions from python standard heapq module. > > PS: I do competitive programming, I use these modules every couple of days > when compared to other modules. so didn't give much thought when posting to > the mailing list. sorry for that. Competitive programming? That sounds interesting. :) Best, Sven
[toc] | [prev] | [next] | [standalone]
| From | Bernardo Sulzbach <mafagafogigante@gmail.com> |
|---|---|
| Date | 2016-02-05 12:48 -0200 |
| Message-ID | <mailman.8.1454683705.19407.python-list@python.org> |
| In reply to | #102508 |
On 02/05/2016 12:42 PM, Sven R. Kunze wrote: >> >> PS: I do competitive programming, I use these modules every couple of >> days >> when compared to other modules. so didn't give much thought when >> posting to >> the mailing list. sorry for that. > > Competitive programming? That sounds interesting. :) > I wonder why you *can* use this amount of already done stuff in competitive programming. When I was into that you could use what the standard library of the language gave you and nothing else.
[toc] | [prev] | [next] | [standalone]
| From | "Sven R. Kunze" <srkunze@mail.de> |
|---|---|
| Date | 2016-02-05 15:55 +0100 |
| Message-ID | <mailman.9.1454684121.19407.python-list@python.org> |
| In reply to | #102508 |
On 05.02.2016 15:48, Bernardo Sulzbach wrote: > On 02/05/2016 12:42 PM, Sven R. Kunze wrote: >>> >>> PS: I do competitive programming, I use these modules every couple of >>> days >>> when compared to other modules. so didn't give much thought when >>> posting to >>> the mailing list. sorry for that. >> >> Competitive programming? That sounds interesting. :) >> > > I wonder why you *can* use this amount of already done stuff in > competitive programming. When I was into that you could use what the > standard library of the language gave you and nothing else. AFAICT, heapq is part of the standard lib. :) Best, Sven
[toc] | [prev] | [next] | [standalone]
| From | Bernardo Sulzbach <mafagafogigante@gmail.com> |
|---|---|
| Date | 2016-02-05 12:59 -0200 |
| Message-ID | <mailman.10.1454684392.19407.python-list@python.org> |
| In reply to | #102508 |
On 02/05/2016 12:55 PM, Sven R. Kunze wrote: > On 05.02.2016 15:48, Bernardo Sulzbach wrote: >> On 02/05/2016 12:42 PM, Sven R. Kunze wrote: >>>> >>>> PS: I do competitive programming, I use these modules every couple of >>>> days >>>> when compared to other modules. so didn't give much thought when >>>> posting to >>>> the mailing list. sorry for that. >>> >>> Competitive programming? That sounds interesting. :) >>> >> >> I wonder why you *can* use this amount of already done stuff in >> competitive programming. When I was into that you could use what the >> standard library of the language gave you and nothing else. > > AFAICT, heapq is part of the standard lib. :) > Yes. I thought he was talking about XHEAP. However, rereading makes it look like he meant heapq indeed.
[toc] | [prev] | [next] | [standalone]
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2016-02-05 21:15 +0530 |
| Message-ID | <mailman.12.1454687182.19407.python-list@python.org> |
| In reply to | #102508 |
On Fri, Feb 5, 2016 at 8:12 PM, Sven R. Kunze <srkunze@mail.de> wrote: > On 05.02.2016 02:26, srinivas devaki wrote: > What do you think about our use-case? > Oh, the logic is sound, every element that we have inserted has to be popped, We are spending some *extra* time in rearranging the elements only to be sure that we won't be spending more than this *extra* time when doing other operations, and our use-case isn't much different either, If by rearranging the elements in the heap(*subheap*) gets optimal for other operations like popping the root element(heap[0]) then obviously it is optimal for popping other elements (children of heap[0]). PS: @sven But don't yet merge the pull request, I could be wrong. as the heapq module already says that the variance is very small, let me write some tests(on more than 10**3 elements) and then get back here. -- Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight
[toc] | [prev] | [next] | [standalone]
| From | "Sven R. Kunze" <srkunze@mail.de> |
|---|---|
| Date | 2016-02-05 17:27 +0100 |
| Message-ID | <mailman.14.1454689669.19407.python-list@python.org> |
| In reply to | #102508 |
Hi srinivas,
I wrote this simple benchmark to measure comparisons:
import random
from xheapimport RemovalHeap
class X(object):
c =0 def __init__(self, x):
self.x = x
def __lt__(self, other):
X.c +=1 return self.x < other.x
n =100000 for jjin range(5):
items = [X(i)for iin range(n)]
random.shuffle(items)
heap = RemovalHeap(items)
random.shuffle(items)
for i in items:
heap.remove(i)
print(X.c)
X.c =0
old version:
430457
430810
430657
429971
430583
your pull request version:
426414
426045
425437
425528
425522
Can we do better here?
Best,
Sven
[toc] | [prev] | [next] | [standalone]
| From | "Sven R. Kunze" <srkunze@mail.de> |
|---|---|
| Date | 2016-02-05 17:35 +0100 |
| Message-ID | <mailman.15.1454690143.19407.python-list@python.org> |
| In reply to | #102508 |
again for the list:
###################
import random
from xheap import RemovalHeap
class X(object):
c = 0
def __init__(self, x):
self.x = x
def __lt__(self, other):
X.c += 1
return self.x < other.x
n = 100000
for jj in range(5):
items = [X(i) for i in range(n)]
random.shuffle(items)
heap = RemovalHeap(items)
random.shuffle(items)
for i in items:
heap.remove(i)
print(X.c)
X.c = 0
(note to myself: never copy PyCharm formatting strings to this list).
On 05.02.2016 17:27, Sven R. Kunze wrote:
> Hi srinivas,
>
> I wrote this simple benchmark to measure comparisons:
>
> import random
>
> from xheapimport RemovalHeap
>
>
> class X(object):
> c =0 def __init__(self, x):
> self.x = x
> def __lt__(self, other):
> X.c +=1 return self.x < other.x
>
> n =100000 for jjin range(5):
> items = [X(i)for iin range(n)]
> random.shuffle(items)
> heap = RemovalHeap(items)
>
> random.shuffle(items)
> for i in items:
> heap.remove(i)
>
> print(X.c)
> X.c =0
>
>
> old version:
> 430457
> 430810
> 430657
> 429971
> 430583
>
> your pull request version:
> 426414
> 426045
> 425437
> 425528
> 425522
>
>
> Can we do better here?
>
> Best,
> Sven
[toc] | [prev] | [next] | [standalone]
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2016-02-05 23:12 +0530 |
| Message-ID | <mailman.7.1454694210.2317.python-list@python.org> |
| In reply to | #102508 |
wow, that's great. you read a comment in the code, and you test it, to only find that it is indeed true, sounds ok, but feels great. :) Just experimenting a bit, I swaped the lines _siftdown and _siftup and something strange happened the number of comparisions in both the versions remained same. I'm attaching the files. do you have any idea why this happened? On Fri, Feb 5, 2016 at 9:57 PM, Sven R. Kunze <srkunze@mail.de> wrote: > > Can we do better here? > I don't know, I have to read TAOP knuth article. -- Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight
[toc] | [prev] | [next] | [standalone]
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2016-02-06 20:16 +0530 |
| Message-ID | <mailman.44.1454770038.2317.python-list@python.org> |
| In reply to | #102508 |
As the comments in the heapq module says, in most of the cases (probability 0.837 [from knuth vol3]) the the new inserted element whose initial location is end of the array, which means that it will be larger than the `pos` children. So the old version and new version code is same in these cases because in old version the comparison is taking place at line 129, in the new version comparison is taking place inside _siftdown. SO the total number of comparisons are not decreased by those cases the other type case is the swapped element is smaller than `pos` parents, in this case the old version does a comparison and then goto _sifdown but in the new version _siftdown will be executed and then it comes to _siftup, here in 50% of the cases(len(heap last level) == len(heap) // 2) the pos doesn't have a child, so no comparison. ================================================================================================ `pos` | probability | `old version comparisons` | `new version comparisions` ================================================================================================ last level | 0.5 | 1 | 0 last - 1 level | 0.25 | 1 | 1 lower levels | 0.25 | 1 | >=1 ================================================================================================ as the new version doesn't do a comparison if it is in the last level, the optimization is occurring in that place. Which makes the reason behind the heapq module's choice of _siftup code is not at all related to this cause. PS: please copy the table to some text editor, for better visualization. On Fri, Feb 5, 2016 at 11:12 PM, srinivas devaki <mr.eightnoteight@gmail.com> wrote: > wow, that's great. > you read a comment in the code, and you test it, to only find that it > is indeed true, > sounds ok, but feels great. :) > > Just experimenting a bit, I swaped the lines _siftdown and _siftup and something > strange happened the number of comparisions in both the versions remained same. > I'm attaching the files. > > do you have any idea why this happened? > > On Fri, Feb 5, 2016 at 9:57 PM, Sven R. Kunze <srkunze@mail.de> wrote: >> >> Can we do better here? >> > I don't know, I have to read TAOP knuth article. > > -- > Regards > Srinivas Devaki > Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) > Computer Science and Engineering Department > ph: +91 9491 383 249 > telegram_id: @eightnoteight -- Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web