Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: "Sven R. Kunze" Newsgroups: comp.lang.python Subject: Re: _siftup and _siftdown implementation Date: Fri, 5 Feb 2016 15:42:29 +0100 Lines: 58 Message-ID: References: <56b3e902$0$1613$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de vOQm0X5ba5n45IF6p6Fh5wvtlDFqpd1ft1tOCramiO0A== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'root': 0.04; '"""': 0.05; 'important,': 0.07; 'indices': 0.07; 'hiding': 0.09; 'methods,': 0.09; 'modules.': 0.09; 'tends': 0.09; 'variance': 0.09; 'python': 0.10; 'index': 0.13; 'compares': 0.16; 'element,': 0.16; 'heap': 0.16; 'heapq': 0.16; 'item:': 0.16; 'newitem': 0.16; 'popping': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sorting.': 0.16; 'storing': 0.16; 'wrote:': 0.16; 'comparing': 0.18; 'element': 0.18; 'typical': 0.18; 'runs': 0.18; 'algorithm': 0.20; 'arrays': 0.22; 'large,': 0.22; 'pos': 0.22; 'subject:skip:i 10': 0.22; 'demonstrate': 0.23; 'elements': 0.23; 'originally': 0.23; '(this': 0.24; 'thus': 0.24; 'header:In-Reply-To:1': 0.24; 'sort': 0.25; 'module': 0.25; 'header:User-Agent:1': 0.26; "doesn't": 0.26; 'module.': 0.27; 'turns': 0.27; 'idea': 0.28; 'values': 0.28; 'comparison': 0.29; 'idea,': 0.29; 'array': 0.29; 'random': 0.29; 'books': 0.30; 'that.': 0.30; 'too.': 0.30; 'compared': 0.30; 'skip:- 30': 0.32; '"the': 0.32; 'posting': 0.32; 'says': 0.32; 'usually': 0.33; 'extract': 0.33; 'programming,': 0.33; 'received:10.0': 0.34; 'lists': 0.34; 'best,': 0.35; 'level': 0.35; 'but': 0.36; 'should': 0.36; 'instead': 0.36; 'needed': 0.36; '(and': 0.36; 'child': 0.36; 'modules': 0.36; 'smaller': 0.36; 'volume': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'thought': 0.37; 'list.': 0.37; 'starting': 0.37; 'itself': 0.38; 'building': 0.38; 'mailing': 0.38; 'why': 0.39; 'skip:- 20': 0.39; "didn't": 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'received:de': 0.40; 'some': 0.40; 'competitive': 0.61; 'hope': 0.61; 'total': 0.62; 'charset:windows-1252': 0.62; 'making': 0.62; 'more': 0.63; 'times': 0.63; 'our': 0.64; 'soon': 0.65; 'here': 0.66; 'results': 0.66; 'cut': 0.67; 'potentially': 0.67; 'sum': 0.69; 'sounds': 0.76; 'exercise.': 0.84; 'routines': 0.84; 'cutting': 0.93; 'lot,': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mail.de; s=mail201212; t=1454683353; bh=aiJsCskZpW+QYEapkHpyQnsfVctWxrXKq4A9LuoYW0Y=; h=Subject:To:References:From:Date:In-Reply-To:From; b=tIasb9Eehv3JHL4qr20t6BPg6aiN6KrIUeAvwnp8OEeYd85rfFzrTG6N6RaWIa9da QFswEBckh/YCSp9kfHADMT7yrP6hJFOhI5x9xAdqpUYwPU8Ih20MO4WaZKfxDKPlXb np+Iuvi4OImc5OkW2UtZo6bTuEMI19jlb6JKsJc8= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 In-Reply-To: X-purgate: clean X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate: clean X-purgate-size: 9523 X-purgate-ID: 154282::1454683352-00003BBD-B7D0BDB0/0/0 X-Content-Filtered-By: Mailman/MimeDel 2.1.21rc2 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21rc2 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:102522 On 05.02.2016 02:26, srinivas devaki wrote: > as I come to think of it again, it is not subheap, it actually heap cut at > some level hope you get the idea from the usage of _siftup. so even though > the `pos` children are valid the _siftup brings down the new element (i.e > the element which is at first at `pos`) upto its leaf level and then again > it is brought up by using _siftdown. why do the redundant work when it can > simply breakout? The heapq module itself has a very extensive documentation inside. This is what it says for _siftup. I think this is sort of an optimization that works pretty well (cf. the numbers) for popping off the FIRST item: """ # The child indices of heap index pos are already heaps, and we want to make # a heap at index pos too. We do this by bubbling the smaller child of # pos up (and so on with that child's children, etc) until hitting a leaf, # then using _siftdown to move the oddball originally at index pos into place. # # We *could* break out of the loop as soon as we find a pos where newitem <= # both its children, but turns out that's not a good idea, and despite that # many books write the algorithm that way. During a heap pop, the last array # element is sifted in, and that tends to be large, so that comparing it # against values starting from the root usually doesn't pay (= usually doesn't # get us out of the loop early). See Knuth, Volume 3, where this is # explained and quantified in an exercise. # # Cutting the # of comparisons is important, since these routines have no # way to extract "the priority" from an array element, so that intelligence # is likely to be hiding in custom comparison methods, or in array elements # storing (priority, record) tuples. Comparisons are thus potentially # expensive. # # On random arrays of length 1000, making this change cut the number of # comparisons made by heapify() a little, and those made by exhaustive # heappop() a lot, in accord with theory. Here are typical results from 3 # runs (3 just to demonstrate how small the variance is): # # Compares needed by heapify Compares needed by 1000 heappops # -------------------------- -------------------------------- # 1837 cut to 1663 14996 cut to 8680 # 1855 cut to 1659 14966 cut to 8678 # 1847 cut to 1660 15024 cut to 8703 # # Building the heap by using heappush() 1000 times instead required # 2198, 2148, and 2219 compares: heapify() is more efficient, when # you can use it. # # The total compares needed by list.sort() on the same lists were 8627, # 8627, and 8632 (this should be compared to the sum of heapify() and # heappop() compares): list.sort() is (unsurprisingly!) more efficient # for sorting. """ What do you think about our use-case? > _siftup and _siftdown are functions from python standard heapq module. > > PS: I do competitive programming, I use these modules every couple of days > when compared to other modules. so didn't give much thought when posting to > the mailing list. sorry for that. Competitive programming? That sounds interesting. :) Best, Sven