Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Cem Karan Newsgroups: comp.lang.python Subject: Re: How to remove item from heap efficiently? Date: Thu, 14 Jan 2016 07:04:17 -0500 Lines: 87 Message-ID: References: <568EEC40.7070807@mail.de> <568FE797.6090808@mail.de> <5692A795.3070904@mail.de> <5695276A.2020101@mail.de> <7A7E26D9-01E5-4D77-92B4-3491D86E1EA8@gmail.com> <5696A0A5.8020401@mail.de> Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de zuRUI9Xt5swXICz7Qj53hgHF/diGL5wzgYy0LAGMz/cw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'open- source': 0.04; 'pop': 0.05; 'practice,': 0.07; 'pypi': 0.07; 'so?': 0.07; 'wrapper': 0.07; 'cc:addr:python-list': 0.09; 'subject:How': 0.09; 'assumed': 0.09; 'debugger': 0.09; 'dict': 0.09; 'hooks': 0.09; 'simplified': 0.09; 'statements': 0.09; 'underlying': 0.09; 'python': 0.10; 'jan': 0.11; 'extensions': 0.13; 'missed': 0.15; '_heapq': 0.16; 'binaries': 0.16; 'code).': 0.16; 'codebase': 0.16; 'codebase,': 0.16; 'heap': 0.16; 'heapq': 0.16; 'jython,': 0.16; 'module:': 0.16; 'non-cpython': 0.16; 'peek': 0.16; 'printf()': 0.16; 'received:158': 0.16; 'received:192.168.1.4': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'scripts.': 0.16; 'somehow,': 0.16; 'subject:item': 0.16; 'subject:remove': 0.16; 'wrote:': 0.16; 'looked': 0.16; 'debugging': 0.18; 'module,': 0.18; 'platforms': 0.18; '>>>': 0.20; 'developer': 0.20; 'cc:addr:python.org': 0.20; 'extension': 0.20; 'issue.': 0.20; 'cc:2**1': 0.22; '(by': 0.22; 'cc:no real name:2**0': 0.22; 'am,': 0.23; 'code,': 0.23; 'code.': 0.23; "haven't": 0.24; 'thanks,': 0.24; 'cc:addr:gmail.com': 0.24; 'header:In-Reply-To:1': 0.24; 'module': 0.25; "i've": 0.25; '(which': 0.26; 'distribute': 0.27; 'figure': 0.27; 'points': 0.27; 'switch': 0.27; 'least': 0.27; 'finally,': 0.27; 'said,': 0.27; 'turns': 0.27; 'developing': 0.28; 'interface': 0.29; 'regular': 0.29; '"remove': 0.29; '13,': 0.29; 'clever': 0.29; 'concern': 0.29; 'time;': 0.29; "i'm": 0.30; 'code': 0.30; 'becomes': 0.30; 'everyone': 0.31; 'another': 0.32; 'compiled': 0.32; 'run': 0.33; 'source': 0.33; 'usually': 0.33; 'lets': 0.33; 'platforms.': 0.33; "i'll": 0.33; 'message-id:@gmail.com': 0.34; '(for': 0.34; 'this?': 0.34; 'add': 0.34; 'so,': 0.35; 'could': 0.35; 'replaced': 0.35; 'unknown': 0.35; 'something': 0.35; 'asking': 0.35; 'comment': 0.35; 'item': 0.35; 'but': 0.36; 'there': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'agree': 0.37; 'method': 0.37; 'thanks': 0.37; '12,': 0.37; 'missing': 0.37; 'thought': 0.37; 'associated': 0.38; 'delete': 0.38; 'means': 0.39; 'subject:from': 0.39; 'received:192': 0.39; 'still': 0.40; 'your': 0.60; 'determine': 0.61; 'skip:u 10': 0.61; 'header:Message-Id:1': 0.61; 'impact': 0.61; 'replying': 0.61; 'here.': 0.62; 'charset:windows-1252': 0.62; 'skip:n 10': 0.62; 'more': 0.63; 'different': 0.63; 'complete': 0.63; 'it!': 0.64; 'intent': 0.66; 'decided': 0.66; 'future.': 0.67; 'costs': 0.67; 'overall': 0.72; 'hand': 0.82; 'coupled': 0.84; 'experiences,': 0.84; 'faster.': 0.84; 'researching': 0.84; 'adopt': 0.91; 'approach.': 0.91; 'careful': 0.91; 'different.': 0.91; 'maybe,': 0.91; 'factors': 0.97 In-Reply-To: <5696A0A5.8020401@mail.de> X-Mailer: Apple Mail (2.1510) X-Proofpoint-Virus-Version: vendor=nai engine=5400 definitions=5800 signatures=585085 X-PP-Spam-Details: rule=add_spam_details policy=default score=0 spamscore=0 ipscore=0 suspectscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1111160001 definitions=main-1601140196 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:101685 On Jan 13, 2016, at 2:08 PM, Sven R. Kunze wrote: > On 13.01.2016 12:20, Cem Karan wrote: >> On Jan 12, 2016, at 11:18 AM, "Sven R. Kunze" = wrote: >>=20 >>> Thanks for replying here. I've come across these types of = wrappers/re-implementations of heapq as well when researching this = issue. :) >>>=20 >>> Unfortunately, they don't solve the underlying issue at hand which = is: "remove item from heap with unknown index" and be efficient at it = (by not using _heapq C implementation). >>>=20 >>>=20 >>> So, I thought I did another wrapper. ;) It at least uses _heapq (if = available otherwise heapq) and lets you remove items without violating = the invariant in O(log n). I am going to make that open-source on pypi = and see what people think of it. >> Is that so? I'll be honest, I never tested its asymptotic = performance, I just assumed that he had a dict coupled with a heap = somehow, but I never looked into the code. >=20 > My concern about that specific package is a missing C-implementation. = I feel that somewhat defeats the whole purpose of using a heap: = performance. I agree with you that performance is less than that of using a C = extension module, but there are other costs associated with developing a = C module: 1) As the developer of the module, you must be very careful to ensure = your code is portable. 2) Distribution becomes somewhat more difficult; you may need to = distribute both source and compiled binaries for various platforms. = This is somewhat more annoying than pure python scripts. 3) Debugging can become significantly more difficult. My current = codebase is python+cython+c, and when something crashes, it is usually = easier to use a bunch of printf() statements to figure out what is going = on than to use a debugger (others may have different experiences, this = is just mine). 4) Not everyone is familiar with C, so writing extensions may be more = difficult. 5) Will the extension module work on non-cpython platforms (iron python, = jython, etc.)? Finally, without profiling the complete package it may be difficult to = tell what impact your C module will have on overall performance. In my = code, HeapDict had less than a 2% performance impact on what I was = doing; even if I had replaced it with a pure C implementation, my code = would not have run much faster. So, while I agree in principle to what you're saying, in practice there = may be other factors to consider before rejecting the pure python = approach. > Asymptotic performance is still O(log n). So, if the intent is to pop events more often than to peek at them, then = in practice, HeapDict is about the same as some clever heap+dict method = (which it might be, as I said, I haven't looked at the code). >> That said, IMHO using a dict interface is the way to go for priority = queues; it really simplified my code using it! This is my not-so-subtle = way of asking you to adopt the MutableMapping interface for your wrapper = ;) >=20 > Could you elaborate on this? What simplified you code so much? >=20 > I have been using heaps for priority queues as well but haven't missed = the dict interface so far. Maybe, my use-case is different. I'm writing an event-based simulator, and as it turns out, it is much = easier to tentatively add events than it is to figure out precisely = which events will occur in the future. That means that on a regular = basis I need to delete events as I determine that they are garbage. = HeapDict did a good job of that for me (for completely unrelated reasons = I decided to switch to a pure-C codebase, with python hooks to twiddle = the simulator at a few, very rare, points in time; hence the = python+cython+c comment above). Thanks, Cem Karan=