Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101445

Re: How to remove item from heap efficiently?

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From "Sven R. Kunze" <srkunze@mail.de>
Newsgroups comp.lang.python
Subject Re: How to remove item from heap efficiently?
Date Sun, 10 Jan 2016 19:48:53 +0100
Lines 113
Message-ID <mailman.8.1452451745.3151.python-list@python.org> (permalink)
References <568EEC40.7070807@mail.de> <CACs7g=Bjccja67M3t4yL+07vO8iAcOB-dF5p2PgqqBwjFmbh=w@mail.gmail.com> <568FE797.6090808@mail.de> <CACs7g=CE7y-aXSdWbN7Fk0J4O4aRwAH5ad7UR1Jhdf9EpNFnug@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding 7bit
X-Trace news.uni-berlin.de S4QhPLXzck3e/C+pTcYdEAp/Wr1gqwSfHNRLjEgWr0ig==
Return-Path <srkunze@mail.de>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'memory.': 0.05; 'pop': 0.05; 'convention.': 0.07; 'everybody,': 0.07; 'linear': 0.07; 'cc:addr:python-list': 0.09; 'subject:How': 0.09; 'deletion': 0.09; 'expired': 0.09; 'marking': 0.09; 'span': 0.09; 'tasks,': 0.09; 'thrown': 0.09; 'timestamp': 0.09; 'tuple': 0.09; 'underlying': 0.09; 'url:github': 0.09; 'python': 0.10; 'jan': 0.11; 'pushed': 0.13; '(say': 0.16; '10:15': 0.16; '2016': 0.16; 'amortized': 0.16; 'heap': 0.16; 'i.e': 0.16; 'operation.': 0.16; 'popping': 0.16; 'priority).': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:item': 0.16; 'subject:remove': 0.16; 'suggestion.': 0.16; 'task.': 0.16; 'useless': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'deleted.': 0.18; 'resolved': 0.18; 'skip:l 30': 0.18; '&gt;': 0.18; 'variable': 0.18; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'delta': 0.22; 'priorities': 0.22; 'cc:no real name:2**0': 0.22; 'am,': 0.23; 'elements': 0.23; 'insert': 0.23; 'references': 0.23; '(this': 0.24; 'header:In-Reply-To:1': 0.24; 'feature': 0.24; 'header:User- Agent:1': 0.26; "doesn't": 0.26; 'error': 0.27; 'checking': 0.27; 'fri,': 0.27; 'actual': 0.28; 'coded': 0.29; 'crash': 0.29; "i'm": 0.30; 'that.': 0.30; 'url:mailman': 0.30; 'creating': 0.30; 'code': 0.30; 'task': 0.30; 'primary': 0.31; 'posting': 0.32; 'run': 0.33; 'problem': 0.33; 'url:python': 0.33; 'items.': 0.33; 'lets': 0.33; 'url:listinfo': 0.34; 'received:10.0': 0.34; 'running': 0.34; 'add': 0.34; 'server': 0.34; 'list': 0.34; 'best,': 0.35; 'ones': 0.35; 'next': 0.35; 'could': 0.35; 'tasks': 0.35; 'item': 0.35; 'problem.': 0.35; 'reply.': 0.35; 'but': 0.36; 'too': 0.36; 'there': 0.36; 'url:org': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'two': 0.37; 'being': 0.37; 'method': 0.37; 'say': 0.37; 'thanks': 0.37; 'version': 0.38; 'delete': 0.38; 'minimum': 0.38; 'subject:from': 0.39; 'url:mail': 0.40; 'where': 0.40; 'received:de': 0.40; 'space': 0.40; 'some': 0.40; 'future': 0.60; 'high': 0.60; 'your': 0.60; 'hope': 0.61; 'avoid': 0.61; 'here.': 0.62; 'more': 0.63; 'oldest': 0.66; 'here': 0.66; 'decided': 0.66; 'saving': 0.70; 'technology,': 0.71; 'increase': 0.73; 'links,': 0.76; 'obtained': 0.76; 'low': 0.83; 'complexity': 0.84; 'mention.': 0.84; 'stagnate': 0.84; 'expiration': 0.91; 'expiration.': 0.91; 'items,': 0.91; 'reducing': 0.93; 'approached': 0.95
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/simple; d=mail.de; s=mail201212; t=1452451737; bh=YXMMcMc9mc68lxFkgPR/LpFmKNmpPhTZkjyrUje/o0g=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=rtpA0q+n47HFsI+HLZYEl5DibkfYcIQkU9wK0edfZfHfPyPNtZgLIA+mPkERhhFoc FM2otUgk25kcSM+4JKWZM95GErfB/tNeofM5u2bCBGmbS/L0dzOUrWTYTvte089C2U HkPG2ufA+jGTzhiDTRZfAoi3YS54qQ8TGxWVKSWg=
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
In-Reply-To <CACs7g=CE7y-aXSdWbN7Fk0J4O4aRwAH5ad7UR1Jhdf9EpNFnug@mail.gmail.com>
X-purgate clean
X-purgate This mail is considered clean (visit http://www.eleven.de for further information)
X-purgate-type clean
X-purgate-Ad Categorized by eleven eXpurgate (R) http://www.eleven.de
X-purgate This mail is considered clean (visit http://www.eleven.de for further information)
X-purgate clean
X-purgate-size 11622
X-purgate-ID 154282::1452451736-000009F3-53A7E439/0/0
X-Content-Filtered-By Mailman/MimeDel 2.1.20+
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:101445

Show key headers only | View raw


Wow. That's an impressive reply.

On 08.01.2016 20:26, srinivas devaki wrote:
> So you need a task scheduler with expiration and priority on each task.
> Interesting Problem..
>
> as peter said about marking the task in heapB to be deleted. but this
> needs searching everytime you pop off an old item [complexity:
> O(len(heapB))]. you may as well use python del on it as complexity is
> same.
> But if you already know the where to look in the heapB then you can
> avoid searching and thereby reducing the complexity. you can do this
> by saving the references of heapB in heapA and viceversa
>
> and if you marking delete on a number of low priority tasks, then it
> can increase your heapB enormously because being low priority items
> they can stagnate. to resolve this error you have to perform linear
> checking iterations at every critical length (this critical length can
> be obtained mathmatically), so that your amortized complexity of push,
> pop remains log(number_of_valid_tasks_at_any_time);
> the number of valid tasks at any time are those tasks which are not
> expired at the time.
>
> My Algorithm:
> version_a: https://gist.github.com/9ce7a0e534c6e768239e
> this version has simple scheduler which has methods for popping high
> priority one and popping oldest task.
> But this version has a disadvantage, i.e If lets say you are pushed
> some 10**5 tasks with priority 2, and then pushed some 10**5 tasks
> with priority 1. and then you decided to pop the oldest 10**5 items.
> in this version that 10**5 elements will stagnate in priority heap if
> in future all priorities are less than 1.
> now this is not a big issue but if you are running a webserver and
> over a span of 5 days there could be 10**10 tasks, and even if half of
> those tasks stagnate your server is going to crash with out of memory.
>
> version_b: https://gist.github.com/99b4d590753ba234eeed
> this version resolved that stagnation. this one will run sweeps
> whenever there are more than half of useless items, thereby giving us
> an amortized complexity of O(log(n)) for push, pop, etc
>
> but this version doesn't have the feature of expiration.
>
> version_c: https://gist.github.com/9dfd0d291672c0ffa5c3
> in this one we simply keep a variable expiration, and relieve the
> expired tasks on any operation. i have coded it in such a way that
> expiration is specific time, you can change it to delta time if you
> want to.
>
> Time Complexity: O(log(n)) insertion, O(log(n)) deletion   [amortized]
> Space Complexity: O(n)  [amortized]
> here n is number of valid items i.e which are not expired.
>
> I hope this helps with your problem

Indeed. I already do the sweep method as you suggested. ;)

Additionally, you provided me with a reasonable condition when to do the 
sweep in order to achieve O(log n). Thanks much for that. I currently 
used a time-bases approached (sweep each 20 iterations).

PS: Could you add a note on how you got to the condition ( 
2*self.useless_b > len(self.heap_b))?

> PS:
> sorry for posting links, it's just that the code is large for email.
> I'm using minimum number has highest priority convention.

I like Web technology, so no problem here. :)

> On Fri, Jan 8, 2016 at 10:15 PM, Sven R. Kunze <srkunze@mail.de> wrote:
>> Thanks for your suggestion.
>>
>> On 08.01.2016 14:21, srinivas devaki wrote:
>>> You can create a single heap with primary key as timestamp and
>>> secondary key as priority, i.e by creating a tuple
>>> insert the elements into the heap as
>>> (timestamp, priority)
>> I think I cannot use that because I need the list sorted by both criteria.
>>> If there is any underlying meaning for creating 2 heaps. please mention.
>>
>> I use two heaps because I need to insert arbitrary items fast and remove the
>> ones fast which are too old (timestamp) or are next in order (priority).
>>
>> Basically a task scheduler where tasks can be thrown away once they are too
>> long in the queue.
>>
>>
>>> On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze <srkunze@mail.de> wrote:
>>>> Hi everybody,
>>>>
>>>> suppose, I need items sorted by two criteria (say timestamp and
>>>> priority).
>>>> For that purpose, I use two heaps (heapq module):
>>>>
>>>> heapA # items sorted by timestamp
>>>> heapB # items sorted by priority
>>>>
>>>> Now my actual problem. When popping an item of heapA (that's the oldest
>>>> item), I need to remove the very same item from heapB, regardlessly where
>>>> it
>>>> is in heapB. And vice versa.
>>>>
>>>> Is there a datastructure or a simple trick to achieve that in an
>>>> efficient
>>>> matter?
>>>>
>>>> Best,
>>>> Sven
>>>> --
>>>> https://mail.python.org/mailman/listinfo/python-list
>>

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: How to remove item from heap efficiently? "Sven R. Kunze" <srkunze@mail.de> - 2016-01-10 19:48 +0100

csiph-web