Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #101445
| From | "Sven R. Kunze" <srkunze@mail.de> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: How to remove item from heap efficiently? |
| Date | 2016-01-10 19:48 +0100 |
| Message-ID | <mailman.8.1452451745.3151.python-list@python.org> (permalink) |
| References | <568EEC40.7070807@mail.de> <CACs7g=Bjccja67M3t4yL+07vO8iAcOB-dF5p2PgqqBwjFmbh=w@mail.gmail.com> <568FE797.6090808@mail.de> <CACs7g=CE7y-aXSdWbN7Fk0J4O4aRwAH5ad7UR1Jhdf9EpNFnug@mail.gmail.com> |
Wow. That's an impressive reply. On 08.01.2016 20:26, srinivas devaki wrote: > So you need a task scheduler with expiration and priority on each task. > Interesting Problem.. > > as peter said about marking the task in heapB to be deleted. but this > needs searching everytime you pop off an old item [complexity: > O(len(heapB))]. you may as well use python del on it as complexity is > same. > But if you already know the where to look in the heapB then you can > avoid searching and thereby reducing the complexity. you can do this > by saving the references of heapB in heapA and viceversa > > and if you marking delete on a number of low priority tasks, then it > can increase your heapB enormously because being low priority items > they can stagnate. to resolve this error you have to perform linear > checking iterations at every critical length (this critical length can > be obtained mathmatically), so that your amortized complexity of push, > pop remains log(number_of_valid_tasks_at_any_time); > the number of valid tasks at any time are those tasks which are not > expired at the time. > > My Algorithm: > version_a: https://gist.github.com/9ce7a0e534c6e768239e > this version has simple scheduler which has methods for popping high > priority one and popping oldest task. > But this version has a disadvantage, i.e If lets say you are pushed > some 10**5 tasks with priority 2, and then pushed some 10**5 tasks > with priority 1. and then you decided to pop the oldest 10**5 items. > in this version that 10**5 elements will stagnate in priority heap if > in future all priorities are less than 1. > now this is not a big issue but if you are running a webserver and > over a span of 5 days there could be 10**10 tasks, and even if half of > those tasks stagnate your server is going to crash with out of memory. > > version_b: https://gist.github.com/99b4d590753ba234eeed > this version resolved that stagnation. this one will run sweeps > whenever there are more than half of useless items, thereby giving us > an amortized complexity of O(log(n)) for push, pop, etc > > but this version doesn't have the feature of expiration. > > version_c: https://gist.github.com/9dfd0d291672c0ffa5c3 > in this one we simply keep a variable expiration, and relieve the > expired tasks on any operation. i have coded it in such a way that > expiration is specific time, you can change it to delta time if you > want to. > > Time Complexity: O(log(n)) insertion, O(log(n)) deletion [amortized] > Space Complexity: O(n) [amortized] > here n is number of valid items i.e which are not expired. > > I hope this helps with your problem Indeed. I already do the sweep method as you suggested. ;) Additionally, you provided me with a reasonable condition when to do the sweep in order to achieve O(log n). Thanks much for that. I currently used a time-bases approached (sweep each 20 iterations). PS: Could you add a note on how you got to the condition ( 2*self.useless_b > len(self.heap_b))? > PS: > sorry for posting links, it's just that the code is large for email. > I'm using minimum number has highest priority convention. I like Web technology, so no problem here. :) > On Fri, Jan 8, 2016 at 10:15 PM, Sven R. Kunze <srkunze@mail.de> wrote: >> Thanks for your suggestion. >> >> On 08.01.2016 14:21, srinivas devaki wrote: >>> You can create a single heap with primary key as timestamp and >>> secondary key as priority, i.e by creating a tuple >>> insert the elements into the heap as >>> (timestamp, priority) >> I think I cannot use that because I need the list sorted by both criteria. >>> If there is any underlying meaning for creating 2 heaps. please mention. >> >> I use two heaps because I need to insert arbitrary items fast and remove the >> ones fast which are too old (timestamp) or are next in order (priority). >> >> Basically a task scheduler where tasks can be thrown away once they are too >> long in the queue. >> >> >>> On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze <srkunze@mail.de> wrote: >>>> Hi everybody, >>>> >>>> suppose, I need items sorted by two criteria (say timestamp and >>>> priority). >>>> For that purpose, I use two heaps (heapq module): >>>> >>>> heapA # items sorted by timestamp >>>> heapB # items sorted by priority >>>> >>>> Now my actual problem. When popping an item of heapA (that's the oldest >>>> item), I need to remove the very same item from heapB, regardlessly where >>>> it >>>> is in heapB. And vice versa. >>>> >>>> Is there a datastructure or a simple trick to achieve that in an >>>> efficient >>>> matter? >>>> >>>> Best, >>>> Sven >>>> -- >>>> https://mail.python.org/mailman/listinfo/python-list >>
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: How to remove item from heap efficiently? "Sven R. Kunze" <srkunze@mail.de> - 2016-01-10 19:48 +0100
csiph-web