Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #101445
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | "Sven R. Kunze" <srkunze@mail.de> |
| Newsgroups | comp.lang.python |
| Subject | Re: How to remove item from heap efficiently? |
| Date | Sun, 10 Jan 2016 19:48:53 +0100 |
| Lines | 113 |
| Message-ID | <mailman.8.1452451745.3151.python-list@python.org> (permalink) |
| References | <568EEC40.7070807@mail.de> <CACs7g=Bjccja67M3t4yL+07vO8iAcOB-dF5p2PgqqBwjFmbh=w@mail.gmail.com> <568FE797.6090808@mail.de> <CACs7g=CE7y-aXSdWbN7Fk0J4O4aRwAH5ad7UR1Jhdf9EpNFnug@mail.gmail.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=utf-8; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-Trace | news.uni-berlin.de S4QhPLXzck3e/C+pTcYdEAp/Wr1gqwSfHNRLjEgWr0ig== |
| Return-Path | <srkunze@mail.de> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'memory.': 0.05; 'pop': 0.05; 'convention.': 0.07; 'everybody,': 0.07; 'linear': 0.07; 'cc:addr:python-list': 0.09; 'subject:How': 0.09; 'deletion': 0.09; 'expired': 0.09; 'marking': 0.09; 'span': 0.09; 'tasks,': 0.09; 'thrown': 0.09; 'timestamp': 0.09; 'tuple': 0.09; 'underlying': 0.09; 'url:github': 0.09; 'python': 0.10; 'jan': 0.11; 'pushed': 0.13; '(say': 0.16; '10:15': 0.16; '2016': 0.16; 'amortized': 0.16; 'heap': 0.16; 'i.e': 0.16; 'operation.': 0.16; 'popping': 0.16; 'priority).': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:item': 0.16; 'subject:remove': 0.16; 'suggestion.': 0.16; 'task.': 0.16; 'useless': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'deleted.': 0.18; 'resolved': 0.18; 'skip:l 30': 0.18; '>': 0.18; 'variable': 0.18; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'delta': 0.22; 'priorities': 0.22; 'cc:no real name:2**0': 0.22; 'am,': 0.23; 'elements': 0.23; 'insert': 0.23; 'references': 0.23; '(this': 0.24; 'header:In-Reply-To:1': 0.24; 'feature': 0.24; 'header:User- Agent:1': 0.26; "doesn't": 0.26; 'error': 0.27; 'checking': 0.27; 'fri,': 0.27; 'actual': 0.28; 'coded': 0.29; 'crash': 0.29; "i'm": 0.30; 'that.': 0.30; 'url:mailman': 0.30; 'creating': 0.30; 'code': 0.30; 'task': 0.30; 'primary': 0.31; 'posting': 0.32; 'run': 0.33; 'problem': 0.33; 'url:python': 0.33; 'items.': 0.33; 'lets': 0.33; 'url:listinfo': 0.34; 'received:10.0': 0.34; 'running': 0.34; 'add': 0.34; 'server': 0.34; 'list': 0.34; 'best,': 0.35; 'ones': 0.35; 'next': 0.35; 'could': 0.35; 'tasks': 0.35; 'item': 0.35; 'problem.': 0.35; 'reply.': 0.35; 'but': 0.36; 'too': 0.36; 'there': 0.36; 'url:org': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'two': 0.37; 'being': 0.37; 'method': 0.37; 'say': 0.37; 'thanks': 0.37; 'version': 0.38; 'delete': 0.38; 'minimum': 0.38; 'subject:from': 0.39; 'url:mail': 0.40; 'where': 0.40; 'received:de': 0.40; 'space': 0.40; 'some': 0.40; 'future': 0.60; 'high': 0.60; 'your': 0.60; 'hope': 0.61; 'avoid': 0.61; 'here.': 0.62; 'more': 0.63; 'oldest': 0.66; 'here': 0.66; 'decided': 0.66; 'saving': 0.70; 'technology,': 0.71; 'increase': 0.73; 'links,': 0.76; 'obtained': 0.76; 'low': 0.83; 'complexity': 0.84; 'mention.': 0.84; 'stagnate': 0.84; 'expiration': 0.91; 'expiration.': 0.91; 'items,': 0.91; 'reducing': 0.93; 'approached': 0.95 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/simple; d=mail.de; s=mail201212; t=1452451737; bh=YXMMcMc9mc68lxFkgPR/LpFmKNmpPhTZkjyrUje/o0g=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=rtpA0q+n47HFsI+HLZYEl5DibkfYcIQkU9wK0edfZfHfPyPNtZgLIA+mPkERhhFoc FM2otUgk25kcSM+4JKWZM95GErfB/tNeofM5u2bCBGmbS/L0dzOUrWTYTvte089C2U HkPG2ufA+jGTzhiDTRZfAoi3YS54qQ8TGxWVKSWg= |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 |
| In-Reply-To | <CACs7g=CE7y-aXSdWbN7Fk0J4O4aRwAH5ad7UR1Jhdf9EpNFnug@mail.gmail.com> |
| X-purgate | clean |
| X-purgate | This mail is considered clean (visit http://www.eleven.de for further information) |
| X-purgate-type | clean |
| X-purgate-Ad | Categorized by eleven eXpurgate (R) http://www.eleven.de |
| X-purgate | This mail is considered clean (visit http://www.eleven.de for further information) |
| X-purgate | clean |
| X-purgate-size | 11622 |
| X-purgate-ID | 154282::1452451736-000009F3-53A7E439/0/0 |
| X-Content-Filtered-By | Mailman/MimeDel 2.1.20+ |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:101445 |
Show key headers only | View raw
Wow. That's an impressive reply. On 08.01.2016 20:26, srinivas devaki wrote: > So you need a task scheduler with expiration and priority on each task. > Interesting Problem.. > > as peter said about marking the task in heapB to be deleted. but this > needs searching everytime you pop off an old item [complexity: > O(len(heapB))]. you may as well use python del on it as complexity is > same. > But if you already know the where to look in the heapB then you can > avoid searching and thereby reducing the complexity. you can do this > by saving the references of heapB in heapA and viceversa > > and if you marking delete on a number of low priority tasks, then it > can increase your heapB enormously because being low priority items > they can stagnate. to resolve this error you have to perform linear > checking iterations at every critical length (this critical length can > be obtained mathmatically), so that your amortized complexity of push, > pop remains log(number_of_valid_tasks_at_any_time); > the number of valid tasks at any time are those tasks which are not > expired at the time. > > My Algorithm: > version_a: https://gist.github.com/9ce7a0e534c6e768239e > this version has simple scheduler which has methods for popping high > priority one and popping oldest task. > But this version has a disadvantage, i.e If lets say you are pushed > some 10**5 tasks with priority 2, and then pushed some 10**5 tasks > with priority 1. and then you decided to pop the oldest 10**5 items. > in this version that 10**5 elements will stagnate in priority heap if > in future all priorities are less than 1. > now this is not a big issue but if you are running a webserver and > over a span of 5 days there could be 10**10 tasks, and even if half of > those tasks stagnate your server is going to crash with out of memory. > > version_b: https://gist.github.com/99b4d590753ba234eeed > this version resolved that stagnation. this one will run sweeps > whenever there are more than half of useless items, thereby giving us > an amortized complexity of O(log(n)) for push, pop, etc > > but this version doesn't have the feature of expiration. > > version_c: https://gist.github.com/9dfd0d291672c0ffa5c3 > in this one we simply keep a variable expiration, and relieve the > expired tasks on any operation. i have coded it in such a way that > expiration is specific time, you can change it to delta time if you > want to. > > Time Complexity: O(log(n)) insertion, O(log(n)) deletion [amortized] > Space Complexity: O(n) [amortized] > here n is number of valid items i.e which are not expired. > > I hope this helps with your problem Indeed. I already do the sweep method as you suggested. ;) Additionally, you provided me with a reasonable condition when to do the sweep in order to achieve O(log n). Thanks much for that. I currently used a time-bases approached (sweep each 20 iterations). PS: Could you add a note on how you got to the condition ( 2*self.useless_b > len(self.heap_b))? > PS: > sorry for posting links, it's just that the code is large for email. > I'm using minimum number has highest priority convention. I like Web technology, so no problem here. :) > On Fri, Jan 8, 2016 at 10:15 PM, Sven R. Kunze <srkunze@mail.de> wrote: >> Thanks for your suggestion. >> >> On 08.01.2016 14:21, srinivas devaki wrote: >>> You can create a single heap with primary key as timestamp and >>> secondary key as priority, i.e by creating a tuple >>> insert the elements into the heap as >>> (timestamp, priority) >> I think I cannot use that because I need the list sorted by both criteria. >>> If there is any underlying meaning for creating 2 heaps. please mention. >> >> I use two heaps because I need to insert arbitrary items fast and remove the >> ones fast which are too old (timestamp) or are next in order (priority). >> >> Basically a task scheduler where tasks can be thrown away once they are too >> long in the queue. >> >> >>> On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze <srkunze@mail.de> wrote: >>>> Hi everybody, >>>> >>>> suppose, I need items sorted by two criteria (say timestamp and >>>> priority). >>>> For that purpose, I use two heaps (heapq module): >>>> >>>> heapA # items sorted by timestamp >>>> heapB # items sorted by priority >>>> >>>> Now my actual problem. When popping an item of heapA (that's the oldest >>>> item), I need to remove the very same item from heapB, regardlessly where >>>> it >>>> is in heapB. And vice versa. >>>> >>>> Is there a datastructure or a simple trick to achieve that in an >>>> efficient >>>> matter? >>>> >>>> Best, >>>> Sven >>>> -- >>>> https://mail.python.org/mailman/listinfo/python-list >>
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: How to remove item from heap efficiently? "Sven R. Kunze" <srkunze@mail.de> - 2016-01-10 19:48 +0100
csiph-web