Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'tree': 0.05; 'assignment': 0.07; 'explicit': 0.07; 'pypi': 0.07; 'removes': 0.07; 'happens.': 0.09; 'inclusion': 0.09; 'input,': 0.09; 'lawrence': 0.09; 'merging': 0.09; 'pep': 0.09; 'promising': 0.09; 'rejected': 0.09; 'structure,': 0.09; 'python': 0.11; 'containers': 0.16; 'deletion': 0.16; 'doubles': 0.16; 'fifo': 0.16; 'foolishly': 0.16; 'jumps': 0.16; 'pruning': 0.16; 'rarely': 0.16; 'rather,': 0.16; 'reason.': 0.16; 'slicing,': 0.16; 'sorts': 0.16; 'trees:': 0.16; 'url:peps': 0.16; 'url:)': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; 'library': 0.18; 'looked': 0.18; '(but': 0.19; 'unlike': 0.19; 'solution.': 0.20; 'fit': 0.20; 'finally,': 0.24; 'replace': 0.24; 'url:dev': 0.24; 'java': 0.24; "i've": 0.25; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; "we'd": 0.29; '(like': 0.30; 'especially': 0.30; 'message- id:@mail.gmail.com': 0.30; '(which': 0.31; 'went': 0.31; 'code': 0.31; 'equivalent.': 0.31; 'fast.': 0.31; 'idea,': 0.31; 'libraries': 0.31; 'loads': 0.31; 'lists': 0.32; 'url:python': 0.33; 'cases': 0.33; 'to:name:python-list': 0.33; 'copying': 0.34; 'could': 0.34; 'operations': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'there': 0.35; 'really': 0.36; 'instances': 0.36; 'largely': 0.36; 'library.': 0.36; 'much.': 0.36; 'useful': 0.36; 'url:org': 0.36; 'two': 0.37; 'list': 0.37; 'list.': 0.37; 'performance': 0.37; 'sometimes': 0.38; 'branch': 0.38; 'needed': 0.38; 'whatever': 0.38; 'to:addr :python-list': 0.38; 'specialist': 0.39; 'though,': 0.39; 'to:addr:python.org': 0.39; 'enough': 0.39; 'how': 0.40; 'most': 0.60; 'march': 0.61; 'become': 0.64; 'more': 0.64; 'different': 0.65; 'linked': 0.65; 'effectively': 0.66; 'worth': 0.66; 'believe': 0.68; 'benefit': 0.68; 'advantages': 0.68; 'nobody': 0.68; 'home.': 0.72; 'further,': 0.74; 'obvious': 0.74; 'balanced': 0.84; 'misses': 0.84; 'much,': 0.84; 'pocket': 0.84; 'points,': 0.84; 'scarily': 0.84; 'demand': 0.91; 'good,': 0.91; 'obvious,': 0.91; 'url:ru': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding; bh=SOrwALdS13lh94gA6bFOmReZx8zqhr0cF+qpFE4viDY=; b=MNcYIr5RehvSf54V+H9PBjap5yMQhBuf1UrXqwIbt+8ag0ppmL4AIbId5CRDE6WdF1 vnSQXtw89eJ3FVby3TYKbIXng2mKZL8uUvoyD0z5JnuT+zjHPXFzPg577yP5+xrqRUdQ vMHwF9mbHfIEdgLdeolczcWVr6qwzU4bPu4z0XAeS3xXvY7ODhq3N7MypCToKIJQlbZA tGWFjEnB6TQhTPpUVhlfyU3E9jrlwHwcpVfYSGUIEXQCbJFjn2sPCAhMI0Vget5h4K9l 20S0KaOlD4Dtwu7kq8DaqeDgbuIuaH5Pa6BdzpzBs98tpjSdBqdyqn3U+4Lk4El7ODi6 96Bg== X-Received: by 10.152.235.3 with SMTP id ui3mr7714922lac.2.1394846034318; Fri, 14 Mar 2014 18:13:54 -0700 (PDT) MIME-Version: 1.0 Sender: joshua.landau.ws@gmail.com In-Reply-To: References: <87eh2d3x8h.fsf_-_@elektro.pacujo.net> From: Joshua Landau Date: Sat, 15 Mar 2014 01:13:13 +0000 X-Google-Sender-Auth: Nw-O2X6BDvH29H00qRt23GmkRtk Subject: Re: Balanced trees To: python-list Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 74 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1394846042 news.xs4all.nl 2897 [2001:888:2000:d::a6]:44768 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:68364 On 8 March 2014 20:37, Mark Lawrence wrote: > I've found this link useful http://kmike.ru/python-data-structures/ > > I also don't want all sorts of data structures added to the Python librar= y. > I believe that there are advantages to leaving specialist data structures= on > pypi or other sites, plus it means Python in a Nutshell can still fit in > your pocket and not a 40 ton articulated lorry, unlike the Java equivalen= t. The thing we really need is for the blist containers to become stdlib (but not to replace the current list implementation). The rejected PEP (http://legacy.python.org/dev/peps/pep-3128/) misses a few important points, largely in how the "log(n)" has a really large base: random.choice went from 1.2=C2=B5s to 1.6=C2=B5s from n=3D1 to n=3D10=E2=81= =B8, vs 1.2=C2=B5s for a standard list. Further, it's worth considering a few advantages: * copy is O(1), allowing code to avoid mutation by just copying its input, which is good practice. * FIFO is effectively O(1), as the time just about doubles from n=3D1 to n=3D10=E2=81=B8 so will never actually branch that much. There is still a s= peed benefit of collections.deque, but it's much, much less significant. This is very useful when considering usage as a multi-purpose data structure, and removes demand for explicit linked lists (which have foolishly been reimplemented loads of times). * It reduces demand for trees: * There are efficient implementations of sortedlist, sortedset and sorteddict. * Slicing, slice assignment and slice deletion are really fast. * Addition of lists is sublinear. Instead of "list(itertools.chain(...))", one can add in a loop and end up *faster*. I think blist isn't very popular not because it isn't really good, but because it isn't a specialised structure. It is, however, almost there for almost every circumstance. This can help keep the standard library clean, especially of tree data structures. Here's what we kill: * Linked lists and doubly-linked lists, which are scarily popular for whatever reason. Sometimes people claim that collections.deque isn't powerful enough for whatever they want, and blist will almost definitely sate those cases. * Balanced trees, with blist.sortedlist. This is actually needed right now. * Poor performance in the cases where a lot of list merging and pruning hap= pens. * Most uses of bisect. * Some instances where two data structures are used in parallel in order to keep performance fast on disparate operations (like `x in y` and `y[i]`). Now, I understand there are downsides to blist. Particularly, I've looked through the "benchmarks" and they seem untruthful. Further, we'd need a maintainer. Finally, nobody jumps at blists because they're rarely the obvious solution. Rather, they attempt to be a different general solution. Hopefully, though, a stdlib inclusion could make them a lot more obvious, and support in some current libraries could make them feel more at home. I don't know whether this is a good idea, but I do feel that it is more promising and general than having a graph in the standard library.