Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.022 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'argument': 0.05; 'tree': 0.05; '(b)': 0.07; 'indeed,': 0.09; 'indexes': 0.09; 'linear': 0.09; 'lookup': 0.09; 'cc:addr:python-list': 0.11; '12:59': 0.16; 'argument.': 0.16; 'begin.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'roy': 0.16; 'index': 0.16; 'wrote:': 0.18; 'else,': 0.19; '(the': 0.22; 'memory': 0.22; 'cc:addr:python.org': 0.22; '(a)': 0.24; 'cc:2**0': 0.24; 'sort': 0.25; "i've": 0.25; 'extension': 0.26; 'somewhere': 0.26; 'header :In-Reply-To:1': 0.27; 'am,': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'node': 0.31; 'probably': 0.32; 'operations': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'disk': 0.36; 'ram': 0.36; 'example,': 0.37; 'two': 0.37; 'minimum': 0.38; 'problems': 0.38; 'improving': 0.38; 'handle': 0.38; 'expensive': 0.39; 'sure': 0.39; 'read': 0.60; 'chain': 0.60; "you're": 0.61; 'advanced': 0.63; 'mar': 0.68; 'beat': 0.68; 'reads': 0.68; 'smith': 0.68; 'records,': 0.69; 'walk': 0.74; 'million': 0.74; 'collision': 0.84; 'pages),': 0.84; 'thing,': 0.91; 'to:none': 0.92; 'reducing': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=jyj7HToTcYj65w0431xRFZebxkdQ36b1SstC1BHqPfo=; b=gztaWZfshD3h7hnDTm/bJWjpeMiePTCVlXYSVHDWLHy7f6AXO65y3w1syHTD+EgfHI 6fOZ7pHE62wFKS36jXcIx82Ib4tWZ/Yp6Vn2K/QyImvc/lzXn5duuMwL/OweGNiWelQP 2rRYkjTo/cl1K7xg+jr+f+WYRyCd43OjzdYucrcy3Ju2XVPUt3QKLV0Q8+ATFR9plQy8 8rzFl4fcEJsktSpZ5BBDk/SXjaAjwYRX6pihOo0y4awNHaYHnqzXbzXHKnLytYUKeO2f pEzkutLtN1QCEJ7oXmdwxiH/FaLdwe1iWKHV0ITabrASbb4HfR2qw2gPKite6J69D2FB i1EQ== MIME-Version: 1.0 X-Received: by 10.68.112.164 with SMTP id ir4mr5049507pbb.153.1394468405600; Mon, 10 Mar 2014 09:20:05 -0700 (PDT) In-Reply-To: References: <87eh2d3x8h.fsf_-_@elektro.pacujo.net> <87eh2ctmht.fsf@elektro.pacujo.net> <87zjkz3eo3.fsf@elektro.pacujo.net> <87a9czrrax.fsf@elektro.pacujo.net> <531d3058$0$29994$c3e8da3$5496439d@news.astraweb.com> <87eh2atw6s.fsf@elektro.pacujo.net> Date: Tue, 11 Mar 2014 03:20:05 +1100 Subject: Re: Balanced trees From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 22 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1394468411 news.xs4all.nl 2850 [2001:888:2000:d::a6]:33188 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:68156 On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith wrote: > The hash vs. tree argument can get very complicated. For example, if > your tree is not completely resident in memory, the cost of paging in a > node will swamp everything else, and improving lookup speed will boil > down to reducing the number of I/O operations you need. An expensive > hash plus a linear walk through a collision chain which was resident in > a single memory block will beat traversing two nodes, each of which had > to be paged in separately. Indeed, which is broadly an extension of the "cache locality" argument. I've never actually yearned for any of the advanced operations a tree can give (over a hash table). Usually, by the time I'm looking for that sort of thing, I really want an on-disk database - that solves all the problems of paging (the DBMS will make sure it reads in a minimum of index and data pages), and a good SQL database can handle multiple indexes in a space-efficient way. Plus, what you said about log 1,000,000? By the time you're looking at a million records, you probably (a) need to have them on disk somewhere anyway, and (b) don't want to read them all into RAM before you begin. ChrisA