Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'source,': 0.04; 'tree': 0.05; 'consistency': 0.09; 'falls': 0.09; 'function:': 0.09; 'worse': 0.09; 'cc:addr:python-list': 0.11; 'def': 0.12; "wouldn't": 0.14; '"user",': 0.16; '12:59': 0.16; 'fields.)': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'roy': 0.16; 'user-defined': 0.16; 'worse.': 0.16; 'worst': 0.16; 'wrote:': 0.18; 'saying': 0.22; 'cc:addr:python.org': 0.22; 'cc:2**0': 0.24; 'compare': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'involving': 0.30; 'message- id:@mail.gmail.com': 0.30; 'bunch': 0.31; 'convenience': 0.31; 'node': 0.31; 'table': 0.34; 'skip:_ 10': 0.34; 'could': 0.34; 'case,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; "you're": 0.61; 'more': 0.64; 'here': 0.66; 'between': 0.67; 'mar': 0.68; 'smith': 0.68; 'records': 0.73; 'million': 0.74; '20x': 0.84; 'songs': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=qTg39UKCq3cyvxGL69DkLnfECGaxOrvQHsWoM7/astA=; b=mcJl1Iemb6SppCkGvFjycXNJaS41/d7v13Pw0qbFTVDADuUfESjiJeIQXMVniJBtHL y9FhkF3858tXsmkSjj+/GF6gmowwK5Fsh7ln3lDK/CSnmmCDAgHHxXarXBg31yb8iOF0 F7CeeBc/AlkOJp+PWtP2zTsY4zvyzlc1a8OgCJ8tPwN7+AG7Q/TtlN0Y9vwav1lvatzG tQhfy+uekNcHn/0i4abAK0bWLMXunokx6LHzeIE6q1z8em78HesYc+MTmdOAwYCeAdX3 rk0rLwj3JUJIcGk3uVKB25bma7OugPIcbxPVAwM5gmRBMjU+F8HL0Vpx/nxZU2wSjNZk KJYA== MIME-Version: 1.0 X-Received: by 10.66.118.71 with SMTP id kk7mr41338254pab.14.1394468671357; Mon, 10 Mar 2014 09:24:31 -0700 (PDT) In-Reply-To: References: <87eh2d3x8h.fsf_-_@elektro.pacujo.net> <87eh2ctmht.fsf@elektro.pacujo.net> <87zjkz3eo3.fsf@elektro.pacujo.net> <87a9czrrax.fsf@elektro.pacujo.net> <531d3058$0$29994$c3e8da3$5496439d@news.astraweb.com> <87eh2atw6s.fsf@elektro.pacujo.net> Date: Tue, 11 Mar 2014 03:24:31 +1100 Subject: Re: Balanced trees From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 27 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1394468681 news.xs4all.nl 2897 [2001:888:2000:d::a6]:38366 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:68157 On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith wrote: > Looking at the Songza source, I see we have one user-defined hash > function: > > def __hash__(self): > return hash((self.song, > self.user, > self.add_time, > self.delete_time, > self.play_first)) > > where those things are 2 bson.ObjectId's, 2 datetimes, and a boolean. I > wouldn't be surprised if that falls into the 20x node traversal bucket. > In this case, convenience was more important than speed, so it doesn't > matter. The only difference between a tree and a hash here is that the tree might be able to short-cut the comparisons. But if there are a whole bunch of songs with the same "song" and "user", then the tree has to compare (song->song? same; user->user? same; add_time->add_time? left/right) multiple times. So what you're saying is that the hash table has consistency but the tree could do better or could do worse. (Imagine, worst case, all one million records have the same song/user/add_time and you need to do twenty comparisons involving four fields. That's gotta be worse than one hashing of five fields.) ChrisA