Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'feedback.': 0.04; 'third- party': 0.04; '(python': 0.07; 'cache': 0.07; 'explicit': 0.07; 'modified': 0.07; 'pypy': 0.07; 'versions.': 0.07; 'below).': 0.09; 'caching,': 0.09; 'executed': 0.09; 'inclusion': 0.09; 'keys,': 0.09; 'lst': 0.09; 'meaningful': 0.09; 'part,': 0.09; 'type,': 0.09; 'url:github': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'bug': 0.12; 'mostly': 0.14; 'random': 0.14; '"create': 0.16; '"from': 0.16; '"create': 0.16; '__lt__': 0.16; 'badly.': 0.16; 'both,': 0.16; 'dwarfed': 0.16; 'from:addr:stutzbach': 0.16; 'from:name:daniel stutzbach': 0.16; 'index.': 0.16; 'keys.': 0.16; 'list",': 0.16; 'mangled': 0.16; 'once.': 0.16; 'ported': 0.16; 'sorting': 0.16; 'stutzbach': 0.16; 'unfair': 0.16; 'url:issues': 0.16; 'wow,': 0.16; '\xc2\xa0i': 0.16; '\xc2\xa0if': 0.16; 'so.': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'thanks.': 0.20; '>>>': 0.22; 'import': 0.22; 'cc:addr:python.org': 0.22; '>>>': 0.24; 'issue,': 0.24; "shouldn't": 0.24; 'typical': 0.24; 'mon,': 0.24; 'cc:2**0': 0.24; 'sort': 0.25; '>': 0.26; '(see': 0.26; 'compare': 0.26; 'daniel': 0.26; 'extension': 0.26; 'first,': 0.26; 'header:In- Reply-To:1': 0.27; 'list:': 0.30; 'originally': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'gives': 0.31; 'explained': 0.31; 'filed': 0.31; 'overhead': 0.31; 'though.': 0.31; 'this.': 0.32; 'run': 0.32; 'limitation': 0.33; 'skip:& 30': 0.33; 'skip:b 30': 0.33; 'there,': 0.34; "i'd": 0.34; 'could': 0.34; 'agree': 0.35; 'classes': 0.35; 'created': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'thanks': 0.36; 'similar': 0.36; 'should': 0.36; 'too': 0.37; 'two': 0.37; 'list': 0.37; 'list.': 0.37; 'step': 0.37; 'represent': 0.38; 'skip:& 10': 0.38; 'pm,': 0.38; 'rather': 0.38; 'expect': 0.39; 'skip:& 20': 0.39; 'aspects': 0.39; 'though,': 0.39; 'changed': 0.39; 'skip:x 10': 0.40; 'how': 0.40; 'even': 0.60; 'skip:\xc2 10': 0.60; 'lower': 0.61; "you're": 0.61; 'first': 0.61; "you'll": 0.62; 'kind': 0.63; 'myself': 0.63; 'pick': 0.64; 'more': 0.64; 'different': 0.65; 'love': 0.65; 'relatively': 0.65; 'between': 0.67; 'mar': 0.68; 'advantages': 0.68; 'improvement.': 0.68; 'results': 0.69; 'risk': 0.72; 'obvious': 0.74; 'ranges': 0.74; 'benchmark': 0.84; 'claim.': 0.84; 'difference.': 0.84; 'obvious.': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=idIH7PG199AiEGbScbCAIE/2GDlOWqjjRDiahIOJW5Q=; b=ff+mU9fetBAoEQh6Lm9kNyMpEBwLa0KUsHMYsfesvql9600Yx0u3687r/MdL9Xz4cL SShCKKWRFvdLOgL5iCet7zqwq6vs+kuM43fS1o0vAetTxL/uJKKszX37rkEg1Nj2+SiM a0dwzm5kUmudDgfnn58MevOJrwo+xFD/tj8kJmAsENPFDF/pR9wBmai5J2NIEUqzC5yf DrX0LHlVGizSYyA6AByY4L5O3FbLuFHXFjk9QzRvBUcn56puPVktQXmIBhxw9qi/yy+K 0pwBtBmqI/u1UuBgPMgOGlJanrajWpdbY+ad3XkTYI77c2OPNvL0CQcvRdKGZPnK1atf fHrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=idIH7PG199AiEGbScbCAIE/2GDlOWqjjRDiahIOJW5Q=; b=g4+0g6q9judPDwmvJErqbB6iFnjbHxXvZ/7iV+587owagDgQ1C1aef7x2uxYhlec86 xoeA6ItNvv5ECEK/6YIYNqbivmxjYj2xMuoeglE10O7eeUAd2MortuYEo3oI5KbvM3dq ygLS02ggCJgUDT5eCFNDKxRb2vDR5UtOy0YqCjL5n1PPS61gpkJ3zDB2jgL+cg3LFR5z +hcis3keBKZLA7enP50ywbulhmUx/zLrR5cErrSXRthGp525t5VxtV0us+q7Z8thdBzg Q5/Mu8fdFFMnZHDvK4fkIItLzZ/XkM4uzY5O6GEl/rlQeHwvDna404E67BqXW26taRkx 2kZA== X-Gm-Message-State: ALoCoQm4hodImezaE6lKnX7qhp/jdp3A3BK8//7xKDtyfSzI+/IBBqdk9TkOazghxoxNSKHjXk/xCXR12ytIXK/NAhf0F8R1GC1CUuyWZgdAJZTuEaiwuNGJRdyAwjqoO1uqdXc0NPeeb/AW/sM/GMnflOonp20K5RofcdwOZR4nHkYB8/dtwdngP8TmllwdGwjOpNGQOZVk4KG2wr5KeB7L8zjHrme45A== X-Received: by 10.60.142.166 with SMTP id rx6mr13620oeb.57.1395104524997; Mon, 17 Mar 2014 18:02:04 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <87eh2d3x8h.fsf_-_@elektro.pacujo.net> From: Daniel Stutzbach Date: Mon, 17 Mar 2014 18:01:24 -0700 Subject: Re: Balanced trees To: Joshua Landau Content-Type: multipart/alternative; boundary=047d7b2e0b29a3b10004f4d715b1 Cc: python-list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 226 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1395104535 news.xs4all.nl 2863 [2001:888:2000:d::a6]:35348 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:68474 --047d7b2e0b29a3b10004f4d715b1 Content-Type: text/plain; charset=UTF-8 On Mon, Mar 17, 2014 at 5:08 PM, Joshua Landau wrote: > Thanks. First, I want to state that there are two aspects to my > claim. The first is that these benchmarks to not represent typical > use-cases. I will not go too far into this, though, because it's > mostly obvious. > I would love to have include macro-benchmarks. I keep waiting for the PyPy benchmark suite to get ported to Python 3... > "Create from an iterator" gives me relatively different results when I > run it (Python 3). > The graphs were originally created to compare vanilla Python with a Python modified to use blist as the built-in list type. I think I used Python 3.1, but I'm not certain. As I recall, the built-in type has a few small advantages over any third-party extension type, so that might be what you're seeing. Alternately, something may have changed between Python versions. > "Delete a slice" is fudged from its inclusion of multiplication, which > is far faster on blists. I admit that it's not obvious how to fix > this. > I could move the initialization into the timed part, similar to what I did for sort (see below). That has downsides too, of course, but it might be an improvement. > "First in, first out (FIFO)" should be "x.append(0); x.pop(0)". > Wow, I mangled that one badly. > "Last in, first out (LIFO)" should use "pop()" over "pop(-1)", > although I admit it shouldn't make a meaningful difference. > I like pop(-1) because it's explicit rather than implicit. I agree it shouldn't make a meaningful difference. > "Sort *" are really unfair because they put initialisation in the > timed part That's a limitation of timeit. The setup step is only executed once. If I put the initialization there, every sort after the first one would be sorting a pre-sorted list. If you compare the "Create form an iterator" and "Sort a random list", you'll see that the initialization cost is dwarfed by the sorting cost for n > 15 or so. > and all have keys. If you use classes with __lt__ methods instead of keys, the cost is dominated by the calls to __lt__. You're right that I should include both, though. > >>> python -m timeit -s "from random import choice; import blist; lst = > blist.blist(range(10**0))" "choice(lst)" > 1000000 loops, best of 3: 1.18 usec per loop > > >>> python -m timeit -s "from random import choice; import blist; lst = > blist.blist(range(10**8))" "choice(lst)" > 1000000 loops, best of 3: 1.56 usec per loop > > Lower size ranges are hidden by the function-call overhead. > Perhaps this effect is to do with caching, in which case the limits of > the cache should be explained more readily. > That's definitely a cache issue, which is always a risk with micro-benchmarks. I see growth even for the built-in list: gnusto:~$ python -m timeit -s "from random import choice; lst = list(range(10**0))" "choice(lst)" 1000000 loops, best of 3: 0.349 usec per loop gnusto:~$ python -m timeit -s "from random import choice; lst = list(range(10**8))" "choice(lst)" 1000000 loops, best of 3: 0.634 usec per loop I agree it's more interesting to pick items randomly instead of always querying the same index. The overhead of choice() is kind of a problem, though. Since I'm only plotting up to 10**5, I'd expect these to look more or less flat. Thanks for all of the feedback. I filed a bug with myself to improve the metrics: https://github.com/DanielStutzbach/blist/issues/64 -- Daniel Stutzbach --047d7b2e0b29a3b10004f4d715b1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On M= on, Mar 17, 2014 at 5:08 PM, Joshua Landau <joshua@landau.ws>= wrote:
Thanks. =C2=A0First, I want to state that there are two aspects= to my
claim. The first is that these benchmarks to not represent typical
use-cases. I will not go too far into this, though, because it's
mostly obvious.

I would love to have in= clude macro-benchmarks. =C2=A0I keep waiting for the PyPy benchmark suite t= o get ported to Python 3...
=C2=A0
"Create from an iterator" gives me relatively different results w= hen I
run it (Python 3).

The graphs were orig= inally created to compare vanilla Python with a Python modified to use blis= t as the built-in list type. =C2=A0I think I used Python 3.1, but I'm n= ot certain. =C2=A0As I recall, the built-in type has a few small advantages= over any third-party extension type, so that might be what you're seei= ng. =C2=A0Alternately, something may have changed between Python versions.<= /div>
=C2=A0
"Delete a slice" is fudged from its inclusion of multiplication, = which
is far faster on blists. I admit that it's not obvious how to fix
this.

I could move the initialization i= nto the timed part, similar to what I did for sort (see below). =C2=A0That = has downsides too, of course, but it might be an improvement.
=C2= =A0
"First in, first out (FIFO)" should be "x.append(0); x.pop(0= )".

Wow, I mangled that one badly.=
=C2=A0
"Last in, first out (LIFO)" should use "pop()" over &qu= ot;pop(-1)",
although I admit it shouldn't make a meaningful difference.

I like pop(-1) because it's explicit rather th= an implicit. =C2=A0I agree it shouldn't make a meaningful difference.
=C2=A0
"Sort *" are really unfair because they put initialisation in the=
timed part

That's a limitation of timei= t. =C2=A0The setup step is only executed once. =C2=A0If I put the initializ= ation there, every sort after the first one would be sorting a pre-sorted l= ist. =C2=A0If you compare the "Create form an iterator" and "= ;Sort a random list", you'll see that the initialization cost is d= warfed by the sorting cost for n > 15 or so.
=C2=A0
and all have keys.

=
If you use classes with __lt__ methods instead of keys, the cost is do= minated by the calls to __lt__. =C2=A0You're right that I should includ= e both, though.
=C2=A0
>>> python -m timeit -s "from random import choice; import bl= ist; lst =3D blist.blist(range(10**0))" "choice(lst)"
1000000 loops, best of 3: 1.18 usec per loop

>>> python -m timeit -s "from random import choice; import bl= ist; lst =3D blist.blist(range(10**8))" "choice(lst)"
1000000 loops, best of 3: 1.56 usec per loop

Lower size ranges are hidden by the function-call overhead.
Perhaps this effect is to do with caching, in which case the limits of
the cache should be explained more readily.

=
That's definitely a cache issue, which is always a risk with micro= -benchmarks. =C2=A0I see growth even for the built-in list:

<= /div>
gnusto:~$ python -m timeit -s "from random import choic= e; lst =3D list(range(10**0))" "choice(lst)"
10000= 00 loops, best of 3: 0.349 usec per loop
gnusto:~$ python -= m timeit -s "from random import choice; lst =3D list(range(10**8))&quo= t; "choice(lst)"
1000000 loops, best of 3: 0.634 usec per loop

I agree it's more interesting to pick items randomly instead of = always querying the same index. =C2=A0The overhead of choice() is kind of a= problem, though. =C2=A0Since I'm only plotting up to 10**5, I'd ex= pect these to look more or less flat.

Thanks for all of the feedback. =C2=A0I filed a bug wit= h myself to improve the metrics:

--
Daniel Stutzbach
--047d7b2e0b29a3b10004f4d715b1--