Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <87wqfrp3jk.fsf@elektro.pacujo.net>
References: <lennh4$kpm$1@cabale.usenet-fr.net> <CAPTjJmomgZSFj7TanBn8_qXP4ULhCm85abJ5=2PcikXJkrH6GQ@mail.gmail.com> <CAN1F8qU8AJ+Z-Rb4+wGsYWd_bS5aOA3bsjd5NjF46MUvnecSeQ@mail.gmail.com> <mailman.7473.1393598638.18130.python-list@python.org> <XnsA2E95FA1E1EB6duncanbooth@127.0.0.1> <bnvctpF5vanU1@mid.individual.net> <mailman.7920.1394252278.18130.python-list@python.org> <87eh2d3x8h.fsf_-_@elektro.pacujo.net> <CAGGBd_qU3Zp3A4pymnDQfWynWZwFVrdHJpG=U0WZTap4HiymdA@mail.gmail.com> <lffv32$mqo$1@ger.gmane.org> <mailman.8154.1394846042.18130.python-list@python.org> <87mwgoqy4k.fsf@elektro.pacujo.net> <mailman.8257.1395170778.18130.python-list@python.org> <8738ifqlaw.fsf@elektro.pacujo.net> <mailman.8263.1395179161.18130.python-list@python.org> <87wqfrp3jk.fsf@elektro.pacujo.net>
Date: Tue, 18 Mar 2014 15:21:28 -0700
Subject: Re: Balanced trees
From: Dan Stromberg <drsalists@gmail.com>
To: Marko Rauhamaa <marko@pacujo.net>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Python List <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.8264.1395181297.18130.python-list@python.org>
Lines: 24
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:68523

On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Dan Stromberg <drsalists@gmail.com>:
> For a proper comparison, I'd like a fixed, identical dataset and set of
> operations run against each data structure.
>
> How about this test program:

I used to do essentially this, but it was time-prohibitive and
produced harder-to-read graphs - harder to read because the enormous
values of the bad trees were dwarfing the values of the good trees.

Imagine doing 100000000 operation tests for the unbalanced binary
tree. For a series of random keys, it would do quite well (probably
second only to dict), but for a series of sequential keys it would
take longer than anyone would reasonably want to wait because it's
basically a storage-inefficient linked list.

Rather than throw out unbalanced binary tree altogether, it makes more
sense to run it until it gets "too slow".

The workload+interpreter pairs are all tested the same way, it's just
that the ones that are doing badly are thrown out before they're able
to get a lot worse. Studying the graphs will likely help develop an
intuition for what's happening.