Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #45770 > unrolled thread

Ordered dictionaries compared

Started byDan Stromberg <drsalists@gmail.com>
First post2013-05-22 20:31 -0700
Last post2013-05-24 00:57 +0100
Articles 4 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  Ordered dictionaries compared Dan Stromberg <drsalists@gmail.com> - 2013-05-22 20:31 -0700
    Re: Ordered dictionaries compared duncan smith <buzzard@invalid.invalid> - 2013-05-23 17:41 +0100
      Re: Ordered dictionaries compared Dan Stromberg <drsalists@gmail.com> - 2013-05-23 10:44 -0700
        Re: Ordered dictionaries compared duncan smith <buzzard@invalid.invalid> - 2013-05-24 00:57 +0100

#45770 — Ordered dictionaries compared

FromDan Stromberg <drsalists@gmail.com>
Date2013-05-22 20:31 -0700
SubjectOrdered dictionaries compared
Message-ID<mailman.1989.1369279892.3114.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

What kind of ordered dictionaries?  Sorted by key.

I've redone the previous comparison, this time with a better red-black tree
implementation courtesy of Duncan G. Smith.

The comparison is at
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/just-trees/

The Red-Black tree gave a much better showing this time, but it gave just
one 2nd place on one workload-interpreter - still kinda lackluster.  It
took 1st place 0 times.

[toc] | [next] | [standalone]


#45822

Fromduncan smith <buzzard@invalid.invalid>
Date2013-05-23 17:41 +0100
Message-ID<519e46c0$0$26690$862e30e2@ngroups.net>
In reply to#45770
On 23/05/13 04:31, Dan Stromberg wrote:
>
> What kind of ordered dictionaries?  Sorted by key.
>
> I've redone the previous comparison, this time with a better red-black
> tree implementation courtesy of Duncan G. Smith.
>
> The comparison is at
> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/just-trees/
>
> The Red-Black tree gave a much better showing this time, but it gave
> just one 2nd place on one workload-interpreter - still kinda
> lackluster.  It took 1st place 0 times.
>
>

A quick test of my Red Black Tree and Treap (Python 2.7).


 >>> def test_trees(data, randomize=True):
	cpy = data[:] # for deletion
	if randomize:
		random.shuffle(data)
		random.shuffle(cpy)
	t = binary_trees.RedBlackTree()
	start = time.time()
	for datum in data:
		t.insert(datum)
	print 'Red Black Tree insertion %s' % (time.time() - start)
	start = time.time()
	for datum in data:
		t.find(datum)
	print 'Red Black Tree find %s' % (time.time() - start)
	start = time.time()
	for datum in cpy:
		t.delete(datum)
	print 'Red Black Tree deletion %s' % (time.time() - start)
	t = binary_trees.Treap()
	start = time.time()
	for datum in data:
		t.insert(datum)
	print
	print 'Treap insertion %s' % (time.time() - start)
	start = time.time()
	for datum in data:
		t.find(datum)
	print 'Treap find %s' % (time.time() - start)
	start = time.time()
	for datum in cpy:
		t.delete(datum)
	print 'Treap deletion %s' % (time.time() - start)

	
 >>> test_trees(range(100000))
Red Black Tree insertion 5.42807197571
Red Black Tree find 1.58799219131
Red Black Tree deletion 3.87580800056

Treap insertion 6.79647684097
Treap find 2.11693120003
Treap deletion 4.61243915558
 >>>
 >>> test_trees(range(100000), False)
Red Black Tree insertion 6.29647898674
Red Black Tree find 1.157143116
Red Black Tree deletion 2.74785804749

Treap insertion 3.87288999557
Treap find 1.48776102066
Treap deletion 1.88962197304
 >>>


RBT is quicker than Treap for insertion with randomized data, but slower 
with ordered data. Randomized data will tend to minimize the number of 
tree rotations needed to keep the RBT balanced, whilst the Treap will be 
performing rotations to maintain the heap property in an already 
reasonably well balanced tree. With ordered data the RBT will have to 
work harder to keep the tree balanced, whilst the Treap will be able to 
maintain the heap property with fewer rotations.

No surprise that find() is generally quicker for RBTs, they tend to be 
better balanced.

Deletion is a bit more confusing. I suppose deletion from a better 
balanced tree will tend to be quicker, but deletion from a treap 
constructed from ordered data is (for some reason) quickest of all.

All these operations require a call to find(), and that is generally 
going to be quicker for RBTs. Treaps tend to require fewer subsequent 
rotations, but they have variable worth (in terms of rebalancing).

Looks like RBTs are better than treaps if they are being populated with 
randomly ordered data, but not if they are being populated with ordered 
data. RBTs are better for use cases that are heavy on finds.

Both types of tree appear to be better balanced (on the basis of the 
find results) if populated from ordered data. Treaps appear to perform 
better on insertion, find and deletion when populated from ordered data.

Duncan

[toc] | [prev] | [next] | [standalone]


#45826

FromDan Stromberg <drsalists@gmail.com>
Date2013-05-23 10:44 -0700
Message-ID<mailman.2024.1369331102.3114.python-list@python.org>
In reply to#45822

[Multipart message — attachments visible in raw view] — view raw

On Thu, May 23, 2013 at 9:41 AM, duncan smith <buzzard@invalid.invalid>wrote:

>
> RBT is quicker than Treap for insertion with randomized data, but slower
> with ordered data. Randomized data will tend to minimize the number of tree
> rotations needed to keep the RBT balanced, whilst the Treap will be
> performing rotations to maintain the heap property in an already reasonably
> well balanced tree. With ordered data the RBT will have to work harder to
> keep the tree balanced, whilst the Treap will be able to maintain the heap
> property with fewer rotations.
>
> No surprise that find() is generally quicker for RBTs, they tend to be
> better balanced.
>
> Deletion is a bit more confusing. I suppose deletion from a better
> balanced tree will tend to be quicker, but deletion from a treap
> constructed from ordered data is (for some reason) quickest of all.
>
> All these operations require a call to find(), and that is generally going
> to be quicker for RBTs. Treaps tend to require fewer subsequent rotations,
> but they have variable worth (in terms of rebalancing).
>
> Looks like RBTs are better than treaps if they are being populated with
> randomly ordered data, but not if they are being populated with ordered
> data. RBTs are better for use cases that are heavy on finds.
>
> Both types of tree appear to be better balanced (on the basis of the find
> results) if populated from ordered data. Treaps appear to perform better on
> insertion, find and deletion when populated from ordered data.
>

Strange.  I was comparing randomized data (95% get, 50-50 get and set, 95%
set) when I found that treaps were quite a bit faster than red black trees.

The code I used is here:
http://stromberg.dnsalias.org/svn/python-tree-and-heap-comparison/trunk/

See also
https://en.wikipedia.org/wiki/Binary_search_tree#Performance_comparisons ,
which found that treaps were faster on average the red black trees.

[toc] | [prev] | [next] | [standalone]


#45855

Fromduncan smith <buzzard@invalid.invalid>
Date2013-05-24 00:57 +0100
Message-ID<519ead02$0$60192$862e30e2@ngroups.net>
In reply to#45826
On 23/05/13 18:44, Dan Stromberg wrote:
>
> On Thu, May 23, 2013 at 9:41 AM, duncan smith <buzzard@invalid.invalid
> <mailto:buzzard@invalid.invalid>> wrote:
>
>
>     RBT is quicker than Treap for insertion with randomized data, but
>     slower with ordered data. Randomized data will tend to minimize the
>     number of tree rotations needed to keep the RBT balanced, whilst the
>     Treap will be performing rotations to maintain the heap property in
>     an already reasonably well balanced tree. With ordered data the RBT
>     will have to work harder to keep the tree balanced, whilst the Treap
>     will be able to maintain the heap property with fewer rotations.
>
>     No surprise that find() is generally quicker for RBTs, they tend to
>     be better balanced.
>
>     Deletion is a bit more confusing. I suppose deletion from a better
>     balanced tree will tend to be quicker, but deletion from a treap
>     constructed from ordered data is (for some reason) quickest of all.
>
>     All these operations require a call to find(), and that is generally
>     going to be quicker for RBTs. Treaps tend to require fewer
>     subsequent rotations, but they have variable worth (in terms of
>     rebalancing).
>
>     Looks like RBTs are better than treaps if they are being populated
>     with randomly ordered data, but not if they are being populated with
>     ordered data. RBTs are better for use cases that are heavy on finds.
>
>     Both types of tree appear to be better balanced (on the basis of the
>     find results) if populated from ordered data. Treaps appear to
>     perform better on insertion, find and deletion when populated from
>     ordered data.
>
> Strange.  I was comparing randomized data (95% get, 50-50 get and set,
> 95% set) when I found that treaps were quite a bit faster than red black
> trees.
>
> The code I used is here:
> http://stromberg.dnsalias.org/svn/python-tree-and-heap-comparison/trunk/
>
> See also
> https://en.wikipedia.org/wiki/Binary_search_tree#Performance_comparisons
> , which found that treaps were faster on average the red black trees.
>
>

Dan,
     Faster on average, but it depends what you're averaging over. As 
far as insertion and deletions are concerned my results agree with those 
in the paper, except they have treaps performing slightly faster than 
RBTs for insertion with randomly ordered data.

Deletion in your code is slightly different to that in mine. It might 
make a difference. Also, your code doesn't use sentinels (pros and 
cons). It could be down to implementation details.

Duncan

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web