Path: csiph.com!usenet.pasdenom.info!news.albasani.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From: Eric Sosman <esosman@comcast-dot-net.invalid>
Newsgroups: comp.lang.java.programmer
Subject: Re: SortedMap question?
Date: Sun, 25 Nov 2012 22:58:53 -0500
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <k8updu$f62$1@dont-email.me>
References: <k8rkud$i3o$1@dont-email.me> <q4qoyde15lpk$.13amx8nbfrj28.dlg@40tude.net> <k8t8dp$tv5$1@dont-email.me> <k8tljj$6jr$1@localhost.localdomain> <k8ucsh$o2v$1@dont-email.me> <50b2b971$0$294$14726298@news.sunsite.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 26 Nov 2012 03:58:54 +0000 (UTC)
Injection-Info: mx04.eternal-september.org; posting-host="ffb8f7085759b339c1002252b48331a4"; logging-data="15554"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/QVXmZG+vyqeZpiwAYisQe"
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0
In-Reply-To: <50b2b971$0$294$14726298@news.sunsite.dk>
Cancel-Lock: sha1:YGUSK2AIBMfaZ5t9IaGE+m87sYY=
Xref: csiph.com comp.lang.java.programmer:19978

On 11/25/2012 7:36 PM, Arne Vajhøj wrote:
> On 11/25/2012 7:25 PM, Knute Johnson wrote:
>> On 11/25/2012 09:47, Martin Gregorie wrote:
>>> Or, if the key comparison is cheap, i.e. short strings or integer
>>> comparisons, and the key population is very large TreeMap can become
>>> attractive: with 8 million nodes you only need 23 comparisions to find a
>>> miss. Growing the tree isn't too expensive either: rebalancing a
>>> balanced
>>> red-black tree, which is necessary from time to time as it grows, is
>>> only
>>> a matter of adjusting pointers.
>>
>> That's what I thought, that the searching for a match would be very
>> quick once the data was in the TreeMap.  The test code I wrote may not
>> have been any good.  I created a map with Strings for the keys and
>> values.  I could get about 2 million entries before I started running
>> into virtual memory issues slowing things down.  I searched for
>> non-existent elements.  Using both a HashMap and a TreeMap, the TreeMap
>> was significantly slower than the HashMap.  I even tried a
>> ConcurrentHashMap and multiple threads to do the search to see if that
>> would speed things up.  While that technique was better than TreeMap it
>> was still slower than a plain HashMap.
>>
>> So for the actual case that I am working on, I have a collection of
>> about 5000 elements and am using an Integer as the key.  Data is rarely
>> changed but often accessed.  There should never be a case where the
>> searched for key will not exist.  What would you use, a HashMap or a
>> TreeMap?
>
> If it is important then you should measure!

     Aaay-men!

> But I would still expect HashMap to be faster than TreeMap.

     Yes.  A HashMap with 5000 keys and the default 0.75 load
factor will want 5000/0.75 ~= 6667 buckets, which it will round
up to 8192.  With normal luck the 5000 keys will land in ~3743
buckets, about two-thirds of which will hold just one key.  So
a typical search involves a few arithmetic operations to locate
the bucket, then about 5000/3743 ~= 1.3 Integer#equals() calls.

     Searching a TreeMap of 5000 keys, on the other hand, will
use about lg(5000) ~= 11.3 Integer#compareTo() calls.  Uses of
equals() and compareTo() have different costs (equals() needs to
work with Objects and check their classes before comparing, but
compareTo() needs to produce a three-way answer rather than a
simpler boolean ...), but any differences are almost certainly
swamped by the disparity in call counts.

     If the key values are not too widely scattered, other
possibilities exist.  A BitSet can answer "present or absent"
queries very quickly and without taking much space, or a simple
array can serve as a Map stand-in.  But follow Arne's advice
when making your choice: Measure It!

-- 
Eric Sosman
esosman@comcast-dot-net.invalid