Path: csiph.com!usenet.pasdenom.info!news.albasani.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: SortedMap question? Date: Sun, 25 Nov 2012 22:58:53 -0500 Organization: A noiseless patient Spider Lines: 59 Message-ID: References: <50b2b971$0$294$14726298@news.sunsite.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 26 Nov 2012 03:58:54 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="ffb8f7085759b339c1002252b48331a4"; logging-data="15554"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QVXmZG+vyqeZpiwAYisQe" User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0 In-Reply-To: <50b2b971$0$294$14726298@news.sunsite.dk> Cancel-Lock: sha1:YGUSK2AIBMfaZ5t9IaGE+m87sYY= Xref: csiph.com comp.lang.java.programmer:19978 On 11/25/2012 7:36 PM, Arne Vajhøj wrote: > On 11/25/2012 7:25 PM, Knute Johnson wrote: >> On 11/25/2012 09:47, Martin Gregorie wrote: >>> Or, if the key comparison is cheap, i.e. short strings or integer >>> comparisons, and the key population is very large TreeMap can become >>> attractive: with 8 million nodes you only need 23 comparisions to find a >>> miss. Growing the tree isn't too expensive either: rebalancing a >>> balanced >>> red-black tree, which is necessary from time to time as it grows, is >>> only >>> a matter of adjusting pointers. >> >> That's what I thought, that the searching for a match would be very >> quick once the data was in the TreeMap. The test code I wrote may not >> have been any good. I created a map with Strings for the keys and >> values. I could get about 2 million entries before I started running >> into virtual memory issues slowing things down. I searched for >> non-existent elements. Using both a HashMap and a TreeMap, the TreeMap >> was significantly slower than the HashMap. I even tried a >> ConcurrentHashMap and multiple threads to do the search to see if that >> would speed things up. While that technique was better than TreeMap it >> was still slower than a plain HashMap. >> >> So for the actual case that I am working on, I have a collection of >> about 5000 elements and am using an Integer as the key. Data is rarely >> changed but often accessed. There should never be a case where the >> searched for key will not exist. What would you use, a HashMap or a >> TreeMap? > > If it is important then you should measure! Aaay-men! > But I would still expect HashMap to be faster than TreeMap. Yes. A HashMap with 5000 keys and the default 0.75 load factor will want 5000/0.75 ~= 6667 buckets, which it will round up to 8192. With normal luck the 5000 keys will land in ~3743 buckets, about two-thirds of which will hold just one key. So a typical search involves a few arithmetic operations to locate the bucket, then about 5000/3743 ~= 1.3 Integer#equals() calls. Searching a TreeMap of 5000 keys, on the other hand, will use about lg(5000) ~= 11.3 Integer#compareTo() calls. Uses of equals() and compareTo() have different costs (equals() needs to work with Objects and check their classes before comparing, but compareTo() needs to produce a three-way answer rather than a simpler boolean ...), but any differences are almost certainly swamped by the disparity in call counts. If the key values are not too widely scattered, other possibilities exist. A BitSet can answer "present or absent" queries very quickly and without taking much space, or a simple array can serve as a Map stand-in. But follow Arne's advice when making your choice: Measure It! -- Eric Sosman esosman@comcast-dot-net.invalid