Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: SortedMap question? Date: Sun, 25 Nov 2012 09:02:29 -0500 Organization: A noiseless patient Spider Lines: 54 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 25 Nov 2012 14:02:33 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="ffb8f7085759b339c1002252b48331a4"; logging-data="30693"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+PBuAC01XpxLXeRp15dvvw" User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Thunderbird/17.0 In-Reply-To: Cancel-Lock: sha1:Rrn/Tq8r6yVsu5skbQZ7wpEImuY= Xref: csiph.com comp.lang.java.programmer:19937 On 11/24/2012 6:57 PM, Joerg Meier wrote: > On Sat, 24 Nov 2012 15:23:55 -0800, Knute Johnson wrote: > >> I have been operating under a mis-perception that a SortedMap would be >> more quickly searched than an unsorted Map. More specifically that the >> cost would come when adding an element to the SortedMap. I wrote a >> little test program to measure the amount of time it took to look for >> non-existent data in a Map and a SortedMap and it took somewhere between >> twice and three times as long to look for the data in the SortedMap. > > Considering Map is an Interface, I question your claim that you measured > its get(x) performance :p > > Assuming you used HashMap, as far as I recall, its get(x) performance is > already at O(1) unless you mess up your hashcodes. Doesn't really go lower. "HashMap is O(1)" is sort of true, but sweeps a lot of dirt under the rug. Most obviously, there's the possibility of a bad or unlucky hash distribution -- by no means a remote possibility, as demonstrates. Even if the hash distribution is acceptably uniform, there's also the matter of building the map in the first place: As the insertions accumulate, HashMap may need to expand and re-insert everything it already contains. With N=n1+n2+...+nk mappings, there could be n1+(n1+n2)+...+(n1+n2+...+nk) = k*n1+(k-1)*n2+...+1*nk insertions altogether (a good up-front estimate of N can help here). Consider also the operation cost. Searching in a TreeMap of N mappings takes about lg(N) comparisons -- but with String keys the comparisons will be of varying difficulties. Comparisons near the root will probably examine only the first few characters to determine the result; only as the search approaches the leaves will the Strings' tail characters come into play. On a successful search HashMap must iterate over the key String's entire length twice: Once to compute the hashCode(), and again to verify that a match has in fact been found (comparisons against non-matching keys in the same bucket will be quick, like those near the root of a TreeMap). Finally, O() itself hides a lot of detail -- that is, after all, its purpose. Given the choice between O(N*N) Algorithm X and O(1) Algorithm Y, which would you choose? Might you change your mind if you learned that X takes N*N nanoseconds while Y takes a constant eighteen hours? Might you change your mind yet again if you learned that N was ten million? O() alone is not enough to decide a question of speed. Still and all: HashMap makes a good "default" choice, and is "likely" to outperform TreeMap over a "reasonable range" of inputs. Turn to TreeMap when the order of the keys is significant -- for example, when you need "closest match" for an absent key. -- Eric Sosman esosman@comcast-dot-net.invalid