Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19910 > unrolled thread
| Started by | Knute Johnson <nospam@rabbitbrush.frazmtn.com> |
|---|---|
| First post | 2012-11-24 15:23 -0800 |
| Last post | 2012-11-25 14:00 +0000 |
| Articles | 18 on this page of 38 — 13 participants |
Back to article view | Back to comp.lang.java.programmer
SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:23 -0800
Re: SortedMap question? Joerg Meier <joergmmeier@arcor.de> - 2012-11-25 00:57 +0100
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:49 -0500
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:59 -0500
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:02 +0000
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:24 -0500
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 21:25 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 22:24 -0500
Re: SortedMap question? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:45 -0800
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:29 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 09:02 -0500
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:47 +0000
Re: SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-25 16:25 -0800
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:36 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 22:58 -0500
Re: SortedMap question? Lew <lewbloch@gmail.com> - 2012-11-25 16:53 -0800
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:20 +0000
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 14:44 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:18 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 17:35 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:58 -0500
Re: SortedMap question? Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 12:45 -0700
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-26 14:26 -0600
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 14:16 -0600
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:21 -0500
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 17:07 -0600
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 20:05 -0500
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 01:34 +0000
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 20:36 -0600
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-12-15 12:27 +0000
Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 01:36 +0100
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:00 -0500
Re: SortedMap question? markspace <-@.> - 2012-11-24 17:34 -0800
Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:28 +0100
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:49 +0000
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:00 +0000
Page 2 of 2 — ← Prev page 1 [2]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-11-25 16:18 -0500 |
| Message-ID | <50b28b1a$0$289$14726298@news.sunsite.dk> |
| In reply to | #19953 |
On 11/25/2012 2:44 PM, Eric Sosman wrote: > On 11/25/2012 2:20 PM, Arne Vajhøj wrote: >> On 11/25/2012 9:02 AM, Eric Sosman wrote: >>> Finally, O() itself hides a lot of detail -- that is, after all, >>> its purpose. Given the choice between O(N*N) Algorithm X and O(1) >>> Algorithm Y, which would you choose? Might you change your mind >>> if you learned that X takes N*N nanoseconds while Y takes a constant >>> eighteen hours? Might you change your mind yet again if you learned >>> that N was ten million? O() alone is not enough to decide a question >>> of speed. >> >> Obviously true. >> >> But I would consider it very rare that a worse big-O >> algorithm/data-structure is faster in a case where it has >> a measurable impact on overall performance. > > That's why Quicksort implementations never use an O(N*N) > InsertionSort for small subfiles. > > Oh -- Hey, wait ... And it is not rate that this have a measurable impact on overall performance? Arne
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Date | 2012-11-25 17:35 -0500 |
| Message-ID | <k8u6fq$i0q$1@dont-email.me> |
| In reply to | #19956 |
On 11/25/2012 4:18 PM, Arne Vajhøj wrote:
> On 11/25/2012 2:44 PM, Eric Sosman wrote:
>> On 11/25/2012 2:20 PM, Arne Vajhøj wrote:
>>> On 11/25/2012 9:02 AM, Eric Sosman wrote:
>>>> Finally, O() itself hides a lot of detail -- that is, after all,
>>>> its purpose. Given the choice between O(N*N) Algorithm X and O(1)
>>>> Algorithm Y, which would you choose? Might you change your mind
>>>> if you learned that X takes N*N nanoseconds while Y takes a constant
>>>> eighteen hours? Might you change your mind yet again if you learned
>>>> that N was ten million? O() alone is not enough to decide a question
>>>> of speed.
>>>
>>> Obviously true.
>>>
>>> But I would consider it very rare that a worse big-O
>>> algorithm/data-structure is faster in a case where it has
>>> a measurable impact on overall performance.
>>
>> That's why Quicksort implementations never use an O(N*N)
>> InsertionSort for small subfiles.
>>
>> Oh -- Hey, wait ...
>
> And it is not rate that this have a measurable impact on
> overall performance?
(I guess that's "rare" and a typo.)
Every non-student Quicksort I've ever seen has used an O(N*N)
method for short subfiles. Every single one. That's the opposite
of "rare," I'd say.
Or look at String#indexOf(String). Some fast algorithms
like Boyer-Moore have already been mentioned in this thread, yet
Java's own implementation uses a naive method that is linear at
best, super-linear if unlucky (I think it can be O(M*N), where M
and N are the lengths of the two Strings). If the naive method
were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other
better-O() algorithm, would you expect performance to improve?
The ordinary Quickselect algorithm finds a median (or other
quantile) in O(N) expected time but O(N*N) worst-case time. The
Blum-Floyd-Pratt-Rivest-Tarjan variant is guaranteed O(N) in all
cases. Which would you choose? (Hint: Wikipedia says of BFPRT
"Although this approach optimizes quite well, it is typically
outperformed in practice by the expected linear algorithm with
random pivot choices.")
If Algorithm F takes f(N) time while G takes g(N), we might
inspect the two functions and observe that O(f(N)) < O(g(N)).
This allows us to conclude that F will be faster than G *if* N
is sufficiently large -- but the big-Oh statement says nothing
about how large N must be for F to be faster. Nothing. F may
be faster than G for all N, or only for N > 100, or only for
N > 1E100. That's why we see mixed-strategy implementations so
frequently: The programmer will test the value of N and select F
or G depending on which is (believed to be) faster *for that N*.
To put it even more simply: F's greater asymptotic speed
probably comes at the expense of more complication than G, and
the extra complication takes time. For large enough N the time
is well-spent, but for small N it's very likely wasted.
--
Eric Sosman
esosman@comcast-dot-net.invalid
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-11-25 19:58 -0500 |
| Message-ID | <50b2bebd$0$283$14726298@news.sunsite.dk> |
| In reply to | #19961 |
On 11/25/2012 5:35 PM, Eric Sosman wrote: > On 11/25/2012 4:18 PM, Arne Vajhøj wrote: >> On 11/25/2012 2:44 PM, Eric Sosman wrote: >>> On 11/25/2012 2:20 PM, Arne Vajhøj wrote: >>>> On 11/25/2012 9:02 AM, Eric Sosman wrote: >>>>> Finally, O() itself hides a lot of detail -- that is, after all, >>>>> its purpose. Given the choice between O(N*N) Algorithm X and O(1) >>>>> Algorithm Y, which would you choose? Might you change your mind >>>>> if you learned that X takes N*N nanoseconds while Y takes a constant >>>>> eighteen hours? Might you change your mind yet again if you learned >>>>> that N was ten million? O() alone is not enough to decide a question >>>>> of speed. >>>> >>>> Obviously true. >>>> >>>> But I would consider it very rare that a worse big-O >>>> algorithm/data-structure is faster in a case where it has >>>> a measurable impact on overall performance. >>> >>> That's why Quicksort implementations never use an O(N*N) >>> InsertionSort for small subfiles. >>> >>> Oh -- Hey, wait ... >> >> And it is not rate that this have a measurable impact on >> overall performance? > > (I guess that's "rare" and a typo.) Yes. > Every non-student Quicksort I've ever seen has used an O(N*N) > method for short subfiles. Every single one. That's the opposite > of "rare," I'd say. Yes. But "rare occurrence" and "rare measurable impact on overall performance" is not quite the same. > Or look at String#indexOf(String). Some fast algorithms > like Boyer-Moore have already been mentioned in this thread, yet > Java's own implementation uses a naive method that is linear at > best, super-linear if unlucky (I think it can be O(M*N), where M > and N are the lengths of the two Strings). If the naive method > were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other > better-O() algorithm, would you expect performance to improve? I would not expect it to become significant worse. Of course I could be wrong. Joshua already found the matrix multiplication example. > If Algorithm F takes f(N) time while G takes g(N), we might > inspect the two functions and observe that O(f(N)) < O(g(N)). > This allows us to conclude that F will be faster than G *if* N > is sufficiently large -- but the big-Oh statement says nothing > about how large N must be for F to be faster. Nothing. F may > be faster than G for all N, or only for N > 100, or only for > N > 1E100. That's why we see mixed-strategy implementations so > frequently: The programmer will test the value of N and select F > or G depending on which is (believed to be) faster *for that N*. > > To put it even more simply: F's greater asymptotic speed > probably comes at the expense of more complication than G, and > the extra complication takes time. For large enough N the time > is well-spent, but for small N it's very likely wasted. All true. And if one knows the algorithms full characteristics and one knows ones input data well, then one can pick the best algorithm with certainty. But if one really does not know, then I would go for the best big-O, because even if it is not the fastest then most likely it is not that much worse than the best (maybe +100%), if one goes for a worse big-O and it is not the fastest then you can easily be killed performance wise (x100). Arne
[toc] | [prev] | [next] | [standalone]
| From | Jim Janney <jjanney@shell.xmission.com> |
|---|---|
| Date | 2012-11-26 12:45 -0700 |
| Message-ID | <ydnvccs87wo.fsf@shell.xmission.com> |
| In reply to | #19961 |
Eric Sosman <esosman@comcast-dot-net.invalid> writes: > Or look at String#indexOf(String). Some fast algorithms > like Boyer-Moore have already been mentioned in this thread, yet > Java's own implementation uses a naive method that is linear at > best, super-linear if unlucky (I think it can be O(M*N), where M > and N are the lengths of the two Strings). If the naive method > were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other > better-O() algorithm, would you expect performance to improve? Boyer-Moore and its descendants invest more time up front building the jump table, in return for faster searching once the table is built. That pays off if you're searching for the same pattern across a large volume of text. But for the common case where strings are relatively short, I'd expect the brute-force (not naive) approach to be the better choice. -- Jim Janney
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2012-11-26 14:26 -0600 |
| Message-ID | <k90ja3$bg0$1@dont-email.me> |
| In reply to | #19983 |
On 11/26/2012 1:45 PM, Jim Janney wrote: > Eric Sosman <esosman@comcast-dot-net.invalid> writes: > >> Or look at String#indexOf(String). Some fast algorithms >> like Boyer-Moore have already been mentioned in this thread, yet >> Java's own implementation uses a naive method that is linear at >> best, super-linear if unlucky (I think it can be O(M*N), where M >> and N are the lengths of the two Strings). If the naive method >> were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other >> better-O() algorithm, would you expect performance to improve? > > Boyer-Moore and its descendants invest more time up front building the > jump table, in return for faster searching once the table is built. > That pays off if you're searching for the same pattern across a large > volume of text. But for the common case where strings are relatively > short, I'd expect the brute-force (not naive) approach to be the better > choice. Well, it really pays off when the following conditions hold: 1. The string to search for is long. 2. The string to search for repeats a largeish prefix in itself later (for some of the later descendants) 3. You are likely to find *a lot* of partial matches of the string (so full backtracking measurably hurts). I suspect English words in English text don't satisfy these conditions very well, and I presume that (or similar scenarios) would explain a lot of String.indexOf searches. Where this would pay off would be something like trying to find DNA subsequences: you have a 1-in-4 chance of partially matching a given letter, your target string is likely to be lengthy and have lots of similarities within it, and the text being searched is extremely long. There are other penalties to advanced substring finding algorithms than just the build-the-lookup table penalty. The most important is that using the jump table has a minor microarchitectural penalty. Other penalties include increased amount of memory you need in the cache and increased surface for bugs. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2012-11-25 14:16 -0600 |
| Message-ID | <k8tubg$4oe$1@dont-email.me> |
| In reply to | #19949 |
On 11/25/2012 1:20 PM, Arne Vajhøj wrote: > On 11/25/2012 9:02 AM, Eric Sosman wrote: >> Finally, O() itself hides a lot of detail -- that is, after all, >> its purpose. Given the choice between O(N*N) Algorithm X and O(1) >> Algorithm Y, which would you choose? Might you change your mind >> if you learned that X takes N*N nanoseconds while Y takes a constant >> eighteen hours? Might you change your mind yet again if you learned >> that N was ten million? O() alone is not enough to decide a question >> of speed. > > Obviously true. > > But I would consider it very rare that a worse big-O > algorithm/data-structure is faster in a case where it has > a measurable impact on overall performance. Well, we always do matrix multiplication using that O(n^2.3727) algorithm... oh wait. And we always use merge sort or heap sort instead of all the other sorts... oh wait. Maybe radix sort if we're doing integers... oh wait. I read a paper which said you could use log space to see if an undirected graph was connected. The hidden constant is 3^16, and it looks like you'd need a much larger constant to actually make it more usable. Hidden constants in big-O actually matters a lot. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-11-25 16:21 -0500 |
| Message-ID | <50b28bd8$0$289$14726298@news.sunsite.dk> |
| In reply to | #19954 |
On 11/25/2012 3:16 PM, Joshua Cranmer wrote: > On 11/25/2012 1:20 PM, Arne Vajhøj wrote: >> On 11/25/2012 9:02 AM, Eric Sosman wrote: >>> Finally, O() itself hides a lot of detail -- that is, after all, >>> its purpose. Given the choice between O(N*N) Algorithm X and O(1) >>> Algorithm Y, which would you choose? Might you change your mind >>> if you learned that X takes N*N nanoseconds while Y takes a constant >>> eighteen hours? Might you change your mind yet again if you learned >>> that N was ten million? O() alone is not enough to decide a question >>> of speed. >> >> Obviously true. >> >> But I would consider it very rare that a worse big-O >> algorithm/data-structure is faster in a case where it has >> a measurable impact on overall performance. > > Well, we always do matrix multiplication using that O(n^2.3727) > algorithm... oh wait. OK. That is a good counter example. > And we always use merge sort or heap sort instead of all the other > sorts... oh wait. I don't see the point here. Those have the same big O not worse than quick sort. > I read a paper which said you could use log space to see if an > undirected graph was connected. The hidden constant is 3^16, and it > looks like you'd need a much larger constant to actually make it more > usable. And that contradicts it being rare? Arne
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2012-11-25 17:07 -0600 |
| Message-ID | <k8u8cj$s4d$1@dont-email.me> |
| In reply to | #19957 |
On 11/25/2012 3:21 PM, Arne Vajhøj wrote: > On 11/25/2012 3:16 PM, Joshua Cranmer wrote: >> And we always use merge sort or heap sort instead of all the other >> sorts... oh wait. > > I don't see the point here. Those have the same big O not > worse than quick sort. We tend to use quicksort as often as merge sort, if not more, and quicksort has O(N^2) worst-case time. As mentioned earlier, quicksorts--including Java's implementation--often use insertion sort for very small arrays (when you get down to 7-16 or so). Heapsort tends to be extremely rare, despite being an O(n lg n) in-place sort, since maintaining the heap tends to come with a relatively oppressive constant factor. Insertion sort, our favorite O(N^2) sort, is actually a very common sorting algorithm, especially if you realize that keeping an array sorted as you add elements to it tends to end up being an insertion sort. Radix sort, which is O(N) for fixed-size elements (like primitive integers), is quite rare in practice despite being asymptotically better than quicksort or mergesort. A bucket sort would actually be asymptotically superior in Java for bytes, shorts, and chars, too (256 or 64K integer arrays are not onerous in memory usage in modern computers). >> I read a paper which said you could use log space to see if an >> undirected graph was connected. The hidden constant is 3^16, and it >> looks like you'd need a much larger constant to actually make it more >> usable. > > And that contradicts it being rare? No, it's asymptotically better space-wise than BFS or DFS (which are both linear in space). But I highly doubt anyone has even bothered to write an implementation in any computer language (the paper itself didn't specify the whole algorithm concisely). -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-11-25 20:05 -0500 |
| Message-ID | <50b2c04f$0$283$14726298@news.sunsite.dk> |
| In reply to | #19963 |
On 11/25/2012 6:07 PM, Joshua Cranmer wrote: > On 11/25/2012 3:21 PM, Arne Vajhøj wrote: >> On 11/25/2012 3:16 PM, Joshua Cranmer wrote: >>> And we always use merge sort or heap sort instead of all the other >>> sorts... oh wait. >> >> I don't see the point here. Those have the same big O not >> worse than quick sort. > > We tend to use quicksort as often as merge sort, if not more, and > quicksort has O(N^2) worst-case time. As mentioned earlier, > quicksorts--including Java's implementation--often use insertion sort > for very small arrays (when you get down to 7-16 or so). Heapsort tends > to be extremely rare, despite being an O(n lg n) in-place sort, since > maintaining the heap tends to come with a relatively oppressive constant > factor. I thought is was just x2. Which of course is important enough to prefer quicksort, but not a disaster. And still not what I was talking about. > Radix sort, which is O(N) for fixed-size elements (like primitive > integers), is quite rare in practice despite being asymptotically better > than quicksort or mergesort. A bucket sort would actually be > asymptotically superior in Java for bytes, shorts, and chars, too (256 > or 64K integer arrays are not onerous in memory usage in modern computers). But is it due to high constant that they are not used or because memory was more limited when Java was invented? NB: a quick look in some Java sources indicate that Java actually do use counting sort for small arrays of byte/char/short. Arne
[toc] | [prev] | [next] | [standalone]
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Date | 2012-11-26 01:34 +0000 |
| Message-ID | <lOCdnbs17-3bVS_NnZ2dnUVZ7t6dnZ2d@bt.com> |
| In reply to | #19954 |
Joshua Cranmer wrote:
> I read a paper which said you could use log space to see if an
> undirected graph was connected. The hidden constant is 3^16, and it
> looks like you'd need a much larger constant to actually make it more
> usable.
Fascinating. Reference /please/ ;-)
-- chris
[toc] | [prev] | [next] | [standalone]
| From | Joshua Cranmer <Pidgeot18@verizon.invalid> |
|---|---|
| Date | 2012-11-25 20:36 -0600 |
| Message-ID | <k8ukj6$qkl$1@dont-email.me> |
| In reply to | #19972 |
On 11/25/2012 7:34 PM, Chris Uppal wrote: > Joshua Cranmer wrote: > >> I read a paper which said you could use log space to see if an >> undirected graph was connected. The hidden constant is 3^16, and it >> looks like you'd need a much larger constant to actually make it more >> usable. > > Fascinating. Reference /please/ ;-) <http://research.microsoft.com/pubs/148550/sl.pdf> appears to be one of the papers. -- Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald E. Knuth
[toc] | [prev] | [next] | [standalone]
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Date | 2012-12-15 12:27 +0000 |
| Message-ID | <aZ-dnXosX90b8VHNnZ2dnUVZ8nqdnZ2d@bt.com> |
| In reply to | #19975 |
Joshua Cranmer wrote:
> On 11/25/2012 7:34 PM, Chris Uppal wrote:
> > Fascinating. Reference /please/ ;-)
>
> <http://research.microsoft.com/pubs/148550/sl.pdf> appears to be one of
> the papers.
(meant to reply before, sorry)
Thanks for that. It's an interesting paper in its own right too (if somewhat
beyond me).
-- chris
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-11-25 01:36 +0100 |
| Message-ID | <ahd7g4Fu72fU1@mid.individual.net> |
| In reply to | #19910 |
On 11/25/2012 12:23 AM, Knute Johnson wrote: > I have been operating under a mis-perception that a SortedMap would be > more quickly searched than an unsorted Map. More specifically that the > cost would come when adding an element to the SortedMap. I wrote a > little test program to measure the amount of time it took to look for > non-existent data in a Map and a SortedMap and it took somewhere between > twice and three times as long to look for the data in the SortedMap. The TreeMap is a tree implementation and you typically have O(log n) for insertion and lookup operations. For a HashMap it's O(1) - as has been stated already. So the expectation would be for the TreeMap to be slower - at least asymptotically. > So am I know correct in thinking that the only real advantage of a > SortedMap is if you wish to iterate over the keys of the map in some order? SortedMap also supports creation of various sub sets of the map based on the key order as well as access to the min and max keys. The JavaDoc is pretty comprehensive. Kind regards robert
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-11-24 20:00 -0500 |
| Message-ID | <50b16d99$0$281$14726298@news.sunsite.dk> |
| In reply to | #19910 |
On 11/24/2012 6:23 PM, Knute Johnson wrote: > I have been operating under a mis-perception that a SortedMap would be > more quickly searched than an unsorted Map. More specifically that the > cost would come when adding an element to the SortedMap. I wrote a > little test program to measure the amount of time it took to look for > non-existent data in a Map and a SortedMap and it took somewhere between > twice and three times as long to look for the data in the SortedMap. > > So am I know correct in thinking that the only real advantage of a > SortedMap is if you wish to iterate over the keys of the map in some order? Map and SortedMap are just interfaces. HashMap get is O(1) but does not maintain order. SortedMap get is O(logn) but does keep order. So keeping order cost in performance. Which makes sense to me. Arne
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-11-24 17:34 -0800 |
| Message-ID | <k8rsid$sqq$1@dont-email.me> |
| In reply to | #19910 |
On 11/24/2012 3:23 PM, Knute Johnson wrote: > So am I know correct in thinking that the only real advantage of a > SortedMap is if you wish to iterate over the keys of the map in some order? What Robert said. Also HashMap might have to "grow" itself, resulting re-inserting every single element already in the Map. I think TreeMap has a more predictable O(ln n) insertion time for each insertion. I might be wrong about that: a balanced tree could resort itself pretty heavily too on any given insertion.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-11-25 15:28 +0100 |
| Message-ID | <aheo7uF9ot2U1@mid.individual.net> |
| In reply to | #19916 |
On 25.11.2012 02:34, markspace wrote: > On 11/24/2012 3:23 PM, Knute Johnson wrote: >> So am I know correct in thinking that the only real advantage of a >> SortedMap is if you wish to iterate over the keys of the map in some >> order? > > > What Robert said. Also HashMap might have to "grow" itself, resulting > re-inserting every single element already in the Map. I think TreeMap > has a more predictable O(ln n) insertion time for each insertion. > > I might be wrong about that: a balanced tree could resort itself pretty > heavily too on any given insertion. TreeMap uses a red black tree: http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html It has some characteristics which according to my memory avoid the extreme rebalancing overhead of other binary trees: http://en.wikipedia.org/wiki/Red_black_tree Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Date | 2012-11-25 17:49 +0000 |
| Message-ID | <k8tlmr$6jr$2@localhost.localdomain> |
| In reply to | #19941 |
On Sun, 25 Nov 2012 15:28:11 +0100, Robert Klemme wrote: > TreeMap uses a red black tree: > http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html > > It has some characteristics which according to my memory avoid the > extreme rebalancing overhead of other binary trees: > http://en.wikipedia.org/wiki/Red_black_tree > Yes. Spot on. See my note in another part of this thread. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |
[toc] | [prev] | [next] | [standalone]
| From | "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> |
|---|---|
| Date | 2012-11-25 14:00 +0000 |
| Message-ID | <JsednWiiw6YTuS_NnZ2dnUVZ8k2dnZ2d@bt.com> |
| In reply to | #19910 |
Knute Johnson wrote:
> So am I know correct in thinking that the only real advantage of a
> SortedMap is if you wish to iterate over the keys of the map in some
> order?
Adding to the other responses: also hash-based algorithms are sensitive to the
effectiveness of the hash function for the data you actually feed it. In some
cases you might need a stronger bound on performance: preferring the guaranteed
O(log N) of the tree over the /probably/ O(1) but /just maybe/ O(N) of the
hash.
-- chris
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.java.programmer
csiph-web