Groups > comp.lang.java.programmer > #19910 > unrolled thread

SortedMap question?

Started by	Knute Johnson <nospam@rabbitbrush.frazmtn.com>
First post	2012-11-24 15:23 -0800
Last post	2012-11-25 14:00 +0000
Articles	18 on this page of 38 — 13 participants

Back to article view | Back to comp.lang.java.programmer

  SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:23 -0800
    Re: SortedMap question? Joerg Meier <joergmmeier@arcor.de> - 2012-11-25 00:57 +0100
      Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:49 -0500
        Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:59 -0500
          Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:02 +0000
            Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:24 -0500
            Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 21:25 -0500
              Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 22:24 -0500
              Re: SortedMap question? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:45 -0800
                Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:29 -0500
      Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 09:02 -0500
        Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:47 +0000
          Re: SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-25 16:25 -0800
            Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:36 -0500
              Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 22:58 -0500
            Re: SortedMap question? Lew <lewbloch@gmail.com> - 2012-11-25 16:53 -0800
            Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:20 +0000
        Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
          Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 14:44 -0500
            Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:18 -0500
              Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 17:35 -0500
                Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:58 -0500
                Re: SortedMap question? Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 12:45 -0700
                  Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-26 14:26 -0600
          Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 14:16 -0600
            Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:21 -0500
              Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 17:07 -0600
                Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 20:05 -0500
            Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 01:34 +0000
              Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 20:36 -0600
                Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-12-15 12:27 +0000
    Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 01:36 +0100
    Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:00 -0500
    Re: SortedMap question? markspace <-@.> - 2012-11-24 17:34 -0800
      Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:28 +0100
        Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:49 +0000
    Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:00 +0000

Page 2 of 2 — ← Prev page 1 [2]

#19956

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-11-25 16:18 -0500
Message-ID	<50b28b1a$0$289$14726298@news.sunsite.dk>
In reply to	#19953

On 11/25/2012 2:44 PM, Eric Sosman wrote:
> On 11/25/2012 2:20 PM, Arne Vajhøj wrote:
>> On 11/25/2012 9:02 AM, Eric Sosman wrote:
>>>      Finally, O() itself hides a lot of detail -- that is, after all,
>>> its purpose.  Given the choice between O(N*N) Algorithm X and O(1)
>>> Algorithm Y, which would you choose?  Might you change your mind
>>> if you learned that X takes N*N nanoseconds while Y takes a constant
>>> eighteen hours?  Might you change your mind yet again if you learned
>>> that N was ten million?  O() alone is not enough to decide a question
>>> of speed.
>>
>> Obviously true.
>>
>> But I would consider it very rare that a worse big-O
>> algorithm/data-structure is faster in a case where it has
>> a measurable impact on overall performance.
>
>      That's why Quicksort implementations never use an O(N*N)
> InsertionSort for small subfiles.
>
>      Oh -- Hey, wait ...

And it is not rate that this have a measurable impact on
overall performance?

Arne

[toc] | [prev] | [next] | [standalone]

#19961

From	Eric Sosman <esosman@comcast-dot-net.invalid>
Date	2012-11-25 17:35 -0500
Message-ID	<k8u6fq$i0q$1@dont-email.me>
In reply to	#19956

On 11/25/2012 4:18 PM, Arne Vajhøj wrote:
> On 11/25/2012 2:44 PM, Eric Sosman wrote:
>> On 11/25/2012 2:20 PM, Arne Vajhøj wrote:
>>> On 11/25/2012 9:02 AM, Eric Sosman wrote:
>>>>      Finally, O() itself hides a lot of detail -- that is, after all,
>>>> its purpose.  Given the choice between O(N*N) Algorithm X and O(1)
>>>> Algorithm Y, which would you choose?  Might you change your mind
>>>> if you learned that X takes N*N nanoseconds while Y takes a constant
>>>> eighteen hours?  Might you change your mind yet again if you learned
>>>> that N was ten million?  O() alone is not enough to decide a question
>>>> of speed.
>>>
>>> Obviously true.
>>>
>>> But I would consider it very rare that a worse big-O
>>> algorithm/data-structure is faster in a case where it has
>>> a measurable impact on overall performance.
>>
>>      That's why Quicksort implementations never use an O(N*N)
>> InsertionSort for small subfiles.
>>
>>      Oh -- Hey, wait ...
>
> And it is not rate that this have a measurable impact on
> overall performance?

     (I guess that's "rare" and a typo.)

     Every non-student Quicksort I've ever seen has used an O(N*N)
method for short subfiles.  Every single one.  That's the opposite
of "rare," I'd say.

     Or look at String#indexOf(String).  Some fast algorithms
like Boyer-Moore have already been mentioned in this thread, yet
Java's own implementation uses a naive method that is linear at
best, super-linear if unlucky (I think it can be O(M*N), where M
and N are the lengths of the two Strings).  If the naive method
were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other
better-O() algorithm, would you expect performance to improve?

     The ordinary Quickselect algorithm finds a median (or other
quantile) in O(N) expected time but O(N*N) worst-case time.  The
Blum-Floyd-Pratt-Rivest-Tarjan variant is guaranteed O(N) in all
cases.  Which would you choose?  (Hint: Wikipedia says of BFPRT
"Although this approach optimizes quite well, it is typically
outperformed in practice by the expected linear algorithm with
random pivot choices.")

     If Algorithm F takes f(N) time while G takes g(N), we might
inspect the two functions and observe that O(f(N)) < O(g(N)).
This allows us to conclude that F will be faster than G *if* N
is sufficiently large -- but the big-Oh statement says nothing
about how large N must be for F to be faster.  Nothing.  F may
be faster than G for all N, or only for N > 100, or only for
N > 1E100.  That's why we see mixed-strategy implementations so
frequently: The programmer will test the value of N and select F
or G depending on which is (believed to be) faster *for that N*.

     To put it even more simply: F's greater asymptotic speed
probably comes at the expense of more complication than G, and
the extra complication takes time.  For large enough N the time
is well-spent, but for small N it's very likely wasted.

-- 
Eric Sosman
esosman@comcast-dot-net.invalid

[toc] | [prev] | [next] | [standalone]

#19968

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-11-25 19:58 -0500
Message-ID	<50b2bebd$0$283$14726298@news.sunsite.dk>
In reply to	#19961

On 11/25/2012 5:35 PM, Eric Sosman wrote:
> On 11/25/2012 4:18 PM, Arne Vajhøj wrote:
>> On 11/25/2012 2:44 PM, Eric Sosman wrote:
>>> On 11/25/2012 2:20 PM, Arne Vajhøj wrote:
>>>> On 11/25/2012 9:02 AM, Eric Sosman wrote:
>>>>>      Finally, O() itself hides a lot of detail -- that is, after all,
>>>>> its purpose.  Given the choice between O(N*N) Algorithm X and O(1)
>>>>> Algorithm Y, which would you choose?  Might you change your mind
>>>>> if you learned that X takes N*N nanoseconds while Y takes a constant
>>>>> eighteen hours?  Might you change your mind yet again if you learned
>>>>> that N was ten million?  O() alone is not enough to decide a question
>>>>> of speed.
>>>>
>>>> Obviously true.
>>>>
>>>> But I would consider it very rare that a worse big-O
>>>> algorithm/data-structure is faster in a case where it has
>>>> a measurable impact on overall performance.
>>>
>>>      That's why Quicksort implementations never use an O(N*N)
>>> InsertionSort for small subfiles.
>>>
>>>      Oh -- Hey, wait ...
>>
>> And it is not rate that this have a measurable impact on
>> overall performance?
>
>      (I guess that's "rare" and a typo.)

Yes.

>      Every non-student Quicksort I've ever seen has used an O(N*N)
> method for short subfiles.  Every single one.  That's the opposite
> of "rare," I'd say.

Yes.

But "rare occurrence" and "rare measurable impact on overall
performance" is not quite the same.

>      Or look at String#indexOf(String).  Some fast algorithms
> like Boyer-Moore have already been mentioned in this thread, yet
> Java's own implementation uses a naive method that is linear at
> best, super-linear if unlucky (I think it can be O(M*N), where M
> and N are the lengths of the two Strings).  If the naive method
> were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other
> better-O() algorithm, would you expect performance to improve?

I would not expect it to become significant worse.

Of course I could be wrong.

Joshua already found the matrix multiplication example.

>      If Algorithm F takes f(N) time while G takes g(N), we might
> inspect the two functions and observe that O(f(N)) < O(g(N)).
> This allows us to conclude that F will be faster than G *if* N
> is sufficiently large -- but the big-Oh statement says nothing
> about how large N must be for F to be faster.  Nothing.  F may
> be faster than G for all N, or only for N > 100, or only for
> N > 1E100.  That's why we see mixed-strategy implementations so
> frequently: The programmer will test the value of N and select F
> or G depending on which is (believed to be) faster *for that N*.
>
>      To put it even more simply: F's greater asymptotic speed
> probably comes at the expense of more complication than G, and
> the extra complication takes time.  For large enough N the time
> is well-spent, but for small N it's very likely wasted.

All true.

And if one knows the algorithms full characteristics and one
knows ones input data well, then one can pick the best algorithm
with certainty.

But if one really does not know, then I would go for the best
big-O, because even if it is not the fastest then most likely
it is not that much worse than the best (maybe +100%), if one
goes for a worse big-O and it is not the fastest then you
can easily be killed performance wise (x100).

Arne

[toc] | [prev] | [next] | [standalone]

#19983

From	Jim Janney <jjanney@shell.xmission.com>
Date	2012-11-26 12:45 -0700
Message-ID	<ydnvccs87wo.fsf@shell.xmission.com>
In reply to	#19961

Eric Sosman <esosman@comcast-dot-net.invalid> writes:

>     Or look at String#indexOf(String).  Some fast algorithms
> like Boyer-Moore have already been mentioned in this thread, yet
> Java's own implementation uses a naive method that is linear at
> best, super-linear if unlucky (I think it can be O(M*N), where M
> and N are the lengths of the two Strings).  If the naive method
> were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other
> better-O() algorithm, would you expect performance to improve?

Boyer-Moore and its descendants invest more time up front building the
jump table, in return for faster searching once the table is built.
That pays off if you're searching for the same pattern across a large
volume of text.  But for the common case where strings are relatively
short, I'd expect the brute-force (not naive) approach to be the better
choice.

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]

#19984

From	Joshua Cranmer <Pidgeot18@verizon.invalid>
Date	2012-11-26 14:26 -0600
Message-ID	<k90ja3$bg0$1@dont-email.me>
In reply to	#19983

On 11/26/2012 1:45 PM, Jim Janney wrote:
> Eric Sosman <esosman@comcast-dot-net.invalid> writes:
>
>>      Or look at String#indexOf(String).  Some fast algorithms
>> like Boyer-Moore have already been mentioned in this thread, yet
>> Java's own implementation uses a naive method that is linear at
>> best, super-linear if unlucky (I think it can be O(M*N), where M
>> and N are the lengths of the two Strings).  If the naive method
>> were replaced by Boyer-Moore or Knuth-Morris-Pratt or some other
>> better-O() algorithm, would you expect performance to improve?
>
> Boyer-Moore and its descendants invest more time up front building the
> jump table, in return for faster searching once the table is built.
> That pays off if you're searching for the same pattern across a large
> volume of text.  But for the common case where strings are relatively
> short, I'd expect the brute-force (not naive) approach to be the better
> choice.

Well, it really pays off when the following conditions hold:
1. The string to search for is long.
2. The string to search for repeats a largeish prefix in itself later 
(for some of the later descendants)
3. You are likely to find *a lot* of partial matches of the string (so 
full backtracking measurably hurts).

I suspect English words in English text don't satisfy these conditions 
very well, and I presume that (or similar scenarios) would explain a lot 
of String.indexOf searches.

Where this would pay off would be something like trying to find DNA 
subsequences: you have a 1-in-4 chance of partially matching a given 
letter, your target string is likely to be lengthy and have lots of 
similarities within it, and the text being searched is extremely long.

There are other penalties to advanced substring finding algorithms than 
just the build-the-lookup table penalty. The most important is that 
using the jump table has a minor microarchitectural penalty. Other 
penalties include increased amount of memory you need in the cache and 
increased surface for bugs.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]

#19954

From	Joshua Cranmer <Pidgeot18@verizon.invalid>
Date	2012-11-25 14:16 -0600
Message-ID	<k8tubg$4oe$1@dont-email.me>
In reply to	#19949

On 11/25/2012 1:20 PM, Arne Vajhøj wrote:
> On 11/25/2012 9:02 AM, Eric Sosman wrote:
>>      Finally, O() itself hides a lot of detail -- that is, after all,
>> its purpose.  Given the choice between O(N*N) Algorithm X and O(1)
>> Algorithm Y, which would you choose?  Might you change your mind
>> if you learned that X takes N*N nanoseconds while Y takes a constant
>> eighteen hours?  Might you change your mind yet again if you learned
>> that N was ten million?  O() alone is not enough to decide a question
>> of speed.
>
> Obviously true.
>
> But I would consider it very rare that a worse big-O
> algorithm/data-structure is faster in a case where it has
> a measurable impact on overall performance.

Well, we always do matrix multiplication using that O(n^2.3727) 
algorithm... oh wait.

And we always use merge sort or heap sort instead of all the other 
sorts... oh wait. Maybe radix sort if we're doing integers... oh wait.

I read a paper which said you could use log space to see if an 
undirected graph was connected. The hidden constant is 3^16, and it 
looks like you'd need a much larger constant to actually make it more 
usable.

Hidden constants in big-O actually matters a lot.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]

#19957

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-11-25 16:21 -0500
Message-ID	<50b28bd8$0$289$14726298@news.sunsite.dk>
In reply to	#19954

On 11/25/2012 3:16 PM, Joshua Cranmer wrote:
> On 11/25/2012 1:20 PM, Arne Vajhøj wrote:
>> On 11/25/2012 9:02 AM, Eric Sosman wrote:
>>>      Finally, O() itself hides a lot of detail -- that is, after all,
>>> its purpose.  Given the choice between O(N*N) Algorithm X and O(1)
>>> Algorithm Y, which would you choose?  Might you change your mind
>>> if you learned that X takes N*N nanoseconds while Y takes a constant
>>> eighteen hours?  Might you change your mind yet again if you learned
>>> that N was ten million?  O() alone is not enough to decide a question
>>> of speed.
>>
>> Obviously true.
>>
>> But I would consider it very rare that a worse big-O
>> algorithm/data-structure is faster in a case where it has
>> a measurable impact on overall performance.
>
> Well, we always do matrix multiplication using that O(n^2.3727)
> algorithm... oh wait.

OK. That is a good counter example.

> And we always use merge sort or heap sort instead of all the other
> sorts... oh wait.

I don't see the point here. Those have the same big O not
worse than quick sort.

> I read a paper which said you could use log space to see if an
> undirected graph was connected. The hidden constant is 3^16, and it
> looks like you'd need a much larger constant to actually make it more
> usable.

And that contradicts it being rare?

Arne

[toc] | [prev] | [next] | [standalone]

#19963

From	Joshua Cranmer <Pidgeot18@verizon.invalid>
Date	2012-11-25 17:07 -0600
Message-ID	<k8u8cj$s4d$1@dont-email.me>
In reply to	#19957

On 11/25/2012 3:21 PM, Arne Vajhøj wrote:
> On 11/25/2012 3:16 PM, Joshua Cranmer wrote:
>> And we always use merge sort or heap sort instead of all the other
>> sorts... oh wait.
>
> I don't see the point here. Those have the same big O not
> worse than quick sort.

We tend to use quicksort as often as merge sort, if not more, and 
quicksort has O(N^2) worst-case time. As mentioned earlier, 
quicksorts--including Java's implementation--often use insertion sort 
for very small arrays (when you get down to 7-16 or so). Heapsort tends 
to be extremely rare, despite being an O(n lg n) in-place sort, since 
maintaining the heap tends to come with a relatively oppressive constant 
factor.

Insertion sort, our favorite O(N^2) sort, is actually a very common 
sorting algorithm, especially if you realize that keeping an array 
sorted as you add elements to it tends to end up being an insertion sort.

Radix sort, which is O(N) for fixed-size elements (like primitive 
integers), is quite rare in practice despite being asymptotically better 
than quicksort or mergesort. A bucket sort would actually be 
asymptotically superior in Java for bytes, shorts, and chars, too (256 
or 64K integer arrays are not onerous in memory usage in modern computers).

>> I read a paper which said you could use log space to see if an
>> undirected graph was connected. The hidden constant is 3^16, and it
>> looks like you'd need a much larger constant to actually make it more
>> usable.
>
> And that contradicts it being rare?

No, it's asymptotically better space-wise than BFS or DFS (which are 
both linear in space). But I highly doubt anyone has even bothered to 
write an implementation in any computer language (the paper itself 
didn't specify the whole algorithm concisely).

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]

#19970

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-11-25 20:05 -0500
Message-ID	<50b2c04f$0$283$14726298@news.sunsite.dk>
In reply to	#19963

On 11/25/2012 6:07 PM, Joshua Cranmer wrote:
> On 11/25/2012 3:21 PM, Arne Vajhøj wrote:
>> On 11/25/2012 3:16 PM, Joshua Cranmer wrote:
>>> And we always use merge sort or heap sort instead of all the other
>>> sorts... oh wait.
>>
>> I don't see the point here. Those have the same big O not
>> worse than quick sort.
>
> We tend to use quicksort as often as merge sort, if not more, and
> quicksort has O(N^2) worst-case time. As mentioned earlier,
> quicksorts--including Java's implementation--often use insertion sort
> for very small arrays (when you get down to 7-16 or so). Heapsort tends
> to be extremely rare, despite being an O(n lg n) in-place sort, since
> maintaining the heap tends to come with a relatively oppressive constant
> factor.

I thought is was just x2.

Which of course is important enough to prefer quicksort, but
not a disaster.

And still not what I was talking about.

> Radix sort, which is O(N) for fixed-size elements (like primitive
> integers), is quite rare in practice despite being asymptotically better
> than quicksort or mergesort. A bucket sort would actually be
> asymptotically superior in Java for bytes, shorts, and chars, too (256
> or 64K integer arrays are not onerous in memory usage in modern computers).

But is it due to high constant that they are not used or because
memory was more limited when Java was invented?

NB: a quick look in some Java sources indicate that Java actually do
use counting sort for small arrays of byte/char/short.

Arne

[toc] | [prev] | [next] | [standalone]

#19972

From	"Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Date	2012-11-26 01:34 +0000
Message-ID	<lOCdnbs17-3bVS_NnZ2dnUVZ7t6dnZ2d@bt.com>
In reply to	#19954

Joshua Cranmer wrote:

> I read a paper which said you could use log space to see if an
> undirected graph was connected. The hidden constant is 3^16, and it
> looks like you'd need a much larger constant to actually make it more
> usable.

Fascinating.  Reference /please/ ;-)

    -- chris

[toc] | [prev] | [next] | [standalone]

#19975

From	Joshua Cranmer <Pidgeot18@verizon.invalid>
Date	2012-11-25 20:36 -0600
Message-ID	<k8ukj6$qkl$1@dont-email.me>
In reply to	#19972

On 11/25/2012 7:34 PM, Chris Uppal wrote:
> Joshua Cranmer wrote:
>
>> I read a paper which said you could use log space to see if an
>> undirected graph was connected. The hidden constant is 3^16, and it
>> looks like you'd need a much larger constant to actually make it more
>> usable.
>
> Fascinating.  Reference /please/ ;-)

<http://research.microsoft.com/pubs/148550/sl.pdf> appears to be one of 
the papers.


-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]

#20358

From	"Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Date	2012-12-15 12:27 +0000
Message-ID	<aZ-dnXosX90b8VHNnZ2dnUVZ8nqdnZ2d@bt.com>
In reply to	#19975

Joshua Cranmer wrote:
> On 11/25/2012 7:34 PM, Chris Uppal wrote:
> > Fascinating.  Reference /please/ ;-)
>
> <http://research.microsoft.com/pubs/148550/sl.pdf> appears to be one of
> the papers.

(meant to reply before, sorry)

Thanks for that.  It's an interesting paper in its own right too (if somewhat 
beyond me).

    -- chris

[toc] | [prev] | [next] | [standalone]

#19912

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-11-25 01:36 +0100
Message-ID	<ahd7g4Fu72fU1@mid.individual.net>
In reply to	#19910

On 11/25/2012 12:23 AM, Knute Johnson wrote:
> I have been operating under a mis-perception that a SortedMap would be
> more quickly searched than an unsorted Map.  More specifically that the
> cost would come when adding an element to the SortedMap.  I wrote a
> little test program to measure the amount of time it took to look for
> non-existent data in a Map and a SortedMap and it took somewhere between
> twice and three times as long to look for the data in the SortedMap.

The TreeMap is a tree implementation and you typically have O(log n) for 
insertion and lookup operations.  For a HashMap it's O(1) - as has been 
stated already.  So the expectation would be for the TreeMap to be 
slower - at least asymptotically.

> So am I know correct in thinking that the only real advantage of a
> SortedMap is if you wish to iterate over the keys of the map in some order?

SortedMap also supports creation of various sub sets of the map based on 
the key order as well as access to the min and max keys.  The JavaDoc is 
pretty comprehensive.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]

#19913

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-11-24 20:00 -0500
Message-ID	<50b16d99$0$281$14726298@news.sunsite.dk>
In reply to	#19910

On 11/24/2012 6:23 PM, Knute Johnson wrote:
> I have been operating under a mis-perception that a SortedMap would be
> more quickly searched than an unsorted Map.  More specifically that the
> cost would come when adding an element to the SortedMap.  I wrote a
> little test program to measure the amount of time it took to look for
> non-existent data in a Map and a SortedMap and it took somewhere between
> twice and three times as long to look for the data in the SortedMap.
>
> So am I know correct in thinking that the only real advantage of a
> SortedMap is if you wish to iterate over the keys of the map in some order?

Map and SortedMap are just interfaces.

HashMap get is O(1) but does not maintain order.

SortedMap get is O(logn) but does keep order.

So keeping order cost in performance.

Which makes sense to me.

Arne

[toc] | [prev] | [next] | [standalone]

#19916

From	markspace <-@.>
Date	2012-11-24 17:34 -0800
Message-ID	<k8rsid$sqq$1@dont-email.me>
In reply to	#19910

On 11/24/2012 3:23 PM, Knute Johnson wrote:
> So am I know correct in thinking that the only real advantage of a
> SortedMap is if you wish to iterate over the keys of the map in some order?

What Robert said.  Also HashMap might have to "grow" itself, resulting 
re-inserting every single element already in the Map.  I think TreeMap 
has a more predictable O(ln n) insertion time for each insertion.

I might be wrong about that: a balanced tree could resort itself pretty 
heavily too on any given insertion.

[toc] | [prev] | [next] | [standalone]

#19941

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-11-25 15:28 +0100
Message-ID	<aheo7uF9ot2U1@mid.individual.net>
In reply to	#19916

On 25.11.2012 02:34, markspace wrote:
> On 11/24/2012 3:23 PM, Knute Johnson wrote:
>> So am I know correct in thinking that the only real advantage of a
>> SortedMap is if you wish to iterate over the keys of the map in some
>> order?
>
>
> What Robert said.  Also HashMap might have to "grow" itself, resulting
> re-inserting every single element already in the Map.  I think TreeMap
> has a more predictable O(ln n) insertion time for each insertion.
>
> I might be wrong about that: a balanced tree could resort itself pretty
> heavily too on any given insertion.

TreeMap uses a red black tree:
http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html

It has some characteristics which according to my memory avoid the 
extreme rebalancing overhead of other binary trees:
http://en.wikipedia.org/wiki/Red_black_tree

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#19945

From	Martin Gregorie <martin@address-in-sig.invalid>
Date	2012-11-25 17:49 +0000
Message-ID	<k8tlmr$6jr$2@localhost.localdomain>
In reply to	#19941

On Sun, 25 Nov 2012 15:28:11 +0100, Robert Klemme wrote:

> TreeMap uses a red black tree:
> http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html
> 
> It has some characteristics which according to my memory avoid the
> extreme rebalancing overhead of other binary trees:
> http://en.wikipedia.org/wiki/Red_black_tree
>
Yes. Spot on. See my note in another part of this thread.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]

#19936

From	"Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org>
Date	2012-11-25 14:00 +0000
Message-ID	<JsednWiiw6YTuS_NnZ2dnUVZ8k2dnZ2d@bt.com>
In reply to	#19910

Knute Johnson wrote:

> So am I know correct in thinking that the only real advantage of a
> SortedMap is if you wish to iterate over the keys of the map in some
> order?

Adding to the other responses: also hash-based algorithms are sensitive to the 
effectiveness of the hash function for the data you actually feed it.  In some 
cases you might need a stronger bound on performance: preferring the guaranteed 
O(log N) of the tree over the /probably/ O(1) but /just maybe/ O(N) of the 
hash.

    -- chris

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

SortedMap question?

Contents

#19956

#19961

#19968

#19983

#19984

#19954

#19957

#19963

#19970

#19972

#19975

#20358

#19912

#19913

#19916

#19941

#19945

#19936