Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #10365 > unrolled thread

Verifying a list is alphabetized

Started bylaredotornado <laredotornado@zipmail.com>
First post2011-11-30 07:52 -0800
Last post2011-12-02 19:52 -0500
Articles 20 on this page of 32 — 13 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Verifying a list is alphabetized laredotornado <laredotornado@zipmail.com> - 2011-11-30 07:52 -0800
    Re: Verifying a list is alphabetized Knute Johnson <nospam@knutejohnson.com> - 2011-11-30 08:05 -0800
      Re: Verifying a list is alphabetized Lew <lewbloch@gmail.com> - 2011-11-30 11:18 -0800
      Re: Verifying a list is alphabetized Arne Vajhøj <arne@vajhoej.dk> - 2011-12-02 19:54 -0500
        Re: Verifying a list is alphabetized Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-12-03 07:44 -0500
          Re: Verifying a list is alphabetized Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-12-03 09:02 -0500
          Re: Verifying a list is alphabetized Roedy Green <see_website@mindprod.com.invalid> - 2011-12-03 20:05 -0800
            Re: Verifying a list is alphabetized Roedy Green <see_website@mindprod.com.invalid> - 2011-12-03 22:13 -0800
              Re: Verifying a list is alphabetized Tom Anderson <twic@urchin.earth.li> - 2011-12-04 21:47 +0000
            Re: Verifying a list is alphabetized Martin Gregorie <martin@address-in-sig.invalid> - 2011-12-04 12:42 +0000
              Re: Verifying a list is alphabetized Patricia Shanahan <pats@acm.org> - 2011-12-04 12:18 -0800
                Re: Verifying a list is alphabetized Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-12-04 13:17 -0800
                  Re: Verifying a list is alphabetized Patricia Shanahan <pats@acm.org> - 2011-12-04 15:32 -0800
                    Re: Verifying a list is alphabetized markspace <-@.> - 2011-12-04 15:59 -0800
                      Re: Verifying a list is alphabetized Patricia Shanahan <pats@acm.org> - 2011-12-04 16:14 -0800
                        Re: Verifying a list is alphabetized markspace <-@.> - 2011-12-04 17:12 -0800
                        Re: Verifying a list is alphabetized Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-12-04 21:17 -0500
              Re: Verifying a list is alphabetized Tom Anderson <twic@urchin.earth.li> - 2011-12-04 22:11 +0000
                Re: Verifying a list is alphabetized Martin Gregorie <martin@address-in-sig.invalid> - 2011-12-04 23:41 +0000
                Re: Verifying a list is alphabetized Roedy Green <see_website@mindprod.com.invalid> - 2011-12-12 01:39 -0800
            Re: Verifying a list is alphabetized Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-12-04 08:47 -0500
            Re: Verifying a list is alphabetized Gene Wirchenko <genew@ocis.net> - 2011-12-04 21:34 -0800
    Re: Verifying a list is alphabetized Patricia Shanahan <pats@acm.org> - 2011-11-30 12:52 -0800
      Re: Verifying a list is alphabetized laredotornado <laredotornado@zipmail.com> - 2011-12-01 06:43 -0800
    Re: Verifying a list is alphabetized Joshua Maurice <joshuamaurice@gmail.com> - 2011-11-30 13:16 -0800
    Re: Verifying a list is alphabetized Roedy Green <see_website@mindprod.com.invalid> - 2011-12-02 01:43 -0800
      Re: Verifying a list is alphabetized Patricia Shanahan <pats@acm.org> - 2011-12-02 05:58 -0800
        Re: Verifying a list is alphabetized Arne Vajhøj <arne@vajhoej.dk> - 2011-12-02 19:55 -0500
        Re: Verifying a list is alphabetized Roedy Green <see_website@mindprod.com.invalid> - 2011-12-03 02:54 -0800
          Re: Verifying a list is alphabetized Patricia Shanahan <pats@acm.org> - 2011-12-03 06:17 -0800
      Re: Verifying a list is alphabetized Roedy Green <see_website@mindprod.com.invalid> - 2011-12-03 03:10 -0800
    Re: Verifying a list is alphabetized Arne Vajhøj <arne@vajhoej.dk> - 2011-12-02 19:52 -0500

Page 1 of 2  [1] 2  Next page →


#10365 — Verifying a list is alphabetized

Fromlaredotornado <laredotornado@zipmail.com>
Date2011-11-30 07:52 -0800
SubjectVerifying a list is alphabetized
Message-ID<722147ec-ab43-4d7c-8b41-32f8705ee7db@i8g2000vbh.googlegroups.com>
Hi,

I'm using Java 1.6.  Given a java.util.List of Strings, what is the
quickest way to verify that the list is sorted in ascending order?

Thanks, - Dave

[toc] | [next] | [standalone]


#10366

FromKnute Johnson <nospam@knutejohnson.com>
Date2011-11-30 08:05 -0800
Message-ID<jb5k7n$g4u$1@dont-email.me>
In reply to#10365
On 11/30/2011 7:52 AM, laredotornado wrote:
> Hi,
>
> I'm using Java 1.6.  Given a java.util.List of Strings, what is the
> quickest way to verify that the list is sorted in ascending order?
>
> Thanks, - Dave

If it could possibly unsorted, I'd just sort it.  How big is this list 
anyway?

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]


#10371

FromLew <lewbloch@gmail.com>
Date2011-11-30 11:18 -0800
Message-ID<20791905.1171.1322680736966.JavaMail.geo-discussion-forums@pruu23>
In reply to#10366
Knute Johnson wrote:
> laredotornado wrote:
>> I'm using Java 1.6.  Given a java.util.List of Strings, what is the
>> quickest way to verify that the list is sorted in ascending order?
> 
> If it could possibly unsorted, I'd just sort it.  How big is this list 
> anyway?

<http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#sort(java.util.List)>

Know thine API!

No reason not to be familiar with the core java.util.** types.  None.

-- 
Lew

[toc] | [prev] | [next] | [standalone]


#10448

FromArne Vajhøj <arne@vajhoej.dk>
Date2011-12-02 19:54 -0500
Message-ID<4ed9732b$0$294$14726298@news.sunsite.dk>
In reply to#10366
On 11/30/2011 11:05 AM, Knute Johnson wrote:
> On 11/30/2011 7:52 AM, laredotornado wrote:
>> I'm using Java 1.6. Given a java.util.List of Strings, what is the
>> quickest way to verify that the list is sorted in ascending order?
>>
>> Thanks, - Dave
>
> If it could possibly unsorted, I'd just sort it. How big is this list
> anyway?

If it is small it does not matter.

But I would not recommend an O(nlogn) solution for a O(n)
problem unless there is a note in 72 pt red blinking
on top of it.

Arne

[toc] | [prev] | [next] | [standalone]


#10465

FromEric Sosman <esosman@ieee-dot-org.invalid>
Date2011-12-03 07:44 -0500
Message-ID<jbd5k5$86c$1@dont-email.me>
In reply to#10448
On 12/2/2011 7:54 PM, Arne Vajhøj wrote:
> On 11/30/2011 11:05 AM, Knute Johnson wrote:
>> On 11/30/2011 7:52 AM, laredotornado wrote:
>>> I'm using Java 1.6. Given a java.util.List of Strings, what is the
>>> quickest way to verify that the list is sorted in ascending order?
>>>
>>> Thanks, - Dave
>>
>> If it could possibly unsorted, I'd just sort it. How big is this list
>> anyway?
>
> If it is small it does not matter.
>
> But I would not recommend an O(nlogn) solution for a O(n)
> problem unless there is a note in 72 pt red blinking
> on top of it.

     What is the big-Oh complexity of Collections.sort(List<String>)
on an already-sorted list?

     (Note that Oracle -- unwisely, IMHO -- specifies the sorting
algorithm as part of the method's contract, although "a modified
mergesort..." leaves a lot of wiggle room.  But anyhow: at least
one interpretation of "a modified mergesort" would be a natural
merge, which would use N-1 comparisons to discover that the input
consisted of just one run and would then leave that run alone.)

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]


#10466

FromEric Sosman <esosman@ieee-dot-org.invalid>
Date2011-12-03 09:02 -0500
Message-ID<jbda6g$nh$1@dont-email.me>
In reply to#10465
On 12/3/2011 7:44 AM, Eric Sosman wrote:
> On 12/2/2011 7:54 PM, Arne Vajhøj wrote:
>> On 11/30/2011 11:05 AM, Knute Johnson wrote:
>>> On 11/30/2011 7:52 AM, laredotornado wrote:
>>>> I'm using Java 1.6. Given a java.util.List of Strings, what is the
>>>> quickest way to verify that the list is sorted in ascending order?
>>>>
>>>> Thanks, - Dave
>>>
>>> If it could possibly unsorted, I'd just sort it. How big is this list
>>> anyway?
>>
>> If it is small it does not matter.
>>
>> But I would not recommend an O(nlogn) solution for a O(n)
>> problem unless there is a note in 72 pt red blinking
>> on top of it.
>
> What is the big-Oh complexity of Collections.sort(List<String>)
> on an already-sorted list?
>[...]

     Just for fun, I tried it out.  For a twenty-element ArrayList,
Vector, Stack, or LinkedList, Collections.sort() uses

     19 comparisons if the List is in strictly increasing order,

     19 comparisons if the List is in non-decreasing order but
        with equal elements,

     19 comparisons if the List is in strictly decreasing order,

     70 comparisons (!) if the List is in non-increasing order but
       with equal elements.  (Likely data-dependent.)

     Conclusion: I think using the color red for the 72-point
blinking note is overkill.  Try white, on a white background. :)

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]


#10479

FromRoedy Green <see_website@mindprod.com.invalid>
Date2011-12-03 20:05 -0800
Message-ID<b5sld7ltif6oi2tke3q2fkr5vnr2cc6nkc@4ax.com>
In reply to#10465
On Sat, 03 Dec 2011 07:44:48 -0500, Eric Sosman
<esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
someone who said :

>     What is the big-Oh complexity of Collections.sort(List<String>)
>on an already-sorted list?

IIRC QuickSort goes a bit bananas if the file is already sorted.  ISTR
some sort doing some random swaps at the start to make sure that does
not happen.  It does do a check first if things are in order.  

Most of the time when I sort things, they are already sorted. Nothing
has changed since the last sort, or when I manually edit files I
naturally keep them in order.

You also check inorder in an assert, not as a prelude to sorting. If
it is not in order, something went wrong.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 

[toc] | [prev] | [next] | [standalone]


#10481

FromRoedy Green <see_website@mindprod.com.invalid>
Date2011-12-03 22:13 -0800
Message-ID<fr3md71662vdtc9e5nuecvbk3fpe4meiuu@4ax.com>
In reply to#10479
On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

> It does do a check first if things are in order.  
 No. Sun's sort does NOT check first.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 

[toc] | [prev] | [next] | [standalone]


#10498

FromTom Anderson <twic@urchin.earth.li>
Date2011-12-04 21:47 +0000
Message-ID<alpine.DEB.2.00.1112042134470.3416@urchin.earth.li>
In reply to#10481
On Sat, 3 Dec 2011, Roedy Green wrote:

> On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green
> <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
> someone who said :
>
>> It does do a check first if things are in order.
>
> No. Sun's sort does NOT check first.

Yes, it does. The scripture:

http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html#sort%28java.lang.Object[]%29

States:

  The sorting algorithm is a modified mergesort (in which the merge is
  omitted if the highest element in the low sublist is less than the lowest
  element in the high sublist).

That bit about the omission of the merge means that if the input is 
sorted, then there will be no actual attempt at sorting; the merges will 
all be skipped. That behaviour is not only the way things are, it's 
guaranteed by the docs.

If you take a look at the actual implementation, there is a method called 
mergeSort on line 1146 of Arrays.java which does the dirty work:

http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/tip/src/share/classes/java/util/Arrays.java

It is a classical recursive mergesort. For sublists smaller than seven 
elements, it does an insertion sort; insertion sort is itself adaptive, in 
that on a sorted input, it will make a single pass through the list, 
noticing that the elements are in order, and not moving any of them. O(n) 
with a minimal constant - or, if you like, O(7), with O(n/7) calls, which 
is O(n) over the whole list. For larger sublists, there is the 
aforementioned already-ordered check; that's O(1) per check, and there 
are, i think O(log n) checks made during the 'merge'. So, O(n) overall, 
with most of the time spent in checks in the insertion sort.

tom

-- 
So the moon is approximately 24 toasters from Scunthorpe.

[toc] | [prev] | [next] | [standalone]


#10486

FromMartin Gregorie <martin@address-in-sig.invalid>
Date2011-12-04 12:42 +0000
Message-ID<jbfpru$ksi$2@localhost.localdomain>
In reply to#10479
On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green wrote:

> On Sat, 03 Dec 2011 07:44:48 -0500, Eric Sosman
> <esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
> someone who said :
> 
>>     What is the big-Oh complexity of Collections.sort(List<String>)
>>on an already-sorted list?
> 
> IIRC QuickSort goes a bit bananas if the file is already sorted.  ISTR
> some sort doing some random swaps at the start to make sure that does
> not happen.  It does do a check first if things are in order.
>
I'm a little wary of quicksort's claim of superior speed in every 
situation.

Some time back (and writing PL/9 on a 6809 - PL/9 is a close relative of 
PL/M) I decided to educate myself by implementing a number of sorts to 
compare code complexity and speed. These were the Bubble, Selection, 
Shell, and Quicksort algorithms. As I was writing in PL/9, I simply threw 
all the records into a chunk of memory and stored pointers to them in an 
array. Swapping records was very fast because the records themselves 
never moved: I just swapped pointers in the array and, when the sort 
finished, wrote out the records by iterating along the pointer array. 

With a single text key the various sorts performed as expected: slowest 
to fastest was the order that I listed earlier.

However, then I got cute and replaced the simple comparison function, 
which I used in all the algorithms, with a more complex one, capable of 
multi-key sorting as well as natural, lexicographic, caseless, and 
numeric ordering for each key. Much to my surprise, I discovered that 
using this comparator, the quicksort was slower than the shell sort, 
giving a performance ordering (slow to fast) of Bubble, Selection, 
Quicksort and Shell.  

Evidently there's an unstated assumption in Quicksort that record swaps 
will always be slower and/or less costly than comparisons, so it was 
optimised to minimise swaps and not to worry about the cost of 
comparisons. This feature was emphasised in my test series by the 
extremely low cost of record swaps. 

It strikes me that Java implementation may also show this behavior, 
especially if the objects to be sorted were stored in an Object array
rather than a more complex structure such as a Collection, and the 
comparator required by this object type is non-trivial.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]


#10493

FromPatricia Shanahan <pats@acm.org>
Date2011-12-04 12:18 -0800
Message-ID<WPidnXX0kcoMSEbTnZ2dnUVZ_q6dnZ2d@earthlink.com>
In reply to#10486
On 12/4/2011 4:42 AM, Martin Gregorie wrote:
> On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green wrote:
>
>> On Sat, 03 Dec 2011 07:44:48 -0500, Eric Sosman
>> <esosman@ieee-dot-org.invalid>  wrote, quoted or indirectly quoted
>> someone who said :
>>
>>>      What is the big-Oh complexity of Collections.sort(List<String>)
>>> on an already-sorted list?
>>
>> IIRC QuickSort goes a bit bananas if the file is already sorted.  ISTR
>> some sort doing some random swaps at the start to make sure that does
>> not happen.  It does do a check first if things are in order.
>>
> I'm a little wary of quicksort's claim of superior speed in every
> situation.

I would be a lot wary of that claim if I ever saw it made. As pointed
out at the start of its Wikiedia article, it "on average, makes O(nlog
n) (big O notation) comparisons to sort n items. In the worst case, it
makes O(n2) comparisons, though this behavior is rare. Quicksort is
often faster in practice than other O(nlog n) algorithms"

The key words here are "on average" and "often faster". To test those
claims, it would be necessary to do many runs, in many different
situations, with different inputs.

However, I think the java.util developers made the right call in
deciding not to use it. Given that there are very effective algorithms
that are truly O(n log(n)), not just on average, why not use one of them?

Patricia

[toc] | [prev] | [next] | [standalone]


#10496

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2011-12-04 13:17 -0800
Message-ID<ZrRCq.5933$cN1.371@newsfe12.iad>
In reply to#10493
On 12/4/11 12:18 PM, Patricia Shanahan wrote:
> On 12/4/2011 4:42 AM, Martin Gregorie wrote:
>> On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green wrote:
>>
>>> On Sat, 03 Dec 2011 07:44:48 -0500, Eric Sosman
>>> <esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
>>> someone who said :
>>>
>>>> What is the big-Oh complexity of Collections.sort(List<String>)
>>>> on an already-sorted list?
>>>
>>> IIRC QuickSort goes a bit bananas if the file is already sorted. ISTR
>>> some sort doing some random swaps at the start to make sure that does
>>> not happen. It does do a check first if things are in order.
>>>
>> I'm a little wary of quicksort's claim of superior speed in every
>> situation.
>
> I would be a lot wary of that claim if I ever saw it made. As pointed
> out at the start of its Wikiedia article, it "on average, makes O(nlog
> n) (big O notation) comparisons to sort n items. In the worst case, it
> makes O(n2) comparisons, though this behavior is rare. Quicksort is
> often faster in practice than other O(nlog n) algorithms"
>
> The key words here are "on average" and "often faster". To test those
> claims, it would be necessary to do many runs, in many different
> situations, with different inputs.
>
> However, I think the java.util developers made the right call in
> deciding not to use it. Given that there are very effective algorithms
> that are truly O(n log(n)), not just on average, why not use one of them?
>
> Patricia
I remember reading a paper online a while back which showed that it was 
possible to create a comparison specifically designed to thwart Quick 
Sort.  Not sure why anyone would want to do that in a real life 
situation, but the paper showed that it could cause common QS 
implementations to always approach n^2.

[toc] | [prev] | [next] | [standalone]


#10502

FromPatricia Shanahan <pats@acm.org>
Date2011-12-04 15:32 -0800
Message-ID<jY6dnVhvedWQnkHTnZ2dnUVZ_r2dnZ2d@earthlink.com>
In reply to#10496
On 12/4/2011 1:17 PM, Daniel Pitts wrote:
> On 12/4/11 12:18 PM, Patricia Shanahan wrote:
...
> I remember reading a paper online a while back which showed that it was
> possible to create a comparison specifically designed to thwart Quick
> Sort. Not sure why anyone would want to do that in a real life
> situation, but the paper showed that it could cause common QS
> implementations to always approach n^2.

One practical use of such a data set would be to bring home to algorithm
design students and programmers choosing algorithms the difference
between "average O(n log(n) time" and "O(n log(n)) time", which by
default means a bound on the maximum time across all input sets.

Patricia

[toc] | [prev] | [next] | [standalone]


#10504

Frommarkspace <-@.>
Date2011-12-04 15:59 -0800
Message-ID<jbh1hf$m6j$1@dont-email.me>
In reply to#10502
On 12/4/2011 3:32 PM, Patricia Shanahan wrote:

> One practical use of such a data set would be to bring home to algorithm
> design students and programmers choosing algorithms the difference
> between "average O(n log(n) time" and "O(n log(n)) time", which by
> default means a bound on the maximum time across all input sets.


I think you're missing a parenthesis in the first formula there.

So let me take a stab at that.  "Average O" time means, given a bunch of 
different data, we think that some sort of "average" of them will have 
an upper bound of O, but some might not.  And "average" means "average 
of the data we measured," not "the data you're using."

Whereas "O" means the upper bound is O, period, regardless of the data.

[toc] | [prev] | [next] | [standalone]


#10505

FromPatricia Shanahan <pats@acm.org>
Date2011-12-04 16:14 -0800
Message-ID<JOGdnZ4NAKmfkEHTnZ2dnUVZ_rednZ2d@earthlink.com>
In reply to#10504
On 12/4/2011 3:59 PM, markspace wrote:
> On 12/4/2011 3:32 PM, Patricia Shanahan wrote:
>
>> One practical use of such a data set would be to bring home to algorithm
>> design students and programmers choosing algorithms the difference
>> between "average O(n log(n) time" and "O(n log(n)) time", which by
>> default means a bound on the maximum time across all input sets.
>
>
> I think you're missing a parenthesis in the first formula there.

Correct. Sorry about that.

>
> So let me take a stab at that. "Average O" time means, given a bunch of
> different data, we think that some sort of "average" of them will have
> an upper bound of O, but some might not. And "average" means "average of
> the data we measured," not "the data you're using."

Usually, when it is quoted in Big-O notation it should mean "calculated
average over all data sets". Big-O is not really subject to
experimentation, because it is about limits as size tends to infinity.

>
> Whereas "O" means the upper bound is O, period, regardless of the data.
>

Yup.

Patricia

[toc] | [prev] | [next] | [standalone]


#10508

Frommarkspace <-@.>
Date2011-12-04 17:12 -0800
Message-ID<jbh5qd$faj$1@dont-email.me>
In reply to#10505
On 12/4/2011 4:14 PM, Patricia Shanahan wrote:

> Usually, when it is quoted in Big-O notation it should mean "calculated
> average over all data sets". Big-O is not really subject to
> experimentation, because it is about limits as size tends to infinity.


Really.  I'll have to take another look at the math then (not sure where 
my math books are right now).  I didn't think this could actually be 
worked out, with any rigor or precision.  As you say the data sets 
"tends to infinity."  It'll be interesting to review those calculations.

[toc] | [prev] | [next] | [standalone]


#10509

FromEric Sosman <esosman@ieee-dot-org.invalid>
Date2011-12-04 21:17 -0500
Message-ID<jbh9j9$2eo$1@dont-email.me>
In reply to#10505
On 12/4/2011 7:14 PM, Patricia Shanahan wrote:
> [...]
> Usually, when it is quoted in Big-O notation it should mean "calculated
> average over all data sets".

     I must disagree. Big-Oh is a statement about some quantity, and
carries no implication about the nature of that quantity: average,
best case, worst case, or whatever.  We can say of Quicksort that the
average running time is O(N log N) and the worst is O(N*N), and the
big-Ohs in the two assertions are independent: They are statements
about two different quantities.

     In fact, Big-Oh is often used in an entirely different context,
describing a tiny quantity rather than a large one.  For example, we
can say that the sum 1/1 + 1/2 + ... + 1/N is ln(N) + O(1).  There is
no implication of an average of any kind in this statement, nor even
of a data set or a universe of data sets; there is only a statement
about the rate of growth of the discrepancy between the exact sum
and the approximation.

     If notions like "average over all data sets" appear, they do not
arise from the Big-Oh but from the nature of the quantity that Big-Oh
describes.  If we're talking about "average," we must describe the
genesis of this average: Over the entire universe of cases or over a
designated subset, weighted or unweighted, time-decayed or static,
whatever.  It's not Big-Oh that contributes such things, but the
definition of the quantity under consideration.

> Big-O is not really subject to
> experimentation, because it is about limits as size tends to infinity.

     Or, sometimes, as something tends to zero.  You'll find this sense
used all over the place in any textbook on numerical analysis: Integrate
by the trapezoidal rule with step size h and the error is O(h^2); use
Simpson's rule and you can get O(h^4).  These Big-Oh's still describe
upper bounds, but the quantities become smaller rather than larger in
the direction of interest.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]


#10499

FromTom Anderson <twic@urchin.earth.li>
Date2011-12-04 22:11 +0000
Message-ID<alpine.DEB.2.00.1112042147290.3416@urchin.earth.li>
In reply to#10486
On Sun, 4 Dec 2011, Martin Gregorie wrote:

> On Sat, 03 Dec 2011 20:05:24 -0800, Roedy Green wrote:
>
>> On Sat, 03 Dec 2011 07:44:48 -0500, Eric Sosman
>> <esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
>> someone who said :
>>
>>>     What is the big-Oh complexity of Collections.sort(List<String>)
>>> on an already-sorted list?
>>
>> IIRC QuickSort goes a bit bananas if the file is already sorted.  ISTR
>> some sort doing some random swaps at the start to make sure that does
>> not happen.  It does do a check first if things are in order.
>
> I'm a little wary of quicksort's claim of superior speed in every 
> situation.

What claims? Quicksort, being an algorithm, makes no claims on its behalf.

Tony Hoare, being a scholar and a gentleman, made a strong but measured 
claim about it (from the conclusion of [1]):

  The number of cycles of the innermost comparison loop is close to the
  theoretical minimum, and the loop may be made very fast. The amount of
  data movement within the store is kept within very reasonable bounds.
  Quicksort is therefore likely to recommend itself as the standard sorting
  method on most computers with a large enough random-access store to make
  internal sorting worth while.

Although it seems he may at one point have made rather more extravagant 
claims which he came to regret - a footnote in the above paper tries to 
cover them up:

  The claim to a negative sorting time in the reference is, of course, due
  to a misprint.

I very much doubt that anyone worth listening to has ever made a claim 
that quicksort is the fastest in every situation. Pathological cases have 
been known for such a long time that to do so would be to court honorary 
membership of the Flat Earth Society.

> However, then I got cute and replaced the simple comparison function, 
> which I used in all the algorithms, with a more complex one, capable of 
> multi-key sorting as well as natural, lexicographic, caseless, and 
> numeric ordering for each key. Much to my surprise, I discovered that 
> using this comparator, the quicksort was slower than the shell sort, 
> giving a performance ordering (slow to fast) of Bubble, Selection, 
> Quicksort and Shell.
>
> Evidently there's an unstated assumption in Quicksort that record swaps 
> will always be slower and/or less costly than comparisons, so it was 
> optimised to minimise swaps and not to worry about the cost of 
> comparisons. This feature was emphasised in my test series by the 
> extremely low cost of record swaps.

It's possible that quicksort does fewer swaps than Shell sort. But i don't 
think it was focused on that: Hoare's point in the quote above is that (my 
emphasis) "the number of cycles of the innermost *comparison* loop is 
close to the theoretical minimum". Sortologists have understood the 
importance of minimising the number of comparisons for a very long time.

On the contrary, the number of comparisons made by Shell sort is, 
according to the so-called experts at wikipedia, not terribly rigorously 
known, but is generally something like O(n^k), with k > 1:

http://en.wikipedia.org/wiki/Shellsort

Which is worse than the O(n log n) you get with other sorts. There will be 
particular cases where it beats quicksort, or some other sort, and perhaps 
even particular applications covering many cases, but that is not a 
general conclusion you can draw.

tom

[1] http://comjnl.oxfordjournals.org/content/5/1/10.full.pdf+html

tom

-- 
So the moon is approximately 24 toasters from Scunthorpe.

[toc] | [prev] | [next] | [standalone]


#10503

FromMartin Gregorie <martin@address-in-sig.invalid>
Date2011-12-04 23:41 +0000
Message-ID<jbh0ft$a0$1@localhost.localdomain>
In reply to#10499
On Sun, 04 Dec 2011 22:11:32 +0000, Tom Anderson wrote:

> Tony Hoare, being a scholar and a gentleman, made a strong but measured
> claim about it (from the conclusion of [1]):
>
This was in "Software Tools in Pascal" (Kernighan & Plauger).
 
> I very much doubt that anyone worth listening to has ever made a claim
> that quicksort is the fastest in every situation. Pathological cases
> have been known for such a long time that to do so would be to court
> honorary membership of the Flat Earth Society.
>
K&P describe shellsort as approximating N**1.5 and say that it averages 
NlogN but can degrade to n**2 but that worse than NlogN is 'very rare', 
addinf that to sort an array "just enter 'quick(v, 1, n)' and stand well 
back". I probably ascribed too much weight to that remark, but OTOH the 
book says almost nothing about what can degrade its performance.

It then goes on to talk about sort-merges which I really related to since 
I was using a box with 48K RAM at the time and wasn't long away from 
mainframes with little RAM (multi-user operation with 32 Kwords of RAM: 
96 KB equivalent), so all disk sort utilities were of the sort-merge 
variety - tape sorts were pure multi-way merges preceeded by a collation 
phase).


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]


#10660

FromRoedy Green <see_website@mindprod.com.invalid>
Date2011-12-12 01:39 -0800
Message-ID<5gibe750e8e13jp53cg7mips2vcv44b2o7@4ax.com>
In reply to#10499
On Sun, 4 Dec 2011 22:11:32 +0000, Tom Anderson <twic@urchin.earth.li>
wrote, quoted or indirectly quoted someone who said :

>http://en.wikipedia.org/wiki/Shellsort

I  coded a bunch of sorts with test drivers if you are interested in
comparing them under various circumstances:
http://mindprod.com/products2.html#HEAPSORT HeapSort
http://mindprod.com/products2.html#QUICKSORT QuickSort
http://mindprod.com/products2.html#RADIXSORT RadixSort
http://mindprod.com/products2.html#SHELLSORT ShellSort
http://mindprod.com/jgloss/sort.html#BADSORT  BubbleSort
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.java.programmer


csiph-web