Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail From: Lew Newsgroups: comp.lang.java.programmer Subject: Re: Did the sort do anything? Date: Wed, 18 May 2011 13:02:55 -0400 Organization: albasani.net Lines: 45 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.albasani.net 3AfNZ6JdwT+k+rcCUCseDjIa0bUjtK9wIsIae1bkVQfGeGMKNrvx1OrEXuf3bZVMFTYeeYx39hI8RHiI7Scv/TBRElDWMdaBszQQrNNu7AoKxmhVp0HCMMCWO/dluC94 NNTP-Posting-Date: Wed, 18 May 2011 17:02:34 +0000 (UTC) Injection-Info: news.albasani.net; logging-data="Uy5UeXXxXBnl2s45ff6wRXGS7C7iwgtYqJM/CUXhYdpog9blUvGzQVu7bFLvoxuAa2EjjLj9TqqHELhya3MYzrx5oeHamUMmGISDFzdv/0sgcNwR7pi0nxdDW69IWZZz"; mail-complaints-to="abuse@albasani.net" User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 In-Reply-To: Cancel-Lock: sha1:44E0jEu3hOOyHdnwOvn7jFw6yiA= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:4262 Michael Wojcik wrote: > Tom Anderson wrote: >> What would shut me up would be a performance comparison of two >> comparable-quality implementations of the algorithms, showing that >> Timsort wipes the floor with Smoothsort. Is anyone aware of one? > > The usual problem with heapsort-based algorithms, besides having O(n > lg n) as a lower bound, is that they have lousy locality of reference. > In traditional heapsort, once the heap is built, you proceed by taking > the leftmost element in the array (the root of the heap) and swapping > it with the last element of the heap part of the array. If the heap > size is not trivial, you're doing a load from one cache line and a > store into another. > > Smoothsort appears to fix this problem, because it keeps the largest > elements at the right end of the forest of heaps, so the move from the > forest of heaps to the sorted part of the array is local. It also > avoids having to re-heapify large heaps, since it always operates on > the smallest heap left in the forest, and it splits large heaps as it > dequeues from them. In fact, while I've never tried implementing > smoothsort, much less analyzed its behavior, it looks like it has > quite good locality of reference. > > But Timsort has great locality of reference, because it deals in runs > (including reversed-order runs), and when it doesn't have a decent run > at the current position, it sorts a small fragment of the array to > create one. > > On the other hand, when sorts are used in most modern applications, > they're comparing keys that are located elsewhere in memory and > probably invoking a user-supplied comparison function, so locality of > reference may not have much effect on performance. Proving once again that theory can lead you to town, but it can't find you the concierge at the hotel. For that you need measurement under realistic conditions. Evidence will tell you if the difference between Timsort, Quicksort, Smoothsort, Bubble sort or Stupidsort even matters, much less which one wins. -- Lew Stupidsort: scan the array looking for the largest element. Copy it to one end of temporary array. Repeat with next element, placing in next position, repeat until whole array is copied. Quicksort the result. Copy it back to the original array. Cache the results.