Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: Isn't that beauty ? (no it's not) Date: Tue, 07 Apr 2026 02:12:49 -0700 Organization: A noiseless patient Spider Lines: 378 Message-ID: <86fr57xhf2.fsf@linuxsc.com> References: <10otm7r$1ntrg$1@raubtier-asyl.eternal-september.org> <10ov73a$2bugo$3@dont-email.me> <10ovlm7$2ijcp$1@raubtier-asyl.eternal-september.org> <10p1r1q$3hfeq$1@dont-email.me> <10p2q41$3umkh$1@raubtier-asyl.eternal-september.org> <10p2ssl$3ven8$1@dont-email.me> <10p30j2$13nd$1@raubtier-asyl.eternal-september.org> <10p30kr$13nd$2@raubtier-asyl.eternal-september.org> <10p3jgn$80c7$1@dont-email.me> <10p9q0h$2a77o$1@dont-email.me> <10pa02q$2d5us$1@dont-email.me> <10pa4eg$2dtli$5@dont-email.me> <10pa78l$2fjaa$1@dont-email.me> <10palrv$2jpeu$2@dont-email.me> <10pbf0q$2rtq3$1@dont-email.me> <10pbg8f$2spms$2@dont-email.me> <10pbhud$2tbk0$2@dont-email.me> <10pcvr2$3ednq$2@dont-email.me> <20260318112151.000021dc@yahoo.com> <10pdu8k$3o60c$1@dont-email.me> <20260319111959.0000447c@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Tue, 07 Apr 2026 09:12:52 +0000 (UTC) Injection-Info: dont-email.me; posting-host="578c4ea6691f08c6710ee5a21a69b6e8"; logging-data="2865053"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19M+66Zux7vRAyygu7nV6OmSWwdz6fe1V8=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:0t5B0gzkBLvb4gtRzxDmwtx+hR0= sha1:DVmneS808Xs7MgEQGYz73/pve7I= Xref: csiph.com comp.lang.c:397408 Michael S writes: [..lots left out..] > On Wed, 18 Mar 2026 11:20:03 +0100 > David Brown wrote: [...] >> I don't think pure bubblesort has much serious use outside >> educational purposes, but it is easy to understand, easy to >> implement correctly in pretty much any language, and can do a >> perfectly good job if you don't need efficient sorting of large >> datasets (for small enough datasets, a hand-written bubblesort >> in C will be faster than calling qsort). > > IMHO, Bubble Sort is harder to understand then Straight Select > Sort. After reading through the several discussions regarding sorting, I have a variety of comments to make about sorting algorithms, not just the remarks above but to the discussions in general. The posting above seems a reasonable departure point for those comments, although much of what I have to say is prompted by comments in other postings. For starters, I contend that a completely simple bubble sort is the easiest sorting algorithm to show to novice programmers. By novice here I mean people who barely even know what is meant by the word algorithm. To be specific, what I mean by simple bubble sort is like this: void simple_minded_bubble_sort( unsigned n, Element a[n] ){ unsigned i, j; for( i = 0; i < n; i++ ){ for( j = 1; j < n; j++ ){ if( follows( j-1, j, a ) ) swap( j-1, j, a ); } } } Incidentally all the algorithms I will show here are written in terms of the two functions follows() and swap(), whose meanings are (I hope) very obvious. The array to be sorted has values of type Element, which doesn't play any direct role in the discussion. I think it goes without saying that this is a crummy algorithm. But it's a very simple algorithm, which makes it a good first choice for novice programmers. We can reasonably ask how bad this algorithm is, and ask in a way that novices can understand. It seems obvious that a good sorting algorithm should compare any two particular elements at most once. If we try this algorithm on an array of randomly chosen input values, we might get statistics like this: simple bubble 151277700 37391291 200.000 Here the columns are name, number of compares, number of swaps, and number of compares shown as a percentage of the "maximal" number of compares. The crumminess is evident: this algorithm compares each pair of elements not once but twice. Awful! There is a second and more subtle problem, which is that even though the algorithm is simple it isn't obvious that it works or why. However it isn't hard to see that the inner loop carries the largest value to the upper end of the array, and similarly the second iteration of the inner loop will carry the second largest value to the last place before that, and so on. This observation leads to a second algorithm that is both faster and has a motivating argument for why it works: void bubble_sort( unsigned n, Element a[n] ){ unsigned i, j; for( i = n; i > 1; i-- ){ for( j = 1; j < i; j++ ){ if( follows( j-1, j, a ) ) swap( j-1, j, a ); } } } Now the inner loop works over smaller and smaller subsections of the whole array, each time carrying the largest remaining value to its appropriate location in the to-be-sorted array. Using the same test data as before, we get these measurements: simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 Excellent! Besides being able to see why it works, we do better on the number of compares by a factor of two. The number of swaps is the same (as indeed it cannot be improved for any algorithm that swaps only adjacent elements), but cutting down on the number of compares is clearly a step in the right direction. Now if we are inspired we can notice something. During one iteration of the inner loop, if we remember the last place a swap occurred, subsequent iterations of the inner don't need to go any farther than that. The reason is that we know all the values after the last swap must the larger than the element being carried along, so only smaller values were passed over, so we don't need to look past the location of the rightmost swap. It's kind of a subtle argument, but we can program it thusly: void knuth_bubble_sort( unsigned n, Element a[n] ){ unsigned i, j, t; for( i = n; i > 1; i = t ){ for( t = 0, j = 1; j < i; j++ ){ if( follows( j-1, j, a ) ){ swap( j-1, j, a ), t = j; } } } } The variable 't' records the location past which no more compares need be done in the next iteration. By using 'i = t' in the outer loop control, we get to limit the range of what is looked at in the next iteration of the inner loop. Here is the data for the first three algorithms, again all run on the same input: simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 The number of compares done has indeed gone down, but not by much. There are a couple of dirty secrets about the "improved" version of bubble sort. One is that it doesn't help very much. The other is about the idea that it terminates early when the array values are already almost sorted. That is mostly a myth. It can happen that when the array values are already almost sorted the knuth_bubble_sort() will terminate early, but it almost never does. Even when there is only one array value out of place, it is relatively rare for this algorithm to get a big speedup. It's time to look at another algorithm. Here is one that wasn't around when I was first learning about sorting algorithms, called gnome sort: void gnome_sort( unsigned n, Element a[n] ){ unsigned i = 0; while( ++i < n ){ if( follows( i-1, i, a ) ){ swap( i-1, i, a ); i = i > 1 ? i-2 : 1; } } } The idea is simple. Imagine a gnome wandering among the array elements, starting at the left end. If the two elements being looked at are in order the gnome moves right; if they aren't then the gnome swaps them and moves left (of course never moving past the left end). So the gnome just wanders back and forth, swapping elements that are out of order and going either left or right, depending on whether it just did a swap or not. How does it do, measurement-wise? Here is the data: simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 gnome 74794859 37391291 98.884 We see the relatively simple gnome sort does better than the more complicated knuth bubble sort. Perhaps not a lot better, but still better. Even so, these numbers are not very inspiring. We can do better if we take the basic idea of (knuth) bubble sort and make it boustrophedonic, or what might be called a ping pong sort: void ping_pong_sort( unsigned n, Element a[n] ){ unsigned i=1, j=-1, k=n, newj, newk; while( j+1 < k-1 ){ for( i = j+2, newk = i; i < k; i++ ){ if( follows( i-1, i, a ) ){ swap( i-1, i, a ), newk = i; } } k = newk; for( i = k-1, newj = i; i > j+1; i-- ){ if( follows( i-1, i, a ) ){ swap( i-1, i, a ), newj = i-1; } } j = newj; } } Gosh, what a mess. But let's look at the metrics. simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 gnome 74794859 37391291 98.884 ping pong 49983149 37391291 66.081 That is nice to see - a healthy performance improvement. But of course what everyone is waiting for are the well-known simple standard sorts: void selection_sort( unsigned n, Element a[n] ){ unsigned i, j; for( i = 0; i < n; i++ ){ unsigned k = i; for( j = i+1; j < n; j++ ){ if( follows( k, j, a ) ) k = j; } swap( i, k, a ); } } void insertion_sort( unsigned n, Element a[n] ){ unsigned i, j; for( i = 1; i < n; i++ ){ for( j = i; j > 0; j-- ){ if( !follows( j-1, j, a ) ) break; swap( j-1, j, a ); } } } simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 gnome 74794859 37391291 98.884 ping pong 49983149 37391291 66.081 selection 75638850 12289 100.000 insertion 37403579 37391291 49.450 Clearly selection sort should be used only when the amount of movement is the overriding factor, and compares don't matter. Conversely, insertion sort is the flip side of that - the number of compares is much lower, the amount of movement is the same as all the others. For my own amusement I wrote a new sorting algorithm, which I whimsically call titanic sort: void titanic_sort( unsigned n, Element a[n] ){ unsigned i, j; for( i = n; i > 0; i-- ){ for( j = i-1; j < n-1; j++ ){ if( follows( j, j+1, a ) ) swap( j, j+1, a ); } } } Titanic sort is just like an (ordinary) bubble sort, except it does ever larger iterations in the inner loop, rather than ever smaller iterations like an ordinary bubble sort. Here are the metrics: simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 gnome 74794859 37391291 98.884 ping pong 49983149 37391291 66.081 selection 75638850 12289 100.000 insertion 37403579 37391291 49.450 titanic 75638850 37391291 100.000 After writing the titanic sort, I had an idea. Because we are working on ever growing subintervals, we can stop the inner loop early whenever we find we don't need to swap. For this sort I once again chose a whimsical name, namely seven up sort: void sevenup_sort( unsigned n, Element a[n] ){ unsigned i, j; for( i = n; i > 0; i-- ){ for( j = i-1; j < n-1; j++ ){ if( !follows( j, j+1, a ) ) break; swap( j, j+1, a ); } } } simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 gnome 74794859 37391291 98.884 ping pong 49983149 37391291 66.081 selection 75638850 12289 100.000 insertion 37403579 37391291 49.450 titanic 75638850 37391291 100.000 seven up 37403579 37391291 49.450 Oh of course; a seven up sort is just like an insertion sort, but inserting on the far end rather than the near end. Duh! After doing all these I wrote and measured three other sorts, for which I will give the metrics but the code for only one. Here are the metrics: simple bubble 151277700 37391291 200.000 bubble 75638850 37391291 100.000 knuth bubble 75611850 37391291 99.964 gnome 74794859 37391291 98.884 ping pong 49983149 37391291 66.081 selection 75638850 12289 100.000 insertion 37403579 37391291 49.450 titanic 75638850 37391291 100.000 seven up 37403579 37391291 49.450 heapsort 309459 168889 0.409 quicksort 192033 117831 0.254 binary insertion 150102 37391291 0.198 The code for heapsort and for quicksort was not as clean as I would like so I'm not posting that. Here is the code for binary insertion sort: void binary_insertion_sort( unsigned n, Element a[n] ){ unsigned i, j, k, m; for( i = 1; i < n; i++ ){ j = -1, k = i; while( j+1 < k ){ m = j+(k-j)/2; if( follows( m, i, a ) ) k = m; else j = m; } for( j = i; j > k; j-- ) swap( j-1, j, a ); } } A comment about heapsort, especially as compared with quicksort. Using this test input heapsort did a lot more compares than quicksort did. That result is not an accident of the data. Of course sometimes quicksort misbehaves, but usually it doesn't, and in those cases heapsort is going to run slower than quicksort because it does more compares. That result is inherent in the heapsort algorithm. It's true that the worst case for heapsort is O( N * log N ), but constant for heapsort is bigger than (the average case) for quicksort. Also it's worth noting that binary insertion sort does the best of all in terms of number of compares done. My conclusions... The right place to start on sorting algorithms is with bubble sort, not because it's a good algorithm, but because it's the easiest one to show, and what is more important it's a good way to teach about how algorithms evolve. There is no reason to use the knuth version of bubble sort. Compared to ordinary bubble sort the upside is small and not worth the downside of needing more intricate code. If there is a good chance that input starts off being close to sorted, ping pong sort can be considered. In almost all cases prefer binary insertion sort to plain insertion sort, not because the average case is better but because the worst case is better. Moreover the extra code needed is small and easily understood. Writing a full-scale sorting algorithm, such as a replacement for qsort, is a non-trivial undertaking. Other considerations apply if the sort needs to be stable, but surely this posting is long enough already. :)