Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: sorting Was: Isn't that beauty ? (no it's not) Date: Mon, 06 Apr 2026 18:37:47 -0700 Organization: A noiseless patient Spider Lines: 37 Message-ID: <86v7e3y2hg.fsf@linuxsc.com> References: <10otm7r$1ntrg$1@raubtier-asyl.eternal-september.org> <10p1r1q$3hfeq$1@dont-email.me> <10p2q41$3umkh$1@raubtier-asyl.eternal-september.org> <10p2ssl$3ven8$1@dont-email.me> <10p30j2$13nd$1@raubtier-asyl.eternal-september.org> <10p30kr$13nd$2@raubtier-asyl.eternal-september.org> <10p3jgn$80c7$1@dont-email.me> <10p9q0h$2a77o$1@dont-email.me> <10pa02q$2d5us$1@dont-email.me> <10pa4eg$2dtli$5@dont-email.me> <10pa78l$2fjaa$1@dont-email.me> <10palrv$2jpeu$2@dont-email.me> <10pbf0q$2rtq3$1@dont-email.me> <10pbg8f$2spms$2@dont-email.me> <10pbhud$2tbk0$2@dont-email.me> <10pcvr2$3ednq$2@dont-email.me> <20260318112151.000021dc@yahoo.com> <10pdu8k$3o60c$1@dont-email.me> <20260319111959.0000447c@yahoo.com> <10ph2db$pn58$1@dont-email.me> <20260319172955.000062ce@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Tue, 07 Apr 2026 01:37:48 +0000 (UTC) Injection-Info: dont-email.me; posting-host="578c4ea6691f08c6710ee5a21a69b6e8"; logging-data="2645957"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18y1CYE1FkWeVtgR+Wcsr5V3e7ofLh0GlQ=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:dVPYFvClLuf3PKwYhhYrvIiHMG4= sha1:bWMT5gXTgo4OL9aH1AO3xl3FV2k= Xref: csiph.com comp.lang.c:397403 Michael S writes: > On Thu, 19 Mar 2026 14:49:16 +0000 > Bart wrote: > >> On 19/03/2026 09:19, Michael S wrote: >> >>> On Wed, 18 Mar 2026 11:20:03 +0100 >>> David Brown wrote: >> >> Normally however (and again in scripting code) I'd use my built-in >> sort() based on quicksort, which is nearly 1000 times faster than >> bubble-sort for my test (sort 20K random strings), and some 300x >> faster than your routines. It's not O(n-squared) either. > > For lexicographic sort of 20K random strings, plain quicksort is > probably quite sub-optimal. > If performance is important, I'd consider combined method: first > pre-sort by 3-char or 4-char prefix with Radix sort ('by LS character > first' variation of algorithm), then use quicksprt to sort sections > with the same prefix. For string taken from the real world it will not > work as well as for artificial random strings, but should still > significantly outperform plain quicksort. Put the strings in a hash table. If a string has been seen before nothing more needs to be done. Any string not seen before to be inserted into a balanced tree (with no further tests for equality needed). The balanced tree can be used as a basis to associate an integer with each string, where the associated integers have the same sort order as the strings. The only string comparisons needed are for building the hash table (which for the most part will have no misses), and for building the balanced tree, which is good because most of the comparisons are with values that are far away, lexicographically, from the string being inserted, and so not many character comparisons are needed before a decision is made about which way to go in the tree.