Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c
Subject: Re: sorting Was: Isn't that beauty ? (no it's not)
Date: Sun, 12 Apr 2026 06:17:12 -0700
Organization: A noiseless patient Spider
Lines: 35
Message-ID: <86wlyc8giv.fsf@linuxsc.com>
References: <10otm7r$1ntrg$1@raubtier-asyl.eternal-september.org> <86mrzfxvv5.fsf@linuxsc.com> <10r35bp$29m9s$1@paganini.bofh.team> <863416xid5.fsf@linuxsc.com> <10r94t2$or8$1@reader1.panix.com> <20260410013139.000059d4@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Sun, 12 Apr 2026 13:17:15 +0000 (UTC)
Injection-Info: dont-email.me; posting-host="425f3de481b107cac4c4e410f9a2c2b8"; logging-data="2689433"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19SxWbz2ghM0B34V1RC/sbJEtRlwgsttlE="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:SoeDyYB/Iwb+68rfr0lruKeEVow= sha1:lekvQno8hVQWsjH26jMzLxXW964=
Xref: csiph.com comp.lang.c:397502
Michael S writes:
> On Thu, 9 Apr 2026 21:15:14 -0000 (UTC)
> cross@spitfire.i.gajendra.net (Dan Cross) wrote:
[...]
>> Indeed. As Rob Pike once put it, "Fancy algorithms are slow
>> when $n$ is small, and $n$ is usually small. Fancy algorithms
>> have big constants. Until you know that $n$ is frequently going
>> to get big, don't get fancy."
>
> One case where these considerations are not at all theoretical and
> where simple quicksort from the books performs very very slowly exactly
> because when sorting progresses each lexicographic comparison
> takes more and more time, is a sorting at core of Burrows-Wheeler
> transform, which in turn is at core of various compression schemes,
> including bzip2. The problem hits you the worst when data set compresses
> well.
I'm curious to know if you can quantify this to some degree, and
if so how much. I don't have any experience with Burrows-Wheeler
transform or (any internals of) bzip2.
> In specific case of bzip2, they limited block size to 900KB which
> is quite low and did preprocessing on input data which often seriously
> impacts the quality of compression. I can't say for sure, but it seems
> to me that the reason was exactly that - avoiding prohibitively slow
> sorting. Were they had time and desire to use "fancy algorithms",
> either combinations of bucket and non-bucket variants of radix sort, or
> quicksort that memorizes common prefixes, or even combination of all
> three, then they would not need to use preprocessing and small blocks
> and would end up with better compression ratios.
There could be other reasons for wanting to limit the block size.
Have you done any web searches or tried to investigate in some
other ways?