Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c
Subject: Re: sorting Was: Isn't that beauty ? (no it's not)
Date: Sat, 11 Apr 2026 21:32:21 -0700
Organization: A noiseless patient Spider
Lines: 83
Message-ID: <861pgkaje2.fsf@linuxsc.com>
References: <10otm7r$1ntrg$1@raubtier-asyl.eternal-september.org> <86mrzfxvv5.fsf@linuxsc.com> <10r35bp$29m9s$1@paganini.bofh.team> <863416xid5.fsf@linuxsc.com> <10r94t2$or8$1@reader1.panix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Sun, 12 Apr 2026 04:32:22 +0000 (UTC)
Injection-Info: dont-email.me; posting-host="425f3de481b107cac4c4e410f9a2c2b8"; logging-data="2409914"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/iR2lmitga4IP1/DiAbm+cbYTUaT0l1us="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:vr0MOQilp6zZzXsvBjAZU6SyvUU= sha1:mO1tZA1wfxSnPVsX2V7KtYwTBDE=
Xref: csiph.com comp.lang.c:397497
cross@spitfire.i.gajendra.net (Dan Cross) writes:
> In article <863416xid5.fsf@linuxsc.com>,
> Tim Rentsch wrote:
>
>> antispam@fricas.org (Waldek Hebisch) writes:
>>
>>> Tim Rentsch wrote:
>>>
>>>> Michael S writes:
>>>>
>>>>> On Mon, 06 Apr 2026 15:13:32 -0700
>>>>> Tim Rentsch wrote:
>>>>>
>>>>>> Obviously what words (or lines) appear can affect the character
>>>>>> counts, but that still doesn't change BigO. By the way you don't
>>>>>> say whether you are sorting words or lines.
>>>>>
>>>>> This sub-thread is about sorting lines with average length of few
>>>>> dozens characters, i.e. many times longer than log2(N). That was
>>>>> stated at one of earlier posts.
>>>>
>>>> That has nothing to do with BigO, which is about asymptotic
>>>> behavior as N goes to infinity.
>>>
>>> Honest Big(O) varies length of the key with N. In practical range
>>> key length may be constant, but fixing length gives unrealistic
>>> problem for Big(O) analysis: without varying key length there are
>>> finitely many keys and sorting is equivalent to counting how many
>>> times each key appears in the input.
>>
>> There's an important clarification to make here. There are two
>> independent parameters: N, the number of records to be sorted (a
>> record being a character string that is either a word or a line),
>> and the (maximum) length of any record, which in the discussion is
>> bounded above by a constant.
>>
>> What is being asked about is the behavior as a function of N as N
>> increases without bound. Of course, theoretically, as the number of
>> records increases without bound, eventually the character strings
>> being sorted will have to have duplicates. But long before that
>> happens the index variable N will run out of bits. This property is
>> well understood in theoretical computer science, not just in terms
>> of how much time is used but how much storage is needed. In theory
>> log N bits are needed just to hold the index pointers. It is
>> customary though, outside of purely theoretical discussions, to
>> ignore that and treat the size of an index or pointer variable as
>> constant. In purely theoretical terms no sorting algorithm is
>> O(N*log(N)), because just incrementing a pointer takes more than
>> O(1) operations. Surely the discussions in Knuth's books take such
>> things into consideration.
>
> If by "Knuth's books" you're referring to TAOCP, then he does
> not seem to give it too much attention. [...]
In most of the chapter on Sorting, TAOCP uses the number of
comparisons as the basis of comparison. But not everywhere
in the chapter.
My statement was not meant to be limited to the discussion of
Sorting.
>> On the practical side, which almost
>> always covers discussions that take place in usenet newsgroups,
>> these minor theoretical issues are ignored. Any actul computer in
>> the physical universe will never have occasion to process more than
>> 2**512 records, due to the limitation of the number of elementary
>> particles in the universe, so a 512-bit address (or index value)
>> always suffices.
>>
>> So yes, in theory, the considerations around processing an enormous
>> number of values are relevant. In the practical context of the
>> discussion underway here, they aren't.
>
> Indeed. As Rob Pike once put it, "Fancy algorithms are slow
> when $n$ is small, and $n$ is usually small. Fancy algorithms
> have big constants. Until you know that $n$ is frequently going
> to get big, don't get fancy."
Whether the Rob Pike advisory is applicable or not is beside the
point. My comment was about fancy mathematics, not fancy
algorithms. My statement is just as applicable to Tim Sort (one
of the fancier sorting algorithms) as it is to Bubble Sort.