Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c
Subject: Re: sorting Was: Isn't that beauty ? (no it's not)
Date: Sun, 12 Apr 2026 07:13:41 -0700
Organization: A noiseless patient Spider
Lines: 77
Message-ID: <86fr508dwq.fsf@linuxsc.com>
References: <10otm7r$1ntrg$1@raubtier-asyl.eternal-september.org> <10pi29u$15114$1@dont-email.me> <10pih35$17mc2$5@dont-email.me> <20260320125824.00002759@yahoo.com> <10pjdll$1h0eh$6@dont-email.me> <10pjedq$1if62$1@dont-email.me> <20260320150840.000045db@yahoo.com> <10pjj56$1k50i$1@dont-email.me> <20260322020323.00006a16@yahoo.com> <864ilnzqib.fsf@linuxsc.com> <20260407022213.00001ae7@yahoo.com> <86mrzfxvv5.fsf@linuxsc.com> <10r35bp$29m9s$1@paganini.bofh.team> <863416xid5.fsf@linuxsc.com> <10r9d06$32585$1@paganini.bofh.team>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Sun, 12 Apr 2026 14:13:42 +0000 (UTC)
Injection-Info: dont-email.me; posting-host="425f3de481b107cac4c4e410f9a2c2b8"; logging-data="2689433"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19pbIpyY2YzcuSNuiciECG2EABNapktDx0="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:XXYuinahkfh8+GxrJJlD4hc2uIQ= sha1:VlGV+3dSA6H525BK6IGMkiW3FZY=
Xref: csiph.com comp.lang.c:397505
antispam@fricas.org (Waldek Hebisch) writes:
> Tim Rentsch wrote:
>
>> antispam@fricas.org (Waldek Hebisch) writes:
>>
>>> Tim Rentsch wrote:
>>>
>>>> Michael S writes:
>>>>
>>>>> On Mon, 06 Apr 2026 15:13:32 -0700
>>>>> Tim Rentsch wrote:
>>>>>
>>>>>> Obviously what words (or lines) appear can affect the character
>>>>>> counts, but that still doesn't change BigO. By the way you don't
>>>>>> say whether you are sorting words or lines.
>>>>>
>>>>> This sub-thread is about sorting lines with average length of few
>>>>> dozens characters, i.e. many times longer than log2(N). That was
>>>>> stated at one of earlier posts.
>>>>
>>>> That has nothing to do with BigO, which is about asymptotic
>>>> behavior as N goes to infinity.
>>>
>>> Honest Big(O) varies length of the key with N. In practical range
>>> key length may be constant, but fixing length gives unrealistic
>>> problem for Big(O) analysis: without varying key length there are
>>> finitely many keys and sorting is equivalent to counting how many
>>> times each key appears in the input.
>>
>> There's an important clarification to make here. There are two
>> independent parameters: N, the number of records to be sorted (a
>> record being a character string that is either a word or a line),
>> and the (maximum) length of any record, which in the discussion is
>> bounded above by a constant.
>
> There is one choice, made often to simplify problem. But there
> is quite universal choice: N is total size of input data.
No, N is usually the number of records to be sorted, not the size of
the input. Also, in the discussion I was responding to, there were
clearly two distinct axes relevant to the discussion.
>> What is being asked about is the behavior as a function of N as N
>> increases without bound. Of course, theoretically, as the number of
>> records increases without bound, eventually the character strings
>> being sorted will have to have duplicates. But long before that
>> happens the index variable N will run out of bits. This property is
>> well understood in theoretical computer science, not just in terms
>> of how much time is used but how much storage is needed. In theory
>> log N bits are needed just to hold the index pointers. It is
>> customary though, outside of purely theoretical discussions, to
>> ignore that and treat the size of an index or pointer variable as
>> constant. In purely theoretical terms no sorting algorithm is
>> O(N*log(N)), because just incrementing a pointer takes more than
>> O(1) operations. Surely the discussions in Knuth's books take such
>> things into consideration. On the practical side, which almost
>> always covers discussions that take place in usenet newsgroups,
>> these minor theoretical issues are ignored. Any actul computer in
>> the physical universe will never have occasion to process more than
>> 2**512 records, due to the limitation of the number of elementary
>> particles in the universe, so a 512-bit address (or index value)
>> always suffices.
>>
>> So yes, in theory, the considerations around processing an enormous
>> number of values are relevant. In the practical context of the
>> discussion underway here, they aren't.
>
> It was you who insisted on asymptitic complexity, rejecting
> practical amendments...
This statement isn't right. BigO was already part of the discussion
when I joined in. Also, it is customary in discussion of sorting
algorithms to use the metric of number of comparisons done, without
regard to the size of the variables needed to hold the indices of
the records being sorted. See Knuth chapter 5, on Sorting, in
volume 3 of TAOCP.