Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: sorting Was: Isn't that beauty ? (no it's not) Date: Sat, 25 Apr 2026 15:47:28 -0700 Organization: A noiseless patient Spider Lines: 107 Message-ID: <86fr4i3ben.fsf@linuxsc.com> References: <10otm7r$1ntrg$1@raubtier-asyl.eternal-september.org> <863416xid5.fsf@linuxsc.com> <10r9d06$32585$1@paganini.bofh.team> <86fr508dwq.fsf@linuxsc.com> <10rjkin$otp$1@reader1.panix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Sat, 25 Apr 2026 22:47:29 +0000 (UTC) Injection-Info: dont-email.me; posting-host="842c0705c388246dee2d3602366a27fb"; logging-data="1270410"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX197stWdhscsiKRVOhOm3vktVrZ6jCn4xo8=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:NAmP2m1fJxXbqwRywYXt+GtrjPo= sha1:k1O9qu+6rV1Vw4gN19UnjWfQAQ0= Xref: csiph.com comp.lang.c:397953 cross@spitfire.i.gajendra.net (Dan Cross) writes: > In article <86fr508dwq.fsf@linuxsc.com>, > Tim Rentsch wrote: > >> antispam@fricas.org (Waldek Hebisch) writes: >> >>> Tim Rentsch wrote: >>> >>>> antispam@fricas.org (Waldek Hebisch) writes: >>>> >>>>> Tim Rentsch wrote: >>>>> >>>>>> Michael S writes: >>>>>> >>>>>>> On Mon, 06 Apr 2026 15:13:32 -0700 >>>>>>> Tim Rentsch wrote: >>>>>>> >>>>>>>> Obviously what words (or lines) appear can affect the character >>>>>>>> counts, but that still doesn't change BigO. By the way you don't >>>>>>>> say whether you are sorting words or lines. >>>>>>> >>>>>>> This sub-thread is about sorting lines with average length of few >>>>>>> dozens characters, i.e. many times longer than log2(N). That was >>>>>>> stated at one of earlier posts. >>>>>> >>>>>> That has nothing to do with BigO, which is about asymptotic >>>>>> behavior as N goes to infinity. >>>>> >>>>> Honest Big(O) varies length of the key with N. In practical range >>>>> key length may be constant, but fixing length gives unrealistic >>>>> problem for Big(O) analysis: without varying key length there are >>>>> finitely many keys and sorting is equivalent to counting how many >>>>> times each key appears in the input. >>>> >>>> There's an important clarification to make here. There are two >>>> independent parameters: N, the number of records to be sorted (a >>>> record being a character string that is either a word or a line), >>>> and the (maximum) length of any record, which in the discussion is >>>> bounded above by a constant. >>> >>> There is one choice, made often to simplify problem. But there >>> is quite universal choice: N is total size of input data. >> >> No, N is usually the number of records to be sorted, not the size of >> the input. Also, in the discussion I was responding to, there were >> clearly two distinct axes relevant to the discussion. >> >>>> What is being asked about is the behavior as a function of N as N >>>> increases without bound. Of course, theoretically, as the number of >>>> records increases without bound, eventually the character strings >>>> being sorted will have to have duplicates. But long before that >>>> happens the index variable N will run out of bits. This property is >>>> well understood in theoretical computer science, not just in terms >>>> of how much time is used but how much storage is needed. In theory >>>> log N bits are needed just to hold the index pointers. It is >>>> customary though, outside of purely theoretical discussions, to >>>> ignore that and treat the size of an index or pointer variable as >>>> constant. In purely theoretical terms no sorting algorithm is >>>> O(N*log(N)), because just incrementing a pointer takes more than >>>> O(1) operations. Surely the discussions in Knuth's books take such >>>> things into consideration. On the practical side, which almost >>>> always covers discussions that take place in usenet newsgroups, >>>> these minor theoretical issues are ignored. Any actul computer in >>>> the physical universe will never have occasion to process more than >>>> 2**512 records, due to the limitation of the number of elementary >>>> particles in the universe, so a 512-bit address (or index value) >>>> always suffices. >>>> >>>> So yes, in theory, the considerations around processing an enormous >>>> number of values are relevant. In the practical context of the >>>> discussion underway here, they aren't. >>> >>> It was you who insisted on asymptitic complexity, rejecting >>> practical amendments... >> >> This statement isn't right. BigO was already part of the discussion >> when I joined in. Also, it is customary in discussion of sorting >> algorithms to use the metric of number of comparisons done, without >> regard to the size of the variables needed to hold the indices of >> the records being sorted. > > Hmm, this response echoes almost exactly what I wrote in > <10ranaa$ihf$1@reader1.panix.com> on April 10th, also in > response to Waldek. Did you see it? Probably, but I don't remember it specifically. >> See Knuth chapter 5, on Sorting, in volume 3 of TAOCP. > > As for TAOCP, as much as I respect and admire Knuth, I don't > think "The Art of Computer Programming" is a good reference for > working programmers. At 391 pages long, Chapter Five occupies > more than half of an ~750 page book, the examples are all in > assembly language for his notional MIX computer, and the > analysis he presents presumes a mathematics background that, > frankly, most programmers neither have nor need. I am not recommending TAOCP as a textbook. It's a well-known work and respected for its theoretical treatments. Also it happens to be what I consulted before posting, mainly because it was handy. > "See chapter 5" is thus not useful as a reference, and rather > comes across as an admonishment. No admonishment was intented. I mentioned it only as a supporting work for my previous statement.