Path: csiph.com!news.mixmin.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.programming Subject: Re: Another little puzzle Date: Sun, 25 Dec 2022 01:05:06 -0800 Organization: A noiseless patient Spider Lines: 144 Message-ID: <86mt7byhtp.fsf@linuxsc.com> References: <86pmcczcak.fsf@linuxsc.com> <87wn6krsc5.fsf@bsb.me.uk> <868riwzxu3.fsf@linuxsc.com> <87y1qwpb5p.fsf@bsb.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: reader01.eternal-september.org; posting-host="bacb00f09a0305cdb2c72c0e332a04cb"; logging-data="2756965"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18djjpYAnJxjlAvVq3a7YNL9mym0R9HbUc=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:5QvaTxhIfPI0IdxUZHxv0k55fTI= sha1:5QLJS6aHQ1v+r2DmcBoPwqBSdGE= Xref: csiph.com comp.programming:16152 Ben Bacarisse writes: > Tim Rentsch writes: > >> Ben Bacarisse writes: >> >>> Tim Rentsch writes: >>> >>>> ram@zedat.fu-berlin.de (Stefan Ram) writes: >>>> >>>>> ram@zedat.fu-berlin.de (Stefan Ram) writes: >>>>> >>>>>> Given n times of the 24-hour day, print their average. >>>>>> For example, the average of "eight o'clock" and >>>>>> "ten o'clock" (n=2) would be "nine o'clock". >>>>>> (You can choose any representation, for example "HH:MM" >>>>>> or "seconds since midnight".) >>>>> >>>>> Thanks for all replies! >>>>> >>>>> I waited a few days before answering to allow >>>>> sufficient time to think about the problem. >>>>> >>>>> There were not enough tests written and run. As a result, >>>>> the puzzle has not yet been solved (unless I have overlooked >>>>> a contribution or misworded expectations). >>>>> >>>>> So, here are two possible test cases. >>>>> >>>>> average( 23.5, 1.5 )== 0.5 >>>>> average( 11.5, 13.5 )== 12.5 >>>>> >>>>> (I use hours as units, so "0.5" means, "half past midnight".) >>>>> >>>>> I hope that these test cases encode sensible expectations >>>>> for an average of two times on a 24-hour clock in the spirit >>>>> of the example given in the OP, which was, "the average of >>>>> eight o'clock and ten o'clock would be nine o'clock", since >>>>> these test cases just have rotated that example by 3.5 and >>>>> 15.5 hours. >>>>> >>>>> I believe that I have not seen an algorithm so far in this >>>>> thread that would pass these tests. >>>> >>>> As before, the problem is underspecified. >>> >>> Some remarks not specifically in reply to you, Tim... >> >> I didn't really say very much. Your comments are a lot more >> interesting. >> >>> The input is a collection, t(n), of n > 1 numbers in [0, 24). The >>> average should be a number, A, in [0, 24) that minimises >>> >>> Sum_{i=1,n} distance(A, t(i)) >>> >>> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms >>> of variance). So far, this is just what an average is. The key point >>> is what is the distance (or difference) whose sum (or sum of squares) >>> we want to minimise? For times, I would say it is the length of the >>> shorter arc round an imaginary 24-hour clock face. >> >> Minimizing the square of the distance (aka length of the shorter arc) >> is one metric, and I think a reasonable one. (Minimizing the absolute >> value of the distance gives a result more like the median than the >> mean, IIANM.) >> >>> The problem has a natural interpretation in terms of angles. Whatever >>> the circular quantity is, convert the values to unit vectors round a >>> circle. For times of day, just scale [0, 24) to [0, 2*pi). The >>> average is then just the direction of the average vector, converted >>> back to a time of day. >> >> Averaging the "time unit vectors" is another plausible metric. >> >>> Sometimes that vector has zero length, and the average is undefined, >>> but otherwise the length of the vector gives an indication of the >>> variability of the data. >>> >>> Why do I consider this a reasonable interpretation of the problem? >>> Well, given a list of times of day when a train is observed to pass >>> some station, the circular 24-hour-time average should be our best >>> estimate of the scheduled time. >>> >>> Obviously there are other possible readings of the problem, but I was >>> not able to justify any of them as useful for any real-world >>> applications. This is a case where I hope I am wrong and there /are/ >>> other circular averages with practical interpretations. >> >> A nice analysis. > > A generous appraisal, thanks. I was trying to be vague enough not to > put my foot in it, by was too specific with the formula. > >> Some observations (all of which I believe are correct, but no >> guarantees are offered). >> >> The "vector average" metric is easier to calculate and gives an >> explicit indication when the data produces a degenerate result. >> >> Considering only cases that do not produce a degenerate result: >> >> (1) the "conventional average" metric and the "vector average" >> metric can produce different results when there are more than two >> data points. (For one or two data points I think they are always >> the same.) >> >> (2) the "vector average" metric can be calculated in O(n). The >> "conventional average" metric can be calculated in O(n*n). > > By which I presume you mean and /not/ in O(n). (Big O is just an > upper bound.) What I mean is I have an algorithm that runs in time proportional to n*n. I suspect that there is an algorithm with better big-O behavior (probably O( n * log n )), but I don't actually have one to show, so I simply gave the best big-O result that I knew. >> (3) when the times are all reasonably close together, the >> "conventional average" metric seems more likely to produce a result >> corresponding to what most people expect. Another way to say this >> is that the "vector average" metric will sometimes surprise people. > > I'm sorry to be obtuse, but what is the "conventional average"? The > name makes it sound trivial, but the quadratic time makes me certain > that is isn't. Sorry, I meant to refer to your formulation of average A that minimizes { Sum_{i=1,n} difference(A, t(i))^2 } where 'difference' means the shorter arc length. This formula matches the result for 'mean' on real numbers. > My "conventional average" algorithm (which is not well thought out) was > to (a) rotate the data set to avoid the 23/0 boundary (not always > possible), (b) take the arithmetic mean, and then (c) rotate the result > back. E.g. [23,0,1] -> [0,1,2] by adding one, and the average is > mean[0,1,2] - 1 = 0. Yes, if you know where to split the cycle then the answer can be found in O(n) time. But how can we figure out where to split the cycle? (Incidentally I got that part wrong the first way I tried to do it.)