Path: csiph.com!news.mixmin.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.programming Subject: Re: Another little puzzle Date: Sat, 24 Dec 2022 06:21:40 -0800 Organization: A noiseless patient Spider Lines: 112 Message-ID: <868riwzxu3.fsf@linuxsc.com> References: <86pmcczcak.fsf@linuxsc.com> <87wn6krsc5.fsf@bsb.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: reader01.eternal-september.org; posting-host="4fac2999906db85e6e2faf120b1d87d4"; logging-data="2421808"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DN8yZtSnUCcUP5S7efl4dsN3EgmViqlI=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:CgFytGo5aZI0OfhFxjx1rhT0NkY= sha1:eh+jLu61AwtljfsO/cXw1T80gKs= Xref: csiph.com comp.programming:16134 Ben Bacarisse writes: > Tim Rentsch writes: > >> ram@zedat.fu-berlin.de (Stefan Ram) writes: >> >>> ram@zedat.fu-berlin.de (Stefan Ram) writes: >>> >>>> Given n times of the 24-hour day, print their average. >>>> For example, the average of "eight o'clock" and >>>> "ten o'clock" (n=2) would be "nine o'clock". >>>> (You can choose any representation, for example "HH:MM" >>>> or "seconds since midnight".) >>> >>> Thanks for all replies! >>> >>> I waited a few days before answering to allow >>> sufficient time to think about the problem. >>> >>> There were not enough tests written and run. As a result, >>> the puzzle has not yet been solved (unless I have overlooked >>> a contribution or misworded expectations). >>> >>> So, here are two possible test cases. >>> >>> average( 23.5, 1.5 )== 0.5 >>> average( 11.5, 13.5 )== 12.5 >>> >>> (I use hours as units, so "0.5" means, "half past midnight".) >>> >>> I hope that these test cases encode sensible expectations >>> for an average of two times on a 24-hour clock in the spirit >>> of the example given in the OP, which was, "the average of >>> eight o'clock and ten o'clock would be nine o'clock", since >>> these test cases just have rotated that example by 3.5 and >>> 15.5 hours. >>> >>> I believe that I have not seen an algorithm so far in this >>> thread that would pass these tests. >> >> As before, the problem is underspecified. > > Some remarks not specifically in reply to you, Tim... I didn't really say very much. Your comments are a lot more interesting. > The input is a collection, t(n), of n > 1 numbers in [0, 24). The > average should be a number, A, in [0, 24) that minimises > > Sum_{i=1,n} distance(A, t(i)) > > (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms > of variance). So far, this is just what an average is. The key point > is what is the distance (or difference) whose sum (or sum of squares) > we want to minimise? For times, I would say it is the length of the > shorter arc round an imaginary 24-hour clock face. Minimizing the square of the distance (aka length of the shorter arc) is one metric, and I think a reasonable one. (Minimizing the absolute value of the distance gives a result more like the median than the mean, IIANM.) > The problem has a natural interpretation in terms of angles. Whatever > the circular quantity is, convert the values to unit vectors round a > circle. For times of day, just scale [0, 24) to [0, 2*pi). The > average is then just the direction of the average vector, converted > back to a time of day. Averaging the "time unit vectors" is another plausible metric. > Sometimes that vector has zero length, and the average is undefined, > but otherwise the length of the vector gives an indication of the > variability of the data. > > Why do I consider this a reasonable interpretation of the problem? > Well, given a list of times of day when a train is observed to pass > some station, the circular 24-hour-time average should be our best > estimate of the scheduled time. > > Obviously there are other possible readings of the problem, but I was > not able to justify any of them as useful for any real-world > applications. This is a case where I hope I am wrong and there /are/ > other circular averages with practical interpretations. A nice analysis. Some observations (all of which I believe are correct, but no guarantees are offered). The "vector average" metric is easier to calculate and gives an explicit indication when the data produces a degenerate result. Considering only cases that do not produce a degenerate result: (1) the "conventional average" metric and the "vector average" metric can produce different results when there are more than two data points. (For one or two data points I think they are always the same.) (2) the "vector average" metric can be calculated in O(n). The "conventional average" metric can be calculated in O(n*n). (3) when the times are all reasonably close together, the "conventional average" metric seems more likely to produce a result corresponding to what most people expect. Another way to say this is that the "vector average" metric will sometimes surprise people. Incidentally, I had implemented a method that is isomorphic to the "vector average" metric (at least I think it is), and I was surprised to discover that it gave different results than the "conventional average" metric in some cases.