Path: csiph.com!news.mixmin.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.programming
Subject: Re: Another little puzzle
Date: Sat, 24 Dec 2022 06:21:40 -0800
Organization: A noiseless patient Spider
Lines: 112
Message-ID: <868riwzxu3.fsf@linuxsc.com>
References: <86pmcczcak.fsf@linuxsc.com> <87wn6krsc5.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: reader01.eternal-september.org; posting-host="4fac2999906db85e6e2faf120b1d87d4"; logging-data="2421808"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19DN8yZtSnUCcUP5S7efl4dsN3EgmViqlI="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:CgFytGo5aZI0OfhFxjx1rhT0NkY= sha1:eh+jLu61AwtljfsO/cXw1T80gKs=
Xref: csiph.com comp.programming:16134
Ben Bacarisse writes:
> Tim Rentsch writes:
>
>> ram@zedat.fu-berlin.de (Stefan Ram) writes:
>>
>>> ram@zedat.fu-berlin.de (Stefan Ram) writes:
>>>
>>>> Given n times of the 24-hour day, print their average.
>>>> For example, the average of "eight o'clock" and
>>>> "ten o'clock" (n=2) would be "nine o'clock".
>>>> (You can choose any representation, for example "HH:MM"
>>>> or "seconds since midnight".)
>>>
>>> Thanks for all replies!
>>>
>>> I waited a few days before answering to allow
>>> sufficient time to think about the problem.
>>>
>>> There were not enough tests written and run. As a result,
>>> the puzzle has not yet been solved (unless I have overlooked
>>> a contribution or misworded expectations).
>>>
>>> So, here are two possible test cases.
>>>
>>> average( 23.5, 1.5 )== 0.5
>>> average( 11.5, 13.5 )== 12.5
>>>
>>> (I use hours as units, so "0.5" means, "half past midnight".)
>>>
>>> I hope that these test cases encode sensible expectations
>>> for an average of two times on a 24-hour clock in the spirit
>>> of the example given in the OP, which was, "the average of
>>> eight o'clock and ten o'clock would be nine o'clock", since
>>> these test cases just have rotated that example by 3.5 and
>>> 15.5 hours.
>>>
>>> I believe that I have not seen an algorithm so far in this
>>> thread that would pass these tests.
>>
>> As before, the problem is underspecified.
>
> Some remarks not specifically in reply to you, Tim...
I didn't really say very much. Your comments are a lot more
interesting.
> The input is a collection, t(n), of n > 1 numbers in [0, 24). The
> average should be a number, A, in [0, 24) that minimises
>
> Sum_{i=1,n} distance(A, t(i))
>
> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms
> of variance). So far, this is just what an average is. The key point
> is what is the distance (or difference) whose sum (or sum of squares)
> we want to minimise? For times, I would say it is the length of the
> shorter arc round an imaginary 24-hour clock face.
Minimizing the square of the distance (aka length of the shorter arc)
is one metric, and I think a reasonable one. (Minimizing the absolute
value of the distance gives a result more like the median than the
mean, IIANM.)
> The problem has a natural interpretation in terms of angles. Whatever
> the circular quantity is, convert the values to unit vectors round a
> circle. For times of day, just scale [0, 24) to [0, 2*pi). The
> average is then just the direction of the average vector, converted
> back to a time of day.
Averaging the "time unit vectors" is another plausible metric.
> Sometimes that vector has zero length, and the average is undefined,
> but otherwise the length of the vector gives an indication of the
> variability of the data.
>
> Why do I consider this a reasonable interpretation of the problem?
> Well, given a list of times of day when a train is observed to pass
> some station, the circular 24-hour-time average should be our best
> estimate of the scheduled time.
>
> Obviously there are other possible readings of the problem, but I was
> not able to justify any of them as useful for any real-world
> applications. This is a case where I hope I am wrong and there /are/
> other circular averages with practical interpretations.
A nice analysis.
Some observations (all of which I believe are correct, but no
guarantees are offered).
The "vector average" metric is easier to calculate and gives an
explicit indication when the data produces a degenerate result.
Considering only cases that do not produce a degenerate result:
(1) the "conventional average" metric and the "vector average"
metric can produce different results when there are more than two
data points. (For one or two data points I think they are always
the same.)
(2) the "vector average" metric can be calculated in O(n). The
"conventional average" metric can be calculated in O(n*n).
(3) when the times are all reasonably close together, the
"conventional average" metric seems more likely to produce a result
corresponding to what most people expect. Another way to say this
is that the "vector average" metric will sometimes surprise people.
Incidentally, I had implemented a method that is isomorphic to
the "vector average" metric (at least I think it is), and I was
surprised to discover that it gave different results than the
"conventional average" metric in some cases.