Path: csiph.com!news.mixmin.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse Newsgroups: comp.programming Subject: Re: Another little puzzle Date: Sat, 24 Dec 2022 22:30:55 +0000 Organization: A noiseless patient Spider Lines: 151 Message-ID: <875ye0qvs0.fsf@bsb.me.uk> References: <86pmcczcak.fsf@linuxsc.com> <87wn6krsc5.fsf@bsb.me.uk> <868riwzxu3.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Info: reader01.eternal-september.org; posting-host="e644e9458384d7404bf4d80462c60f1f"; logging-data="2516341"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18N72u2wLRwjyXabVm3gm7PWM0e9GV9Hkg=" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cancel-Lock: sha1:U/7X3Gd7Z5OjthLVBo5og4WNwnA= sha1:jMPkufLBxzM3pTv9A1+RxE77moM= X-BSB-Auth: 1.beed6468d71b93d5f1c6.20221224223055GMT.875ye0qvs0.fsf@bsb.me.uk Xref: csiph.com comp.programming:16144 Mike Terry writes: > On 24/12/2022 14:21, Tim Rentsch wrote: >> Ben Bacarisse writes: >> >>> Tim Rentsch writes: >>> >>>> ram@zedat.fu-berlin.de (Stefan Ram) writes: >>>> >>>>> ram@zedat.fu-berlin.de (Stefan Ram) writes: >>>>> >>>>>> Given n times of the 24-hour day, print their average. >>>>>> For example, the average of "eight o'clock" and >>>>>> "ten o'clock" (n=2) would be "nine o'clock". >>>>>> (You can choose any representation, for example "HH:MM" >>>>>> or "seconds since midnight".) >>>>> >>>>> Thanks for all replies! >>>>> >>>>> I waited a few days before answering to allow >>>>> sufficient time to think about the problem. >>>>> >>>>> There were not enough tests written and run. As a result, >>>>> the puzzle has not yet been solved (unless I have overlooked >>>>> a contribution or misworded expectations). >>>>> >>>>> So, here are two possible test cases. >>>>> >>>>> average( 23.5, 1.5 )== 0.5 >>>>> average( 11.5, 13.5 )== 12.5 >>>>> >>>>> (I use hours as units, so "0.5" means, "half past midnight".) >>>>> >>>>> I hope that these test cases encode sensible expectations >>>>> for an average of two times on a 24-hour clock in the spirit >>>>> of the example given in the OP, which was, "the average of >>>>> eight o'clock and ten o'clock would be nine o'clock", since >>>>> these test cases just have rotated that example by 3.5 and >>>>> 15.5 hours. >>>>> >>>>> I believe that I have not seen an algorithm so far in this >>>>> thread that would pass these tests. >>>> >>>> As before, the problem is underspecified. >>> >>> Some remarks not specifically in reply to you, Tim... >> I didn't really say very much. Your comments are a lot more >> interesting. >> >>> The input is a collection, t(n), of n > 1 numbers in [0, 24). The >>> average should be a number, A, in [0, 24) that minimises >>> >>> Sum_{i=1,n} distance(A, t(i)) >>> >>> (or Sum_{i=1,n} difference(A, t(i))^2 if you prefer to think in terms >>> of variance). So far, this is just what an average is. The key point >>> is what is the distance (or difference) whose sum (or sum of squares) >>> we want to minimise? For times, I would say it is the length of the >>> shorter arc round an imaginary 24-hour clock face. >> Minimizing the square of the distance (aka length of the shorter arc) >> is one metric, and I think a reasonable one. (Minimizing the absolute >> value of the distance gives a result more like the median than the >> mean, IIANM.) > > Yes - perhaps Ben was thinking of aking > > Sum_{i=1,n} signed_distance(A, t(i)) = 0 > > rather than minimising the (absolute) distance. That would match up > with minimising the sum of squares of the distance (mean). Or he just > meant the median as you say, which is different from the mean. Both > have plausible "average" properties, Well, I was trying to be deliberately vague (hence my talking about "an average" rather than mean, median or whatever) but I was no vague enough with the formula. I wasn't thinking of a zero net distance because some metrics (though probably not applicable in this situation) don't always allow for a zero sum. I should have referred to minimising some kind distance sum, be it the absolute value of a signed sum or, as in the solution I was proposing, the sum of directional differences (i.e. full vectors). > and meet my "minimal" criteria > for what I think anyone would require of an "average" applying to a > modular arithmetic: > > - if all x_i are within 5 minutes of each other [ALLOWING FOR WRAPPING > AROUND THE END OF THE MODULO LOOP] then the "average" will be within > those same 5 minutes. NOT e.g. 12 hours or 4 hours or whatever away. And when the x_i are widely spaced the average might be considered nearly meaningless. If you have {0, 8, 16} hours then what is the "average"? Are there cases where every set of circular values has a meaningful "average"? If not, you really need more than a single number answer. (These are not questions to you, specifically, I'm interested to know of interpretations that go beyond the way I see things.) > - if all x_i are translated by a fixed offset like 1 hour, the average > will translate by the same offset, i.e. it maintains the same "modulo" > relation with the translated x_i. > > hmm, thinking now I'll add the similar: > > - if all x_i are "reflected" about a chosen time, then the new average > will be the reflected time of the original average. > > (those aren't intended to be a definition of a "modular average", just > to exclude suggestions that blatently lack the right symmetry for such > an average.) > >>> The problem has a natural interpretation in terms of angles. Whatever >>> the circular quantity is, convert the values to unit vectors round a >>> circle. For times of day, just scale [0, 24) to [0, 2*pi). The >>> average is then just the direction of the average vector, converted >>> back to a time of day. >> Averaging the "time unit vectors" is another plausible metric. > > Yes I think this one would work best in most situations, given that > the OP didn't have a precise mathematical requirement for his solution > - probably he wants something that just works reasonably as expected. > (It's my favourite answer.) > > Another way of thinking of this approach is "balancing" the 24-hour > modular circle, where we've put unit weights at each of the times x_i. > E.g. we look for a balancing line through the centre of the circle. Is that the same? For {0, 8, 16} any of the three lines through 0, 8 or 16 balances, but the vector sum interpretation is undefined. And for {0, 6, 12, 18} there are two balance lines, neither of which seems to be a natural answer for the average. (Also, every balance line gives two answers, so how do you pick?) > [Note times on the circle go from 1-24, not 1-12 like a normal clock.] > Also, thinking like this, it's equivalent to (conventional) averaging > of the sin of the angular differences from the average (balancing) > point, since the sin will give the perpendicular distance of the > weight from the balancing line. So it still fits with the general > scheme of minimising distance(x_i, v)^2, but with a slightly different > metric (distance function) in use. > > As sin(x) is close to x when x is small, the vector average solution > would match closely with the mean (by arc length) solution when the > times are close together, but diverge more noticeably when the times > are spread out... -- Ben.