Path: csiph.com!news.mixmin.net!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse Newsgroups: comp.programming Subject: Re: Another little puzzle Date: Sun, 01 Jan 2023 01:10:57 +0000 Organization: A noiseless patient Spider Lines: 111 Message-ID: <87zgb3giu6.fsf@bsb.me.uk> References: <87tu1diu2s.fsf@bsb.me.uk> <864jtdtkt5.fsf@linuxsc.com> <87o7rlhtsv.fsf@bsb.me.uk> <878rioifnh.fsf@bsb.me.uk> <868rinskhk.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Info: reader01.eternal-september.org; posting-host="f378404b147b12236a02e89857bb1e08"; logging-data="1238596"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PwnRFEtm3txKsOUhscZCy4xcMPcdswbc=" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cancel-Lock: sha1:gGihyavq2wincnXU7lLqEZmQeV0= sha1:jgXMQneFYKU/JNV2TIdjwZAmML0= X-BSB-Auth: 1.7df55a1eb81ee03f2564.20230101011057GMT.87zgb3giu6.fsf@bsb.me.uk Xref: csiph.com comp.programming:16204 Tim Rentsch writes: > Ben Bacarisse writes: > >> Ben Bacarisse writes: >> >>> Tim Rentsch writes: >> >> A couple of further thoughts only one of which is directed at you >> Tim (at the end)... >> >> Given the general conception of a mean (rather than any other kind of >> summary statistic) as minimising the sum of squares of some "distance" >> metric: >> >> Sum_{i=1,n} difference(A, t(i))^2 >> >> we can characterise the two contenders by what distance is being >> minimised. For the less well discussed "conventional average" (to use >> your terminology) we are minimising the sum of squares of the shorter >> arc lengths between A and the t(i). >> >> For the "vector average", we convert the t(i) to unit vectors u(i) and >> we calculate the mean if the u(i) to get a vector m. The "average", A, >> is just the direction of this vector -- another point on the unit >> circle. In this case we are minimising the sum of squares of the >> /chord/ lengths between A and the t(i). > > I think of this approach differently. I take the time values > t(i) as being unit masses on the unit circle, and calculate the > center of mass. As long as the center of mass is not the origin > we can project it from the origin to find a corresponding time > value on the unit circle (which in my case is done implicitly by > using atan2()). It's the calculation though. >> The mean vector itself (which may not lie on the unit circle) minimises >> the sum of the squares of the length of the vector differences m-u(i): >> >> Sum_{i=1,n} |m - u(i)|^2 > > That's true but I thought of it just as the average of the time > value "masses" on the unit circle. > >> and any other vector along the same line as m (i.e. c*m for real c) >> minimises >> >> Sum_{i=1,n} |c*m - u(i)|^2 > > I'm not sure what you're saying here. Only the one point (m in > your terminology) minimizes the sum of squares of distances. How > do other points on the same line qualify? I over-simplified by omitting "compared to any other point on that circle". The mean vector is the absolute minimum, but at any radius |c*m| the sum above is at a minmum at c*m. >> This includes, of course, m projected out to the unit circle. >> >> This distinction between arc lengths and chord lengths helps to >> visualise where these averages differ, and why the conventional >> average may seem more intuitive. > > Interesting perspective. I wouldn't call them chord lengths > because I think of a chord as being between two points both on > the same circle, m (the vector) is the centre of mass, but for some c, c*m lies on the circle. All the lines from the data points to c*m are actual chords. The average "time", A, is the point on the circle that minimises the sum of squares of the cords between A the t(i). The "other" average, minimises the sum of the of the angular distances. > and the center of mass is never on the unit > circle (not counting the case when all the time values are the > same). Even so it's an interesting way to view the distinction. It's more interesting because it really is about chords! >> Incidentally, I found another book on statistics on spheres, and that >> gets the average over in a few paragraphs. It states, without >> considering any alternatives, that the average is the vector average. I >> can't find anyone using or citing the arc-length minimising average, >> despite it being natural on a sphere to find the mid-point of, say, a >> set of points on the Earth's surface. > > Do you mind my asking, which book was that? http://library.sadjad.ac.ir/opac//temp/19108.pdf > Now that I think about it, finding the point that minimizes the > great circle distances squared would be at least computationally > unpleasant. But it could be relevant in some contexts (thinking > in particular of astronautics, or, dare I say it, rocket science). > >> Tim, my best shot at calculating this other average sorts the points to >> find the widest gap. I suspect your algorithm is similar since you say >> it is O(n log n). > > Your instinct that I sorted the time values is right. If you > think a little more I think you will see that the widest gap need > not have any particular relationship to where the cut point is. > No further elaboration for now so as not to be a spoiler (and I > know you like to try figuring things out yourself). You must be confusing me with someone else! -- Ben.