Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Ingo Thies <ingo.thies@gmx.de>
Newsgroups: comp.graphics.apps.gnuplot
Subject: Re: Fitting: How does gnuplot calculate the covariance matrix?
Date: Fri, 08 Apr 2011 17:18:17 +0200
Lines: 97
Message-ID: <908n9pF2jaU1@mid.individual.net>
References: <9088euFi3iU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: individual.net il7d8bKGC/BDMSYS2bZA1gilfeHkAo5oMuAfMjnGiO9ynYHgNt
Cancel-Lock: sha1:ziATYMNm4bR/xTnUYe1kx1Z75cI=
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8
In-Reply-To: <9088euFi3iU1@mid.individual.net>
Xref: x330-a1.tempe.blueboxinc.net comp.graphics.apps.gnuplot:224

On 08.04.2011 13:05, I wrote:

> The background: I am still mistrusting the error ellipses one gets from
> the eigenvalues. As already mentioned last year (aroung May or so), I
> suspect that the resulting error contours are underestimated since the
> best-fit chi^2 is assumed to be zero there, but it isn't.

Another point seems to be even more problematic, that is the usage of
the "asymptotic standard error".

For example, one can use the 1-sigma error contour as a min-max
estimator for the error of a,b in case of a two-parameter fit. I have
done this for e.g. a linear fit, f(x)=a+b*x, and compared the results
from gnuplot with those from an independent approach. While the best-fit
values as well as the chi^2 and the a,b correlation values are the same
within the visibile digits (these aren't very much in gnuplot, though),
the error estimate from the 1-sig contour is typically by an order of
magnitude larger than the asymptotic standard error.

One could argue, the outline of the error contour overestimates the
error (since error ellipses are often highly elongated but slim along
their minor axis; especially for |cor_ab| \approx 1), but in the test
case, this would only reduce the a,b erros only by a factor of about 1/2.

The gnuplot.pdf states that the asymptotic error underestimates the true
error, but it wasn't clear to me up to now that it could do this by such
a vast amount.

Here are sample data I have been using for the test:

#x              y             dy
0.000000        1.078039      0.100000
0.200000        1.012137      0.100000
0.400000        0.994650      0.100000
0.600000        1.210933      0.100000
0.800000        1.228788      0.100000
1.000000        1.279018      0.100000
1.200000        1.510823      0.100000
1.400000        1.466704      0.100000
1.600000        1.523061      0.100000
1.800000        1.768180      0.100000
2.000000        1.894179      0.100000
2.200000        2.006980      0.100000
2.400000        1.994247      0.100000
2.600000        2.212952      0.100000
2.800000        2.123002      0.100000
3.000000        2.341036      0.100000
3.200000        2.321149      0.100000
3.400000        2.469557      0.100000
3.600000        2.310980      0.100000
3.800000        2.451646      0.100000
4.000000        2.652370      0.100000

Gnuplot finds:

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

a               = 0.948209         +/- 0.04083      (4.306%)
b               = 0.427096         +/- 0.01746      (4.089%)


correlation matrix of the fit parameters:

               a      b
a               1.000
b              -0.855  1.000


while an independent approach finds

a = 9.482095E-01 -/+-1.947044E-01, 1.947043E-01
b = 4.270960E-01 -/+-8.327496E-02, 8.327495E-02

chi^2_red = 0.939417
cor_ab    =-0.855398

i.e. the a error is +/- 0.195 instead of +/- 0.04, while the b error is
+/- 0.083 instead of +/- 0.017

Using the errors of each only for the other parameter at its best-fit
value, the errorbars are about +/- 0.1 and +/-0.05, respectively.

I did not do a large study of fitting functions, but the given results
suggest, in my opinion, that the asymptotic standard error should not be
used for an error discussion at all.

Instead, I would recommend to calculate the error ellipse from the
(eigenvalues of the) covariance matrix, corrected for non-zero minimum
chi^2, and use it as an (additional) error estimate.

Any opionions?

-- 
Gruß,
      Ingo