Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.graphics.apps.gnuplot > #223 > unrolled thread
| Started by | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| First post | 2011-04-08 13:05 +0200 |
| Last post | 2011-04-11 22:13 +0200 |
| Articles | 14 — 3 participants |
Back to article view | Back to comp.graphics.apps.gnuplot
Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-08 13:05 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-08 17:18 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Hans-Bernhard Bröker <HBBroeker@t-online.de> - 2011-04-10 19:31 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Charles Allen <ca137tmp@earthlink.net> - 2011-04-10 20:14 -0500
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-14 15:41 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-11 21:42 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Hans-Bernhard Bröker <HBBroeker@t-online.de> - 2011-04-11 23:27 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-12 11:28 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Hans-Bernhard Bröker <HBBroeker@t-online.de> - 2011-04-16 00:03 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-16 18:06 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-14 15:07 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-15 13:47 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Hans-Bernhard Bröker <HBBroeker@t-online.de> - 2011-04-10 19:04 +0200
Re: Fitting: How does gnuplot calculate the covariance matrix? Ingo Thies <ingo.thies@gmx.de> - 2011-04-11 22:13 +0200
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-08 13:05 +0200 |
| Subject | Fitting: How does gnuplot calculate the covariance matrix? |
| Message-ID | <9088euFi3iU1@mid.individual.net> |
Hi all,
I am trying to find out how exactly gnuplot calculates the covariance
matrix for a fitted function.
The background: I am still mistrusting the error ellipses one gets from
the eigenvalues. As already mentioned last year (aroung May or so), I
suspect that the resulting error contours are underestimated since the
best-fit chi^2 is assumed to be zero there, but it isn't.
I have compared the error ellipses with the true error contours from an
independent chi^2-2D-mapping routine, and found that the latter may be
considerably larger than the ones one would get from gnuplot's cov matrix.
However, basic information about covariance (e.g. from Wikipedia) are
misleading here, since they only deal with the covariance of measured
data themselves, not the errors. Also I didn't manage to get a clear
image from the sourcecode yet.
For the long term, it would be helpful to include the basic formulas and
the most important caveats and pitfalls (e.g. how may I trust the
results or not) into the gnuplot.pdf.
In our institute, there are people working with gnuplot and its fitting
algorihm, but lack of important information about how they are working
in detail.
Many thanks for any helpful reply!
--
Gruß,
Ingo
[toc] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-08 17:18 +0200 |
| Message-ID | <908n9pF2jaU1@mid.individual.net> |
| In reply to | #223 |
On 08.04.2011 13:05, I wrote:
> The background: I am still mistrusting the error ellipses one gets from
> the eigenvalues. As already mentioned last year (aroung May or so), I
> suspect that the resulting error contours are underestimated since the
> best-fit chi^2 is assumed to be zero there, but it isn't.
Another point seems to be even more problematic, that is the usage of
the "asymptotic standard error".
For example, one can use the 1-sigma error contour as a min-max
estimator for the error of a,b in case of a two-parameter fit. I have
done this for e.g. a linear fit, f(x)=a+b*x, and compared the results
from gnuplot with those from an independent approach. While the best-fit
values as well as the chi^2 and the a,b correlation values are the same
within the visibile digits (these aren't very much in gnuplot, though),
the error estimate from the 1-sig contour is typically by an order of
magnitude larger than the asymptotic standard error.
One could argue, the outline of the error contour overestimates the
error (since error ellipses are often highly elongated but slim along
their minor axis; especially for |cor_ab| \approx 1), but in the test
case, this would only reduce the a,b erros only by a factor of about 1/2.
The gnuplot.pdf states that the asymptotic error underestimates the true
error, but it wasn't clear to me up to now that it could do this by such
a vast amount.
Here are sample data I have been using for the test:
#x y dy
0.000000 1.078039 0.100000
0.200000 1.012137 0.100000
0.400000 0.994650 0.100000
0.600000 1.210933 0.100000
0.800000 1.228788 0.100000
1.000000 1.279018 0.100000
1.200000 1.510823 0.100000
1.400000 1.466704 0.100000
1.600000 1.523061 0.100000
1.800000 1.768180 0.100000
2.000000 1.894179 0.100000
2.200000 2.006980 0.100000
2.400000 1.994247 0.100000
2.600000 2.212952 0.100000
2.800000 2.123002 0.100000
3.000000 2.341036 0.100000
3.200000 2.321149 0.100000
3.400000 2.469557 0.100000
3.600000 2.310980 0.100000
3.800000 2.451646 0.100000
4.000000 2.652370 0.100000
Gnuplot finds:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 0.948209 +/- 0.04083 (4.306%)
b = 0.427096 +/- 0.01746 (4.089%)
correlation matrix of the fit parameters:
a b
a 1.000
b -0.855 1.000
while an independent approach finds
a = 9.482095E-01 -/+-1.947044E-01, 1.947043E-01
b = 4.270960E-01 -/+-8.327496E-02, 8.327495E-02
chi^2_red = 0.939417
cor_ab =-0.855398
i.e. the a error is +/- 0.195 instead of +/- 0.04, while the b error is
+/- 0.083 instead of +/- 0.017
Using the errors of each only for the other parameter at its best-fit
value, the errorbars are about +/- 0.1 and +/-0.05, respectively.
I did not do a large study of fitting functions, but the given results
suggest, in my opinion, that the asymptotic standard error should not be
used for an error discussion at all.
Instead, I would recommend to calculate the error ellipse from the
(eigenvalues of the) covariance matrix, corrected for non-zero minimum
chi^2, and use it as an (additional) error estimate.
Any opionions?
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Hans-Bernhard Bröker <HBBroeker@t-online.de> |
|---|---|
| Date | 2011-04-10 19:31 +0200 |
| Message-ID | <90e7s7F9faU1@mid.dfncis.de> |
| In reply to | #224 |
On 08.04.2011 17:18, Ingo Thies wrote: > Another point seems to be even more problematic, that is the usage of > the "asymptotic standard error". That's actually not another point, it's exactly the same. > Gnuplot finds: > > Final set of parameters Asymptotic Standard Error > ======================= ========================== > > a = 0.948209 +/- 0.04083 (4.306%) > b = 0.427096 +/- 0.01746 (4.089%) > > > correlation matrix of the fit parameters: > > a b > a 1.000 > b -0.855 1.000 > > > while an independent approach finds > > a = 9.482095E-01 -/+-1.947044E-01, 1.947043E-01 > b = 4.270960E-01 -/+-8.327496E-02, 8.327495E-02 Those errors are _way_ too big. Do yourself a favour and plot your data along with the model, using parameters modified by those errors: [assuming 'set fit errorvar' active, and fit done:] gnuplot> p 'fit.dat' u 1:2:3 w err, a*x+b w l gnuplot> rep (a+a_err)*x+b+b_err, (a-a_err)*x+b-b_err gnuplot> a_thies=0.195 gnuplot> b_thies=.083 gnuplot> rep (a+a_thies)*x+b+b_thies, (a-a_thies)*x+b-b_thies You'll see that gnuplot's a_err and b_err yield a corridor somewhat tightly containing almost all data points and their errors, just like it should be. Your independent approach defines a corridor that is way larger than it needs to be. Particularly the error on 'a' leads to a massive overestimation of the data errors at large 'x'. E.g. at x==4, the model with those parameter errors yields an interval of 2.6 +/- 0.8, while your data claimed 2.6 +/- 0.1
[toc] | [prev] | [next] | [standalone]
| From | Charles Allen <ca137tmp@earthlink.net> |
|---|---|
| Date | 2011-04-10 20:14 -0500 |
| Message-ID | <slrniq4lf3.4pm.ca137tmp@mobile.local> |
| In reply to | #227 |
> On 08.04.2011 17:18, Ingo Thies wrote: >> Another point seems to be even more problematic, that is the usage of >> the "asymptotic standard error". Hans-Bernhard Bröker: > Those errors are _way_ too big. Do yourself a favour and plot your data > along with the model, using parameters modified by those errors: Perhaps there is confusion about "standard error" (error on the mean) versus standard deviation. The factor of 1/sqrt(21) (or however many data points there were) seems about right. Gnuplot is reporting the error on the mean, the other program may be reporting the standard deviation. -- Charles Allen
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-14 15:41 +0200 |
| Message-ID | <90obscFk42U1@mid.individual.net> |
| In reply to | #228 |
On 11.04.2011 03:14, Charles Allen wrote:
> Perhaps there is confusion about "standard error" (error on the mean)
> versus standard deviation. The factor of 1/sqrt(21) (or however many
> data points there were) seems about right. Gnuplot is reporting the
> error on the mean, the other program may be reporting the standard
> deviation.
Hmm, I have made some tests. If a,b are strongly correlated (happens
e.g. when the x values are far off the x0 in a formula like
f(x)=a+b*(x-x0), then the error contour of (a,b) is a strongly elongated
"banana" (an ellipse in approximation). In the given case, x0=0. Then,
if you vary a with fixed b=b_best and also vary b with fixed a=a_best,
the resulting errors are about the same as in gnuplot. If, however,
x=xmean (here =2), so that the a-b-correlation vanishes, these errors
increase to the full errors you get from the full extent of the error
ellipse. These are much larger then the errors from gnuplot.
But, tell me, if I have non-correlated a,b, thus an error circle, what
else then the upper and lower rim in both a and b direction of the
1-sigma-circle should I take as the standard deviation of a and b?
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-11 21:42 +0200 |
| Message-ID | <90h3ttF39uU1@mid.individual.net> |
| In reply to | #227 |
Am 2011-04-10 19:31, schrieb Hans-Bernhard Bröker:
>> a = 9.482095E-01 -/+-1.947044E-01, 1.947043E-01
>> b = 4.270960E-01 -/+-8.327496E-02, 8.327495E-02
>
> Those errors are _way_ too big. Do yourself a favour and plot your data
> along with the model, using parameters modified by those errors:
Well, I plotted the enveloping contours for all (a,b) sets on the
1-sigma contour, and found that the resulting two curves nicely embrace
the majority of data points, and the "error pipe" is about the same size
as the errorbars. And along this 1-sigma contour a and b walk through
the error intervals above.
In particular, the upper right figure in this poster has been created
this way:
<http://www.iac.es/congreso/constellation10/media/posters/Thies_poster.pdf>
Please note that the error pipe is based on the 1 and 2 sigma level
corresponding to chi^2-chi_min^2 rather than chi^2 itself.
>
> [assuming 'set fit errorvar' active, and fit done:]
>
> gnuplot> p 'fit.dat' u 1:2:3 w err, a*x+b w l
You mixed up a and b here; it is actually f(x)=a+b*x
> gnuplot> rep (a+a_err)*x+b+b_err, (a-a_err)*x+b-b_err
> gnuplot> a_thies=0.195
> gnuplot> b_thies=.083
> gnuplot> rep (a+a_thies)*x+b+b_thies, (a-a_thies)*x+b-b_thies
>
> You'll see that gnuplot's a_err and b_err yield a corridor somewhat
> tightly containing almost all data points and their errors, just like it
> should be.
The problem hier is that a and b are anti-correlated (the correlation
coefficient in the lower-left of the matrix is negative), i.e. the
larges b corresponds to the smallest a and vice-versa. The only viable
way to plot the error pipe (or corridor) is to use the actual (a,b)
tuples from the 1 (or 2, 3 etc.) sigma contour. Or, at least, combine
the largest a with the smallest b:
[...]
gnuplot> rep a+a_thies+(b-b_thies)*x, a-a_thies+(b+b_thies)*x
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Hans-Bernhard Bröker <HBBroeker@t-online.de> |
|---|---|
| Date | 2011-04-11 23:27 +0200 |
| Message-ID | <4DA3725D.3040306@t-online.de> |
| In reply to | #234 |
On 11.04.2011 21:42, Ingo Thies wrote: > Am 2011-04-10 19:31, schrieb Hans-Bernhard Bröker: > >>> a = 9.482095E-01 -/+-1.947044E-01, 1.947043E-01 >>> b = 4.270960E-01 -/+-8.327496E-02, 8.327495E-02 >> Those errors are _way_ too big. Do yourself a favour and plot your data >> along with the model, using parameters modified by those errors: > > Well, I plotted the enveloping contours for all (a,b) sets on the > 1-sigma contour, So, was a comparison of the model function's values for parameter pairs (a,b), (a+sigma_a, b+sigma_b) and (a-sigma_a, b-sigma_b) at x=4.0 to the data value of y=2.65 +/- 0.1 part of that study? Did you notice that you get a model corridor [after correcting my a<-->b mixup] from 2.13 to 3.18, more than 5 times wider than the error in the data? Or, to keep it short: did you even look at the plot I described? What's actually so wrong with the error corridor you get from gnuplot alone? > And along this 1-sigma contour a and b walk through > the error intervals above. So which parameters of the error ellipse did you give above: horizontal and vertical section at the center, or overall size of the enclosed, axis-aligned bounding box? >> [assuming 'set fit errorvar' active, and fit done:] >> >> gnuplot> p 'fit.dat' u 1:2:3 w err, a*x+b w l > > You mixed up a and b here; it is actually f(x)=a+b*x Yep. Sorry about that. But even with that mixup repaired, the message is still the same. a_thies and b_thies are similar enough to each other for that... As to using opposite-sign combinations of deviations on the parameters --- that's just as bad, but in a different way, since it completely underestimates the data errors in the middle of the region. The errors you get from gnuplot achieve a much more sensible error band with a whole lot less hassle.
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-12 11:28 +0200 |
| Message-ID | <90ik9hF7p0U1@mid.individual.net> |
| In reply to | #236 |
On 11.04.2011 23:27, Hans-Bernhard Bröker wrote:
> So, was a comparison of the model function's values for parameter pairs
> (a,b), (a+sigma_a, b+sigma_b) and (a-sigma_a, b-sigma_b) at x=4.0 to the
> data value of y=2.65 +/- 0.1 part of that study? Did you notice that
> you get a model corridor [after correcting my a<-->b mixup] from 2.13 to
> 3.18, more than 5 times wider than the error in the data?
Well, look at
http://www.astro.uni-bonn.de/~ithies/gnuplot/temp/l2envelope.eps
and the (a,b) contours (for 1,2 and 3 sigma)
http://www.astro.uni-bonn.de/~ithies/gnuplot/temp/l2contour.eps
A colleague actually used gnuplot's errorbars for a linear fit to a set
of about 100 data points. The resulting error contours were *tiny*, but
were OK with the independent approach.
> Or, to keep it short: did you even look at the plot I described? What's
> actually so wrong with the error corridor you get from gnuplot alone?
My concern here ist that you set the y-offsets of the error lines rather
arbitrary. This approach assumes that the errors are smallest at the
lower end of the xrange, and largest at the upper end, since the error
corridor always opens for larger x. This may be true, if e.g. the errors
are Poisson errors, but it certainly does not have to.
The only honest way, I think, is to calculate a large sample of (a,b)
along the 1-sigma ellipse (or, more precisely, the 1-sigma contour
calculated in a full chi^2 test), and calculate the enveloping curves
for all these resulting functions.
BTW Did you even look at the poster figure I have linked to, where I
have done exactly this (for a small sample, however)?
>> And along this 1-sigma contour a and b walk through
>> the error intervals above.
>
> So which parameters of the error ellipse did you give above: horizontal
> and vertical section at the center, or overall size of the enclosed,
> axis-aligned bounding box?
All (a,b) at small steps around the ellipse.
> As to using opposite-sign combinations of deviations on the parameters
> --- that's just as bad, but in a different way, since it completely
> underestimates the data errors in the middle of the region.
A first approach then is, just to connect the endpoints of each of these
opposite-sign lines, and use the resulting trapezium as error corridor.
However, this slightly overestimates the errors in the middle.
> The errors you get from gnuplot achieve a much more sensible error band
> with a whole lot less hassle.
This may be true for a moderate number of samples. However, for large
number of samples, as my colleague did, the error corridor is way too
small. By nearly an order of magnitude or so.
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Hans-Bernhard Bröker <HBBroeker@t-online.de> |
|---|---|
| Date | 2011-04-16 00:03 +0200 |
| Message-ID | <90rtluF8muU1@mid.dfncis.de> |
| In reply to | #240 |
On 12.04.2011 11:28, Ingo Thies wrote: > My concern here ist that you set the y-offsets of the error lines rather > arbitrary. Where do you think I did anything arbitrary? The y offset of the fittet line is a result of the fit, including an error. So constructing the error corridor has to include varying the fitted parameter "y offset" across it's fitted error range. > This approach assumes that the errors are smallest at the > lower end of the xrange, and largest at the upper end, since the error > corridor always opens for larger x. Yes, that's what this particular model function does. That was your choice, not mine. A different model, e.g. a*(x-x0), might yield a different behaviour. > BTW Did you even look at the poster figure I have linked to, where I > have done exactly this (for a small sample, however)? For the context of this discussion, that poster figure is really quite meaningless. It has neither the same data, nor the same model function as what we've been discussing. >>> And along this 1-sigma contour a and b walk through >>> the error intervals above. >> So which parameters of the error ellipse did you give above: horizontal >> and vertical section at the center, or overall size of the enclosed, >> axis-aligned bounding box? > > All (a,b) at small steps around the ellipse. A misunderstanding. I was asking: where in that ellipse did those numbers your quote as errors of a and b come from? > This may be true for a moderate number of samples. However, for large > number of samples, as my colleague did, the error corridor is way too > small. By nearly an order of magnitude or so. Let's see some actual test case to demonstrate that.
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-16 18:06 +0200 |
| Message-ID | <90tt4sFj9bU1@mid.individual.net> |
| In reply to | #269 |
Am 2011-04-16 00:03, schrieb Hans-Bernhard Bröker:
> Where do you think I did anything arbitrary?
The assumption that the largest a (i.e. a+a_err) corresponds to the
largest b (b+b_err) is arbitrary and actually wrong if the a,b are
anticorrelated. Rather, if b_max=b+b_err is given, the corresponding a'
is minimizing chi^2 for this condition (see Numerical Recipes in
C/Fortran, Chapter 15.6). This can be done by following the
68.3%-probability ("1 sigma") contour until b becomes maximal. If
anticorrelated, a' will be closer to a-a_err rather than a or even a+a_err.
My own fault just was to pick the wrong degree of freedom (dfree). While
the goodness-of-fit estimation requires dfree=ndata-npar (=21-2=19 in
our example), the confidence ellipse for combined (a,b) needs
dfree=npar=2, and for separate a and b errors, dfree=1. Then the
ellipses (or, more generally, contours, since they are not necessarily
perfect ellipses) are smaller than the (a,b) confidence levels, but give
you the position of an extreme parameter and its corresponding
counterpart geometrically. If you have a copy of the Numerical Recipes
in your library, Figure 15.6.4. gives a good illustration for what's
going on.
After correcting the degrees of freedom and redoing the fit for
chi^2=ndata-npar, the errors are now nearly identical to those of
gnuplot, at least, if the error contours are near-elliptical. If they're
not (i.e. errors are not Gaussian), the errors are larger, but probably
less meaningful.
> Yes, that's what this particular model function does. That was your
> choice, not mine. A different model, e.g. a*(x-x0), might yield a
> different behaviour.
If the error corridors are calculated by following the confidence
ellipses, they are indeed independent from the choide of x0, although
the best-fit a and its errors change.
>> All (a,b) at small steps around the ellipse.
>
> A misunderstanding. I was asking: where in that ellipse did those
> numbers your quote as errors of a and b come from?
The extremal a-values minus a_bestfit are taken as a-errors, and the
same with b.
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-14 15:07 +0200 |
| Message-ID | <90o9s9F5aiU1@mid.individual.net> |
| In reply to | #227 |
On 10.04.2011 19:31, Hans-Bernhard Bröker wrote:
> gnuplot> p 'fit.dat' u 1:2:3 w err, a*x+b w l
> gnuplot> rep (a+a_err)*x+b+b_err, (a-a_err)*x+b-b_err
> gnuplot> a_thies=0.195
> gnuplot> b_thies=.083
> gnuplot> rep (a+a_thies)*x+b+b_thies, (a-a_thies)*x+b-b_thies
I have modified the model function to minimize the correlation of a and
b. This has been done by substituting x by x-x0, where x0 is the mean of
all x (2 in this case). Here is the full script, to be applied to the
same data table ('fit.tab'). To enhance visibility, I have given
different colors for the different error types (blue for gnuplot's, red
for mine). Please look again at the errors. Are they still _way_ too
large? Or aren't gnuplot's errors now _way_ too small?
set fit errorvar
#x0=0.; a_thies=0.195; b_thies=0.083 # old fit
x0=2.; a_thies=0.101; b_thies=0.083 # new fit
f(x)=a+b*(x-x0)
set xrange [0:4]
set yrange [0.5:3]
fit [0:4] f(x) xfile u 1:2:3 via a,b
set key left
p a+b*(x-x0) w l lc rgb '#0000ff',\
a-a_err+(b-b_err)*(x-x0) lc rgb '#007fff',\
a+a_err+(b+b_err)*(x-x0) lc rgb '#007fff',\
a-a_thies+(b-b_thies)*(x-x0) lc rgb '#ff0000',\
a+a_thies+(b+b_thies)*(x-x0) lc rgb '#ff0000',\
'fit.dat' u 1:2:3 w err lc rgb '#000000'
I would find it highly confusing if the error corridors depend on a
simple substitution. If one plotts the a,b error contours, and uses them
for the error estimate (i.e. walks around the 1-sigma contour in the a-b
space and creates the enveloping curve for all possible resulting
functions), then the "way-too-large" error for the slope b remains
constant. Only the error for the offset a changes, as expected.
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-15 13:47 +0200 |
| Message-ID | <90qpj4FrbmU1@mid.individual.net> |
| In reply to | #227 |
On 10.04.2011 19:31, Hans-Bernhard Bröker wrote:
>> a = 9.482095E-01 -/+-1.947044E-01, 1.947043E-01
>> b = 4.270960E-01 -/+-8.327496E-02, 8.327495E-02
>
> Those errors are _way_ too big. Do yourself a favour and plot your data
> along with the model, using parameters modified by those errors:
I have now found the reason for the large errorbars. I simply had used
the wron degree of freedom for calculating the 68.3% probability ("1
sigma") contours in the (a,b) space. It should be 2 (2 params) for the
contours, and even only 1 for a single contouf (cf Numerical Recipes
chapter 15.6), but I used ndata-2, which is only valid for the
goodness-of-fit estimate. Now the errors are about the same as gnuplot's
given a good fit (i.e. \chi^2 \approx ndata-2).
Gnuplot seems to assume a good fit always, so the a-b errors do not vary
if all sigma(y_i) are scaled by the same factor. Maybe this is a
reasonable assumption since otherwise the whole fit is in question.
I hope I could repair a bit of the confusion I caused with this thread ;-)
--
Gruß,
Ingo
[toc] | [prev] | [next] | [standalone]
| From | Hans-Bernhard Bröker <HBBroeker@t-online.de> |
|---|---|
| Date | 2011-04-10 19:04 +0200 |
| Message-ID | <4DA1E330.20903@t-online.de> |
| In reply to | #223 |
On 08.04.2011 13:05, Ingo Thies wrote: > I am trying to find out how exactly gnuplot calculates the covariance > matrix for a fitted function. It does it basically the same way any simple fitting program does it: directly from the covariance matrix. What you're looking at are the roots of the diagonal elements of the covariance matrix. The only major difference is that gnuplot doesn't treat the data errors as more believable than the actual distance between model and data. This is done by normalizing the covariances by reduced chisquare, effectively reducing the meaning of the data errors to that of _weights_ for individual data points. So you get the same parameter errors (and correlations) from multiple fits to the same data, but with re-scaled errors: fit f(x) 'foo.dat' u 1:2:($3) via ... fit f(x) 'foo.dat' u 1:2:($3*100) via ... fit f(x) 'foo.dat' u 1:2:($3/100) via ... all yield the same fit. > The background: I am still mistrusting the error ellipses one gets from > the eigenvalues. As already mentioned last year (aroung May or so), I > suspect that the resulting error contours are underestimated since the > best-fit chi^2 is assumed to be zero there,but it isn't. Nobody expects chi^2 to be zero. The expectation is for the reduced chisquare (chisq/ndf) to be about 1.0. > I have compared the error ellipses with the true error contours from an > independent chi^2-2D-mapping routine, and found that the latter may be > considerably larger than the ones one would get from gnuplot's cov matrix. Just in case: please note gnuplot's output is a correlation matrix, not a covariance matrix.
[toc] | [prev] | [next] | [standalone]
| From | Ingo Thies <ingo.thies@gmx.de> |
|---|---|
| Date | 2011-04-11 22:13 +0200 |
| Message-ID | <90h5nvFh1hU1@mid.individual.net> |
| In reply to | #226 |
Am 2011-04-10 19:04, schrieb Hans-Bernhard Bröker:
> Nobody expects chi^2 to be zero. The expectation is for the reduced
> chisquare (chisq/ndf) to be about 1.0.
You might have misunderstood me. If you want to calculate the 1-sigma
ellipse from the actual chi^2 in the (a,b) map (this cannot be done with
gnuplot yet, but with a simple independent program), one has to
transform the chi^2 into the exclusion probability (via the incomplete
gamma function and the degree of freedom), and then transform this into
sigma via the (inverse) error function.
But before you throw the chi^2 into this algorithm, you have to
substract the chi_min^2, the chi^2 of the best-fit (a,b). I.e. "0 sigma"
corresponds to chi^2=chi_min^2, not 0. Otherwise you get too small error
contours, and for bad fits, the 1-sigma contour might not even exist.
I have to admit that I had indeed neglected to do this correction in
Thies & Kroupa (2007), Figure 9. Plotted there are the 95% and 99% (i.e.
2 and 2.6 sigma) contours for the fitted parameters of four initial mass
functions (IMF). The fit to the Taurus IMF was relatively poor, with the
probability being >95%, making the 2-sigma contour disappear without the
correction mentioned above (however, the main result is not affected by
this lapsus). This shows the consequence of missing this correction.
Here the paper on arxiv: <http://arxiv.org/abs/0708.1764>
What has all this to do with gnuplot?
Well, if the errors of the data are near-Gaussian, you can approximate
the error contour in the (a,b) space by an ellipse. The axis vectors of
this ellipse are, as far as I remember, the eigenvectors of the
covariance matrix which is closely related to the correlation matrix by
multiplying/dividing with/by the product of the standard errors of a and b.
However, the error corridor you get from walking around the error
ellipse and picking (a,b) and use it in f(x), is too narrow. As far as I
remember, it is similar to the 1-sigma (a,b) error contour if chi_min^2
is not subtracted. This, the other way round, corresponds to too small
a,b standard errors.
--
Gruß,
Ingo
[toc] | [prev] | [standalone]
Back to top | Article view | comp.graphics.apps.gnuplot
csiph-web