From: Alex van der Spek Subject: MSE of a linear regression line + prediction intervals Newsgroups: comp.graphics.apps.gnuplot MIME-Version: 1.0 User-Agent: Pan/0.149 (Bellevue; 4c157ba) Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Message-ID: Organization: KPN B.V. Date: Tue, 29 Aug 2023 14:54:18 +0200 Path: csiph.com!tncsrv06.tnetconsulting.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe005.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail Lines: 129 Injection-Date: Tue, 29 Aug 2023 14:54:18 +0200 Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com" X-Received-Bytes: 4493 Xref: csiph.com comp.graphics.apps.gnuplot:4539 I worked out a way to compute and plot the wide and narrow prediction intervals of a linear regression line. A self contained working plot script goes below. I have two questions. 1. It seems as if the MSE could have been computed by the stats command. It don't think it is so I put in a work around which is rather convoluted. Am I overlooking something? 2. The shading between the prediction intervals doesn't clip correctly at the plot boundaries. I believe this is a known issue for which no work around exists? It would be a nice to have only, currently I just plot the lines instead of a fill between curves (which is in the example code below). Thanks in advance, Alex +++++++++++++++++++++ #!/usr/bin/env gnuplot #Make a Weibull plot of DC motor data taken from Philips databook C18 (1986) #Clean slate unset multiplot unset out reset set term qt size 640,480 set encoding utf8 #Inline data N=20 $MDAT << EOF #Benard Hours Exact 0.034 1750 0.0341 0.083 2000 0.0825 0.132 2525 0.1315 0.181 2600 0.1805 0.230 2815 0.2297 EOF #Linear fit stats $MDAT using (log10(column(2))):(log(-log(1-column(1)))) n = STATS_records x_mean = STATS_mean_x; y_mean = STATS_mean_y x_std = STATS_stddev_x; y_std = STATS_stddev_y a = STATS_slope; b = STATS_intercept a_err = STATS_slope_err; b_err = STATS_intercept_err #Mean Square Error set table $MSED plot $MDAT using (($1)-(1-exp(-exp(a*log10($2)+b))))**2 unset table stats $MSED using 2 MSE = STATS_sum / (n-2) #Wide prediction intervals set table $WPINT set xrange [200:20000] set samples 1000 t = 3.18 plot '+' using ($1):(1-exp(-exp(a*log10($1)+b))) + t * \ sqrt(MSE * (1+1/n+(log10($1)-x_mean)**2/(n*x_std**2))): \ (1-exp(-exp(a*log10($1)+b))) - t * \ sqrt(MSE * (1+1/n+(log10($1)-x_mean)**2/(n*x_std**2))) \ with filledcurves unset table #Narrow prediction intervals set table $NPINT set xrange [200:20000] set samples 1000 t = 3.18 plot '+' using ($1):(1-exp(-exp(a*log10($1)+b))) + t * \ sqrt(MSE * (1/n+(log10($1)-x_mean)**2/(n*x_std**2))):\ (1-exp(-exp(a*log10($1)+b))) - t * \ sqrt(MSE * (1/n+(log10($1)-x_mean)**2/(n*x_std**2))) \ with filledcurves unset table #Compute B10, MTBF and FR via shape (k) and scale (l) parameters log10exp1 = log10(exp(1)) k = a * log10(exp(1)) l = exp(-b / k) #Quantile function for Weibull distribution Q(p, k, l) = l * (-log(1-p))**(1.0/k) #B10 hours, 90% lives longer B10 = Q(0.1, k, l) #Characteristic time and failure rate CT = Q(1-exp(-1), k, l) FR = 1 / CT #Mean Time Between Failures MTBF = l * gamma(1 + 1.0/k) #Plot specifics set samples 100 set grid x y mx set tics out set xlabel 'Operating hours' set ylabel 'Median rank' set key top left opaque set style fill transparent solid 0.25 #Non linear, Weibull distributed y axis, x axis log set yrange [0.01:0.99] set xrange [1000:7000] set nonlinear y via log(-log(1-y)) inverse 1-exp(-exp(y)) set logscale x set ytics (0.01, 0.02, 0.05, 0.1, 0.2, 0.5, \ "1-e^{-1}" 0.632, 0.9, 0.99) set xtics (1000, '' 1500, 2000, 3000, 4000, 5000, 6000, 7000) set title sprintf("B_{10}=%.fh; MTBF=%.fh; FR = %.2f/1000h", B10, MTBF, FR*1000) plot $WPINT using 1:2:3 with filledcurves lc 2 notitle, \ $NPINT using 1:2:3 with filledcurves lc 4 notitle, \ $MDAT using 2:1 with points pt 7 ps 2 lc 8 title 'Data', \ $MDAT using 2:3 with points pt 6 ps 2 lc 8 title 'Data', \ 1-exp(-1) with lines dt 2 title '{/Symbol t}', \ 0.1 with lines dt 3 title 'B_{10}', \ (1-exp(-exp(a*log10(x)+b))) with lines lc 7 lw 2 title 'Fit Weibull', \ (1-exp(-exp(log(x/l)))) with lines lc 7 dt 3 title 'Fit Exponential'