Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.graphics.apps.gnuplot > #1496

linear fit in GNU Awk

From Ivan Shmakov <oneingray@gmail.com>
Newsgroups alt.sources, comp.graphics.apps.gnuplot, comp.lang.awk
Subject linear fit in GNU Awk
Followup-To alt.sources.d, comp.lang.awk
Date 2012-11-23 12:30 +0700
Organization Aioe.org NNTP Server
Message-ID <86a9u87uoh.fsf@gray.siamics.net> (permalink)

Cross-posted to 3 groups.

Followups directed to: alt.sources.d, comp.lang.awk

Show all headers | View raw


Archive-name: awk-linear-fit-is
Submitted-by: oneingray@gmail.com
Last-modified: 2012-11-23 +00:00
Copyright-Notice: Both the README and the code are under the
    CC0 Public Domain Dedication.

	Once, I had to "verify" the Gnuplot (iterative) non-linear least
	squares algorithm against a linear least squares for a linear
	function.  Thus, I've written the code below.

	The code calculates the fit parameters (offset, coefficient) for
	each data "block" in the file, with blocks being separated with
	at least two empty lines.  By default, first input column is
	used as the independent variable, and the second as dependent,
	subject to change with -vxcol=, -vycol= GNU Awk options.

	All data points are considered of equal weight.


    Examples

	Using flat-distribution random data:

$ (for i in a b c d e f ; do
       for ((x = 0; x < 128; x++)) ; do
           printf %s\\n "${x} ${RANDOM}"
       done
       printf \\n\\n
   done) | gawk -f linear-fit.awk 
18001.6 -11.2342
18399.9 -50.597
16128.5 10.2013
16592.5 -8.59861
17240.7 -10.3474
16284.8 -4.78138
$ 

	Superimposing linear dependency on the random data:

$ (for i in 1 3 3 7 ; do
       for ((x = 0; x < 128; x++)) ; do
           printf %s\\n "${x} $((10000 * ${i} * ${x} + ${RANDOM}))"
       done
       printf \\n\\n
   done) | gawk -f linear-fit.awk 
17379 9975.24
14182 30027.4
15867.6 29993.9
15284 70019.7
$ 

	Increasing the number of the random data points, so that the
	dependency becomes apparent:

$ (for i in 1 3 3 7 ; do
       for ((x = 0; x < 1024; x++)) ; do
           printf %s\\n "${x} $((10000 * ${i} * ${x} + ${RANDOM} - 16384))"
       done
       printf \\n\\n
   done) | gawk -f linear-fit.awk 
-48.5561 9999.55
-81.2604 30000.2
-696.383 30000
-332.214 70000.7
$ 

### linear-fit.awk --- Linear fit  -*- Awk -*-

### Ivan Shmakov, 2012
## To the extent possible under law, the author(s) have dedicated all
## copyright and related and neighboring rights to this software to the
## public domain worldwide.  This software is distributed without any
## warranty.

## You should have received a copy of the CC0 Public Domain Dedication
## along with this software.  If not, see
## <http://creativecommons.org/publicdomain/zero/1.0/>.

### Code:

BEGIN {
    ## the independent and dependent variables' columns
    if (xcol == "") { xcol = 1; }
    if (ycol == "") { ycol = 2; }
    ## we assume that the state variables are "", thus 0
}

## skip over the comments
/^#/ { print $0; next; }

## count empty lines, so that we can split on blocks
! /./ { el++; next; }

## NB: ignore y == NaN
$ycol == 0 + $ycol {
    x = $xcol;  y = $ycol;
    if (el >= 2 && n != "") {
        q = 1 / (n * s2 - s1 ^2);
        ## print the result for the previous block
        print q * (s2 * u1 - s1 * u2), q * (n * u2 - s1 * u1);
        ## initialize the state
        n = 1;
        s1  = x;    s2  = x ^2;
        u1  = y;    u2  = x * y;
    } else {
        n++;
        s1 += x;    s2 += x ^2;
        u1 += y;    u2 += x * y;
    }
    el = 0;
}

END {
    if (n != "") {
        q = 1 / (n * s2 - s1 ^2);
        ## print the result for the last block
        print q * (s2 * u1 - s1 * u2), q * (n * u2 - s1 * u1);
    }
}

### linear-fit.awk ends here

-- 
FSF associate member #7257

Back to comp.graphics.apps.gnuplot | Previous | Next | Find similar


Thread

linear fit in GNU Awk Ivan Shmakov <oneingray@gmail.com> - 2012-11-23 12:30 +0700

csiph-web