Path: csiph.com!goblin3!goblin.stu.neva.ru!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: "Rick C. Hodgin" <rick.c.hodgin@gmail.com>
Newsgroups: comp.compilers
Subject: Algorithm Optimization
Date: Sun, 13 Sep 2020 13:00:43 -0400
Organization: Liberty Software Foundation
Lines: 213
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-09-032@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="19838"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, comment
Posted-Date: 14 Sep 2020 22:53:54 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-US
Xref: csiph.com comp.compilers:2603

1.

I've been pursuing the idea of what I call algorithm optimization.  It's
the idea that algorithms coded by individuals may not be optimal, and
may require refactoring / re-engineering to be made optimal based on
what's trying to be achieved.

Here's a simple example:

-----[ Begin ]-----
00:    struct SData
01:    {
02:        SData*  next;    // Link list
03:        int     x;       // Some data payload
04:    };
05:    SData* first = NULL;
06:
07:    SData* iAlloc_new_sdata(void)
08:    {
09:        SData* d = (SData*)calloc(1, sizeof(SData));
10:        SData* p;
11:
12:        if (!first)
13:        {
14:            first = d;
15:
16:        } else {
17:            for (p = first; p->next; )
18:                p = p->next;
19:            p->next = d;
20:        }
21:        return d;
22:    }
23:
24:    SData* store_x(int x)
25:    {
26:        SData* d = iAlloc_new_sdata();
27:        if (d)
28:            d->x = x;
29:
30:        return d;
31:    }
-----[ End ]-----

In the above example, a linked list structure is allocated and some data
is stored into it.  In this example a single x variable, but in a
real-world case there may be many variables.

The function called by user code is store_x(), but store_x() must call
into iAlloc_new_sdata() to retrieve a new pointer to store its data.

If this were only slightly re-written, a notable improvement could be
achieved eliminating the need for store_x(), with only a slight
extension added to iAlloc_new_sdata().

Example:
-----[ Begin ]-----
// First 6 lines are the same
07:    SData* iAlloc_new_sdata(int x)
08:    {
09:        SData* d;
10:        SData* p;
11:
12:        if ((d = (SData*)calloc(1, sizeof(SData))))
13:        {
14:            d->x = x;
15:            if (!first)
16:            {
17:                first = d;
18:
19:           } else {
20:                for (p = first; p->next; )
21:                    p = p->next;
22:                p->next = d;
23:            }
24:        }
25:        return d;
26:    }
27:
28:    SData* store_x(int x)
29:    {
30:        return iAlloc_new_sdata(x);
31:    }
-----[ End ]-----

By moving the assignment of x into ialloc_new_sdata(), it greatly
simplifies store_x().

And since store_x() is now effectively a pass-thru function, the
compiler can optimize away all calls to store_x() and make them direct
calls to iAlloc_new_sdata().

-----
2.

We see in the above examples this loop to reach the last element:

20:                for (p = first; p->next; )
21:                    p = p->next;

This iterative component can be a great slowdown if the link list grows
beyond a few trivial elements.  But regardless, the compiler could flag
it as an iterative loop where the logic test is based upon an iterative
access to the portion being tested.

If the compiler could identify and understand the purpose of this block
of code, it could extrapolate a need to change the algorithm from
something iterative, into something more direct.  And by rearranging it
slightly, it would greatly improve performance on growing lists:

-----[ Begin ]-----
00:    struct SData
01:    {
02:        SData*  next;    // Link list
03:        int     x;       // Some data payload
04:    };
05:    SData* first = NULL;
06:    SData* last  = NULL;
07:
08:    SData* iAlloc_new_sdata(int x)
09:    {
10:        SData* d;
11:        SData* p;
12:
13:        if ((d = (SData*)calloc(1, sizeof(SData))))
14:        {
15:            d->x = x;
16:            if (!first)
17:            {
18:                first = d;
19:                last  = d;
20:
21:            } else {
22:                last->next = d;
23:                last       = d;
24:            }
25:        }
26:        return d;
27:    }
28:
29: // SData* store_x(int x)
30: // {
31: //     return iAlloc_new_sdata(x);
32: // }
-----[ End ]-----

The last variable there would've been injected by the compiler, and the
iterative loop would've been replaced with a direct assignment plus the
update of the last variable.

Other functions accessing SData (like an iAlloc_delete_sdata()) would
also have to be modified by the compiler to handle first/last processing
as well, and there may be constraints there on what's possible making
this optimization not possible due to those other constraints, even
though it is possible here.

But, in this way, by examining the intent of the algorithm and
recognizing what it's trying to do, alternate routes to achieve the same
type of computing could be determined, and the optimum of them could be
chosen for the application at hand.

It would be nice to have the compiler break things down, analyze them,
and reconstruct the revised algorithms back into source code as in the
above example, the reality is some re-arranging optimizations that could
be injected to address the constraints of logic would be made not just
based on what the language itself is capable of handling syntax-wise,
but would also include that which the target ISA's underlying machine
code would be capable of processing as well.

Things would have to be broken down to their component parts at the
targeted ISA level, examined, and then reconstructed back up from that,
being refactored where possible to improve things.

I can easily see a way to do this with patterns and use cases seen in
taking gobs of existing source code in popular products and re-compiling
it through these compilers to identify common logic flow examples, and
then manually analyzing the code and creating alternatives which allow
the optimizing compiler to pattern-match use cases and replace them with
manmade alternatives ... but I'd like it to be its own thing, where it
could know what's happening data/flow-analysis-wise, and then revise the
input to achieve the same processing/compute, but with something that is
greatly simplified and optimal for the target.

I view this in my CAlive compiler as three levels:

     3)  CAlive, a C/C++ like language with similar syntax.
     2)  BAlive, a language like C with only fundamental types.
     1)  AAlive, an intermediate assembly language which addresses the
         needs of a virtual processor, which is also then constrained
         by the limitations of the target ISA.
     0)  Emitted opcodes for the target ISA.

The framework takes CAlive source code, compiles it into BAlive source
code, which compiles it into AAlive source code, which emits opcodes for
the target ISA.  Optimization takes place at the BAlive level, and
keyhole optimization takes place at the AAlive level and (more limited)
at the opcode level.

Are these all standard optimization techniques which exist, or is this
something else I'm pursuing with the big push to have optimization take
place at the BAlive level to revamp algorithms based on fundamental data
types and data/flow analyses of the intent of the algorithms?

Note:  All of this is my original thinking this all through.  I have not
read books or articles or papers from others on how to do things.  I
look at the code and I think things.  I used to discuss them with Walter
Banks, but he passed in late 2019.

--
Rick C. Hodgin
[I think the usual way to do this is to provide a way to express higher level
algorithms in your programming language so the compiler doesn't have to try
to reverse engineer them. -John]