Path: csiph.com!goblin3!goblin.stu.neva.ru!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: "Rick C. Hodgin" Newsgroups: comp.compilers Subject: Algorithm Optimization Date: Sun, 13 Sep 2020 13:00:43 -0400 Organization: Liberty Software Foundation Lines: 213 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-09-032@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="19838"; mail-complaints-to="abuse@iecc.com" Keywords: optimize, comment Posted-Date: 14 Sep 2020 22:53:54 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-US Xref: csiph.com comp.compilers:2603 1. I've been pursuing the idea of what I call algorithm optimization. It's the idea that algorithms coded by individuals may not be optimal, and may require refactoring / re-engineering to be made optimal based on what's trying to be achieved. Here's a simple example: -----[ Begin ]----- 00: struct SData 01: { 02: SData* next; // Link list 03: int x; // Some data payload 04: }; 05: SData* first = NULL; 06: 07: SData* iAlloc_new_sdata(void) 08: { 09: SData* d = (SData*)calloc(1, sizeof(SData)); 10: SData* p; 11: 12: if (!first) 13: { 14: first = d; 15: 16: } else { 17: for (p = first; p->next; ) 18: p = p->next; 19: p->next = d; 20: } 21: return d; 22: } 23: 24: SData* store_x(int x) 25: { 26: SData* d = iAlloc_new_sdata(); 27: if (d) 28: d->x = x; 29: 30: return d; 31: } -----[ End ]----- In the above example, a linked list structure is allocated and some data is stored into it. In this example a single x variable, but in a real-world case there may be many variables. The function called by user code is store_x(), but store_x() must call into iAlloc_new_sdata() to retrieve a new pointer to store its data. If this were only slightly re-written, a notable improvement could be achieved eliminating the need for store_x(), with only a slight extension added to iAlloc_new_sdata(). Example: -----[ Begin ]----- // First 6 lines are the same 07: SData* iAlloc_new_sdata(int x) 08: { 09: SData* d; 10: SData* p; 11: 12: if ((d = (SData*)calloc(1, sizeof(SData)))) 13: { 14: d->x = x; 15: if (!first) 16: { 17: first = d; 18: 19: } else { 20: for (p = first; p->next; ) 21: p = p->next; 22: p->next = d; 23: } 24: } 25: return d; 26: } 27: 28: SData* store_x(int x) 29: { 30: return iAlloc_new_sdata(x); 31: } -----[ End ]----- By moving the assignment of x into ialloc_new_sdata(), it greatly simplifies store_x(). And since store_x() is now effectively a pass-thru function, the compiler can optimize away all calls to store_x() and make them direct calls to iAlloc_new_sdata(). ----- 2. We see in the above examples this loop to reach the last element: 20: for (p = first; p->next; ) 21: p = p->next; This iterative component can be a great slowdown if the link list grows beyond a few trivial elements. But regardless, the compiler could flag it as an iterative loop where the logic test is based upon an iterative access to the portion being tested. If the compiler could identify and understand the purpose of this block of code, it could extrapolate a need to change the algorithm from something iterative, into something more direct. And by rearranging it slightly, it would greatly improve performance on growing lists: -----[ Begin ]----- 00: struct SData 01: { 02: SData* next; // Link list 03: int x; // Some data payload 04: }; 05: SData* first = NULL; 06: SData* last = NULL; 07: 08: SData* iAlloc_new_sdata(int x) 09: { 10: SData* d; 11: SData* p; 12: 13: if ((d = (SData*)calloc(1, sizeof(SData)))) 14: { 15: d->x = x; 16: if (!first) 17: { 18: first = d; 19: last = d; 20: 21: } else { 22: last->next = d; 23: last = d; 24: } 25: } 26: return d; 27: } 28: 29: // SData* store_x(int x) 30: // { 31: // return iAlloc_new_sdata(x); 32: // } -----[ End ]----- The last variable there would've been injected by the compiler, and the iterative loop would've been replaced with a direct assignment plus the update of the last variable. Other functions accessing SData (like an iAlloc_delete_sdata()) would also have to be modified by the compiler to handle first/last processing as well, and there may be constraints there on what's possible making this optimization not possible due to those other constraints, even though it is possible here. But, in this way, by examining the intent of the algorithm and recognizing what it's trying to do, alternate routes to achieve the same type of computing could be determined, and the optimum of them could be chosen for the application at hand. It would be nice to have the compiler break things down, analyze them, and reconstruct the revised algorithms back into source code as in the above example, the reality is some re-arranging optimizations that could be injected to address the constraints of logic would be made not just based on what the language itself is capable of handling syntax-wise, but would also include that which the target ISA's underlying machine code would be capable of processing as well. Things would have to be broken down to their component parts at the targeted ISA level, examined, and then reconstructed back up from that, being refactored where possible to improve things. I can easily see a way to do this with patterns and use cases seen in taking gobs of existing source code in popular products and re-compiling it through these compilers to identify common logic flow examples, and then manually analyzing the code and creating alternatives which allow the optimizing compiler to pattern-match use cases and replace them with manmade alternatives ... but I'd like it to be its own thing, where it could know what's happening data/flow-analysis-wise, and then revise the input to achieve the same processing/compute, but with something that is greatly simplified and optimal for the target. I view this in my CAlive compiler as three levels: 3) CAlive, a C/C++ like language with similar syntax. 2) BAlive, a language like C with only fundamental types. 1) AAlive, an intermediate assembly language which addresses the needs of a virtual processor, which is also then constrained by the limitations of the target ISA. 0) Emitted opcodes for the target ISA. The framework takes CAlive source code, compiles it into BAlive source code, which compiles it into AAlive source code, which emits opcodes for the target ISA. Optimization takes place at the BAlive level, and keyhole optimization takes place at the AAlive level and (more limited) at the opcode level. Are these all standard optimization techniques which exist, or is this something else I'm pursuing with the big push to have optimization take place at the BAlive level to revamp algorithms based on fundamental data types and data/flow analyses of the intent of the algorithms? Note: All of this is my original thinking this all through. I have not read books or articles or papers from others on how to do things. I look at the code and I think things. I used to discuss them with Walter Banks, but he passed in late 2019. -- Rick C. Hodgin [I think the usual way to do this is to provide a way to express higher level algorithms in your programming language so the compiler doesn't have to try to reverse engineer them. -John]