Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.misty.com!news.iecc.com!.POSTED!nerds-end From: grom358@gmail.com Newsgroups: comp.compilers Subject: Compiling Lexical Closures to C Date: Tue, 25 Sep 2012 16:57:01 -0700 (PDT) Organization: Compilers Central Lines: 121 Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <12-09-020@comp.compilers> NNTP-Posting-Host: news.iecc.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: leila.iecc.com 1348672778 95469 64.57.183.58 (26 Sep 2012 15:19:38 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Wed, 26 Sep 2012 15:19:38 +0000 (UTC) Keywords: code, question Posted-Date: 26 Sep 2012 11:19:38 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:756 Working on a basic compiler for a language that supports nested functions and full closures. I have read a few articles like http://matt.might.net/articles/compiling-scheme-to-c/ where they using a structure for the closure, struct closure { Lambda lam; void *env; }; So say I have the following: var createSpinner = function(initVal) { var x = initVal; return [ function() { return ++x; }, function() { return --x; } ]; }; The return functions can share the environment so they both modifying the same x variable. So can hoist the x variable into a structure on the heap and pass pointer to it to both the increment and decrement functions returned in the array. // Example use in language var spinner = createSpinner(10); spinner[0](); // 11 spinner[0](); // 12 spinner[1](); // 11 // Pseudo compiled C Array spinner = createSpinner(10); spinner[0]->lam(spinner[0]->env); spinner[0]->lam(spinner[0]->env); spinner[1]->lam(spinner[1]->env); Now looking at more complex example: var createLines = function() { var m = 2; var c = 3; return { 'line': function(x) { return m * x + c; }, 'lineSquare': function(x) { return x * x + c; } }; }; Now how to pass the free variables? I could put all free variables into an environment. Which means memory can be wasted, because as long as a closure is still referenced then all the free variables by the top level function are kept in memory. Additionally how to deal with deeper nested functions, for example: var run = function() { var message = ''; var setMessage = function(msg) { message = msg; }; var sayHello = function() { var getName = function() { println(message ~ ', what is your name?'); return readLine(); }; var name = getName(); println(message ~ ' ' ~ name); }; setMessage('hello'); sayHello(); }; So the getName function needs reference to the message variable declared in the run function. One solution I was thinking would be to have chained environment for each level of nesting. Eg. getName would reference the message variable via env->parentEnv->message. That is for each level of nesting need to have another pointer dereference to access the free variable. Alternatively the other solution I was thinking was free variables are indirectly related to the environment, and each closure has its own environment. Free variables are shared between environments by them having reference to the same free variable. Eg. struct env_getName { char *message[]; }; So each free variable is allocated separately on the heap and each free variable is reference counted (or garbage collected) separately. Therefore disadvantage here is have more allocations of memory. One per free variable then two for each closure (one for the closure, and one for its environment; though could collapse the environment into the closure structure). Could also use hybrid approach. The compiler using different solutions based on how much sharing there is of free variables. But this means the compiler is getting more complex. For example, in the simple case of: var makeCounter = function(x) { return function() { return ++x; }; }; var counter = makeCounter(10); counter(); // returns 11 Could compile into: int func_a1(int *x) { return ++(*x); } struct closure_makeCounter { int (*func_a1)(); int x; }; struct closure_makeCounter counter = /* allocate closure */; counter.func_a1(&counter.x); Then only have the one allocation in this simple case, and only have to maintain the reference counts on the closure. What other techniques/methods are there for compiling lexical closures into C?