Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.misty.com!news.iecc.com!.POSTED!nerds-end
From: BGB <cr88192@hotmail.com>
Newsgroups: comp.compilers
Subject: Re: Compiling Lexical Closures to C
Date: Thu, 27 Sep 2012 12:21:01 -0500
Organization: albasani.net
Lines: 197
Sender: johnl@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <12-09-021@comp.compilers>
References: <12-09-020@comp.compilers>
NNTP-Posting-Host: news.iecc.com
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: leila.iecc.com 1348977375 2369 64.57.183.58 (30 Sep 2012 03:56:15 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Sun, 30 Sep 2012 03:56:15 +0000 (UTC)
Keywords: code, design
Posted-Date: 29 Sep 2012 23:56:15 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:757

On 9/25/2012 6:57 PM, grom358@gmail.com wrote:
> Working on a basic compiler for a language that supports nested functions and
> full closures.
>
> I have read a few articles like
> http://matt.might.net/articles/compiling-scheme-to-c/ where they using a
> structure for the closure,

this strategy actually works fairly well...


skim article:
I don't like tagged unions.

it doesn't take long to realize they are a rather poor way to deal with
this.


personally, I am left with more of a mash-up between doing some magic
with the pointers in the MM/GC (the MM/GC keeps track of object types,
and uses special address-ranges for some other types), and also using a
"signature string" system vaguely more related to the JVM (partly
combined with the IA-64 C++ ABI name-mangling as well, and some
"custom"-ness. the basic notation and 'concept' resembles that of the
JVM signature strings though).

normally (but it does happen), the signatures are not directly used in
the main execution path, but usually more for "static" types.

so:
var x;
( typically dynamically typed (when not type-inferred), uses a raw
pointer internally, with some types encoded via address ranges ).

var x:int;
(may use a raw machine integer, may encode type via an "i" signature
string).


> var createLines = function() {
>    var m = 2;
>    var c = 3;
>    return {
>      'line': function(x) { return m * x + c; },
>      'lineSquare': function(x) { return x * x + c; }
>    };
> };

I recognize this language... or at least the basic syntax.

(my own language derives from a similar origin, so has a similar
base-syntax as well...).


> Now how to pass the free variables?

mark which variables are captured, then there can also be case for a
(potentially more efficient) closure which doesn't capture anything
(basically, a raw code-block).

this way, captured variables go in the structure, but non-captured
variables can remain in locals.

typically, all functions will see the same set of captured variables,
but this isn't usually a big issue.


> Additionally how to deal with deeper nested functions, for example:

usual strategy is to daisy-chain.
this tends to be fairly straightforward and can get reasonable performance.

the additional indirections needed for deep nesting don't really add
much to the cost, if anything, because deep nesting tends to be fairly
uncommon anyways.


> What other techniques/methods are there for compiling lexical closures
> into C?

don't have much significant.

but, if you are clever, closures can be made to look like normal C
function pointers.


the downside is that there is no good way to handle this generally (and
portably) in plain C, so this is one of those cases that requires
assembler to work well.

a simple concept for a plain C implementation though is that a person
can have an array of "reserve" functions (to be pointers), which call
the closure via loading it from a global array.


say:
MyVM_Closure *closure_array[1024];

int funptr0_i_i(int x)
	{ return closure_array[0]->func(closure_array[0], x); }
int funptr1_i_i(int x)
	{ return closure_array[1]->func(closure_array[1], x); }
...
int (*funptr_i_i_array[256])(int x)={
funptr0_i_i, funptr1_i_i, funptr2_i_i, ... };

this can work, but has the downside that such a plain C implementation
can waste lots of memory.


if a person has ASM/machine-code and write/execute memory, a better
solution may be to make an executable structure, say:
struct MyVM_FunPtrWrap_s
{
byte jump[64];
MyVM_Closure *close;
};


with 'jump' potentially containing essentially a "trampoline" of sorts,
say (32-bit x86 using cdecl):
call L0			;get structure address
L0:			;label
pop eax			;pop address
sub eax, 5		;adjust for call (size of 'call L0')
;non-GCC case:
pop ecx			;return EIP
push esp		;get pointer to arguments
push eax		;structure address
push ecx		;return EIP
jmp MyVM_ClosureMagicT	;wrap over call
;GCC case (try to keep stack aligned):
lea edx, [esp+4]	;try to preserve stack alignment
push edx		;pad 1
push edx		;arguments list
push eax		;structure address
call MyVM_ClosureMagicT	;closure magic
add esp, 16
ret


the downside of such a strategy is portability:
the logic may have to be specialized for every target CPU and ABI (the
x86 case is simple, but for SysV/AMD64 may require a bit more nasty glob
of code here, as it is essentially necessary to save off all of the
registers which may be used to pass arguments).

'ClosureMagicT' above would also be a bit machine specific, and mostly
just serve to work the arguments into the form the closure expects and
pass control into the closure ('T' can be a stand-in for the return
type, note: the logic will be different for some return types, ...).

the advantage though is that there is no strict limit to the number of
closures which can be created this way, and it doesn't waste a lot of
memory with piles of "reserves".

getting slightly more clever, the above step can be skipped, instead
using logic to transfer directly to the wrapped closure body (this may
involve general-case logic to spit out machine code specific to each
argument list, making it potentially non-trivial as a general-purpose
solution, as well as considerably more complicated).


side note:
personally, my current language, even though it usually runs in a
threaded-code interpreter, tries to at least externally mimic using the
C ABI, which makes cross-language interfacing a good deal easier
(internally the ABI is a bit different, and technically internal to the
interpreter).

in my case, being able to easily share code and data between C-land and
script code is a major use-case for my language.


I try for the goal of making it more like the C <-> C++ interface,
namely, it exists but for the most part it isn't too terrible (in
contrast to the Java <-> C interface, which has the general awfulness
that is JNI and JNA, where cross-language interfacing tends to require a
small mountain of boilerplate...).


currently, my interpreter is "mostly plain C", but still has a few
optional ASM parts/augments (the mostly-plain-C route is mostly so that
I am not entirely tied to x86...).

I had considered potentially going and rewriting a big chunk of the
threaded-code interpreter core into ASM to potentially allow it to be
faster on certain targets (namely x86), but never got around to it (more
important stuff exists at the moment, and it is harder to justify the
effort and added maintenance hassle for this...).

this leaves most of the ASM magic mostly devoted to handling calls into
and out-of C land, and ironically, not so much about making the core
interpreter fast...

yeah, I have had JITs before, but most have not been maintained well
thus-far.