Path: csiph.com!weretis.net!feeder6.news.weretis.net!feeder.usenetexpress.com!feeder-in1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Kaz Kylheku <157-073-9834@kylheku.com>
Newsgroups: comp.compilers
Subject: Re: Add nested-function support in a language the based on a stack-machine
Date: Sun, 11 Mar 2018 03:19:57 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 120
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <18-03-037@comp.compilers>
References: <18-02-009@comp.compilers> <18-03-002@comp.compilers> <18-03-012@comp.compilers> <18-03-020@comp.compilers> <18-03-024@comp.compilers> <18-03-028@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="20540"; mail-complaints-to="abuse@iecc.com"
Keywords: code
Posted-Date: 11 Mar 2018 05:43:17 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:1987

On 2018-03-06, Kaz Kylheku <217-679-0842@kylheku.com> wrote:
> On 2018-03-06, John Levine <johnl@taugh.com> wrote:
>> The other is the same array, but with a copy of the current display in
>> each routine's stack frame.  This is slower at call time but probably
>> faster to use on machines like S/360 without direct addressing
>> since the display on the stack is addressable from the frame pointer,
>> but static data needs to load a pointer from a constant pool.  A
>> longjmp doesn't need anything special since stacked displays go away
>> when the stack frames go away.
>
> This is precisely what I happen to be working on at the moment: a
> virtual machine for a Lisp dialect in which I plan to have displays
> copied into the local stack frame.  I expect it to be a performance
> advantage that there is nothing special to do when the stack frame goes
> away.

And ... bam, here is a working demo in less than a week of hacking!

Funny we should have had this whole debate around this time.  I've been
massaging the display-based VM design in my head for some months. I
write some design docs regarding the instruction set and fired up Dia to
make some bubble and arrow diagrams.

In the end, it is turning out to be far simpler than I made it out to
be. I have a working implementation of a VM with 32 instructions so far,
and an assembler/disassembler for all of them, with backpatch label
support. There are some pseudo-ops: a single mov instruction for
instance generates three different real instructions based on how the operands fit.

Without any more lengthy introduction, here is vm-clos.tl.

It implements the simple counting closure in the VM assembly
language (there is no compiler yet):

(load "asm")

(in-package :sys)

(let ((a (new assembler))
      (d #(succ format t "inner closure, counter: ~s\n")))
  a.(asm '((close t1 16 :l1 1 1 nil v0000)
           (close t1 0 :l1 0 0 nil)
           (call t1 d1 d2 d3 v0000) ;; (format t ...)
           (mov t1 v0000)         ;; t1 <- v-0-0
           (call v0000 d0 v0000)  ;; v-0-0 <- (succ v-0-0)
           :l1
           (end t1)))             ;; (return t1)
  a.(dis-listing)
  (let* ((vd (vm-make-desc 4 32 a.buf d)) ;; a.buf is code
         (mk-counter (vm-interpret-toplevel vd))
         (counter [mk-counter 10]))
    (prinl [counter])
    (prinl [counter])
    (prinl [counter])))

Run:

$ ./txr vm-clos.tl
    0: 7C00000D close t01 16 13 1 1 nil v0000
    1: 00100001
    2: 00010001
    3: 00000200
    4: 7C00000D close t01 0 13 0 0 nil
    5: 00000001
    6: 00000000
    7: 14030001 call t01 d01 d02 d03 v0000
    8: 01020101
    9: 02000103
   10: 20010200 movsr t01 v0000
   11: 14010200 call v0000 d00 v0000
   12: 02000100
   13: 0C000001 end t01
inner closure, counter: 10
10
inner closure, counter: 11
11
inner closure, counter: 12
12

The high level overview is that we have an assembly job which produces
some code in a.buf. We use this to make a VM description: we tell the VM
description that it has 4 levels of display nesting (we only use 3), and
that it should allocate 32 registers. We give it the code from a.buf and
the data vector d.

The vm-interpret-toplevel instantiates the VM state and runs this
description. No arguments can be passed in the top-level entry. But it
produces a return value. That return value is the outer closure: a
one-arg function which instantiates a counter that counts from that
argument.

When we call the closure, it instantiates a vm state, but with an
entry into the indicated closure code, and with argument passing.

The close instruction walks the display (on the stack) and clones it
into a heaped object. Any levels that are moved from stack to heap
are also installed in the original display, since they are shared.
There is a way for the machine code to create levels that are not
subject to capture.

Everything is in the display. Each level has up to 256 entries.
Level 0 is the registers t0 through tFF.  t0 is always NIL, inspired
by MIPS.

Level 1 is the data vector, d0 through dFF.

The levels after that are vXXYY. For instance v0312 is the fifth level,
register 18 (12 hex). I might re-adjust this: say, have only up to 64
levels that are up to 1024 cells wide.

I not only have closures working, but unwind-protect and exception
handling, and the handling of the dynamic environment for binding
special variables.

Basically, it appears to be reasonably complete; I can't think of
anything missing from being able to support the semantics of the Lisp
dialect. Maybe some unexpected strange interactions with the delimited
continuation support will crop up. Ah, debugging support. Ha!

:)