Groups > comp.compilers > #1950 > unrolled thread

Add nested-function support in a language the based on a stack-machine

Started by	dror.openu@gmail.com
First post	2018-02-12 11:25 -0800
Last post	2018-02-14 12:27 -0800
Articles	9 on this page of 29 — 13 participants

Back to article view | Back to comp.compilers

  Add nested-function support in a language the based on a stack-machine dror.openu@gmail.com - 2018-02-12 11:25 -0800
    Re: Add nested-function support in a language the based on a stack-machine dror.openu@gmail.com - 2018-02-12 14:16 -0800
      Re: Add nested-function support in a language the based on a stack-machine Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-02-13 12:10 +0100
    Re: Add nested-function support in a language the based on a stack-machine Louis Krupp <lkrupp@nospam.pssw.com.invalid> - 2018-02-13 00:42 -0700
      Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-14 00:59 +0000
        Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 22:04 -0500
          Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-14 18:06 +0000
            Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-15 11:41 -0500
              Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-17 16:13 +0000
                Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-02-17 16:39 +0000
                  Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-01 13:48 -0500
                    Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-01 19:26 +0000
                      Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-02 00:38 +0000
                      Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-02 02:48 -0500
                      Re: Add nested-function support in a language the based on a stack-machine rpw3@rpw3.org (Rob Warnock) - 2018-03-03 16:00 +0000
                    Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-03-02 09:30 +0000
                      Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-03-05 15:01 +0000
                        Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-05 15:51 -0500
                      Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-05 14:39 -0500
                        Re: Add nested-function support in a language the based on a stack-machine John Levine <johnl@taugh.com> - 2018-03-06 04:31 +0000
                          Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-06 18:17 +0000
                            Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <157-073-9834@kylheku.com> - 2018-03-11 03:19 +0000
                        Re: Add nested-function support in a language the based on a stack-machine Tomasz Kowaltowski <tk@ic.unicamp.br> - 2018-03-06 12:10 -0300
                        Re: Add nested-function support in a language the based on a stack-machine bartc <bc@freeuk.com> - 2018-03-06 19:02 +0000
                        Re: Add nested-function support in a language the based on a stack-machine antispam@math.uni.wroc.pl - 2018-04-29 16:29 +0000
                Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-01 13:49 -0500
      Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 22:37 -0500
    Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 23:27 -0500
    Re: Add nested-function support in a language the based on a stack-machine Shoefoot <shoefoot@gmail.com> - 2018-02-14 12:27 -0800

Page 2 of 2 — ← Prev page 1 [2]

#1980

From	Kaz Kylheku <217-679-0842@kylheku.com>
Date	2018-03-06 18:17 +0000
Message-ID	<18-03-028@comp.compilers>
In reply to	#1978

On 2018-03-06, John Levine <johnl@taugh.com> wrote:
>>>>>This is because displays were found to be more costly for Algol-like
>>>>>languages ...
>>>
>>>>> IIRC the additional cost is in updating the display on calls
>>>>>and returns.
>
> I get the impression that displays describe two different things.
>
> One is a static array that has one entry for each level of lexical
> scoping.  In a routine declared N levels (zero based) down, its prolog
> saves the Nth entry in the display and replaces it with a pointer to
> the current stack frame, and the epilog restores it.  It can assume
> that higher level routines have correctly set entries 0:N-1.  That
> seems pretty efficient.  Assuming the saved pointer is at a known
> location in each stack frame you can do a longjmp and unwind stacks
> without too much pain.
>
> The other is the same array, but with a copy of the current display in
> each routine's stack frame.  This is slower at call time but probably
> faster to use on machines like S/360 without direct addressing
> since the display on the stack is addressable from the frame pointer,
> but static data needs to load a pointer from a constant pool.  A
> longjmp doesn't need anything special since stacked displays go away
> when the stack frames go away.

This is precisely what I happen to be working on at the moment: a
virtual machine for a Lisp dialect in which I plan to have displays
copied into the local stack frame.  I expect it to be a performance
advantage that there is nothing special to do when the stack frame goes
away.

If we had to take some unconditional action, we would have to set up
a catch frame. That means doing a setjmp-like operation that saves
a whole vector of stuff similar to a jmp_buf. Not just machine
registers, but the signal mask (or a virtualized representation of it,
for efficiency), and other stuff. This vector is larger than a display
display containing a significant number of environment levels!

If the scope binds dynamically scoped variables, those have to be
undone.  However, that requires no special action.  The dynamic
environment is a chain rooted at a global pointer called dyn_env.  Long
ago, I made the simplifying design decision to include this dyn_env
pointer in the jmp_buf-like context that is saved at every catch frame,
and so there is no need for a scope to set up a handler for the sake of
restoring dynamically scoped bindings. The normal return case has to
restore dyn_env, which is simple.

[toc] | [prev] | [next] | [standalone]

#1987

From	Kaz Kylheku <157-073-9834@kylheku.com>
Date	2018-03-11 03:19 +0000
Message-ID	<18-03-037@comp.compilers>
In reply to	#1980

On 2018-03-06, Kaz Kylheku <217-679-0842@kylheku.com> wrote:
> On 2018-03-06, John Levine <johnl@taugh.com> wrote:
>> The other is the same array, but with a copy of the current display in
>> each routine's stack frame.  This is slower at call time but probably
>> faster to use on machines like S/360 without direct addressing
>> since the display on the stack is addressable from the frame pointer,
>> but static data needs to load a pointer from a constant pool.  A
>> longjmp doesn't need anything special since stacked displays go away
>> when the stack frames go away.
>
> This is precisely what I happen to be working on at the moment: a
> virtual machine for a Lisp dialect in which I plan to have displays
> copied into the local stack frame.  I expect it to be a performance
> advantage that there is nothing special to do when the stack frame goes
> away.

And ... bam, here is a working demo in less than a week of hacking!

Funny we should have had this whole debate around this time.  I've been
massaging the display-based VM design in my head for some months. I
write some design docs regarding the instruction set and fired up Dia to
make some bubble and arrow diagrams.

In the end, it is turning out to be far simpler than I made it out to
be. I have a working implementation of a VM with 32 instructions so far,
and an assembler/disassembler for all of them, with backpatch label
support. There are some pseudo-ops: a single mov instruction for
instance generates three different real instructions based on how the operands fit.

Without any more lengthy introduction, here is vm-clos.tl.

It implements the simple counting closure in the VM assembly
language (there is no compiler yet):

(load "asm")

(in-package :sys)

(let ((a (new assembler))
      (d #(succ format t "inner closure, counter: ~s\n")))
  a.(asm '((close t1 16 :l1 1 1 nil v0000)
           (close t1 0 :l1 0 0 nil)
           (call t1 d1 d2 d3 v0000) ;; (format t ...)
           (mov t1 v0000)         ;; t1 <- v-0-0
           (call v0000 d0 v0000)  ;; v-0-0 <- (succ v-0-0)
           :l1
           (end t1)))             ;; (return t1)
  a.(dis-listing)
  (let* ((vd (vm-make-desc 4 32 a.buf d)) ;; a.buf is code
         (mk-counter (vm-interpret-toplevel vd))
         (counter [mk-counter 10]))
    (prinl [counter])
    (prinl [counter])
    (prinl [counter])))

Run:

$ ./txr vm-clos.tl
    0: 7C00000D close t01 16 13 1 1 nil v0000
    1: 00100001
    2: 00010001
    3: 00000200
    4: 7C00000D close t01 0 13 0 0 nil
    5: 00000001
    6: 00000000
    7: 14030001 call t01 d01 d02 d03 v0000
    8: 01020101
    9: 02000103
   10: 20010200 movsr t01 v0000
   11: 14010200 call v0000 d00 v0000
   12: 02000100
   13: 0C000001 end t01
inner closure, counter: 10
10
inner closure, counter: 11
11
inner closure, counter: 12
12

The high level overview is that we have an assembly job which produces
some code in a.buf. We use this to make a VM description: we tell the VM
description that it has 4 levels of display nesting (we only use 3), and
that it should allocate 32 registers. We give it the code from a.buf and
the data vector d.

The vm-interpret-toplevel instantiates the VM state and runs this
description. No arguments can be passed in the top-level entry. But it
produces a return value. That return value is the outer closure: a
one-arg function which instantiates a counter that counts from that
argument.

When we call the closure, it instantiates a vm state, but with an
entry into the indicated closure code, and with argument passing.

The close instruction walks the display (on the stack) and clones it
into a heaped object. Any levels that are moved from stack to heap
are also installed in the original display, since they are shared.
There is a way for the machine code to create levels that are not
subject to capture.

Everything is in the display. Each level has up to 256 entries.
Level 0 is the registers t0 through tFF.  t0 is always NIL, inspired
by MIPS.

Level 1 is the data vector, d0 through dFF.

The levels after that are vXXYY. For instance v0312 is the fifth level,
register 18 (12 hex). I might re-adjust this: say, have only up to 64
levels that are up to 1024 cells wide.

I not only have closures working, but unwind-protect and exception
handling, and the handling of the dynamic environment for binding
special variables.

Basically, it appears to be reasonably complete; I can't think of
anything missing from being able to support the semantics of the Lisp
dialect. Maybe some unexpected strange interactions with the delimited
continuation support will crop up. Ah, debugging support. Ha!

:)

[toc] | [prev] | [next] | [standalone]

#1979

From	Tomasz Kowaltowski <tk@ic.unicamp.br>
Date	2018-03-06 12:10 -0300
Message-ID	<18-03-026@comp.compilers>
In reply to	#1976

I am not sure whether it will contribute anything new
 in this discussion, but you might want to check an old
 paper of mine:

     Parameter Passing Mechanisms and Run Time Data Structures
     Software - Practice and Experience
     Vol. 11, Number 7, July 1981
     pg. 757-765

tk
[Any way to get a copy of it?  Wiley wants $38 for a PDF. -John]

[toc] | [prev] | [next] | [standalone]

#1981

From	bartc <bc@freeuk.com>
Date	2018-03-06 19:02 +0000
Message-ID	<18-03-030@comp.compilers>
In reply to	#1976

On 05/03/2018 19:39, George Neuner wrote:
> On Fri, 02 Mar 2018 09:30:42 GMT, anton@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:

>> Wrong.  Consider the following nesting of functions:
>>
>> A
>> B
>>   C
>> D
>>
>> With a display, in C you have built up a display pointing to the
>> frames of C, B, and A.  When performing a call from C to D, you throw
>> that away and replace it with a display pointing to just the frame of
>> D.  When returning from D, you have to restore the old display.
>
> No.  D and A are at the same lexical level.  On entry, D will replace
> A's entry in the display, and will restore A's entry on exit.
>
> The scope rules say D can't see or call B or C.

What scope rules? As I don't think this is for a specific language.

Even in C, some versions of which have nested functions, it is possible
to set up a function pointer to C(), which can later be called from D()
without first calling A().

And in my own language, it is possible for code in D() to directly call
A.B.C() without entering A.

Although in this one, C() can't access any stack-allocated data of B or
A; partly for this reason (data may not be undefined if called from
outside), and also because it would be fiddly to do anyway, even if
called from B() and A(), as the thread demonstrates.

(C() can access static data of A() and B(), also types, named constants,
enums, and other local functions, so it can still be worthwhile having
local functions with such a restriction.)

--
bartc
[The discussion so far has mostly been about Algol/Pascal style
languages where you can't peek into blocks and contexts go away when a
routine returns so there are no closures.  I agree that if you relax
those rules, life gets more complicated.

Hypothetical question: Algol60 call by name was a mistake.  They
intended an elegant definition of call by reference and didn't realize
until Jensen's device what they'd done.  If they'd done what they
intended and we didn't have to invent thunks, how would programming
languages be different? -John]

[toc] | [prev] | [next] | [standalone]

#2088

From	antispam@math.uni.wroc.pl
Date	2018-04-29 16:29 +0000
Message-ID	<18-04-079@comp.compilers>
In reply to	#1976

George Neuner <gneuner2@comcast.net> wrote:
> On Fri, 02 Mar 2018 09:30:42 GMT, anton@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:
>
> >George Neuner <gneuner2@comcast.net> writes:
> >>On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
> >>(Anton Ertl) wrote:
> >>
> >>>Kaz Kylheku <217-679-0842@kylheku.com> writes:
> >>>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
> >>>>> No worries.  IME, displays don't get much respect from modern
> >>>>> textbooks - they are mentioned mostly in passing.
> >>>>
> >>>>This is probably because of
> >>>
> >>>This is because displays were found to be more costly for Algol-like
> >>>languages
> >>
> >>More costly than what?
> >
> >Static link chains.
> >
> >>> IIRC the additional cost is in updating the display on calls
> >>>and returns.
> >>
> >>Which is no different from the overhead to maintain static links
> >
> >Wrong.  Consider the following nesting of functions:
> >
> >A
> > B
> >  C
> >D
> >
> >With a display, in C you have built up a display pointing to the
> >frames of C, B, and A.  When performing a call from C to D, you throw
> >that away and replace it with a display pointing to just the frame of
> >D.  When returning from D, you have to restore the old display.
>
> No.  D and A are at the same lexical level.  On entry, D will replace
> A's entry in the display, and will restore A's entry on exit.
>
> The scope rules say D can't see or call B or C.  If it calls A, then A
> replaces it at the same lexical level.  When A returns, D's link is
> restored.

You assume no procedure parameters (boring case).  With procedure
parameters C can pass itself (or B or other procedure at the
same lexical level) to D and D can call it.  So dynamic chain
can be:

A B C D C

and innermost C needs to see A variables in its display.
IIUC resolving this required copies that make display
management O(n) (n being depth of static chain).

> No problem.  And no need to preserve the whole display.
>
> >If you don't maintain a static link chain, you have to save the
> >complete display of C on the call and restore it on return.
>
> No you don't.
>
>
> >Note that D may itself call functions that need more display, so you don't get
> >away with just saving and restoring the first slot of the display.
> >IIRC this was the variant looked at in the paper that concluded that
> >displays are more costly.
>
> That is simply wrong unless you change the whole semantics.
>
> As long as call sites are restricted to being within the definition
> scope of the called function [as is true in Pascal],

Pascal without procedural parameters.

--
                              Waldek Hebisch

[toc] | [prev] | [next] | [standalone]

#1967

From	George Neuner <gneuner2@comcast.net>
Date	2018-03-01 13:49 -0500
Message-ID	<18-03-003@comp.compilers>
In reply to	#1963

On Sat, 17 Feb 2018 16:13:10 +0000 (UTC), Kaz Kylheku
<217-679-0842@kylheku.com> wrote:

>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>> On Wed, 14 Feb 2018 18:06:40 +0000 (UTC), Kaz Kylheku
>><217-679-0842@kylheku.com> wrote:
>>
>>>On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote:

>> ... you're correct that the display needs to be counted
>> as part of the thread switch context.  But each thread also would be
>> using from a separate stack, so having the entire display saved in
>> each frame is overkill when all that's needed is a single pointer.
>
>The address of a local function can be taken. When that function is
>called from any context, possibly another thread, it has access to the
>locals that were lexically apparent there.

You Lisp is showing.  IOW, that's language dependent.

There are perfectly useable languages that do not allow taking the
address of a function ... never mind passing such pointers between
threads.

Upcalls / callbacks / completion functions, etc.  do require the
ability to specify the target function, but there isn't any reason for
the language to expose that to the programmer.  And in such cases the
stack environment required by the called function exists at the point
where the function is called.

>Stack *access* is shared among threads. E.g. a thread can register
>a stack object in a shared queue (such as a synchronization wait queue).
>As long as it is dequeued before that frame terminates, everything is
>cool. Commonly done.

Again, language dependent.

Your hypothetical stack resident object need not be any kind of
"active" object.  And if the [shared entity] is provided a callback
function [by whatever means the language permits], there is no problem
as long as the lexical environment of that function still exists.

Have you never used multi-threaded Pascal?

>If a downward-only lexical closure is implemented as a pointer into some
>threads's stack memory (plus a function) that object can be passed to
>another thread and used (as long as the context which created that
>closure doesn't terminate).

Yes it can.  And the result is something akin to co-routine.  But
there's no reason any particular language should allow it.

Many of your arguments are based on the notion of a general purpose
language needing to provide full closures and objects ... but in a
more restricted domain specific setting there often are perfectly good
arguments for NOT providing general purpose features.

We really don't know what the OP is trying to do.  It is good to point
out limitations of an idea and to bring up alternatives where viable,
but let's not go too crazy here until we know more.

YMMV,
George

[toc] | [prev] | [next] | [standalone]

#1956

From	George Neuner <gneuner2@comcast.net>
Date	2018-02-13 22:37 -0500
Message-ID	<18-02-020@comp.compilers>
In reply to	#1952

On Tue, 13 Feb 2018 00:42:00 -0700, Louis Krupp
<lkrupp@nospam.pssw.com.invalid> wrote:

>On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:
>
>>Suppose I have a simple C-like programming language: ...
>>Like you can see, it supports nested functions.
>
>gcc supports nested functions as an extension to C. Compiling this
>program with -O0 -fdump-tree-all and looking at the generated files
>might give you an idea of one way to do it: ...

>The alternative, a display vector, seems like it would be easier to
>implement unless you're doing this for a real machine with a real (and
>therefore limited) set of registers.

The display doesn't need to be in registers [though the pointer to the
display itself should be].  Since every non-leaf function will touch
some part of the display, it will tend to stay in cache close to the
CPU.

And a display doesn't need to be large to be effective.  Lexical
definition hierarchies tend to be quite shallow.  Studies of nested
functions done back in the days of Pascal showed that 16 entries
sufficed for all but a tiny percentage of code.

Also remember that OO is an orthogonal dimension: the depth of an
object hierarchy will not be limited by using a display.
However if you mix objects and display nested functions, I *think* it
would be necessary to maintain a separate display instance per object
type.

George

[toc] | [prev] | [next] | [standalone]

#1957

From	George Neuner <gneuner2@comcast.net>
Date	2018-02-13 23:27 -0500
Message-ID	<18-02-022@comp.compilers>
In reply to	#1950

On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:

>Suppose I have a simple C-like programming language:
>
>    int foo() {
>        int x = 10;
>        int bar(y int) {
>            return y * 2
>        }
>        return bar() + x
>    }
>
>Like you can see, it supports nested functions.
>
>I already implemented the lexing-parsing phases, and now I'm working
>on the code generation and the stack-machine. most of the operations,
>and the simple flow-control already implemented but I need some
>help/ideas how to solve the nested-function task.
>
>The stack-machine has two registers, accumulator, and a temporary register.
>
>Each frame contains: `[arguments, local variables, return address,
>caller frame and stack pointer, temporaries and data operations]`.  I
>read about two ways to implement the nested-function. one, using an
>activation-link (also called static-link), and display. Is there any
>better idea to handle this?  If I'll choose one of these idea, is it
>compile time calculation or do I need to store something in runtime?
>if it's a runtime thing, what's the time complexity for it? is it O(1)
>operation, or something like O(k) where k is the depth of the
>function.

Using activation pointers in the stack frames, you do end up with an
O(n) list.  To get something M lexical levels away, you need to follow
M pointers.

A display is O(1) to use: to get from, e.g., level 4 to level 2, the
level 4 function just looks at the pointer in display[2].

Both are O(1) to maintain at runtime - just a pointer save/restore.

Whether they work for you depends on the language.  These methods work
for languages like Pascal, but not for languages like Lisp where
nested functions can be called after their parent has termninated and
the stack environment that created them has disappeared.

And as I mentioned in another post, there is an issue mixing displays
with object oriented languages.

If you prefer to go with more advanced compiler techniques, there is
either closure conversion or lambda lifting.  They can get pretty
involved [again depending on the language], so rather than us trying
to explain them here, it is better that you read about them and ask
questions if you don't understand something.

>[This at least used to be a standard topic in compiler texts.  You should
>be able to do either in O(1). -John]

I have never actually seen lambda lifting (as such) covered in any
compiler text ... I learned of it perusing research papers.

But certainly the other methods should be in any book since ~1990.

Not that lambda lifting is that hard to grasp:  once you understand
closure conversion, it's obvious that lambda lifting is simply a (more
complicated) optimization for situations that don't require closure
persistence.  Of course, in such situations, you could just choose to
stack allocate the closure rather than heap allocating it.  Once
you're committed to the code rewriting necessary to do conversion, the
extra needed to do lambda lifting may be less of a concern.

YMMV,
George

[toc] | [prev] | [next] | [standalone]

#1959

From	Shoefoot <shoefoot@gmail.com>
Date	2018-02-14 12:27 -0800
Message-ID	<18-02-026@comp.compilers>
In reply to	#1950

On Monday, February 12, 2018 at 3:51:19 PM UTC-5, dror....@gmail.com wrote:
> one, using an activation-link (also called static-link), and display.

I'm a big fan of Display stacks.  They are easy to implement and fast at runtime.
However they do break down with setjmp/longjmp.  The Display stack becomes
part of the state and must be pushed with the setjmp.  There is also some
considerations for the debugger.  The debugger has to walk the stack rather than
use the Display.

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Add nested-function support in a language the based on a stack-machine

Contents

#1980

#1987

#1979

#1981

#2088

#1967

#1956

#1957

#1959