Groups > comp.compilers > #1950 > unrolled thread

Add nested-function support in a language the based on a stack-machine

Started by	dror.openu@gmail.com
First post	2018-02-12 11:25 -0800
Last post	2018-02-14 12:27 -0800
Articles	20 on this page of 29 — 13 participants

Back to article view | Back to comp.compilers

  Add nested-function support in a language the based on a stack-machine dror.openu@gmail.com - 2018-02-12 11:25 -0800
    Re: Add nested-function support in a language the based on a stack-machine dror.openu@gmail.com - 2018-02-12 14:16 -0800
      Re: Add nested-function support in a language the based on a stack-machine Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-02-13 12:10 +0100
    Re: Add nested-function support in a language the based on a stack-machine Louis Krupp <lkrupp@nospam.pssw.com.invalid> - 2018-02-13 00:42 -0700
      Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-14 00:59 +0000
        Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 22:04 -0500
          Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-14 18:06 +0000
            Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-15 11:41 -0500
              Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-17 16:13 +0000
                Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-02-17 16:39 +0000
                  Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-01 13:48 -0500
                    Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-01 19:26 +0000
                      Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-02 00:38 +0000
                      Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-02 02:48 -0500
                      Re: Add nested-function support in a language the based on a stack-machine rpw3@rpw3.org (Rob Warnock) - 2018-03-03 16:00 +0000
                    Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-03-02 09:30 +0000
                      Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-03-05 15:01 +0000
                        Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-05 15:51 -0500
                      Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-05 14:39 -0500
                        Re: Add nested-function support in a language the based on a stack-machine John Levine <johnl@taugh.com> - 2018-03-06 04:31 +0000
                          Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-06 18:17 +0000
                            Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <157-073-9834@kylheku.com> - 2018-03-11 03:19 +0000
                        Re: Add nested-function support in a language the based on a stack-machine Tomasz Kowaltowski <tk@ic.unicamp.br> - 2018-03-06 12:10 -0300
                        Re: Add nested-function support in a language the based on a stack-machine bartc <bc@freeuk.com> - 2018-03-06 19:02 +0000
                        Re: Add nested-function support in a language the based on a stack-machine antispam@math.uni.wroc.pl - 2018-04-29 16:29 +0000
                Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-01 13:49 -0500
      Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 22:37 -0500
    Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 23:27 -0500
    Re: Add nested-function support in a language the based on a stack-machine Shoefoot <shoefoot@gmail.com> - 2018-02-14 12:27 -0800

Page 1 of 2 [1] 2 Next page →

#1950 — Add nested-function support in a language the based on a stack-machine

From	dror.openu@gmail.com
Date	2018-02-12 11:25 -0800
Subject	Add nested-function support in a language the based on a stack-machine
Message-ID	<18-02-009@comp.compilers>

Suppose I have a simple C-like programming language:

    int foo() {
        int x = 10;
        int bar(y int) {
            return y * 2
        }
        return bar() + x
    }

Like you can see, it supports nested functions.

I already implemented the lexing-parsing phases, and now I'm working
on the code generation and the stack-machine. most of the operations,
and the simple flow-control already implemented but I need some
help/ideas how to solve the nested-function task.

The stack-machine has two registers, accumulator, and a temporary register.

Each frame contains: `[arguments, local variables, return address,
caller frame and stack pointer, temporaries and data operations]`.  I
read about two ways to implement the nested-function. one, using an
activation-link (also called static-link), and display. Is there any
better idea to handle this?  If I'll choose one of these idea, is it
compile time calculation or do I need to store something in runtime?
if it's a runtime thing, what's the time complexity for it? is it O(1)
operation, or something like O(k) where k is the depth of the
function.
[This at least used to be a standard topic in compiler texts.  You should
be able to do either in O(1). -John]

[toc] | [next] | [standalone]

#1951

From	dror.openu@gmail.com
Date	2018-02-12 14:16 -0800
Message-ID	<18-02-010@comp.compilers>
In reply to	#1950

> [This at least used to be a standard topic in compiler texts.  You should
> be able to do either in O(1). -John]

I'm not sure how to calculate it in compile time, because the call frame
doesn't have a fixed size, because I'm using the same stack for storing the
call-frames and for the operations evaluation (iadd, isub, etc..). Maybe I
should split it into two stacks, one for the call frames, and the second for
the operation evaluations. I just sharing my thoughts.

[I would think that your compiler should always know how many partial
results are on the stack above the call frame so it should be able to
use a fixed offset to get to the chain pointers at the base of call
frame. -John]

[toc] | [prev] | [next] | [standalone]

#1953

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2018-02-13 12:10 +0100
Message-ID	<18-02-013@comp.compilers>
In reply to	#1951

Am 12.02.2018 um 23:16 schrieb dror.openu@gmail.com:
>> [This at least used to be a standard topic in compiler texts.  You should
>> be able to do either in O(1). -John]
>
> I'm not sure how to calculate it in compile time, because the call frame
> doesn't have a fixed size, because I'm using the same stack for storing the
> call-frames and for the operations evaluation (iadd, isub, etc..). Maybe I
> should split it into two stacks, one for the call frames, and the second for
> the operation evaluations. I just sharing my thoughts.
>
> [I would think that your compiler should always know how many partial
> results are on the stack above the call frame so it should be able to
> use a fixed offset to get to the chain pointers at the base of call
> frame. -John]

ACK

Simple compilers use a dedicated base pointer register (x86: eBP) for
their stack frame, but clever compilers will track the current stack
pointer offset to that base address.

The calling frames can have a variable size, when the local function is
called from various places in the code. I.e. the address of a local
variable in an outer scope can vary, fixed is only the variable's offset
to its own frame start. That's why the chain of all intermediate frame
pointers must be followed, until the frame of the variable of interest
is found.

Things can become worse if one of the calling functions is called
recursively, so that a variable number of frames has to be traversed
until the outmost frame is found.

DoDi

[toc] | [prev] | [next] | [standalone]

#1952

From	Louis Krupp <lkrupp@nospam.pssw.com.invalid>
Date	2018-02-13 00:42 -0700
Message-ID	<18-02-012@comp.compilers>
In reply to	#1950

On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:

>Suppose I have a simple C-like programming language: ...
>Like you can see, it supports nested functions.

gcc supports nested functions as an extension to C. Compiling this
program with -O0 -fdump-tree-all and looking at the generated files
might give you an idea of one way to do it:

#include <stdio.h>

int main(void)
{
	int k1 = 1;
	int p1(void)
	{
		int k2 = 2;
		int p2(void)
		{
			int k3 = 3;
			int p3(void)
			{
				return k1 + k2 + k3;
			}
			return p3();
		}
		return p2();
	}
	int n;

	n = p1();
	printf("%d\n", n);

	return 0;
}

Apparently, gcc chains stack frames, and accessing uplevel variables,
as when p3 uses k1 and k2, seems like it would take O(n) time, where n
is the nesting level.

The alternative, a display vector, seems like it would be easier to
implement unless you're doing this for a real machine with a real (and
therefore limited) set of registers.

Louis

[toc] | [prev] | [next] | [standalone]

#1954

From	Kaz Kylheku <217-679-0842@kylheku.com>
Date	2018-02-14 00:59 +0000
Message-ID	<18-02-016@comp.compilers>
In reply to	#1952

On 2018-02-13, Louis Krupp <lkrupp@nospam.pssw.com.invalid> wrote:
> On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:
>
>>Suppose I have a simple C-like programming language: ...
>>Like you can see, it supports nested functions.
>
> gcc supports nested functions as an extension to C. Compiling this
> program with -O0 -fdump-tree-all and looking at the generated files
> might give you an idea of one way to do it:
>
> #include <stdio.h>
>
> int main(void)
> {
> 	int k1 = 1;
> 	int p1(void)
> 	{
> 		int k2 = 2;
> 		int p2(void)
> 		{
> 			int k3 = 3;
> 			int p3(void)
> 			{
> 				return k1 + k2 + k3;
> 			}
> 			return p3();
> 		}
> 		return p2();
> 	}
> 	int n;
>
> 	n = p1();
> 	printf("%d\n", n);
>
> 	return 0;
> }
>
> Apparently, gcc chains stack frames, and accessing uplevel variables,
> as when p3 uses k1 and k2, seems like it would take O(n) time, where n
> is the nesting level.
>
> The alternative, a display vector, seems like it would be easier to

The problem is that these functions can recurse, including mutually.
So that is to say, you can end up with a situation in which you have
a single activation of main with a single k1 variable, but multiple
activations of p2, with multiple k2's all being live at the
same time.

To complicate things, the multiple active p2 functions can be lifted
into function pointers, all of which are simultaneously valid.

Each of those p2 pointers could be called back by some external function.
Those calls have to establish an environment which resolves to the
correct k2, and at the same time to the one and only k1.

That doesn't rule out display vectors, but it seems like each function
activation has to have its own complete display vector.

For instance the p2 level functions need a two-level vector. Under
the same main invocation, they share the vec[0] entry pointing to
the main frame; but each one has a different vec[1] referencing its
own frame.

When a p3 executes it has its own 3 element vec[].  The vec[0] and
vec[1] are copied from the parent p2. If a pointer to a p3 is captured,
then its vec[1] correctly references the particular dynamic p2 instance
in which it is scoped; vec[2] references its own environment.

Under this scheme, we basically we need o(n^2) storage for the vectors,
for n nesting levels in the absence of recursion.

The compiler can determine which functions aren't subject to indirection
and avoid allocating the trampoline space for that.

If functions are downward funargs only (turn to garbage when they go out
of scope), the trampoline space needed to capture a given function
invocation can be part of the activation record of the block in which
that function is declared. When that block terminates, the function is
garbage, and so is its trampoline.

> implement unless you're doing this for a real machine with a real (and
> therefore limited) set of registers.

Hmm. The vectors can be treated like any array, or set of local
variables (v0, v1, ..). We generate abstract AST code to load these and
use them; then that code gets subject to register allocation like
anything else.
[As I recall, you do need a display with a pointer for each lexical scope,
but so long as you don't expect closures to work, you can do it in
constant time with a fixed location where each scope stores its
current frame pointer.  If you want closures, then you're into tree
shaped stacks and garbage collected stack frames. -John]

[toc] | [prev] | [next] | [standalone]

#1955

From	George Neuner <gneuner2@comcast.net>
Date	2018-02-13 22:04 -0500
Message-ID	<18-02-018@comp.compilers>
In reply to	#1954

On Wed, 14 Feb 2018 00:59:34 +0000 (UTC), Kaz Kylheku <217-679-0842@kylheku.com> wrote:
>On 2018-02-13, Louis Krupp <lkrupp@nospam.pssw.com.invalid> wrote:
>> On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:
>>
>>>Suppose I have a simple C-like programming language: ...
>>>Like you can see, it supports nested functions.
>>
>> gcc supports nested functions as an extension to C. Compiling this
>> program with -O0 -fdump-tree-all and looking at the generated files
>> might give you an idea of one way to do it: ...

>> Apparently, gcc chains stack frames, and accessing uplevel variables,
>> as when p3 uses k1 and k2, seems like it would take O(n) time, where n
>> is the nesting level.
>>
>> The alternative, a display vector, seems like it would be easier to
>
>The problem is that these functions can recurse, including mutually.
>So that is to say, you can end up with a situation in which you have
>a single activation of main with a single k1 variable, but multiple
>activations of p2, with multiple k2's all being live at the
>same time.
>
>To complicate things, the multiple active p2 functions can be lifted
>into function pointers, all of which are simultaneously valid.
>
>Each of those p2 pointers could be called back by some external function.
>Those calls have to establish an environment which resolves to the
>correct k2, and at the same time to the one and only k1.
>
>That doesn't rule out display vectors, but it seems like each function
>activation has to have its own complete display vector.

As long as each function creates a new stack frame when called, then
recursion is not relevant and a single display is sufficient to handle
the nesting.

The display tracks lexical parentage [who defined  who], not call
chain parentage.  Each display entry points to the stack frame of the
last function invoked that was defined at the corresponding lexical
level.

The preamble of a function at, e.g., level 3, preserves display[3] in
its own stack frame, and sets display[3] to point to its own frame.
The postscript of the function restores the original display[3] value
when the function ends.
[Obviously, the compiler must keep track of lexical levels in order
that each function update the proper display entry.]

When a function at level 4 needs something at level 2, it just looks
to display[2] to find the proper stack frame.  Regardless of
intervening recursions, the lexical environment is maintained:
display[2] always will point to the latest invocation of the level 2
function which contains the current (level 4) function.

>For instance the p2 level functions need a two-level vector. Under
>the same main invocation, they share the vec[0] entry pointing to
>the main frame; but each one has a different vec[1] referencing its
>own frame.
>
>When a p3 executes it has its own 3 element vec[].  The vec[0] and
>vec[1] are copied from the parent p2. If a pointer to a p3 is captured,
>then its vec[1] correctly references the particular dynamic p2 instance
>in which it is scoped; vec[2] references its own environment.
>
>Under this scheme, we basically we need o(n^2) storage for the vectors,
>for n nesting levels in the absence of recursion.

No, you're missing something.  I just haven't figured out what.

>The compiler can determine which functions aren't subject to indirection
>and avoid allocating the trampoline space for that.

With the classic implementation, every non-leaf function has to do its
part to maintain the lexical environment.  But the allocation overhead
is just one saved pointer per stack frame.

>[As I recall, you do need a display with a pointer for each lexical scope,
>but so long as you don't expect closures to work, you can do it in
>constant time with a fixed location where each scope stores its
>current frame pointer.  If you want closures, then you're into tree
>shaped stacks and garbage collected stack frames. -John]

You still do have closures of sorts ... they just are distributed
through the stack frames of the current function call chain.

A display cannot handle persistent closures, such as in Lisp, where a
nested function may be invoked after its parent terminates, outside of
the runtime environment that defined it.

George

[toc] | [prev] | [next] | [standalone]

#1958

From	Kaz Kylheku <217-679-0842@kylheku.com>
Date	2018-02-14 18:06 +0000
Message-ID	<18-02-023@comp.compilers>
In reply to	#1955

On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote:
> On Wed, 14 Feb 2018 00:59:34 +0000 (UTC), Kaz Kylheku <217-679-0842@kylheku.com> wrote:
>>On 2018-02-13, Louis Krupp <lkrupp@nospam.pssw.com.invalid> wrote:
>>> On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:
>>>
>>>>Suppose I have a simple C-like programming language: ...
>>>>Like you can see, it supports nested functions.
>>>
>>> gcc supports nested functions as an extension to C. Compiling this
>>> program with -O0 -fdump-tree-all and looking at the generated files
>>> might give you an idea of one way to do it: ...
>
>>> Apparently, gcc chains stack frames, and accessing uplevel variables,
>>> as when p3 uses k1 and k2, seems like it would take O(n) time, where n
>>> is the nesting level.
>>>
>>> The alternative, a display vector, seems like it would be easier to
>>
>>The problem is that these functions can recurse, including mutually.
>>So that is to say, you can end up with a situation in which you have
>>a single activation of main with a single k1 variable, but multiple
>>activations of p2, with multiple k2's all being live at the
>>same time.
>>
>>To complicate things, the multiple active p2 functions can be lifted
>>into function pointers, all of which are simultaneously valid.
>>
>>Each of those p2 pointers could be called back by some external function.
>>Those calls have to establish an environment which resolves to the
>>correct k2, and at the same time to the one and only k1.
>>
>>That doesn't rule out display vectors, but it seems like each function
>>activation has to have its own complete display vector.
>
> As long as each function creates a new stack frame when called, then
> recursion is not relevant and a single display is sufficient to handle
> the nesting.
>
> The display tracks lexical parentage [who defined  who], not call
> chain parentage.  Each display entry points to the stack frame of the
> last function invoked that was defined at the corresponding lexical
> level.
>
> The preamble of a function at, e.g., level 3, preserves display[3] in
> its own stack frame, and sets display[3] to point to its own frame.
> The postscript of the function restores the original display[3] value
> when the function ends.
> [Obviously, the compiler must keep track of lexical levels in order
> that each function update the proper display entry.]

Yes; it is clear that there can be a single display vector which is
associated with the entire lexical scope as a whole, and then the
local scopes just mutate the entries in the vector, taking care
to save and restore them (which requires some local storage, by the
way, but less than in the scheme I describe).

The problem with this saving and restoring is that it's not thread safe.
If there are multiple invocations of the p2 scope, and they can be
entered by independent threads, we're in trouble if there is a single
pointer to that scope in a single display.  (Excluding that from being a
requirement is fairly reasonable.)

The model I described simply replicates the display, allowing immutable
local copies of it in each frame. When a scope is entered via
indirection, it just installs a context pointer referencing the display.
(Of course, that context pointer has to be saved and restored.
It could just be the regular frame pointer, with the display being
at some well known offset in the frame.)

> When a function at level 4 needs something at level 2, it just looks
> to display[2] to find the proper stack frame.  Regardless of
> intervening recursions, the lexical environment is maintained:
> display[2] always will point to the latest invocation of the level 2
> function which contains the current (level 4) function.

(Under what I described, that display[2] is in that function's own
display[] array rather than a shared one.  That display[2] is
initialized and never mutated.)

> No, you're missing something.  I just haven't figured out what.

I'm missing the space optimization that a single display can be
(serially) shared, in spite of multiple activations of scopes, as long
as the entries are saved and restored.

[toc] | [prev] | [next] | [standalone]

#1961

From	George Neuner <gneuner2@comcast.net>
Date	2018-02-15 11:41 -0500
Message-ID	<18-02-029@comp.compilers>
In reply to	#1958

On Wed, 14 Feb 2018 18:06:40 +0000 (UTC), Kaz Kylheku
<217-679-0842@kylheku.com> wrote:

>On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote:
>
>> As long as each function creates a new stack frame when called, then
>> recursion is not relevant and a single display is sufficient to handle
>> the nesting.
>>
>> The display tracks lexical parentage [who defined  who], not call
>> chain parentage.  Each display entry points to the stack frame of the
>> last function invoked that was defined at the corresponding lexical
>> level.
>>
>> The preamble of a function at, e.g., level 3, preserves display[3] in
>> its own stack frame, and sets display[3] to point to its own frame.
>> The postscript of the function restores the original display[3] value
>> when the function ends.
>> [Obviously, the compiler must keep track of lexical levels in order
>> that each function update the proper display entry.]
>
>Yes; it is clear that there can be a single display vector which is
>associated with the entire lexical scope as a whole, and then the
>local scopes just mutate the entries in the vector, taking care
>to save and restore them (which requires some local storage, by the
>way, but less than in the scheme I describe).
>
>The problem with this saving and restoring is that it's not thread safe.
>If there are multiple invocations of the p2 scope, and they can be
>entered by independent threads, we're in trouble if there is a single
>pointer to that scope in a single display.  (Excluding that from being a
>requirement is fairly reasonable.)
>
>The model I described simply replicates the display, allowing immutable
>local copies of it in each frame. When a scope is entered via
>indirection, it just installs a context pointer referencing the display.
>(Of course, that context pointer has to be saved and restored.
>It could just be the regular frame pointer, with the display being
>at some well known offset in the frame.)

I understand, and you're correct that the display needs to be counted
as part of the thread switch context.  But each thread also would be
using from a separate stack, so having the entire display saved in
each frame is overkill when all that's needed is a single pointer.

If you're suggesting that the stacks be shared among threads, then
you're no longer talking about "threads" per se, but rather about
"scheduler activations", which are subtly different from threads.
I'm not aware of any non-research system that is based on activations.


>> When a function at level 4 needs something at level 2, it just looks
>> to display[2] to find the proper stack frame.  Regardless of
>> intervening recursions, the lexical environment is maintained:
>> display[2] always will point to the latest invocation of the level 2
>> function which contains the current (level 4) function.
>
>(Under what I described, that display[2] is in that function's own
>display[] array rather than a shared one.  That display[2] is
>initialized and never mutated.)
>
>> No, you're missing something.  I just haven't figured out what.
>
>I'm missing the space optimization that a single display can be
>(serially) shared, in spite of multiple activations of scopes, as long
>as the entries are saved and restored.

No worries.  IME, displays don't get much respect from modern
textbooks - they are mentioned mostly in passing.  The basic problem
is that, while they are quite useful for an Algol-like language, they
can be problematic for an OO or functional language.

George

[toc] | [prev] | [next] | [standalone]

#1963

From	Kaz Kylheku <217-679-0842@kylheku.com>
Date	2018-02-17 16:13 +0000
Message-ID	<18-02-032@comp.compilers>
In reply to	#1961

On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
> On Wed, 14 Feb 2018 18:06:40 +0000 (UTC), Kaz Kylheku
><217-679-0842@kylheku.com> wrote:
>
>>On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote:
>>
>>> As long as each function creates a new stack frame when called, then
>>> recursion is not relevant and a single display is sufficient to handle
>>> the nesting.
>>>
>>> The display tracks lexical parentage [who defined  who], not call
>>> chain parentage.  Each display entry points to the stack frame of the
>>> last function invoked that was defined at the corresponding lexical
>>> level.
>>>
>>> The preamble of a function at, e.g., level 3, preserves display[3] in
>>> its own stack frame, and sets display[3] to point to its own frame.
>>> The postscript of the function restores the original display[3] value
>>> when the function ends.
>>> [Obviously, the compiler must keep track of lexical levels in order
>>> that each function update the proper display entry.]
>>
>>Yes; it is clear that there can be a single display vector which is
>>associated with the entire lexical scope as a whole, and then the
>>local scopes just mutate the entries in the vector, taking care
>>to save and restore them (which requires some local storage, by the
>>way, but less than in the scheme I describe).
>>
>>The problem with this saving and restoring is that it's not thread safe.
>>If there are multiple invocations of the p2 scope, and they can be
>>entered by independent threads, we're in trouble if there is a single
>>pointer to that scope in a single display.  (Excluding that from being a
>>requirement is fairly reasonable.)
>>
>>The model I described simply replicates the display, allowing immutable
>>local copies of it in each frame. When a scope is entered via
>>indirection, it just installs a context pointer referencing the display.
>>(Of course, that context pointer has to be saved and restored.
>>It could just be the regular frame pointer, with the display being
>>at some well known offset in the frame.)
>
> I understand, and you're correct that the display needs to be counted
> as part of the thread switch context.  But each thread also would be
> using from a separate stack, so having the entire display saved in
> each frame is overkill when all that's needed is a single pointer.

The address of a local function can be taken. When that function is
called from any context, possibly another thread, it has access to the
locals that were lexically apparent there.

> If you're suggesting that the stacks be shared among threads, then
> you're no longer talking about "threads" per se, but rather about

Stack *access* is shared among threads. E.g. a thread can register
a stack object in a shared queue (such as a synchronization wait queue).
As long as it is dequeued before that frame terminates, everything is
cool. Commonly done.

If a downward-only lexical closure is implemented as a pointer into some
threads's stack memory (plus a function) that object can be passed to
another thread and used (as long as the context which created that
closure doesn't terminate).

When that sort of thing is done with threads, we cannot have a single
display array per lexical scope being mutated.

If the stack closures maintain local displays which are immutable,
then it can work.

The function which is called from the other thread is using that
thread's stack, of course, for the parameter passing and return and
all. Also for its freshly instantiated locals. However, the material
in the captured lexical scope is in the original stack.  So is the
display vector which accesses the multiple levels of it.

The display is only used for acessing the lexical things outside of the
local function's body; its own locals and parameters are on the local
stack.

Ahd here is the kicker: *that* local environment can be captured by that
thread, together with that other environment. No problem; that thread
has a copy of the display in its stack frame and extends it with a pointer
to its own stuff. So now there is a display which references two
different stacks.

> No worries.  IME, displays don't get much respect from modern
> textbooks - they are mentioned mostly in passing.

This is probably because of

1) a few decades of dominance of C, C++ and Java whose standard dialects
don't have scope capturing local functions. So no interest in the
mainstream software dev in local functions as such.
OOP techniques somewhat make up for the lack of closures. You use
an object with a method; the captured stuff is in the object.
Languages with "class scope" make it easy for the code in a class
method to access the object with simple, unqualified name references
like "foo", so that at least the body of a method is comparably
convenient to that of a closure.

2) real languages have proper lexical closures with indefinite
lifetimes. The display concept is still relevant but probably turn into
something else under a different name.  (We are seeing much more use
of and interest in languages that have closures; but that interest is
largely in new ones that are poorly optimized.)

> The basic problem
> is that, while they are quite useful for an Algol-like language, they
> can be problematic for an OO or functional language.

Displays, or something equivalent, is necessary to work out the nesting
of captured lexical scopes. Different contours of the scope have dynamic
activations that differ in lifetimes and have to be separate objects.
Those objects somehow have to be centrally referenced from places that
*somehow* have simultaneous access to them.

In a purely functional language, a simplifying aspect is that bindings
are immutable. We can capture a lexical scope by copying it; the user
cannot tell the difference between that and the original scope.
If variables are mutable, you know that you have a copy because a
variable assignment in the original scope isn't reflected in the
captured one.

[toc] | [prev] | [next] | [standalone]

#1964

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2018-02-17 16:39 +0000
Message-ID	<18-02-034@comp.compilers>
In reply to	#1963

Kaz Kylheku <217-679-0842@kylheku.com> writes:
>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>> No worries.  IME, displays don't get much respect from modern
>> textbooks - they are mentioned mostly in passing.
>
>This is probably because of

This is because displays were found to be more costly for Algol-like
languages (there was an influential paper that convinced everyone of
that).  IIRC the additional cost is in updating the display on calls
and returns.

Unfortunately, I don't have a reference to the paper above.  You can
probably find a reference in old compiler books.

Of course, given that you are using a different language that probably
uses nested functions quite differently from what was used in that
paper, you may want to reevaluate the balance of display vs. static
chains.

>> The basic problem
>> is that, while they are quite useful for an Algol-like language, they
>> can be problematic for an OO or functional language.

The funny thing is that, while displays were found to be suboptimal
for Algol-like languages, they can be used for type inclusion tests,
used in many OO languages [Cohen:acm:toplas:1991].  There are a bunch
of other solutions to this problem, so I don't know if displays are
used for this purpose in current systems, though.

@Article{Cohen:acm:toplas:1991,
  author =	"Norman H. Cohen",
  title =	"Type-Extension Type Tests Can Be Performed In Constant
		 Time",
  journal =	"ACM Transactions on Programming Languages and
		 Systems",
  volume =	"13",
  number =	"4",
  pages =	"626--629",
  month =	oct,
  year = 	"1991",
  refs = 	"2",
  checked =	"19940624",
  source =	"Dept. Library",
  keywords =	"class, descriptor, display, extensible data type,
		 inheritance, membership test, object-oriented
		 programming, type extension, type test",
  note = 	"Technical Correspondence",
  abstract =	"Wirth's proposal for type extensions includes an
		 algorithm for determining whether a give value belongs
		 to an extension of a given type. In the worst case,
		 this algorithm takes time proportional to the depth of
		 the type-extension hierarchy. Wirth describes the loop
		 in this algorithm as ``unavoidable,'' but in fact, the
		 test can be performed in constant time by associating a
		 ``display'' of base types with each type descriptor.",
  xref = 	"Wirth:acm:toplas:1988",
  reffrom =	"Corney:Gough:plasa:1994",
}

I dimly remember a letter to the editor by Wirth where he appreciated
that finally a good use for the display had been found.

>Displays, or something equivalent, is necessary to work out the nesting
>of captured lexical scopes. Different contours of the scope have dynamic
>activations that differ in lifetimes and have to be separate objects.
>Those objects somehow have to be centrally referenced from places that
>*somehow* have simultaneous access to them.

Static link chains work fine.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#1966

From	George Neuner <gneuner2@comcast.net>
Date	2018-03-01 13:48 -0500
Message-ID	<18-03-002@comp.compilers>
In reply to	#1964

On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>Kaz Kylheku <217-679-0842@kylheku.com> writes:
>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>>> No worries.  IME, displays don't get much respect from modern
>>> textbooks - they are mentioned mostly in passing.
>>
>>This is probably because of
>
>This is because displays were found to be more costly for Algol-like
>languages

More costly than what?

>(there was an influential paper that convinced everyone of
>that).  IIRC the additional cost is in updating the display on calls
>and returns.

Which is no different from the overhead to maintain static links, and
faster to use when you need non-locals more than one lexical level
distant.

>Unfortunately, I don't have a reference to the paper above.  You can
>probably find a reference in old compiler books.

I'd really like to see that paper to find out what they are comparing
to display [and using what language].

It certainly is true that closeure conversion is a better solution
overall because it flattens all non-local accesses to use a single
indirection ... but it also significantly complicates the compiler,
and creates a need for handling wide pointers (or equivalent).  Many
useful languages simply don't need that extra complexity.


>Of course, given that you are using a different language that probably
>uses nested functions quite differently from what was used in that
>paper, you may want to reevaluate the balance of display vs. static
>chains.

I would never use static chains ... if display wasn't viable I would
jump straight to closure conversion.


>>> The basic problem
>>> is that, while they are quite useful for an Algol-like language, they
>>> can be problematic for an OO or functional language.
>
>The funny thing is that, while displays were found to be suboptimal
>for Algol-like languages, they can be used for type inclusion tests,
>used in many OO languages [Cohen:acm:toplas:1991].  There are a bunch
>of other solutions to this problem, so I don't know if displays are
>used for this purpose in current systems, though.
>
>@Article{Cohen:acm:toplas:1991,
>  author =	"Norman H. Cohen",
>  title =	"Type-Extension Type Tests Can Be Performed In Constant
>		 Time",
>  journal =	"ACM Transactions on Programming Languages and
>		 Systems",
>  volume =	"13",
>  number =	"4",
>  pages =	"626--629",
>  month =	oct,
>  year = 	"1991",
>  refs = 	"2",
>  checked =	"19940624",
>  source =	"Dept. Library",
>  keywords =	"class, descriptor, display, extensible data type,
>		 inheritance, membership test, object-oriented
>		 programming, type extension, type test",
>  note = 	"Technical Correspondence",
>  abstract =	"Wirth's proposal for type extensions includes an
>		 algorithm for determining whether a give value belongs
>		 to an extension of a given type. In the worst case,
>		 this algorithm takes time proportional to the depth of
>		 the type-extension hierarchy. Wirth describes the loop
>		 in this algorithm as ``unavoidable,'' but in fact, the
>		 test can be performed in constant time by associating a
>		 ``display'' of base types with each type descriptor.",
>  xref = 	"Wirth:acm:toplas:1988",
>  reffrom =	"Corney:Gough:plasa:1994",
>}
>
>I dimly remember a letter to the editor by Wirth where he appreciated
>that finally a good use for the display had been found.

Thanks for that pointer ... going to have to read that.


>>Displays, or something equivalent, is necessary to work out the nesting
>>of captured lexical scopes. Different contours of the scope have dynamic
>>activations that differ in lifetimes and have to be separate objects.
>>Those objects somehow have to be centrally referenced from places that
>>*somehow* have simultaneous access to them.
>
>Static link chains work fine.

Yes, they do work - but they are the lowest performance option for
non-local accesses.


>- anton
George

[toc] | [prev] | [next] | [standalone]

#1968

From	Kaz Kylheku <217-679-0842@kylheku.com>
Date	2018-03-01 19:26 +0000
Message-ID	<18-03-006@comp.compilers>
In reply to	#1966

On 2018-03-01, George Neuner <gneuner2@comcast.net> wrote:
> On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:
>
>>Kaz Kylheku <217-679-0842@kylheku.com> writes:
>>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>>>> No worries.  IME, displays don't get much respect from modern
>>>> textbooks - they are mentioned mostly in passing.
>>>
>>>This is probably because of
>>
>>This is because displays were found to be more costly for Algol-like
>>languages
>
> More costly than what?
>
>>(there was an influential paper that convinced everyone of
>>that).  IIRC the additional cost is in updating the display on calls
>>and returns.
>
> Which is no different from the overhead to maintain static links, and
> faster to use when you need non-locals more than one lexical level
> distant.
>
>>Unfortunately, I don't have a reference to the paper above.  You can
>>probably find a reference in old compiler books.
>
> I'd really like to see that paper to find out what they are comparing
> to display [and using what language].
>
> It certainly is true that closeure conversion is a better solution
> overall because it flattens all non-local accesses to use a single
> indirection ... but it also significantly complicates the compiler,
> and creates a need for handling wide pointers (or equivalent).  Many
> useful languages simply don't need that extra complexity.

How do flat representations handle the fact that different levels
of the lexical environment have different lifetimes?

E.g.

   (let ((a 1) (b 2))
     (loop ...
       (let ((c 3) (d 4))
         (lambda ()))))

The (c d) environment is freshly instantiated each time the loop
re-enters the inner let, but the (a b) environment is the same.

Each (lambda ()) invocation must capture a different environment,
with a different (c d) but shared (a b).

If there is just a flat vector, it must still contain an extra
level of indirection, no?

If the language is purely functional, then the bindings are immutable,
and so the (lambda ()) can copy the whole damn thing.

If bindings are mutable, then when one lambda modifies a or b,
the other lambdas (with their different environment) must see the
modification. Yet they are shielded from each other's modification
of c or d.

So the upshot is that there has to be an extra level of indirection.

It cannot be the case that "all non-local accesses use a single
indirection".

The flat environment has to contain entries which are not the storage
bindings themselves, but pointers to bindings.

For instance (let's keep it simple and concrete) it could be a vector of
conses:

  #((a . 1) (b . 2) (c . 3) (d . 4))

This is not direct access: we have to dereference a pointer to index
into the vector.  Then we retrieve a cons cell which is a pointer,
we have to dereference this to get to the cdr field to get to the value.

So then whenever the loop enters the (c d) environment, the the (c ...)
and (e ...) conses are destructively replaced with fresh conses.

What the lambda operator then has to do to fetch the environment
for the closure is to take a shallow-copy snapshot of the vector:
allocate a new vector of the same length which takes the same conses.

So if instead of the above, we have a display, we can do this:

   #(#(1 2) #(3 4)) ;; env vec: effectively, a two-level display

Instead of dereferencing a vector and then a cons cell, we
dereference a vector and then an inner vector.

In this display case (pun intended) when the inner let is entered,
the #(3 4) is replaced with #(nil nil), destructively,
analogous to the conses being replaced in the "flat" case.
Note that this is likely more efficient than allocating two conses.

The lambda operator shallow-copies the whole vector, just like
in the flat environment case.

> I would never use static chains ... if display wasn't viable I would
> jump straight to closure conversion.

What is the meaning of "display", anyway?

It seems like it just means "array" instead of "linked list" (just
specifically in the context of activation frame environments).

The C main function's argv[] is a "display" of strings, whereas
the Lisp list '("bash" "-c" "ls") is a "chain".

Except that the context isn't environment access and so we don't use
the word "display".
[A display is an array of pointers to the frames of the enclosing
scopes.  You can access any variable in an enclosing frame by
indirecting through the frame's entry in the current display and then
indexing.  This is for languges like Algol and Pascal, not Lisp.  I
don't ever recall anyone talking about displays and threads, although
I think they should work OK so long as you keep your displays on the
local stack. -John]

[toc] | [prev] | [next] | [standalone]

#1969

From	Kaz Kylheku <217-679-0842@kylheku.com>
Date	2018-03-02 00:38 +0000
Message-ID	<18-03-008@comp.compilers>
In reply to	#1968

On 2018-03-01, Kaz Kylheku <217-679-0842@kylheku.com> wrote:
> John:
> [A display is an array of pointers to the frames of the enclosing
> scopes.  You can access any variable in an enclosing frame by
> indirecting through the frame's entry in the current display and then
> indexing.  This is for languges like Algol and Pascal, not Lisp.  I

This is absolutely relevant to Lisp dialects with lexical scoping,
like Common Lisp and Scheme (or anything worth its salt, really).
There are no first-class functions without lexical scoping.

Structure and Interpretation of Computer Programs (SICP) discusses
something similar to displays in "5.5.6 Lexical Addressing":

https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.html#%_sec_5.5.6

"We can exploit this fact by inventing a new kind of variable-lookup
operation, lexical-address-lookup, that takes as arguments an
environment and a lexical address that consists of two numbers: a frame
number, which specifies how many frames to pass over, and a displacement
number, which specifies how many variables to pass over in that frame.
Lexical-address-lookup will produce the value of the variable stored at
that lexical address relative to the current environment."

The focus of that section is eliminating the run-time search which
actually finds a name by symbol matching, with numeric indexing.
However, if we can "pass over" the frames by indexing through a vector
of them, then we have a display. We can also directly index into the
frame by also representing it as a vector, so we don't have to
"pass over" n-1 variables to get to the n-th one.

[I was thinking of closures when I said not Lisp.  I suppose you could
use displays and garbage collect them but I haven't really thought
about the implications. -John]

[toc] | [prev] | [next] | [standalone]

#1970

From	George Neuner <gneuner2@comcast.net>
Date	2018-03-02 02:48 -0500
Message-ID	<18-03-010@comp.compilers>
In reply to	#1968

On Thu, 1 Mar 2018 19:26:06 +0000 (UTC), Kaz Kylheku
<217-679-0842@kylheku.com> wrote:

>On 2018-03-01, George Neuner <gneuner2@comcast.net> wrote:
>
>> It certainly is true that closeure conversion is a better solution
>> overall because it flattens all non-local accesses to use a single
>> indirection ... but it also significantly complicates the compiler,
>> and creates a need for handling wide pointers (or equivalent).  Many
>> useful languages simply don't need that extra complexity.
>
>How do flat representations handle the fact that different levels
>of the lexical environment have different lifetimes?

In fact they don't ... all the "levels" of the lexical environment
must persist IN SOME FASHION for as long as the function lives.  That
is the whole point of the "closure" - to persist the environment.

>E.g.
>
>   (let ((a 1) (b 2))
>     (loop ...
>       (let ((c 3) (d 4))
>         (lambda ()))))

This particular example actually is better suited to lambda lifting
than to closure conversion, but ...

>The (c d) environment is freshly instantiated each time the loop
>re-enters the inner let, but the (a b) environment is the same.
>
>Each (lambda ()) invocation must capture a different environment,
>with a different (c d) but shared (a b).
>
>If there is just a flat vector, it must still contain an extra
>level of indirection, no?

As you discuss more below, the answer depends on whether or not any
binding are mutated and shared.  Note, it does not depend on whether
they are "mutable" per the language, but whether they actually are
mutated.

The real point of closure conversion is to flatten the function's
environment as much as is possible to minimize the runtime effort of
accessing the variable.  How far you can take it depends on the
complexity of your compiler.

If no bindings are mutated, the values simply can be copied into a
single structure which is passed to the function.

   [ a  b  c  d ]

Note, that this also is the case for shared bindings which are NOT
mutated. (more below)

If any of the bindings are mutated, then you have to assume that they
also are shared (unless you can prove otherwise), and so you need to
provide a more complex environment.

What this looks like depends on how much you want to optimize the
non-shared bindings:

The simplest and most direct translation: you allocate a separate
structure for each scope, and a vector of pointers to them.  You then
pass the pointer vector to the function.

   [ +  + ]
     |  |
     |  +--> [ c d ]
     |
     +-----> [ a b ]

More complex: you allocate separately those bindings which are
mutated, and create a flat structure that contains values for the
static bindings and pointers to the mutable ones.

   [ a  +  c  d ]
        |
        +--> [ b ]

More complex still: figure out whether mutated bindings actually are
shared.  A mutated variable, e.g., used as a private counter, need not
be allocated separately.

This can get quite complicated to figure out if there are multiple
functions mutating different sets of shared bindings.

Also note that if the closures are not passed upward out of the
definition environment, the environment "structures" may be stack
allocated [if the runtime uses mark:release style stack(s)].

>If the language is purely functional, then the bindings are immutable,
>and so the (lambda ()) can copy the whole damn thing.
>
>If bindings are mutable, then when one lambda modifies a or b,
>the other lambdas (with their different environment) must see the
>modification. Yet they are shielded from each other's modification
>of c or d.
>
>So the upshot is that there has to be an extra level of indirection.
>
>It cannot be the case that "all non-local accesses use a single
>indirection".

An extra indirection is required only for those bindings which are
BOTH mutated AND shared.  Sharing alone or mutation alone does not
require a separate allocation.

The challenge for the compiler is to figure whether a binding actually
is mutated only, shared only, or both mutated and shared.

>> I would never use static chains ... if display wasn't viable I would
>> jump straight to closure conversion.
>
>What is the meaning of "display", anyway?
>
>It seems like it just means "array" instead of "linked list" (just
>specifically in the context of activation frame environments).

More or less.  I don't know who named the construct originally, or why
the particular name "display" was chosen.

The object, however, is simple.  With nested functions and recursion,
every call frame logically is a member of 2 lists: "who called who",
and "who defined who".

"who called who" pretty much has to be maintained as a list chained
through the call frames themselves because the call stack can become
arbitrarily deep, and because (modulo GC, debugger, etc) there never
is any runtime need to follow more than 1 pointer.

However, "who defined who" hierarchies typically are fairly shallow:
some large scale studies done on Pascal showed that 16 lexical scopes
sufficed for 99+% of the code they examined.

A display optimizes non-local access (more than 1 lexical scope away)
by maintaining "who defined who" as a vector of frame pointers rather
than as yet another list chained through the frames themselves.  It
has the same O(1) maintenance overhead as the chained list, but its
use is O(1) rather than O(n).

>The C main function's argv[] is a "display" of strings, whereas
>the Lisp list '("bash" "-c" "ls") is a "chain".

Yes.

>Except that the context isn't environment access and so we don't use
>the word "display".

Yes, the term "display" - referring to a vector of pointers - is
rather unique to compiler construction.  I haven't seen it used as
such elsewhere.

>[A display is an array of pointers to the frames of the enclosing
>scopes.  You can access any variable in an enclosing frame by
>indirecting through the frame's entry in the current display and then
>indexing.  This is for languges like Algol and Pascal, not Lisp.  I
>don't ever recall anyone talking about displays and threads, although
>I think they should work OK so long as you keep your displays on the
>local stack. -John]

Yes - the display has to be considered aa part of the thread switch
context - like CPU registers.

The main point with a display is that the function's execution
environment is (Lisp eq) identical to its definition environment.
There is no separate allocation(s) created such that the function can
be called after its definition environment is gone.

George

[toc] | [prev] | [next] | [standalone]

#1972

From	rpw3@rpw3.org (Rob Warnock)
Date	2018-03-03 16:00 +0000
Message-ID	<18-03-013@comp.compilers>
In reply to	#1968

Kaz Kylheku  <217-679-0842@kylheku.com> wrote:
+---------------
| George Neuner <gneuner2@comcast.net> wrote:
| > It certainly is true that closeure conversion is a better solution
| > overall because it flattens all non-local accesses to use a single
| > indirection ... but it also significantly complicates the compiler,
| > and creates a need for handling wide pointers (or equivalent).  Many
| > useful languages simply don't need that extra complexity.
|
| How do flat representations handle the fact that different levels
| of the lexical environment have different lifetimes?
+---------------

I'm not sure if this counts as "closure conversion" or not, but some
Common Lisp implementations [such as CMUCL] use flat environments just
fine (even in the interpreter).  The trick is to do some minimal
pre-analysis of the code, to see which closed-over variables are *not*
ever mutated or are *not* used in closures which "escape" the original
lexical context. All such variables may be copied into a single "flat"
vector at closure creation time. All other closed-over variables are
converted into "indirect" variables -- such variables are individually
heap-allocated, pointed to by "ref" pointers which are also copied
into a the same flat environment vector at closure creation time.
Then when the closure is called, all of the up-level variables are
either *in* the (flat) environment or available via *one* indirection.
Because all of the "ref" pointers for a given environment variable in
all the (potential) copies point to the same heap location, closures
which mutate a shared variable work just fine. And all the rest are
either read-only [and thus all the copies are the same] or else
mutated and *not* accessed in any interior closure which escapes.

This is normally performs better than linked environments or even
displays, unless there are a *large* number of nested closures. In the
latter case, when each level of closure is created, it will have to
copy all of its free variables from its parent's flat environment into
its own.

[I apologize if I haven't explained this very well.]

This approach is not unique to Lisp. Appel discusses it for ML in his
"Compiling With Continuations".

-Rob

-----
Rob Warnock		<rpw3@rpw3.org>
627 26th Avenue		<http://rpw3.org/>
San Mateo, CA 94403

[toc] | [prev] | [next] | [standalone]

#1971

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2018-03-02 09:30 +0000
Message-ID	<18-03-012@comp.compilers>
In reply to	#1966

George Neuner <gneuner2@comcast.net> writes:
>On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
>(Anton Ertl) wrote:
>
>>Kaz Kylheku <217-679-0842@kylheku.com> writes:
>>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>>>> No worries.  IME, displays don't get much respect from modern
>>>> textbooks - they are mentioned mostly in passing.
>>>
>>>This is probably because of
>>
>>This is because displays were found to be more costly for Algol-like
>>languages
>
>More costly than what?

Static link chains.

>> IIRC the additional cost is in updating the display on calls
>>and returns.
>
>Which is no different from the overhead to maintain static links

Wrong.  Consider the following nesting of functions:

A
 B
  C
D

With a display, in C you have built up a display pointing to the
frames of C, B, and A.  When performing a call from C to D, you throw
that away and replace it with a display pointing to just the frame of
D.  When returning from D, you have to restore the old display.

If you don't maintain a static link chain, you have to save the
complete display of C on the call and restore it on return.  Note that
D may itself call functions that need more display, so you don't get
away with just saving and restoring the first slot of the display.
IIRC this was the variant looked at in the paper that concluded that
displays are more costly.

If you maintain a static link chain, you can restore the display of C
from the static link chain, but you now have the cost of maintaining
the static link chain and in addition the cost of maintaining the
display, which is obviously more costly than just the static chain
alone.  If you go that way, you can actually produce an intermediate
scheme, where you cache static link accesses that you need, and the
cached accesses are similar to (the needed part of) a display.

>I'd really like to see that paper to find out what they are comparing
>to display [and using what language].

The language was probably Algol or one of it's direct offspring.

Anyway, if I get around to it, I will search for a reference to the
paper in our library, but if someone else can supply the reference,
that would be cool.

>>Static link chains work fine.
>
>Yes, they do work - but they are the lowest performance option for
>non-local accesses.

The first non-local level has the same performance as a display.  Are
deeper accesses frequent enough to pay for the increased cost of
maintaining the display?  The paper's answer was no, but that may vary
from language to language (and with usage patterns).  Relative
hardware costs have also shifted somewhat.  Bottom line: Better
reevaluate the options yourself.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#1975

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2018-03-05 15:01 +0000
Message-ID	<18-03-018@comp.compilers>
In reply to	#1971

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>George Neuner <gneuner2@comcast.net> writes:
>>I'd really like to see that paper to find out what they are comparing
>>to display [and using what language].
>
>The language was probably Algol or one of it's direct offspring.
>
>Anyway, if I get around to it, I will search for a reference to the
>paper in our library, but if someone else can supply the reference,
>that would be cool.

I looked in some older compiler books, but did not find a reference to
such a paper.  What I found:

* Aho and Ullman (1977) only mention displays as ways to implement
  static scoping.

* Aho, Sethi, Ullman (1986) also mention access links (static links).

* Fischer and Leblanc (1988) discuss both, and say the following:

  |For example, a study of reference patterns in a wide variety of
  |Simula 67 programs (Magnusson 1982) found that fully 80% if all
  |references were to variables at the outermost level or to local
  |variables.  Another 17% of the references were to the block
  |immediately surrounding the local one. [...] Thus few references
  |will actually require following the static chain, and most of those
  |will require only one level of such indirection.

  |Magnussen, K. 1982. "Identifier references in Simula 67 programs."
  |Simula Newsletter 10(2):

  They do not write about performance disadvantages of displays,
  though.

I also looked if I could find Wirth's response to Cohen's paper, and I
found it:

<https://dl.acm.org/citation.cfm?id=214521> (probably behind a paywall
for many).

And there he gave the reason for not using a display for accessing
non-local variables:

 |Upon closer analysis, we decided to discard the use of a display
 |that was already in our first Pascal compiler (1969).  The reason
 |was that (1) the display requires updating for every procedure call
 |and return (!) that causes a change of context, and (2) variables at
 |intermediate levels (neither local nor global) are referenced quite
 |rarely.  As a result, the maintenance of a display turned out to be
 |more costly than the inefficient access of nonlocal variables via a
 |static link.

So apparently there was no influential paper in the 60s or 70s that
damned the display.  Wirth and his collaborators discovered the
performance disadvantages of displays by original research, but
apparently did not publish this piece of knowledge before 1991.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#1977

From	George Neuner <gneuner2@comcast.net>
Date	2018-03-05 15:51 -0500
Message-ID	<18-03-022@comp.compilers>
In reply to	#1975

On Mon, 05 Mar 2018 15:01:33 GMT, anton@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>I also looked if I could find Wirth's response to Cohen's paper, and I
>found it:
>
><https://dl.acm.org/citation.cfm?id=214521> (probably behind a paywall
>for many).
>
>And there he gave the reason for not using a display for accessing
>non-local variables:
>
> |Upon closer analysis, we decided to discard the use of a display
> |that was already in our first Pascal compiler (1969).  The reason
> |was that (1) the display requires updating for every procedure call
> |and return (!) that causes a change of context, and (2) variables at
> |intermediate levels (neither local nor global) are referenced quite
> |rarely.  As a result, the maintenance of a display turned out to be
> |more costly than the inefficient access of nonlocal variables via a
> |static link.
>
>So apparently there was no influential paper in the 60s or 70s that
>damned the display.  Wirth and his collaborators discovered the
>performance disadvantages of displays by original research, but
>apparently did not publish this piece of knowledge before 1991.
>
>- anton

I may be mistaken on the timeline, but it seems to me that in 1990
Wirth already was working on Oberon, which, like Modula-2, had
visibility rules that were problematic for a display implementation.

I think he also had seen how people had used Modula-2, and before that
the "unit" [modular] extended Pascals.  Programmers in these languages
tended to prefer the modules to control visibility and to write
flatter code that made less use of nested functions.

OTOOH, there were the Lisp programmers.

Even in Lisp, deep scope accesses are not that common.  But they
aren't exactly rare either ... sometimes you just have to step-wise
build up the context for a function.

I think the real issue was that, by 1990, everybody in the language
research community knew that closure conversion was a better choice
than either static chains or displays: not just for performance, but
because it enables more freedom and functionality in the language.

YMMV,
George

[toc] | [prev] | [next] | [standalone]

#1976

From	George Neuner <gneuner2@comcast.net>
Date	2018-03-05 14:39 -0500
Message-ID	<18-03-020@comp.compilers>
In reply to	#1971

On Fri, 02 Mar 2018 09:30:42 GMT, anton@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>George Neuner <gneuner2@comcast.net> writes:
>>On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
>>(Anton Ertl) wrote:
>>
>>>Kaz Kylheku <217-679-0842@kylheku.com> writes:
>>>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>>>>> No worries.  IME, displays don't get much respect from modern
>>>>> textbooks - they are mentioned mostly in passing.
>>>>
>>>>This is probably because of
>>>
>>>This is because displays were found to be more costly for Algol-like
>>>languages
>>
>>More costly than what?
>
>Static link chains.
>
>>> IIRC the additional cost is in updating the display on calls
>>>and returns.
>>
>>Which is no different from the overhead to maintain static links
>
>Wrong.  Consider the following nesting of functions:
>
>A
> B
>  C
>D
>
>With a display, in C you have built up a display pointing to the
>frames of C, B, and A.  When performing a call from C to D, you throw
>that away and replace it with a display pointing to just the frame of
>D.  When returning from D, you have to restore the old display.

No.  D and A are at the same lexical level.  On entry, D will replace
A's entry in the display, and will restore A's entry on exit.

The scope rules say D can't see or call B or C.  If it calls A, then A
replaces it at the same lexical level.  When A returns, D's link is
restored.

No problem.  And no need to preserve the whole display.

>If you don't maintain a static link chain, you have to save the
>complete display of C on the call and restore it on return.

No you don't.

>Note that
>D may itself call functions that need more display, so you don't get
>away with just saving and restoring the first slot of the display.
>IIRC this was the variant looked at in the paper that concluded that
>displays are more costly.

That is simply wrong unless you change the whole semantics.

As long as call sites are restricted to being within the definition
scope of the called function [as is true in Pascal], and every
function does its little part to maintain the display, then there is
NO POSSIBLE function call sequence that requires saving/restoring the
whole display.

Each function needs only save/restore the single display entry for its
own lexical level.  Scope visibility rules dictate that no function
can see lower into the display than its own level - so what those
entries point to is irrelevant.

>If you maintain a static link chain, you can restore the display of C
>from the static link chain, but you now have the cost of maintaining
>the static link chain and in addition the cost of maintaining the
>display, which is obviously more costly than just the static chain
>alone.  If you go that way, you can actually produce an intermediate
>scheme, where you cache static link accesses that you need, and the
>cached accesses are similar to (the needed part of) a display.

First, your example does not show any need for a static chain.

Second: the cost to maintain a display is just a single extra pointer
field in the call frame, one pointer copy and 2 pointer stores.

FYI, the amortized cost of a display MAY be less than to maintain the
static chain.

For a static chain:

 - a call sideways in the same scope needs the same link as the
   caller. That can be fetched from the caller's frame.
but
 - a call upwards to a higher scope (such as C->D in your example)
   needs a link pointer which it doesn't have available locally.
   The preamble code must go up the caller's chain to find the correct
   scope link to put in the new frame.

OTOH, the steps to maintain the display are the same regardless of
what function is called:
 - save the display entry for my lexical level.
 - replace the display entry with a pointer to my frame.
 - restore the display entry on exit.

>>I'd really like to see that paper to find out what they are comparing
>>to display [and using what language].
>
>The language was probably Algol or one of it's direct offspring.
>
>Anyway, if I get around to it, I will search for a reference to the
>paper in our library, but if someone else can supply the reference,
>that would be cool.

I really would like to see it.  Algol doesn't do anything that would
make a display more work than static chaining.  But it would matter
what code base they studied and how those programs were written.  As
you mention below, single level accesses are very cheap with static
links [deeper accesses are not].

I can't agree or disagree with a paper I haven't read.  And certainly
I can't refute it's conclusions.

>>>Static link chains work fine.
>>
>>Yes, they do work - but they are the lowest performance option for
>>non-local accesses.
>
>The first non-local level has the same performance as a display.  Are
>deeper accesses frequent enough to pay for the increased cost of
>maintaining the display?  The paper's answer was no, but that may vary
>from language to language (and with usage patterns).

That's true.  If the majority of non-local accesses are a single level
away, then the display has no advantage for access.  But it still may
have an advantage for maintenance.

It only takes a few upward function calls or deep non-local accesses
to F_ performance of a static chain.

>hardware costs have also shifted somewhat.  Bottom line: Better
>reevaluate the options yourself.

Good advice !!!

There still are plenty of uses for simple Pascal-like languages.  Many
DSL applications simply don't need multi-threading, objects, or other
stuff that a general purpose language might need.

Displays are just a tool in your toolkit ... you need to read the
directions and use them appropriately.

>- anton
George

[toc] | [prev] | [next] | [standalone]

#1978

From	John Levine <johnl@taugh.com>
Date	2018-03-06 04:31 +0000
Message-ID	<18-03-024@comp.compilers>
In reply to	#1976

>>>>This is because displays were found to be more costly for Algol-like
>>>>languages ...
>>
>>>> IIRC the additional cost is in updating the display on calls
>>>>and returns.

I get the impression that displays describe two different things.

One is a static array that has one entry for each level of lexical
scoping.  In a routine declared N levels (zero based) down, its prolog
saves the Nth entry in the display and replaces it with a pointer to
the current stack frame, and the epilog restores it.  It can assume
that higher level routines have correctly set entries 0:N-1.  That
seems pretty efficient.  Assuming the saved pointer is at a known
location in each stack frame you can do a longjmp and unwind stacks
without too much pain.

The other is the same array, but with a copy of the current display in
each routine's stack frame.  This is slower at call time but probably
faster to use on machines like S/360 without direct addressing
since the display on the stack is addressable from the frame pointer,
but static data needs to load a pointer from a constant pool.  A
longjmp doesn't need anything special since stacked displays go away
when the stack frames go away.

R's,
John

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Add nested-function support in a language the based on a stack-machine

Contents

#1950 — Add nested-function support in a language the based on a stack-machine

#1951

#1953

#1952

#1954

#1955

#1958

#1961

#1963

#1964

#1966

#1968

#1969

#1970

#1972

#1971

#1975

#1977

#1976

#1978