Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #1950 > unrolled thread
| Started by | dror.openu@gmail.com |
|---|---|
| First post | 2018-02-12 11:25 -0800 |
| Last post | 2018-02-14 12:27 -0800 |
| Articles | 20 on this page of 29 — 13 participants |
Back to article view | Back to comp.compilers
Add nested-function support in a language the based on a stack-machine dror.openu@gmail.com - 2018-02-12 11:25 -0800
Re: Add nested-function support in a language the based on a stack-machine dror.openu@gmail.com - 2018-02-12 14:16 -0800
Re: Add nested-function support in a language the based on a stack-machine Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2018-02-13 12:10 +0100
Re: Add nested-function support in a language the based on a stack-machine Louis Krupp <lkrupp@nospam.pssw.com.invalid> - 2018-02-13 00:42 -0700
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-14 00:59 +0000
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 22:04 -0500
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-14 18:06 +0000
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-15 11:41 -0500
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-02-17 16:13 +0000
Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-02-17 16:39 +0000
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-01 13:48 -0500
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-01 19:26 +0000
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-02 00:38 +0000
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-02 02:48 -0500
Re: Add nested-function support in a language the based on a stack-machine rpw3@rpw3.org (Rob Warnock) - 2018-03-03 16:00 +0000
Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-03-02 09:30 +0000
Re: Add nested-function support in a language the based on a stack-machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2018-03-05 15:01 +0000
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-05 15:51 -0500
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-05 14:39 -0500
Re: Add nested-function support in a language the based on a stack-machine John Levine <johnl@taugh.com> - 2018-03-06 04:31 +0000
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <217-679-0842@kylheku.com> - 2018-03-06 18:17 +0000
Re: Add nested-function support in a language the based on a stack-machine Kaz Kylheku <157-073-9834@kylheku.com> - 2018-03-11 03:19 +0000
Re: Add nested-function support in a language the based on a stack-machine Tomasz Kowaltowski <tk@ic.unicamp.br> - 2018-03-06 12:10 -0300
Re: Add nested-function support in a language the based on a stack-machine bartc <bc@freeuk.com> - 2018-03-06 19:02 +0000
Re: Add nested-function support in a language the based on a stack-machine antispam@math.uni.wroc.pl - 2018-04-29 16:29 +0000
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-03-01 13:49 -0500
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 22:37 -0500
Re: Add nested-function support in a language the based on a stack-machine George Neuner <gneuner2@comcast.net> - 2018-02-13 23:27 -0500
Re: Add nested-function support in a language the based on a stack-machine Shoefoot <shoefoot@gmail.com> - 2018-02-14 12:27 -0800
Page 1 of 2 [1] 2 Next page →
| From | dror.openu@gmail.com |
|---|---|
| Date | 2018-02-12 11:25 -0800 |
| Subject | Add nested-function support in a language the based on a stack-machine |
| Message-ID | <18-02-009@comp.compilers> |
Suppose I have a simple C-like programming language:
int foo() {
int x = 10;
int bar(y int) {
return y * 2
}
return bar() + x
}
Like you can see, it supports nested functions.
I already implemented the lexing-parsing phases, and now I'm working
on the code generation and the stack-machine. most of the operations,
and the simple flow-control already implemented but I need some
help/ideas how to solve the nested-function task.
The stack-machine has two registers, accumulator, and a temporary register.
Each frame contains: `[arguments, local variables, return address,
caller frame and stack pointer, temporaries and data operations]`. I
read about two ways to implement the nested-function. one, using an
activation-link (also called static-link), and display. Is there any
better idea to handle this? If I'll choose one of these idea, is it
compile time calculation or do I need to store something in runtime?
if it's a runtime thing, what's the time complexity for it? is it O(1)
operation, or something like O(k) where k is the depth of the
function.
[This at least used to be a standard topic in compiler texts. You should
be able to do either in O(1). -John]
[toc] | [next] | [standalone]
| From | dror.openu@gmail.com |
|---|---|
| Date | 2018-02-12 14:16 -0800 |
| Message-ID | <18-02-010@comp.compilers> |
| In reply to | #1950 |
> [This at least used to be a standard topic in compiler texts. You should > be able to do either in O(1). -John] I'm not sure how to calculate it in compile time, because the call frame doesn't have a fixed size, because I'm using the same stack for storing the call-frames and for the operations evaluation (iadd, isub, etc..). Maybe I should split it into two stacks, one for the call frames, and the second for the operation evaluations. I just sharing my thoughts. [I would think that your compiler should always know how many partial results are on the stack above the call frame so it should be able to use a fixed offset to get to the chain pointers at the base of call frame. -John]
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2018-02-13 12:10 +0100 |
| Message-ID | <18-02-013@comp.compilers> |
| In reply to | #1951 |
Am 12.02.2018 um 23:16 schrieb dror.openu@gmail.com: >> [This at least used to be a standard topic in compiler texts. You should >> be able to do either in O(1). -John] > > I'm not sure how to calculate it in compile time, because the call frame > doesn't have a fixed size, because I'm using the same stack for storing the > call-frames and for the operations evaluation (iadd, isub, etc..). Maybe I > should split it into two stacks, one for the call frames, and the second for > the operation evaluations. I just sharing my thoughts. > > [I would think that your compiler should always know how many partial > results are on the stack above the call frame so it should be able to > use a fixed offset to get to the chain pointers at the base of call > frame. -John] ACK Simple compilers use a dedicated base pointer register (x86: eBP) for their stack frame, but clever compilers will track the current stack pointer offset to that base address. The calling frames can have a variable size, when the local function is called from various places in the code. I.e. the address of a local variable in an outer scope can vary, fixed is only the variable's offset to its own frame start. That's why the chain of all intermediate frame pointers must be followed, until the frame of the variable of interest is found. Things can become worse if one of the calling functions is called recursively, so that a variable number of frames has to be traversed until the outmost frame is found. DoDi
[toc] | [prev] | [next] | [standalone]
| From | Louis Krupp <lkrupp@nospam.pssw.com.invalid> |
|---|---|
| Date | 2018-02-13 00:42 -0700 |
| Message-ID | <18-02-012@comp.compilers> |
| In reply to | #1950 |
On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:
>Suppose I have a simple C-like programming language: ...
>Like you can see, it supports nested functions.
gcc supports nested functions as an extension to C. Compiling this
program with -O0 -fdump-tree-all and looking at the generated files
might give you an idea of one way to do it:
#include <stdio.h>
int main(void)
{
int k1 = 1;
int p1(void)
{
int k2 = 2;
int p2(void)
{
int k3 = 3;
int p3(void)
{
return k1 + k2 + k3;
}
return p3();
}
return p2();
}
int n;
n = p1();
printf("%d\n", n);
return 0;
}
Apparently, gcc chains stack frames, and accessing uplevel variables,
as when p3 uses k1 and k2, seems like it would take O(n) time, where n
is the nesting level.
The alternative, a display vector, seems like it would be easier to
implement unless you're doing this for a real machine with a real (and
therefore limited) set of registers.
Louis
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <217-679-0842@kylheku.com> |
|---|---|
| Date | 2018-02-14 00:59 +0000 |
| Message-ID | <18-02-016@comp.compilers> |
| In reply to | #1952 |
On 2018-02-13, Louis Krupp <lkrupp@nospam.pssw.com.invalid> wrote:
> On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote:
>
>>Suppose I have a simple C-like programming language: ...
>>Like you can see, it supports nested functions.
>
> gcc supports nested functions as an extension to C. Compiling this
> program with -O0 -fdump-tree-all and looking at the generated files
> might give you an idea of one way to do it:
>
> #include <stdio.h>
>
> int main(void)
> {
> int k1 = 1;
> int p1(void)
> {
> int k2 = 2;
> int p2(void)
> {
> int k3 = 3;
> int p3(void)
> {
> return k1 + k2 + k3;
> }
> return p3();
> }
> return p2();
> }
> int n;
>
> n = p1();
> printf("%d\n", n);
>
> return 0;
> }
>
> Apparently, gcc chains stack frames, and accessing uplevel variables,
> as when p3 uses k1 and k2, seems like it would take O(n) time, where n
> is the nesting level.
>
> The alternative, a display vector, seems like it would be easier to
The problem is that these functions can recurse, including mutually.
So that is to say, you can end up with a situation in which you have
a single activation of main with a single k1 variable, but multiple
activations of p2, with multiple k2's all being live at the
same time.
To complicate things, the multiple active p2 functions can be lifted
into function pointers, all of which are simultaneously valid.
Each of those p2 pointers could be called back by some external function.
Those calls have to establish an environment which resolves to the
correct k2, and at the same time to the one and only k1.
That doesn't rule out display vectors, but it seems like each function
activation has to have its own complete display vector.
For instance the p2 level functions need a two-level vector. Under
the same main invocation, they share the vec[0] entry pointing to
the main frame; but each one has a different vec[1] referencing its
own frame.
When a p3 executes it has its own 3 element vec[]. The vec[0] and
vec[1] are copied from the parent p2. If a pointer to a p3 is captured,
then its vec[1] correctly references the particular dynamic p2 instance
in which it is scoped; vec[2] references its own environment.
Under this scheme, we basically we need o(n^2) storage for the vectors,
for n nesting levels in the absence of recursion.
The compiler can determine which functions aren't subject to indirection
and avoid allocating the trampoline space for that.
If functions are downward funargs only (turn to garbage when they go out
of scope), the trampoline space needed to capture a given function
invocation can be part of the activation record of the block in which
that function is declared. When that block terminates, the function is
garbage, and so is its trampoline.
> implement unless you're doing this for a real machine with a real (and
> therefore limited) set of registers.
Hmm. The vectors can be treated like any array, or set of local
variables (v0, v1, ..). We generate abstract AST code to load these and
use them; then that code gets subject to register allocation like
anything else.
[As I recall, you do need a display with a pointer for each lexical scope,
but so long as you don't expect closures to work, you can do it in
constant time with a fixed location where each scope stores its
current frame pointer. If you want closures, then you're into tree
shaped stacks and garbage collected stack frames. -John]
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2018-02-13 22:04 -0500 |
| Message-ID | <18-02-018@comp.compilers> |
| In reply to | #1954 |
On Wed, 14 Feb 2018 00:59:34 +0000 (UTC), Kaz Kylheku <217-679-0842@kylheku.com> wrote: >On 2018-02-13, Louis Krupp <lkrupp@nospam.pssw.com.invalid> wrote: >> On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote: >> >>>Suppose I have a simple C-like programming language: ... >>>Like you can see, it supports nested functions. >> >> gcc supports nested functions as an extension to C. Compiling this >> program with -O0 -fdump-tree-all and looking at the generated files >> might give you an idea of one way to do it: ... >> Apparently, gcc chains stack frames, and accessing uplevel variables, >> as when p3 uses k1 and k2, seems like it would take O(n) time, where n >> is the nesting level. >> >> The alternative, a display vector, seems like it would be easier to > >The problem is that these functions can recurse, including mutually. >So that is to say, you can end up with a situation in which you have >a single activation of main with a single k1 variable, but multiple >activations of p2, with multiple k2's all being live at the >same time. > >To complicate things, the multiple active p2 functions can be lifted >into function pointers, all of which are simultaneously valid. > >Each of those p2 pointers could be called back by some external function. >Those calls have to establish an environment which resolves to the >correct k2, and at the same time to the one and only k1. > >That doesn't rule out display vectors, but it seems like each function >activation has to have its own complete display vector. As long as each function creates a new stack frame when called, then recursion is not relevant and a single display is sufficient to handle the nesting. The display tracks lexical parentage [who defined who], not call chain parentage. Each display entry points to the stack frame of the last function invoked that was defined at the corresponding lexical level. The preamble of a function at, e.g., level 3, preserves display[3] in its own stack frame, and sets display[3] to point to its own frame. The postscript of the function restores the original display[3] value when the function ends. [Obviously, the compiler must keep track of lexical levels in order that each function update the proper display entry.] When a function at level 4 needs something at level 2, it just looks to display[2] to find the proper stack frame. Regardless of intervening recursions, the lexical environment is maintained: display[2] always will point to the latest invocation of the level 2 function which contains the current (level 4) function. >For instance the p2 level functions need a two-level vector. Under >the same main invocation, they share the vec[0] entry pointing to >the main frame; but each one has a different vec[1] referencing its >own frame. > >When a p3 executes it has its own 3 element vec[]. The vec[0] and >vec[1] are copied from the parent p2. If a pointer to a p3 is captured, >then its vec[1] correctly references the particular dynamic p2 instance >in which it is scoped; vec[2] references its own environment. > >Under this scheme, we basically we need o(n^2) storage for the vectors, >for n nesting levels in the absence of recursion. No, you're missing something. I just haven't figured out what. >The compiler can determine which functions aren't subject to indirection >and avoid allocating the trampoline space for that. With the classic implementation, every non-leaf function has to do its part to maintain the lexical environment. But the allocation overhead is just one saved pointer per stack frame. >[As I recall, you do need a display with a pointer for each lexical scope, >but so long as you don't expect closures to work, you can do it in >constant time with a fixed location where each scope stores its >current frame pointer. If you want closures, then you're into tree >shaped stacks and garbage collected stack frames. -John] You still do have closures of sorts ... they just are distributed through the stack frames of the current function call chain. A display cannot handle persistent closures, such as in Lisp, where a nested function may be invoked after its parent terminates, outside of the runtime environment that defined it. George
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <217-679-0842@kylheku.com> |
|---|---|
| Date | 2018-02-14 18:06 +0000 |
| Message-ID | <18-02-023@comp.compilers> |
| In reply to | #1955 |
On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote: > On Wed, 14 Feb 2018 00:59:34 +0000 (UTC), Kaz Kylheku <217-679-0842@kylheku.com> wrote: >>On 2018-02-13, Louis Krupp <lkrupp@nospam.pssw.com.invalid> wrote: >>> On Mon, 12 Feb 2018 11:25:36 -0800 (PST), dror.openu@gmail.com wrote: >>> >>>>Suppose I have a simple C-like programming language: ... >>>>Like you can see, it supports nested functions. >>> >>> gcc supports nested functions as an extension to C. Compiling this >>> program with -O0 -fdump-tree-all and looking at the generated files >>> might give you an idea of one way to do it: ... > >>> Apparently, gcc chains stack frames, and accessing uplevel variables, >>> as when p3 uses k1 and k2, seems like it would take O(n) time, where n >>> is the nesting level. >>> >>> The alternative, a display vector, seems like it would be easier to >> >>The problem is that these functions can recurse, including mutually. >>So that is to say, you can end up with a situation in which you have >>a single activation of main with a single k1 variable, but multiple >>activations of p2, with multiple k2's all being live at the >>same time. >> >>To complicate things, the multiple active p2 functions can be lifted >>into function pointers, all of which are simultaneously valid. >> >>Each of those p2 pointers could be called back by some external function. >>Those calls have to establish an environment which resolves to the >>correct k2, and at the same time to the one and only k1. >> >>That doesn't rule out display vectors, but it seems like each function >>activation has to have its own complete display vector. > > As long as each function creates a new stack frame when called, then > recursion is not relevant and a single display is sufficient to handle > the nesting. > > The display tracks lexical parentage [who defined who], not call > chain parentage. Each display entry points to the stack frame of the > last function invoked that was defined at the corresponding lexical > level. > > The preamble of a function at, e.g., level 3, preserves display[3] in > its own stack frame, and sets display[3] to point to its own frame. > The postscript of the function restores the original display[3] value > when the function ends. > [Obviously, the compiler must keep track of lexical levels in order > that each function update the proper display entry.] Yes; it is clear that there can be a single display vector which is associated with the entire lexical scope as a whole, and then the local scopes just mutate the entries in the vector, taking care to save and restore them (which requires some local storage, by the way, but less than in the scheme I describe). The problem with this saving and restoring is that it's not thread safe. If there are multiple invocations of the p2 scope, and they can be entered by independent threads, we're in trouble if there is a single pointer to that scope in a single display. (Excluding that from being a requirement is fairly reasonable.) The model I described simply replicates the display, allowing immutable local copies of it in each frame. When a scope is entered via indirection, it just installs a context pointer referencing the display. (Of course, that context pointer has to be saved and restored. It could just be the regular frame pointer, with the display being at some well known offset in the frame.) > When a function at level 4 needs something at level 2, it just looks > to display[2] to find the proper stack frame. Regardless of > intervening recursions, the lexical environment is maintained: > display[2] always will point to the latest invocation of the level 2 > function which contains the current (level 4) function. (Under what I described, that display[2] is in that function's own display[] array rather than a shared one. That display[2] is initialized and never mutated.) > No, you're missing something. I just haven't figured out what. I'm missing the space optimization that a single display can be (serially) shared, in spite of multiple activations of scopes, as long as the entries are saved and restored.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2018-02-15 11:41 -0500 |
| Message-ID | <18-02-029@comp.compilers> |
| In reply to | #1958 |
On Wed, 14 Feb 2018 18:06:40 +0000 (UTC), Kaz Kylheku <217-679-0842@kylheku.com> wrote: >On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote: > >> As long as each function creates a new stack frame when called, then >> recursion is not relevant and a single display is sufficient to handle >> the nesting. >> >> The display tracks lexical parentage [who defined who], not call >> chain parentage. Each display entry points to the stack frame of the >> last function invoked that was defined at the corresponding lexical >> level. >> >> The preamble of a function at, e.g., level 3, preserves display[3] in >> its own stack frame, and sets display[3] to point to its own frame. >> The postscript of the function restores the original display[3] value >> when the function ends. >> [Obviously, the compiler must keep track of lexical levels in order >> that each function update the proper display entry.] > >Yes; it is clear that there can be a single display vector which is >associated with the entire lexical scope as a whole, and then the >local scopes just mutate the entries in the vector, taking care >to save and restore them (which requires some local storage, by the >way, but less than in the scheme I describe). > >The problem with this saving and restoring is that it's not thread safe. >If there are multiple invocations of the p2 scope, and they can be >entered by independent threads, we're in trouble if there is a single >pointer to that scope in a single display. (Excluding that from being a >requirement is fairly reasonable.) > >The model I described simply replicates the display, allowing immutable >local copies of it in each frame. When a scope is entered via >indirection, it just installs a context pointer referencing the display. >(Of course, that context pointer has to be saved and restored. >It could just be the regular frame pointer, with the display being >at some well known offset in the frame.) I understand, and you're correct that the display needs to be counted as part of the thread switch context. But each thread also would be using from a separate stack, so having the entire display saved in each frame is overkill when all that's needed is a single pointer. If you're suggesting that the stacks be shared among threads, then you're no longer talking about "threads" per se, but rather about "scheduler activations", which are subtly different from threads. I'm not aware of any non-research system that is based on activations. >> When a function at level 4 needs something at level 2, it just looks >> to display[2] to find the proper stack frame. Regardless of >> intervening recursions, the lexical environment is maintained: >> display[2] always will point to the latest invocation of the level 2 >> function which contains the current (level 4) function. > >(Under what I described, that display[2] is in that function's own >display[] array rather than a shared one. That display[2] is >initialized and never mutated.) > >> No, you're missing something. I just haven't figured out what. > >I'm missing the space optimization that a single display can be >(serially) shared, in spite of multiple activations of scopes, as long >as the entries are saved and restored. No worries. IME, displays don't get much respect from modern textbooks - they are mentioned mostly in passing. The basic problem is that, while they are quite useful for an Algol-like language, they can be problematic for an OO or functional language. George
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <217-679-0842@kylheku.com> |
|---|---|
| Date | 2018-02-17 16:13 +0000 |
| Message-ID | <18-02-032@comp.compilers> |
| In reply to | #1961 |
On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote: > On Wed, 14 Feb 2018 18:06:40 +0000 (UTC), Kaz Kylheku ><217-679-0842@kylheku.com> wrote: > >>On 2018-02-14, George Neuner <gneuner2@comcast.net> wrote: >> >>> As long as each function creates a new stack frame when called, then >>> recursion is not relevant and a single display is sufficient to handle >>> the nesting. >>> >>> The display tracks lexical parentage [who defined who], not call >>> chain parentage. Each display entry points to the stack frame of the >>> last function invoked that was defined at the corresponding lexical >>> level. >>> >>> The preamble of a function at, e.g., level 3, preserves display[3] in >>> its own stack frame, and sets display[3] to point to its own frame. >>> The postscript of the function restores the original display[3] value >>> when the function ends. >>> [Obviously, the compiler must keep track of lexical levels in order >>> that each function update the proper display entry.] >> >>Yes; it is clear that there can be a single display vector which is >>associated with the entire lexical scope as a whole, and then the >>local scopes just mutate the entries in the vector, taking care >>to save and restore them (which requires some local storage, by the >>way, but less than in the scheme I describe). >> >>The problem with this saving and restoring is that it's not thread safe. >>If there are multiple invocations of the p2 scope, and they can be >>entered by independent threads, we're in trouble if there is a single >>pointer to that scope in a single display. (Excluding that from being a >>requirement is fairly reasonable.) >> >>The model I described simply replicates the display, allowing immutable >>local copies of it in each frame. When a scope is entered via >>indirection, it just installs a context pointer referencing the display. >>(Of course, that context pointer has to be saved and restored. >>It could just be the regular frame pointer, with the display being >>at some well known offset in the frame.) > > I understand, and you're correct that the display needs to be counted > as part of the thread switch context. But each thread also would be > using from a separate stack, so having the entire display saved in > each frame is overkill when all that's needed is a single pointer. The address of a local function can be taken. When that function is called from any context, possibly another thread, it has access to the locals that were lexically apparent there. > If you're suggesting that the stacks be shared among threads, then > you're no longer talking about "threads" per se, but rather about Stack *access* is shared among threads. E.g. a thread can register a stack object in a shared queue (such as a synchronization wait queue). As long as it is dequeued before that frame terminates, everything is cool. Commonly done. If a downward-only lexical closure is implemented as a pointer into some threads's stack memory (plus a function) that object can be passed to another thread and used (as long as the context which created that closure doesn't terminate). When that sort of thing is done with threads, we cannot have a single display array per lexical scope being mutated. If the stack closures maintain local displays which are immutable, then it can work. The function which is called from the other thread is using that thread's stack, of course, for the parameter passing and return and all. Also for its freshly instantiated locals. However, the material in the captured lexical scope is in the original stack. So is the display vector which accesses the multiple levels of it. The display is only used for acessing the lexical things outside of the local function's body; its own locals and parameters are on the local stack. Ahd here is the kicker: *that* local environment can be captured by that thread, together with that other environment. No problem; that thread has a copy of the display in its stack frame and extends it with a pointer to its own stuff. So now there is a display which references two different stacks. > No worries. IME, displays don't get much respect from modern > textbooks - they are mentioned mostly in passing. This is probably because of 1) a few decades of dominance of C, C++ and Java whose standard dialects don't have scope capturing local functions. So no interest in the mainstream software dev in local functions as such. OOP techniques somewhat make up for the lack of closures. You use an object with a method; the captured stuff is in the object. Languages with "class scope" make it easy for the code in a class method to access the object with simple, unqualified name references like "foo", so that at least the body of a method is comparably convenient to that of a closure. 2) real languages have proper lexical closures with indefinite lifetimes. The display concept is still relevant but probably turn into something else under a different name. (We are seeing much more use of and interest in languages that have closures; but that interest is largely in new ones that are poorly optimized.) > The basic problem > is that, while they are quite useful for an Algol-like language, they > can be problematic for an OO or functional language. Displays, or something equivalent, is necessary to work out the nesting of captured lexical scopes. Different contours of the scope have dynamic activations that differ in lifetimes and have to be separate objects. Those objects somehow have to be centrally referenced from places that *somehow* have simultaneous access to them. In a purely functional language, a simplifying aspect is that bindings are immutable. We can capture a lexical scope by copying it; the user cannot tell the difference between that and the original scope. If variables are mutable, you know that you have a copy because a variable assignment in the original scope isn't reflected in the captured one.
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2018-02-17 16:39 +0000 |
| Message-ID | <18-02-034@comp.compilers> |
| In reply to | #1963 |
Kaz Kylheku <217-679-0842@kylheku.com> writes:
>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>> No worries. IME, displays don't get much respect from modern
>> textbooks - they are mentioned mostly in passing.
>
>This is probably because of
This is because displays were found to be more costly for Algol-like
languages (there was an influential paper that convinced everyone of
that). IIRC the additional cost is in updating the display on calls
and returns.
Unfortunately, I don't have a reference to the paper above. You can
probably find a reference in old compiler books.
Of course, given that you are using a different language that probably
uses nested functions quite differently from what was used in that
paper, you may want to reevaluate the balance of display vs. static
chains.
>> The basic problem
>> is that, while they are quite useful for an Algol-like language, they
>> can be problematic for an OO or functional language.
The funny thing is that, while displays were found to be suboptimal
for Algol-like languages, they can be used for type inclusion tests,
used in many OO languages [Cohen:acm:toplas:1991]. There are a bunch
of other solutions to this problem, so I don't know if displays are
used for this purpose in current systems, though.
@Article{Cohen:acm:toplas:1991,
author = "Norman H. Cohen",
title = "Type-Extension Type Tests Can Be Performed In Constant
Time",
journal = "ACM Transactions on Programming Languages and
Systems",
volume = "13",
number = "4",
pages = "626--629",
month = oct,
year = "1991",
refs = "2",
checked = "19940624",
source = "Dept. Library",
keywords = "class, descriptor, display, extensible data type,
inheritance, membership test, object-oriented
programming, type extension, type test",
note = "Technical Correspondence",
abstract = "Wirth's proposal for type extensions includes an
algorithm for determining whether a give value belongs
to an extension of a given type. In the worst case,
this algorithm takes time proportional to the depth of
the type-extension hierarchy. Wirth describes the loop
in this algorithm as ``unavoidable,'' but in fact, the
test can be performed in constant time by associating a
``display'' of base types with each type descriptor.",
xref = "Wirth:acm:toplas:1988",
reffrom = "Corney:Gough:plasa:1994",
}
I dimly remember a letter to the editor by Wirth where he appreciated
that finally a good use for the display had been found.
>Displays, or something equivalent, is necessary to work out the nesting
>of captured lexical scopes. Different contours of the scope have dynamic
>activations that differ in lifetimes and have to be separate objects.
>Those objects somehow have to be centrally referenced from places that
>*somehow* have simultaneous access to them.
Static link chains work fine.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2018-03-01 13:48 -0500 |
| Message-ID | <18-03-002@comp.compilers> |
| In reply to | #1964 |
On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:
>Kaz Kylheku <217-679-0842@kylheku.com> writes:
>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>>> No worries. IME, displays don't get much respect from modern
>>> textbooks - they are mentioned mostly in passing.
>>
>>This is probably because of
>
>This is because displays were found to be more costly for Algol-like
>languages
More costly than what?
>(there was an influential paper that convinced everyone of
>that). IIRC the additional cost is in updating the display on calls
>and returns.
Which is no different from the overhead to maintain static links, and
faster to use when you need non-locals more than one lexical level
distant.
>Unfortunately, I don't have a reference to the paper above. You can
>probably find a reference in old compiler books.
I'd really like to see that paper to find out what they are comparing
to display [and using what language].
It certainly is true that closeure conversion is a better solution
overall because it flattens all non-local accesses to use a single
indirection ... but it also significantly complicates the compiler,
and creates a need for handling wide pointers (or equivalent). Many
useful languages simply don't need that extra complexity.
>Of course, given that you are using a different language that probably
>uses nested functions quite differently from what was used in that
>paper, you may want to reevaluate the balance of display vs. static
>chains.
I would never use static chains ... if display wasn't viable I would
jump straight to closure conversion.
>>> The basic problem
>>> is that, while they are quite useful for an Algol-like language, they
>>> can be problematic for an OO or functional language.
>
>The funny thing is that, while displays were found to be suboptimal
>for Algol-like languages, they can be used for type inclusion tests,
>used in many OO languages [Cohen:acm:toplas:1991]. There are a bunch
>of other solutions to this problem, so I don't know if displays are
>used for this purpose in current systems, though.
>
>@Article{Cohen:acm:toplas:1991,
> author = "Norman H. Cohen",
> title = "Type-Extension Type Tests Can Be Performed In Constant
> Time",
> journal = "ACM Transactions on Programming Languages and
> Systems",
> volume = "13",
> number = "4",
> pages = "626--629",
> month = oct,
> year = "1991",
> refs = "2",
> checked = "19940624",
> source = "Dept. Library",
> keywords = "class, descriptor, display, extensible data type,
> inheritance, membership test, object-oriented
> programming, type extension, type test",
> note = "Technical Correspondence",
> abstract = "Wirth's proposal for type extensions includes an
> algorithm for determining whether a give value belongs
> to an extension of a given type. In the worst case,
> this algorithm takes time proportional to the depth of
> the type-extension hierarchy. Wirth describes the loop
> in this algorithm as ``unavoidable,'' but in fact, the
> test can be performed in constant time by associating a
> ``display'' of base types with each type descriptor.",
> xref = "Wirth:acm:toplas:1988",
> reffrom = "Corney:Gough:plasa:1994",
>}
>
>I dimly remember a letter to the editor by Wirth where he appreciated
>that finally a good use for the display had been found.
Thanks for that pointer ... going to have to read that.
>>Displays, or something equivalent, is necessary to work out the nesting
>>of captured lexical scopes. Different contours of the scope have dynamic
>>activations that differ in lifetimes and have to be separate objects.
>>Those objects somehow have to be centrally referenced from places that
>>*somehow* have simultaneous access to them.
>
>Static link chains work fine.
Yes, they do work - but they are the lowest performance option for
non-local accesses.
>- anton
George
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <217-679-0842@kylheku.com> |
|---|---|
| Date | 2018-03-01 19:26 +0000 |
| Message-ID | <18-03-006@comp.compilers> |
| In reply to | #1966 |
On 2018-03-01, George Neuner <gneuner2@comcast.net> wrote:
> On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:
>
>>Kaz Kylheku <217-679-0842@kylheku.com> writes:
>>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote:
>>>> No worries. IME, displays don't get much respect from modern
>>>> textbooks - they are mentioned mostly in passing.
>>>
>>>This is probably because of
>>
>>This is because displays were found to be more costly for Algol-like
>>languages
>
> More costly than what?
>
>>(there was an influential paper that convinced everyone of
>>that). IIRC the additional cost is in updating the display on calls
>>and returns.
>
> Which is no different from the overhead to maintain static links, and
> faster to use when you need non-locals more than one lexical level
> distant.
>
>>Unfortunately, I don't have a reference to the paper above. You can
>>probably find a reference in old compiler books.
>
> I'd really like to see that paper to find out what they are comparing
> to display [and using what language].
>
> It certainly is true that closeure conversion is a better solution
> overall because it flattens all non-local accesses to use a single
> indirection ... but it also significantly complicates the compiler,
> and creates a need for handling wide pointers (or equivalent). Many
> useful languages simply don't need that extra complexity.
How do flat representations handle the fact that different levels
of the lexical environment have different lifetimes?
E.g.
(let ((a 1) (b 2))
(loop ...
(let ((c 3) (d 4))
(lambda ()))))
The (c d) environment is freshly instantiated each time the loop
re-enters the inner let, but the (a b) environment is the same.
Each (lambda ()) invocation must capture a different environment,
with a different (c d) but shared (a b).
If there is just a flat vector, it must still contain an extra
level of indirection, no?
If the language is purely functional, then the bindings are immutable,
and so the (lambda ()) can copy the whole damn thing.
If bindings are mutable, then when one lambda modifies a or b,
the other lambdas (with their different environment) must see the
modification. Yet they are shielded from each other's modification
of c or d.
So the upshot is that there has to be an extra level of indirection.
It cannot be the case that "all non-local accesses use a single
indirection".
The flat environment has to contain entries which are not the storage
bindings themselves, but pointers to bindings.
For instance (let's keep it simple and concrete) it could be a vector of
conses:
#((a . 1) (b . 2) (c . 3) (d . 4))
This is not direct access: we have to dereference a pointer to index
into the vector. Then we retrieve a cons cell which is a pointer,
we have to dereference this to get to the cdr field to get to the value.
So then whenever the loop enters the (c d) environment, the the (c ...)
and (e ...) conses are destructively replaced with fresh conses.
What the lambda operator then has to do to fetch the environment
for the closure is to take a shallow-copy snapshot of the vector:
allocate a new vector of the same length which takes the same conses.
So if instead of the above, we have a display, we can do this:
#(#(1 2) #(3 4)) ;; env vec: effectively, a two-level display
Instead of dereferencing a vector and then a cons cell, we
dereference a vector and then an inner vector.
In this display case (pun intended) when the inner let is entered,
the #(3 4) is replaced with #(nil nil), destructively,
analogous to the conses being replaced in the "flat" case.
Note that this is likely more efficient than allocating two conses.
The lambda operator shallow-copies the whole vector, just like
in the flat environment case.
> I would never use static chains ... if display wasn't viable I would
> jump straight to closure conversion.
What is the meaning of "display", anyway?
It seems like it just means "array" instead of "linked list" (just
specifically in the context of activation frame environments).
The C main function's argv[] is a "display" of strings, whereas
the Lisp list '("bash" "-c" "ls") is a "chain".
Except that the context isn't environment access and so we don't use
the word "display".
[A display is an array of pointers to the frames of the enclosing
scopes. You can access any variable in an enclosing frame by
indirecting through the frame's entry in the current display and then
indexing. This is for languges like Algol and Pascal, not Lisp. I
don't ever recall anyone talking about displays and threads, although
I think they should work OK so long as you keep your displays on the
local stack. -John]
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <217-679-0842@kylheku.com> |
|---|---|
| Date | 2018-03-02 00:38 +0000 |
| Message-ID | <18-03-008@comp.compilers> |
| In reply to | #1968 |
On 2018-03-01, Kaz Kylheku <217-679-0842@kylheku.com> wrote: > John: > [A display is an array of pointers to the frames of the enclosing > scopes. You can access any variable in an enclosing frame by > indirecting through the frame's entry in the current display and then > indexing. This is for languges like Algol and Pascal, not Lisp. I This is absolutely relevant to Lisp dialects with lexical scoping, like Common Lisp and Scheme (or anything worth its salt, really). There are no first-class functions without lexical scoping. Structure and Interpretation of Computer Programs (SICP) discusses something similar to displays in "5.5.6 Lexical Addressing": https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-35.html#%_sec_5.5.6 "We can exploit this fact by inventing a new kind of variable-lookup operation, lexical-address-lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many variables to pass over in that frame. Lexical-address-lookup will produce the value of the variable stored at that lexical address relative to the current environment." The focus of that section is eliminating the run-time search which actually finds a name by symbol matching, with numeric indexing. However, if we can "pass over" the frames by indexing through a vector of them, then we have a display. We can also directly index into the frame by also representing it as a vector, so we don't have to "pass over" n-1 variables to get to the n-th one. [I was thinking of closures when I said not Lisp. I suppose you could use displays and garbage collect them but I haven't really thought about the implications. -John]
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2018-03-02 02:48 -0500 |
| Message-ID | <18-03-010@comp.compilers> |
| In reply to | #1968 |
On Thu, 1 Mar 2018 19:26:06 +0000 (UTC), Kaz Kylheku
<217-679-0842@kylheku.com> wrote:
>On 2018-03-01, George Neuner <gneuner2@comcast.net> wrote:
>
>> It certainly is true that closeure conversion is a better solution
>> overall because it flattens all non-local accesses to use a single
>> indirection ... but it also significantly complicates the compiler,
>> and creates a need for handling wide pointers (or equivalent). Many
>> useful languages simply don't need that extra complexity.
>
>How do flat representations handle the fact that different levels
>of the lexical environment have different lifetimes?
In fact they don't ... all the "levels" of the lexical environment
must persist IN SOME FASHION for as long as the function lives. That
is the whole point of the "closure" - to persist the environment.
>E.g.
>
> (let ((a 1) (b 2))
> (loop ...
> (let ((c 3) (d 4))
> (lambda ()))))
This particular example actually is better suited to lambda lifting
than to closure conversion, but ...
>The (c d) environment is freshly instantiated each time the loop
>re-enters the inner let, but the (a b) environment is the same.
>
>Each (lambda ()) invocation must capture a different environment,
>with a different (c d) but shared (a b).
>
>If there is just a flat vector, it must still contain an extra
>level of indirection, no?
As you discuss more below, the answer depends on whether or not any
binding are mutated and shared. Note, it does not depend on whether
they are "mutable" per the language, but whether they actually are
mutated.
The real point of closure conversion is to flatten the function's
environment as much as is possible to minimize the runtime effort of
accessing the variable. How far you can take it depends on the
complexity of your compiler.
If no bindings are mutated, the values simply can be copied into a
single structure which is passed to the function.
[ a b c d ]
Note, that this also is the case for shared bindings which are NOT
mutated. (more below)
If any of the bindings are mutated, then you have to assume that they
also are shared (unless you can prove otherwise), and so you need to
provide a more complex environment.
What this looks like depends on how much you want to optimize the
non-shared bindings:
The simplest and most direct translation: you allocate a separate
structure for each scope, and a vector of pointers to them. You then
pass the pointer vector to the function.
[ + + ]
| |
| +--> [ c d ]
|
+-----> [ a b ]
More complex: you allocate separately those bindings which are
mutated, and create a flat structure that contains values for the
static bindings and pointers to the mutable ones.
[ a + c d ]
|
+--> [ b ]
More complex still: figure out whether mutated bindings actually are
shared. A mutated variable, e.g., used as a private counter, need not
be allocated separately.
This can get quite complicated to figure out if there are multiple
functions mutating different sets of shared bindings.
Also note that if the closures are not passed upward out of the
definition environment, the environment "structures" may be stack
allocated [if the runtime uses mark:release style stack(s)].
>If the language is purely functional, then the bindings are immutable,
>and so the (lambda ()) can copy the whole damn thing.
>
>If bindings are mutable, then when one lambda modifies a or b,
>the other lambdas (with their different environment) must see the
>modification. Yet they are shielded from each other's modification
>of c or d.
>
>So the upshot is that there has to be an extra level of indirection.
>
>It cannot be the case that "all non-local accesses use a single
>indirection".
An extra indirection is required only for those bindings which are
BOTH mutated AND shared. Sharing alone or mutation alone does not
require a separate allocation.
The challenge for the compiler is to figure whether a binding actually
is mutated only, shared only, or both mutated and shared.
>> I would never use static chains ... if display wasn't viable I would
>> jump straight to closure conversion.
>
>What is the meaning of "display", anyway?
>
>It seems like it just means "array" instead of "linked list" (just
>specifically in the context of activation frame environments).
More or less. I don't know who named the construct originally, or why
the particular name "display" was chosen.
The object, however, is simple. With nested functions and recursion,
every call frame logically is a member of 2 lists: "who called who",
and "who defined who".
"who called who" pretty much has to be maintained as a list chained
through the call frames themselves because the call stack can become
arbitrarily deep, and because (modulo GC, debugger, etc) there never
is any runtime need to follow more than 1 pointer.
However, "who defined who" hierarchies typically are fairly shallow:
some large scale studies done on Pascal showed that 16 lexical scopes
sufficed for 99+% of the code they examined.
A display optimizes non-local access (more than 1 lexical scope away)
by maintaining "who defined who" as a vector of frame pointers rather
than as yet another list chained through the frames themselves. It
has the same O(1) maintenance overhead as the chained list, but its
use is O(1) rather than O(n).
>The C main function's argv[] is a "display" of strings, whereas
>the Lisp list '("bash" "-c" "ls") is a "chain".
Yes.
>Except that the context isn't environment access and so we don't use
>the word "display".
Yes, the term "display" - referring to a vector of pointers - is
rather unique to compiler construction. I haven't seen it used as
such elsewhere.
>[A display is an array of pointers to the frames of the enclosing
>scopes. You can access any variable in an enclosing frame by
>indirecting through the frame's entry in the current display and then
>indexing. This is for languges like Algol and Pascal, not Lisp. I
>don't ever recall anyone talking about displays and threads, although
>I think they should work OK so long as you keep your displays on the
>local stack. -John]
Yes - the display has to be considered aa part of the thread switch
context - like CPU registers.
The main point with a display is that the function's execution
environment is (Lisp eq) identical to its definition environment.
There is no separate allocation(s) created such that the function can
be called after its definition environment is gone.
George
[toc] | [prev] | [next] | [standalone]
| From | rpw3@rpw3.org (Rob Warnock) |
|---|---|
| Date | 2018-03-03 16:00 +0000 |
| Message-ID | <18-03-013@comp.compilers> |
| In reply to | #1968 |
Kaz Kylheku <217-679-0842@kylheku.com> wrote: +--------------- | George Neuner <gneuner2@comcast.net> wrote: | > It certainly is true that closeure conversion is a better solution | > overall because it flattens all non-local accesses to use a single | > indirection ... but it also significantly complicates the compiler, | > and creates a need for handling wide pointers (or equivalent). Many | > useful languages simply don't need that extra complexity. | | How do flat representations handle the fact that different levels | of the lexical environment have different lifetimes? +--------------- I'm not sure if this counts as "closure conversion" or not, but some Common Lisp implementations [such as CMUCL] use flat environments just fine (even in the interpreter). The trick is to do some minimal pre-analysis of the code, to see which closed-over variables are *not* ever mutated or are *not* used in closures which "escape" the original lexical context. All such variables may be copied into a single "flat" vector at closure creation time. All other closed-over variables are converted into "indirect" variables -- such variables are individually heap-allocated, pointed to by "ref" pointers which are also copied into a the same flat environment vector at closure creation time. Then when the closure is called, all of the up-level variables are either *in* the (flat) environment or available via *one* indirection. Because all of the "ref" pointers for a given environment variable in all the (potential) copies point to the same heap location, closures which mutate a shared variable work just fine. And all the rest are either read-only [and thus all the copies are the same] or else mutated and *not* accessed in any interior closure which escapes. This is normally performs better than linked environments or even displays, unless there are a *large* number of nested closures. In the latter case, when each level of closure is created, it will have to copy all of its free variables from its parent's flat environment into its own. [I apologize if I haven't explained this very well.] This approach is not unique to Lisp. Appel discusses it for ML in his "Compiling With Continuations". -Rob ----- Rob Warnock <rpw3@rpw3.org> 627 26th Avenue <http://rpw3.org/> San Mateo, CA 94403
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2018-03-02 09:30 +0000 |
| Message-ID | <18-03-012@comp.compilers> |
| In reply to | #1966 |
George Neuner <gneuner2@comcast.net> writes: >On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at >(Anton Ertl) wrote: > >>Kaz Kylheku <217-679-0842@kylheku.com> writes: >>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote: >>>> No worries. IME, displays don't get much respect from modern >>>> textbooks - they are mentioned mostly in passing. >>> >>>This is probably because of >> >>This is because displays were found to be more costly for Algol-like >>languages > >More costly than what? Static link chains. >> IIRC the additional cost is in updating the display on calls >>and returns. > >Which is no different from the overhead to maintain static links Wrong. Consider the following nesting of functions: A B C D With a display, in C you have built up a display pointing to the frames of C, B, and A. When performing a call from C to D, you throw that away and replace it with a display pointing to just the frame of D. When returning from D, you have to restore the old display. If you don't maintain a static link chain, you have to save the complete display of C on the call and restore it on return. Note that D may itself call functions that need more display, so you don't get away with just saving and restoring the first slot of the display. IIRC this was the variant looked at in the paper that concluded that displays are more costly. If you maintain a static link chain, you can restore the display of C from the static link chain, but you now have the cost of maintaining the static link chain and in addition the cost of maintaining the display, which is obviously more costly than just the static chain alone. If you go that way, you can actually produce an intermediate scheme, where you cache static link accesses that you need, and the cached accesses are similar to (the needed part of) a display. >I'd really like to see that paper to find out what they are comparing >to display [and using what language]. The language was probably Algol or one of it's direct offspring. Anyway, if I get around to it, I will search for a reference to the paper in our library, but if someone else can supply the reference, that would be cool. >>Static link chains work fine. > >Yes, they do work - but they are the lowest performance option for >non-local accesses. The first non-local level has the same performance as a display. Are deeper accesses frequent enough to pay for the increased cost of maintaining the display? The paper's answer was no, but that may vary from language to language (and with usage patterns). Relative hardware costs have also shifted somewhat. Bottom line: Better reevaluate the options yourself. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2018-03-05 15:01 +0000 |
| Message-ID | <18-03-018@comp.compilers> |
| In reply to | #1971 |
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >George Neuner <gneuner2@comcast.net> writes: >>I'd really like to see that paper to find out what they are comparing >>to display [and using what language]. > >The language was probably Algol or one of it's direct offspring. > >Anyway, if I get around to it, I will search for a reference to the >paper in our library, but if someone else can supply the reference, >that would be cool. I looked in some older compiler books, but did not find a reference to such a paper. What I found: * Aho and Ullman (1977) only mention displays as ways to implement static scoping. * Aho, Sethi, Ullman (1986) also mention access links (static links). * Fischer and Leblanc (1988) discuss both, and say the following: |For example, a study of reference patterns in a wide variety of |Simula 67 programs (Magnusson 1982) found that fully 80% if all |references were to variables at the outermost level or to local |variables. Another 17% of the references were to the block |immediately surrounding the local one. [...] Thus few references |will actually require following the static chain, and most of those |will require only one level of such indirection. |Magnussen, K. 1982. "Identifier references in Simula 67 programs." |Simula Newsletter 10(2): They do not write about performance disadvantages of displays, though. I also looked if I could find Wirth's response to Cohen's paper, and I found it: <https://dl.acm.org/citation.cfm?id=214521> (probably behind a paywall for many). And there he gave the reason for not using a display for accessing non-local variables: |Upon closer analysis, we decided to discard the use of a display |that was already in our first Pascal compiler (1969). The reason |was that (1) the display requires updating for every procedure call |and return (!) that causes a change of context, and (2) variables at |intermediate levels (neither local nor global) are referenced quite |rarely. As a result, the maintenance of a display turned out to be |more costly than the inefficient access of nonlocal variables via a |static link. So apparently there was no influential paper in the 60s or 70s that damned the display. Wirth and his collaborators discovered the performance disadvantages of displays by original research, but apparently did not publish this piece of knowledge before 1991. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2018-03-05 15:51 -0500 |
| Message-ID | <18-03-022@comp.compilers> |
| In reply to | #1975 |
On Mon, 05 Mar 2018 15:01:33 GMT, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: >I also looked if I could find Wirth's response to Cohen's paper, and I >found it: > ><https://dl.acm.org/citation.cfm?id=214521> (probably behind a paywall >for many). > >And there he gave the reason for not using a display for accessing >non-local variables: > > |Upon closer analysis, we decided to discard the use of a display > |that was already in our first Pascal compiler (1969). The reason > |was that (1) the display requires updating for every procedure call > |and return (!) that causes a change of context, and (2) variables at > |intermediate levels (neither local nor global) are referenced quite > |rarely. As a result, the maintenance of a display turned out to be > |more costly than the inefficient access of nonlocal variables via a > |static link. > >So apparently there was no influential paper in the 60s or 70s that >damned the display. Wirth and his collaborators discovered the >performance disadvantages of displays by original research, but >apparently did not publish this piece of knowledge before 1991. > >- anton I may be mistaken on the timeline, but it seems to me that in 1990 Wirth already was working on Oberon, which, like Modula-2, had visibility rules that were problematic for a display implementation. I think he also had seen how people had used Modula-2, and before that the "unit" [modular] extended Pascals. Programmers in these languages tended to prefer the modules to control visibility and to write flatter code that made less use of nested functions. OTOOH, there were the Lisp programmers. Even in Lisp, deep scope accesses are not that common. But they aren't exactly rare either ... sometimes you just have to step-wise build up the context for a function. I think the real issue was that, by 1990, everybody in the language research community knew that closure conversion was a better choice than either static chains or displays: not just for performance, but because it enables more freedom and functionality in the language. YMMV, George
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2018-03-05 14:39 -0500 |
| Message-ID | <18-03-020@comp.compilers> |
| In reply to | #1971 |
On Fri, 02 Mar 2018 09:30:42 GMT, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: >George Neuner <gneuner2@comcast.net> writes: >>On Sat, 17 Feb 2018 16:39:04 GMT, anton@mips.complang.tuwien.ac.at >>(Anton Ertl) wrote: >> >>>Kaz Kylheku <217-679-0842@kylheku.com> writes: >>>>On 2018-02-15, George Neuner <gneuner2@comcast.net> wrote: >>>>> No worries. IME, displays don't get much respect from modern >>>>> textbooks - they are mentioned mostly in passing. >>>> >>>>This is probably because of >>> >>>This is because displays were found to be more costly for Algol-like >>>languages >> >>More costly than what? > >Static link chains. > >>> IIRC the additional cost is in updating the display on calls >>>and returns. >> >>Which is no different from the overhead to maintain static links > >Wrong. Consider the following nesting of functions: > >A > B > C >D > >With a display, in C you have built up a display pointing to the >frames of C, B, and A. When performing a call from C to D, you throw >that away and replace it with a display pointing to just the frame of >D. When returning from D, you have to restore the old display. No. D and A are at the same lexical level. On entry, D will replace A's entry in the display, and will restore A's entry on exit. The scope rules say D can't see or call B or C. If it calls A, then A replaces it at the same lexical level. When A returns, D's link is restored. No problem. And no need to preserve the whole display. >If you don't maintain a static link chain, you have to save the >complete display of C on the call and restore it on return. No you don't. >Note that >D may itself call functions that need more display, so you don't get >away with just saving and restoring the first slot of the display. >IIRC this was the variant looked at in the paper that concluded that >displays are more costly. That is simply wrong unless you change the whole semantics. As long as call sites are restricted to being within the definition scope of the called function [as is true in Pascal], and every function does its little part to maintain the display, then there is NO POSSIBLE function call sequence that requires saving/restoring the whole display. Each function needs only save/restore the single display entry for its own lexical level. Scope visibility rules dictate that no function can see lower into the display than its own level - so what those entries point to is irrelevant. >If you maintain a static link chain, you can restore the display of C >from the static link chain, but you now have the cost of maintaining >the static link chain and in addition the cost of maintaining the >display, which is obviously more costly than just the static chain >alone. If you go that way, you can actually produce an intermediate >scheme, where you cache static link accesses that you need, and the >cached accesses are similar to (the needed part of) a display. First, your example does not show any need for a static chain. Second: the cost to maintain a display is just a single extra pointer field in the call frame, one pointer copy and 2 pointer stores. FYI, the amortized cost of a display MAY be less than to maintain the static chain. For a static chain: - a call sideways in the same scope needs the same link as the caller. That can be fetched from the caller's frame. but - a call upwards to a higher scope (such as C->D in your example) needs a link pointer which it doesn't have available locally. The preamble code must go up the caller's chain to find the correct scope link to put in the new frame. OTOH, the steps to maintain the display are the same regardless of what function is called: - save the display entry for my lexical level. - replace the display entry with a pointer to my frame. - restore the display entry on exit. >>I'd really like to see that paper to find out what they are comparing >>to display [and using what language]. > >The language was probably Algol or one of it's direct offspring. > >Anyway, if I get around to it, I will search for a reference to the >paper in our library, but if someone else can supply the reference, >that would be cool. I really would like to see it. Algol doesn't do anything that would make a display more work than static chaining. But it would matter what code base they studied and how those programs were written. As you mention below, single level accesses are very cheap with static links [deeper accesses are not]. I can't agree or disagree with a paper I haven't read. And certainly I can't refute it's conclusions. >>>Static link chains work fine. >> >>Yes, they do work - but they are the lowest performance option for >>non-local accesses. > >The first non-local level has the same performance as a display. Are >deeper accesses frequent enough to pay for the increased cost of >maintaining the display? The paper's answer was no, but that may vary >from language to language (and with usage patterns). That's true. If the majority of non-local accesses are a single level away, then the display has no advantage for access. But it still may have an advantage for maintenance. It only takes a few upward function calls or deep non-local accesses to F_ performance of a static chain. >hardware costs have also shifted somewhat. Bottom line: Better >reevaluate the options yourself. Good advice !!! There still are plenty of uses for simple Pascal-like languages. Many DSL applications simply don't need multi-threading, objects, or other stuff that a general purpose language might need. Displays are just a tool in your toolkit ... you need to read the directions and use them appropriately. >- anton George
[toc] | [prev] | [next] | [standalone]
| From | John Levine <johnl@taugh.com> |
|---|---|
| Date | 2018-03-06 04:31 +0000 |
| Message-ID | <18-03-024@comp.compilers> |
| In reply to | #1976 |
>>>>This is because displays were found to be more costly for Algol-like >>>>languages ... >> >>>> IIRC the additional cost is in updating the display on calls >>>>and returns. I get the impression that displays describe two different things. One is a static array that has one entry for each level of lexical scoping. In a routine declared N levels (zero based) down, its prolog saves the Nth entry in the display and replaces it with a pointer to the current stack frame, and the epilog restores it. It can assume that higher level routines have correctly set entries 0:N-1. That seems pretty efficient. Assuming the saved pointer is at a known location in each stack frame you can do a longjmp and unwind stacks without too much pain. The other is the same array, but with a copy of the current display in each routine's stack frame. This is slower at call time but probably faster to use on machines like S/360 without direct addressing since the display on the stack is addressable from the frame pointer, but static data needs to load a pointer from a constant pool. A longjmp doesn't need anything special since stacked displays go away when the stack frames go away. R's, John
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.compilers
csiph-web