Groups > comp.compilers > #163 > unrolled thread

How to handle qualified identifiers such as x.y in a Pascal-like language

Started by	noitalmost <noitalmost@cox.net>
First post	2011-06-20 15:43 -0400
Last post	2011-06-29 15:51 -0700
Articles	20 on this page of 21 — 9 participants

Back to article view | Back to comp.compilers

  How to handle qualified identifiers such as x.y in a Pascal-like language noitalmost <noitalmost@cox.net> - 2011-06-20 15:43 -0400
    Re: How to handle qualified identifiers such as x.y in a Pascal-like language torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-06-22 10:57 +0200
    Re: How to handle qualified identifiers such as x.y in a Pascal-like language Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2011-06-22 11:47 +0100
      Re: How to handle qualified identifiers such as x.y in a Pascal-like language George Neuner <gneuner2@comcast.net> - 2011-06-24 18:13 -0400
        Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-06-29 12:31 -0700
          Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-07-01 12:46 -0700
          Re: How to handle qualified identifiers such as x.y in a Pascal-like language anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2011-07-02 17:13 +0000
            Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-07-03 13:14 -0700
              Re: How to handle qualified identifiers such as x.y in a Pascal-like language torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-07-07 10:27 +0200
              Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-07-07 04:14 -0700
          Re: How to handle qualified identifiers such as x.y in a Pascal-like language George Neuner <gneuner2@comcast.net> - 2011-07-02 16:58 -0400
    Re: How to handle qualified identifiers such as x.y in a Pascal-like language Gene <gene.ressler@gmail.com> - 2011-06-22 19:21 -0700
      Re: How to handle qualified identifiers such as x.y in a Pascal-like language Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2011-06-24 07:56 +0100
        Re: How to handle qualified identifiers such as x.y in a Pascal-like language Gene <gene.ressler@gmail.com> - 2011-06-24 19:19 -0700
          Re: How to handle qualified identifiers such as x.y in a Pascal-like language Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2011-06-25 11:55 +0100
            Re: How to handle qualified identifiers such as x.y in a Pascal-like language noitalmost <noitalmost@cox.net> - 2011-06-29 13:13 -0400
      Re: How to handle qualified identifiers such as x.y in a Pascal-like language "[Linux Magazine]" <uu3kw29sb7@snkmail.com> - 2011-06-24 13:58 +0200
        Re: How to handle qualified identifiers such as x.y in a Pascal-like language George Neuner <gneuner2@comcast.net> - 2011-06-25 17:11 -0400
    Re: How to handle qualified identifiers such as x.y in a Pascal-like language noitalmost <noitalmost@cox.net> - 2011-06-23 12:43 -0400
      Re: How to handle qualified identifiers such as x.y in a Pascal-like language Tony Finch <dot@dotat.at> - 2011-06-29 18:55 +0100
      Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-06-29 15:51 -0700

Page 1 of 2 [1] 2 Next page →

#163 — How to handle qualified identifiers such as x.y in a Pascal-like language

From	noitalmost <noitalmost@cox.net>
Date	2011-06-20 15:43 -0400
Subject	How to handle qualified identifiers such as x.y in a Pascal-like language
Message-ID	<11-06-037@comp.compilers>

Here's a silly example program in my language syntax.
program P:
	var x : int;

	procedure s() :
		var x : int;

		procedure t() :
			var x : int;
			x := 3;
			s.x := x + 10;
		end t;

		x := 2;
		t();
		P.x := x + 10;
	end s;

	x := 1;
	s();
	print x;	# should output 23
end P.


I know how to handle the notation if the qualified identifier is a
record, since there's no change in the stack frame. It's just an
offset in the current frame.

What I don't quite understand is how to parse access to a variable in
an outer stack frame. And I think this is similar to the problem of
nested procedures (which I also don't quite know how to handle).

My parser is recursive descent and produces an AST. The AST consists
of nodes such as Program, ProcDef, VarDef, WhileStmt, AssignStmt,
Ident, Num, and ProcCall. The nodes ProcDef and VarDef are the nodes
in the symbol tables, while Ident represents an identifier as used in
a statement. Thus, an Ident node contains a pointer to a definition in
a symbol table. That definition has a field which gives an offset
relative to the current stack frame. A Program or a ProcDef contains a
symbol table and a list of code statements. The interpreter walks the
AST. When it encounters a node such as ProcDef, it creates a new stack
frame, uses the procedure's symbol table to allocate local variables,
and then walks its list of code statements.

So when I encounter
P.x := x + 10;
in the above program, I get an AST fragment something like
Assign( P.x, Add( x, 10) )
where x is an Ident and 10 is a Num. But what do I do with P.x? Should the
parser do all the work and make P.x simply an Ident (whose symTabEntry points
to the outermost x? Then Ident would need a frameCnt field so the interpreter
knows how many stack frames to back out of. In the above example, frameCnt
would be 1, while reference to a local variable would have frameCnt of 0.

Is this a reasonable way to approach the problem?
[Pretty much.  The standard way to handle references to an enclosing
scope is with a display, the calling procedure passes the enclosing
stack frame addresses as hidden parameters.  If your language doesn't
allow variable sized declarations or recursion, P.x will be at a fixed
distance below the current stack frame so you can address it directly,
but in a more general case, you need the display.  See any 1970s
compiler text for details. -John]

[toc] | [next] | [standalone]

#164

From	torbenm@diku.dk (Torben Ægidius Mogensen)
Date	2011-06-22 10:57 +0200
Message-ID	<11-06-038@comp.compilers>
In reply to	#163

noitalmost <noitalmost@cox.net> writes:

> What I don't quite understand is how to parse access to a variable in
> an outer stack frame. And I think this is similar to the problem of
> nested procedures (which I also don't quite know how to handle).

Parsing qualified names is easy enough, so I think you mean how to get
access to a variable in an outer frame.

John mentioned displays, but I think it is easier to use static links.
Displays can be slightly more efficient, though.

Basically, the stack frame for a procedure g has a pointer (the static
link) to the frame of the procedure f in which g is declared.  Note that
this is not necessarily the previous frame, as g can be called from a
procedure declared locally in f or g.

In the simplest case, all variable for f are stored in f's frame, so g
can access these by following the static link to get to f's frame and
then using an offset from this to find the variable.  If you need to
access a variable from a scope even further out than f, you first follow
the static link to f's frame and then in f's frame find the static link
to its outer scope and so on, until you find the right frame.

Your compile-time symbol table should contain information about the
scope depth and offset of any variable, so you (by comparing to the
current scope depth) can see how many static links you must follow.

If you register-allocate variables, you must make sure that variables
declared in outer scopes are stored at known offsets in the frames of
their procedure at the time the are accessed in inner scopes.  This can
be done by using a caller-saves calling convention: At a call, store all
live variables in known offsets in the frame.  Note that variables
accessed in inner scopes should be considered live at the call site.  If
you use a mixed caller-saves/callee-saves convention, allocate all
variables that are accessed in inner scope in caller-saves registers and
make sure these are stored at fixed offsets in the frame.  If using a
pure callee-saves strategy, spill variables that are called from inner
scopes to the frame, so they have a known location.

An alternative is lambda lifting.  This transforms the program so all
procedures are global.  Any variables from (previously) outer scopes
that are acccessed inside a lifted procedure are now passed as extra
reference parameters to the procedure.  So if g is lifted out of f and
accesses f's local variable x, g is given the address of x as an extra
parameter.  You need to handle reference parameters like Pascal's VAR
parameters or the address-of (&) operator in C.  Basically, any
variable that gets its address taken in either way should be stored in
the frame of its procedure.  This can be done by spilling it.
Variables that can not be modified after initialisation can be passed
by value rather than reference.

	Torben

[toc] | [prev] | [next] | [standalone]

#165

From	Hans-Peter Diettrich <DrDiettrich1@aol.com>
Date	2011-06-22 11:47 +0100
Message-ID	<11-06-039@comp.compilers>
In reply to	#163

noitalmost schrieb:
> What I don't quite understand is how to parse access to a variable in
> an outer stack frame. And I think this is similar to the problem of
> nested procedures (which I also don't quite know how to handle).

First of all: local procedures using local variables in an *outer*
stackframe are problematic. Such a language feature deserves much
work, in detail when you also want to allow to use local procedures as
callbacks (procedure pointers). I'd drop that from my language.

Otherwise John already explained how to pass a reference to outer
stackspace(s) to the nested procedure, as an hidden parameter. But
this added parameter will disallow to give away references to the
local subroutine, because the caller then doesn't know about that
parameter.

DoDi

[toc] | [prev] | [next] | [standalone]

#171

From	George Neuner <gneuner2@comcast.net>
Date	2011-06-24 18:13 -0400
Message-ID	<11-06-045@comp.compilers>
In reply to	#165

On Wed, 22 Jun 2011 11:47:19 +0100, Hans-Peter Diettrich
<DrDiettrich1@aol.com> wrote:

>noitalmost schrieb:
>> What I don't quite understand is how to parse access to a variable in
>> an outer stack frame. And I think this is similar to the problem of
>> nested procedures (which I also don't quite know how to handle).
>
>First of all: local procedures using local variables in an *outer*
>stackframe are problematic. Such a language feature deserves much
>work, in detail when you also want to allow to use local procedures as
>callbacks (procedure pointers). I'd drop that from my language.

But doing so sacrifices quite a bit of power.

In a Pascal-like language having strict stack semantics, the best way
to handle non-local variables is to identify and move them all into a
defined structure in the stack frame of the outermost enclosing
function, then add a hidden pointer argument to all the inner
functions and pass the address of the structure down the call chain.

The same analysis allows for creating persistent closures by
allocating the non-local variable structure on the heap instead of on
the stack.
[Of course, then you need other mechanisms to invoke the closures and
clean up after them when they are no longer needed ... mechanisms
which are beyond the scope of this discussion.]

George

[toc] | [prev] | [next] | [standalone]

#180

From	BGB <cr88192@hotmail.com>
Date	2011-06-29 12:31 -0700
Message-ID	<11-07-004@comp.compilers>
In reply to	#171

On 6/24/2011 3:13 PM, George Neuner wrote:
> On Wed, 22 Jun 2011 11:47:19 +0100, Hans-Peter Diettrich
> <DrDiettrich1@aol.com>  wrote:
>
>> noitalmost schrieb:
>>> What I don't quite understand is how to parse access to a variable in
>>> an outer stack frame. And I think this is similar to the problem of
>>> nested procedures (which I also don't quite know how to handle).
>>
>> First of all: local procedures using local variables in an *outer*
>> stackframe are problematic. Such a language feature deserves much
>> work, in detail when you also want to allow to use local procedures as
>> callbacks (procedure pointers). I'd drop that from my language.
>
> But doing so sacrifices quite a bit of power.
>
> In a Pascal-like language having strict stack semantics, the best way
> to handle non-local variables is to identify and move them all into a
> defined structure in the stack frame of the outermost enclosing
> function, then add a hidden pointer argument to all the inner
> functions and pass the address of the structure down the call chain.
>
> The same analysis allows for creating persistent closures by
> allocating the non-local variable structure on the heap instead of on
> the stack.
> [Of course, then you need other mechanisms to invoke the closures and
> clean up after them when they are no longer needed ... mechanisms
> which are beyond the scope of this discussion.]

well, I may elaborate, pardon if I am going outside the allowed scope
here...

one nifty trick I came up with in the context of my own stuff is:
allow using GC (Garbage Collection) for machine-code thunks;
create the actual function as accepting an extra argument to the
closed-over binding frames (chained heap-allocated structs, also GC'ed);
create a "closure object", which holds the captured frame, and re-calls
the actual function with the frame reference added to the argument list.

this process is fairly simple on x86 (with cdecl), but sadly gets very
nasty with the SysV/AMD64 calling convention (among other things, making
many such operations a good deal more expensive and error-prone than
their 32-bit counterparts).

even then, there are obvious deficiencies in my support of SysV/AMD64,
for example, a lot of my code doesn't handle variable argument lists
correctly, and technically the way literal structs are passed in my
stuff is just wrong (struct passing rules are like in Win64, involving
passing pointers to-be-copied memory, rather than decomposing them into
registers). but, grr...

I am still left to much wish Linux/... had adopted the Win64 ABI
instead, or at least something less complicated to work with. I am left
to consider later re-adopting an alternative calling convention in this
case, using SysV/AMD64 mostly only for external interfacing (possible:
AMD64 callee/caller-save status kept, but using Win64 argument passing,
and using name-decoration or name-mangling).

I technically supported closures in my C compiler (using the same syntax
as GCC's nested functions), but this feature wasn't really used, since I
mostly ended up using only "standard C friendly" features, or ones which
could be emulated via macros. later, my C compiler has mostly fallen
victim to bit-rot and my inability to effectively debug it (leaving me
to mostly use native compilers for C).

however, closures in the above form are still used, in a slightly less
nice-looking form, in plain C.

also they are used much more in my own BGBScript language, which is
mostly in the same language family as JavaScript and ActionScript (and
mostly conforms with ECMA-262). this language has closures as a default
feature.

the above mechanism (creating executable thunks) is mostly only used for
external interfacing though, as most internal calls are via
dynamically-typed "function objects", which are typically optimized some
by caching an "apply handler" (such as in class vtables/...), which are
called with the function-object and a pointer to the argument list or
similar (calling into a native ABI, such as SysV/AMD64, generally
involves "re-packing" the argument list at the last moment).

note: it is technically possible to call pretty much anything which has
an apply handler (note: most handlers are registered with type-specific
vtables, and there are many other types of handlers as well for various
operations).

how much of this would make sense in others' projects, I don't really
know...
[This strikes me as all stuff the Lisp community figured out in about 1980.
-John]

[toc] | [prev] | [next] | [standalone]

#182

From	BGB <cr88192@hotmail.com>
Date	2011-07-01 12:46 -0700
Message-ID	<11-07-006@comp.compilers>
In reply to	#180

On 6/29/2011 12:31 PM, BGB wrote:

> how much of this would make sense in others' projects, I don't really
> know...
> [This strikes me as all stuff the Lisp community figured out in about 1980.
> -John]

well, sadly, I don't have any good sources of information related to
VM implementation and design, as most of what I can find is
fragmentary information on the internet (much of my VM technology is
built up from trivia I had read about other VMs, or figured out myself
as part of an ongoing process).

my original exposure to how such a VM would work was gained mostly by
looking over the source for GNU Guile, and throwing together a sort of
makeshift clone (I think at the time I also looked at SIOD and PLT and
others for ideas).

most other information is unusably high-level (mostly focusing on
high-level language features or "pie in the sky"
language/computing/...  concepts), rather than how to implement them
effectively, or deal with more practical concerns (such as dealing
calls to/from C or C++ code and with matters of sharing data and
structures).

actually, many people refer to a lot of these concerns derisively as
"implementation details", which personally I find a bit frustrating.

most information I can find on compiler technology often doesn't match
well with my architecture (often focuses on statically-typed languages
using an AST-driven or SSA-based processes).

in my case I use a type-agnostic stack-machine, and handle types much
later in the process, generally mixed with register allocation and
emitting ASM.

most other information I have found has been of commercial origin
(mostly technical documentation for various products, ...).

(decided to leave out a big description of the overall compilation
process...).

so, it is maybe fair enough that I don't know everything that others
have done.

too bad there is not a big wiki or similar dedicated to compiler and VM
design and implementation topics, without being dedicated to a
particular project or technology.

say, if it would be sort of like Wikipedia, but much more
technically-oriented and topic-based (rather than a random mis-mash of
comments and "stream of consciousness" type stuff like on c2wiki...).

[toc] | [prev] | [next] | [standalone]

#184

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2011-07-02 17:13 +0000
Message-ID	<11-07-008@comp.compilers>
In reply to	#180

BGB <cr88192@hotmail.com> writes:
>I am still left to much wish Linux/... had adopted the Win64 ABI
>instead

That would have been quite an achievement, because the Windows x64 ABI
came out quite some time after Linux and others adopted the SysV ABI,
so it would have required looking into the future.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#187

From	BGB <cr88192@hotmail.com>
Date	2011-07-03 13:14 -0700
Message-ID	<11-07-011@comp.compilers>
In reply to	#184

On 7/2/2011 10:13 AM, Anton Ertl wrote:
> BGB<cr88192@hotmail.com>  writes:
>> I am still left to much wish Linux/... had adopted the Win64 ABI
>> instead
>
> That would have been quite an achievement, because the Windows x64 ABI
> came out quite some time after Linux and others adopted the SysV ABI,
> so it would have required looking into the future.
>

an idle wish doesn't need to be within the realm of "could have actually
happened...".

I am half still considering of using a custom calling-convention for
internal calls. I decided to leave out a more detailed description of
the considered register assignments, argument passing, ... and if/when I
would do something like this is uncertain.

I guess it is mostly just the annoyance that SysV is sufficiently
complex that none of my reflective-call stuff interacting with it can do
so particularly efficiently or correctly (lots of nasty cruft here).

also, it seems to interact badly with my C coding practices in general.

I suspect to work well, SysV/AMD64 expects functions accepting lots of
arguments and with most of the execution/computation being in leaf
functions.

coding practices where most of the computation is in non-leaf functions
and leaf-functions are generally short/trivial (such as performing a
simple operation or returning a status value), ... seem to not be the
ideal use-case.

all this seems to lead to my code generally performing somewhat better
on 32-bit x86 systems and on Win64.

for example, my assembler seems to run about 2x slower on 64-bit Linux
than on 64-bit Windows in my tests, ...

I don't imagine my coding practices are all that novel though.

however, other people have benchmarked other programs, and it seems many
types of programs (apparently especially data-compression and encryption
programs) seem to perform better on Linux x86-64 than on Win64 or on
32-bit systems.

it generally seems to be the reverse for larger programs though (web
browsers, games, office-type apps, ...) which seem to generally perform
better on 32-bit systems and on Windows.

granted, I can't claim any objectivity here, given my personal dislike
(in general) of the SysV/AMD64 ABI...
[It is my impression that a lot of languages that allow closures and the
like end up with their own calling sequences. -John]

[toc] | [prev] | [next] | [standalone]

#190

From	torbenm@diku.dk (Torben Ægidius Mogensen)
Date	2011-07-07 10:27 +0200
Message-ID	<11-07-014@comp.compilers>
In reply to	#187

> [It is my impression that a lot of languages that allow closures and the
> like end up with their own calling sequences. -John]

Not surprising, since most vendor procedure call standards are
targeted for C-like languages: Flat scope, single result, no
exceptions, no tail call optimisation, and so on.  Once you go beyond
that you will need to either extend the calling standard or do
something quite different.

For example, most modern call standards are almost exclusively
callee-saves: While some registers are designated as caller-saves,
these are often only used for variables that are not live across any
call, so they are never saved.  Variables live across calls are all
stored in callee-saves registers (or spilled).

This makes both tail call optimisation and exceptions more costly.  The
first because you have to restore the callee-saves registers before the
tail-call jump (unless it is to the same procedure, which is why many
compilers only support tail recursion and not general tail calls) and
the latter because the registers you have to restore at an exception
will be stored in many different frames, so you will have to unwind the
stack to restore them, where a pure caller-saves strategy will allow you
to go directly to the frame of the exception handler and take the saved
live registers from this.

Sure, at leaf calls a callee-saves strategies might avoid saving
registers at all, but inlining will get you the same (and other)
benefits, though it will increase code size.

	Torben

[toc] | [prev] | [next] | [standalone]

#193

From	BGB <cr88192@hotmail.com>
Date	2011-07-07 04:14 -0700
Message-ID	<11-07-017@comp.compilers>
In reply to	#187

On 7/3/2011 1:14 PM, BGB wrote:
<snip>
> granted, I can't claim any objectivity here, given my personal dislike
> (in general) of the SysV/AMD64 ABI...
> [It is my impression that a lot of languages that allow closures and the
> like end up with their own calling sequences. -John]

fair enough, however, most calling conventions are not nearly so
antisocial towards external interfacing. glueing code together with
cdecl (AKA: SysV/i386 or similar) or Win64 is usually a relatively
straightforward process (albeit there is the complexity in the Win64
case that simply stuffing a single argument onto the stack would lead
to a misaligned stack, and one still has to generally know the number
and types of arguments to adjust for this).

SysV/AMD64 is relatively unique in requiring one to essentially rebuild
the arguments list to make minor adjustments (making these tasks
potentially a bit more expensive).

but, yeah, there are merits potentially to devising ones own ABI. for
example, there are many tasks that could be potentially done more
efficiently, ...

also, if the new ABI can be built on top of things that can be done in
C, then it may sidestep having to do a bunch of nasty ASM-based cruft to
make it work (a current downside of some of my present stuff).

but, my reason for trying to stick so much to the C ABIs, is that it
makes it generally much easier to interface things with C.

meanwhile, my assembler (previously x86/x86-64 only) now also targets
ARM. however, in general, ARM systems are giving me lots of trouble,
making the prospect of actually testing the thing on/with ARM a bit more
problematic at the moment.

[toc] | [prev] | [next] | [standalone]

#185

From	George Neuner <gneuner2@comcast.net>
Date	2011-07-02 16:58 -0400
Message-ID	<11-07-009@comp.compilers>
In reply to	#180

On Wed, 29 Jun 2011 12:31:40 -0700, BGB <cr88192@hotmail.com> wrote:

<some interesting stuff about his own implementation of closure
invocation and GC>

>[This strikes me as all stuff the Lisp community figured out in about
>1980.  -John]

Most of it much earlier - the only thing yet missing is any kind of
standard call convention - apart from foreign calls (e.g., to C
libraries) nearly every Lisp handles its internal function calling
differently.

Closure invocation was implemented in the very first Lisp
implementation (1965?), as was reference counting GC.  By 1970 tracing
GC was rising and reference counting was falling out of favor.
Dijkstra and Lamport's incremental software GC  and the Lisp-2
incremental GC which used VMM page protection to implement the
tri-color abstraction, both debuted in 1967.  Cheney described his
semi-space copying collector in 1970.

Unfortunately, I started this by mentioning closures in my post.  I
didn't intend to clutter the thread with concepts that are too
advanced for the newbie OP, but rather just to indicate that the
method I was advocating to solve his problem had utility beyond his
current needs.

George

[toc] | [prev] | [next] | [standalone]

#166

From	Gene <gene.ressler@gmail.com>
Date	2011-06-22 19:21 -0700
Message-ID	<11-06-040@comp.compilers>
In reply to	#163

On Jun 20, 3:43 pm, noitalmost <noitalm...@cox.net> wrote:
> What I don't quite understand is how to parse access to a variable in
> an outer stack frame. And I think this is similar to the problem of
> nested procedures (which I also don't quite know how to handle).

P.x and x have different scopes, which in this case are obviously
nested.  There are two separate issues: 1) How to represent different,
potentially nested scopes in a symbol table and 2) what code to
generate.  As John says, any compiler text will deal with both.

For 1) perhaps the simplest to implement is a chain of dictionaries.
Every time a scope opens, create a new dictionary, chain it onto the
current one, and start using the new.  When a scope closes, throw away
the current dictionary and resume using the next in the chain.

In your sample program's case, there will be a dictionary for
variables of program (P) scope and then one for each function.  Always
the P dictionary will be the last in the chain.  Nested functions will
each add to this chain, one per nesting level.

Insertions occur for each declaration, always in the current
dictionary.  Lookups of plain identifiers check each dictionary in the
chain until the id is found. The number of chain links followed before
finding the id is important. Stay tuned.

When the language allows (as does yours) explicit scope paths, each
dictionary D should also be paired with its identifier id(D) for the
scope that caused it to be created, here function or program name. The
lookup of a qualified id reference then proceeds right-to-left.  In
your example to process s.x, work along the dictionary chain looking
for the x.  When it's found (in this case immediately) in a given
dictionary D, match id(D) with s.  If the path is longer, say P.s.x,
you'd continue along the chain matching dictionary id's with path
elements.  If the whole path matches, you've found the right x.
Otherwise keep looking for x's further down the chain.

When generating code, each function call creates an "activation
record" on the stack to hold parameters and local variables.  The
simplest way to implement static scopes is with static links.  A SL is
just a pointer at the same position in each activation record that
points to the previous activation of the statically enclosing
function.  (It's the norm to have SL's point to the same location in
the AR as the frame pointer would.)

It turns out that when you compile a function call, the code that
computes the new AR's static link must dereference the link of the
current AR N times where N>=0 is the number of symbol table links
needed to find the name of the called function during parsing. (N=0
corresponds to the current frame pointer. N=1 is the current AR's SL,
N=2 is contents of current AR's SL, etc.)

Cool or what?  In this manner, static links always form a linked list
of ARs with one AR per scope in the static nesting of the currently
running function.  So when you generate code for an identifier
reference x, it will again need to dereference the current static link
N times where N>=0 was the number of links needed to find x in the
symbol table.

If you read this rather confusing recipe a few times, it will at some
point click and you'll see the beauty. If you need a picture, consult
the aforementioned compiler or programming languages textbook.

The "display" method John is talking about merely replaces the static
links between ARs with what is effectively an array of pointers to
ARs. Displays can be more efficient than static links, especially when
functions are deeply nested and/or when the compiler is free to put
oft-used display elements in registers, but implementing them
correctly may be trickier, especially if your language allows function
pointers and/or threads.

[toc] | [prev] | [next] | [standalone]

#169

From	Hans-Peter Diettrich <DrDiettrich1@aol.com>
Date	2011-06-24 07:56 +0100
Message-ID	<11-06-043@comp.compilers>
In reply to	#166

Gene schrieb:

> When the language allows (as does yours) explicit scope paths, each
> dictionary D should also be paired with its identifier id(D) for the
> scope that caused it to be created, here function or program name. The
> lookup of a qualified id reference then proceeds right-to-left.

This IMO is the wrong way. Structured data types require left-to-right
evaluation of the qualified name, i.e. for P.x an identifier P would
be searched first, followed by a search for x in P's *local*
dictionary.

This is problematic with the intended approach, to refer to the
*current* module by its name (P), because the dictionary of P will
already be in the scope-stack, and this one does *not normally*
include the identifier P itself. Instead another (synthetic) outer
scope would be required, that only contains the P identifier, which
refers back to P's scope. Then the search for an global identifier x
would end up immediately in P's scope, of module-level identifiers,
while the qualified search had to go one entry deeper, to resolve
identifier P, which then refers to the (already traversed) P scope.

That's why e.g. C++ uses (AFAIR) "::" to denote the global (static)
scope, not the name of any module. This allows to start the search for
the following identifier(s) immediately in the correct scope, with no
chance for possible mismatches with identifiers P in some other active
scope.

In ObjectPascal (Delphi, FPC) every module (unit) has a public scope.
When a unit is "used", its public scope is pushed onto the scope stack.
This makes available all public external identifiers without
qualification. Newer Wirthian languages instead require full
qualification for references to external symbols, what reduces the
number of scopes to be searched, and eliminates errors that otherwise
can occur when the sequence of the used units is changed, for some reason.

> In your example to process s.x, work along the dictionary chain
> looking for the x.  When it's found (in this case immediately) in a
> given dictionary D, match id(D) with s.  If the path is longer, say
> P.s.x, you'd continue along the chain matching dictionary id's with
> path elements.  If the whole path matches, you've found the right x.
> Otherwise keep looking for x's further down the chain.

This would require that *all* local dictionaries are in the search path,
what IMO doesn't make sense (performance wise), because a full match of
s.x only can occur in the s dictionary, regardless of how many other
scopes contain an x.

DoDi

[toc] | [prev] | [next] | [standalone]

#172

From	Gene <gene.ressler@gmail.com>
Date	2011-06-24 19:19 -0700
Message-ID	<11-06-046@comp.compilers>
In reply to	#169

On Jun 24, 2:56 am, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
> Gene schrieb:
>
> > When the language allows (as does yours) explicit scope paths, each
> > dictionary D should also be paired with its identifier id(D) for the
> > scope that caused it to be created, here function or program name. The
> > lookup of a qualified id reference then proceeds right-to-left.
>
> This IMO is the wrong way. Structured data types require left-to-right
> evaluation of the qualified name, i.e. for P.x an identifier P would
> be searched first, followed by a search for x in P's *local*
> dictionary.

Certainly there are lots of ways to implement. The one I proposed is
not the very best for speed. It is rather good for space, though, and
it's beautifully simple. Your proposal here though is I think
incorrect.  If we have

procedure P:
  var x;
  procedure P:
    var x;
    ... P.x ...

The P.x reference would conventionally be associated with the inner
x.  If you start searching for P.x outer scope first, you get the
wrong one.

> This is problematic with the intended approach, to refer to the
> *current* module by its name (P), because the dictionary of P will
> already be in the scope-stack, and this one does *not normally*
> include the identifier P itself.

You're right about P not being in its own dictionary.  Each dictionary
lies in a tuple (the pair I spoke of), where the other element is the
identifier of the dictionary's scope.  The identifier will also appear
in the dictionary for the enclosing scope.  The program identifier
lies in no dictionary at all because it has no enclosing scope. With
this convention I can't see a problem as long as procedure/program
names inhabit the same name space as variables.  Maybe an example
would help me see your concern.

I have actually used this technique to implement an assembler with
scopes, and it worked quite well.

> That's why e.g. C++ uses (AFAIR) "::" to denote the global (static)
> scope, not the name of any module. This allows to start the search for
> the following identifier(s) immediately in the correct scope, with no
> chance for possible mismatches with identifiers P in some other active
> scope.

You're right, but this language isn't C++.  Nested procedures are
allowed, and the s.x example shows that complete paths are not
required.

> This would require that *all* local dictionaries are in the search path,
> what IMO doesn't make sense (performance wise), because a full match of
> s.x only can occur in the s dictionary, regardless of how many other
> scopes contain an x.

The dictionaries in the path are those for the static scopes currently
open for parsing.  When parsing a procedure nested N deep, you have N
dictionaries.  In a realistic program, N is often 1, maybe 2, seldom
3, almost never 4...  In my experience id lookups are a trivial cost
of processing compared to once-per-character operations like scanning.
And in practice the most frequent case by far is simple id's in the
inner scope, which need only one dictionary lookup. So you'd be
unlikely to find this algorithm is a bottleneck.

Thanks for the discussion.

[toc] | [prev] | [next] | [standalone]

#173

From	Hans-Peter Diettrich <DrDiettrich1@aol.com>
Date	2011-06-25 11:55 +0100
Message-ID	<11-06-047@comp.compilers>
In reply to	#172

Gene schrieb:

> procedure P:
>   var x;
>   procedure P:
>     var x;
>     ... P.x ...
>
> The P.x reference would conventionally be associated with the inner
> x.  If you start searching for P.x outer scope first, you get the
> wrong one.

I dare to disagree. When P.x would refer to the inner x, how then would
you refer to the outer x?

Also: how would you call the inner P, from inside the outer P?
More general, how to distinguish between an recursive call, and a call
of a local subroutine, when both have the same name?

IMO qualifiers should allow to address identifiers in *external* (here:
outer) scopes, which otherwise would be hidden by a local identifier of
the same name.

A more elaborate example:

program P;
var P;
var x;
procedure P;
   var P: RecordContainingX;
   var x;
   procedure P;
     var x;
     ... P.x ...
     ... P.P.x ...
   end; //inner P
   ... P.x ... //x in var P or in outer procedure P?
   ... P() ... //recursive or local procedure call?

Some general questions:

Should we allow for nested scopes of equal names at all?
(procedure P inside program P and procedure P)

Should we allow for local names, duplicating the scope name?
(var P inside program P or procedure P)?

How to return values from functions?
(by assignment to the function name (Pascal), or to a Result variable
(Delphi), or by explicit "return <value>" statement (C) )

How to distinguish, if ever, between partial (relative) and fully
qualified (absolute) references? Relative to what?

IMO a name in an inner scope hides identifiers of the same name, in all
outer scopes. This should be perfectly allowed.

But duplicate names inside the same scope should be disallowed. This
will make a difference, depending on whether the scope name is part of
its own scope (as you suggest for P.x), or whether the name of a scope
is a member of its parent scope (with inner procedure P being a member
of the outer procedure P).


>> This is problematic with the intended approach, to refer to the
>> *current* module by its name (P), because the dictionary of P will
>> already be in the scope-stack, and this one does *not normally*
>> include the identifier P itself.
>
> You're right about P not being in its own dictionary.  Each dictionary
> lies in a tuple (the pair I spoke of), where the other element is the
> identifier of the dictionary's scope.  The identifier will also appear
> in the dictionary for the enclosing scope.  The program identifier
> lies in no dictionary at all because it has no enclosing scope. With
> this convention I can't see a problem as long as procedure/program
> names inhabit the same name space as variables.  Maybe an example
> would help me see your concern.

When we have these nested socpes:

program P
   procedure P
     procedure P
       procedure P

what (unique!?) scope ID should be assigned to each scope?

IMO this example only reveals, that nested scopes of same names tend to
*fully* hide outer scopes of the same name. I assume that you want to
bypass this restriction, by qualifier evalution from right to left. I.e.
P.x would address an x in the nearest enclosing scope P, while P.P.x
would refer to an x in the next *outer* P scope. Right?

Then an implementation had to search for in identifer x first and, when
found, successively match the next qualifiers with the names of the
containing and enclosing scopes. Very nice, indeed :-)

But then I wonder how such a resolution will fit together with (nested)
scopes of structured types. Imagine a record variable R, then R.x will
refer to the immediate member x of this record. But then R.x.y has to
refer to member y of the scope of x, what IMO means that here the
qualifiers *must* be evaluated left-to-right.

A more concrete example:

type r1 = record x:integer ... end;
type r2 = record x:r1 ... end;
type r3 = record x:r1 ... end;
var R:r2;
... R.x.x ...

Where would you place the scopes for the records, so that their "x"s are
found when starting qualifier resolution from the right? And finally,
when x.x has been matched in scope r2, how to resolve the next qualifier
"R"?

> I have actually used this technique to implement an assembler with
> scopes, and it worked quite well.

Did your assembler include structured types, and if so, how did you
resolve beforementioned situation?


>> That's why e.g. C++ uses (AFAIR) "::" to denote the global (static)
>> scope, not the name of any module. This allows to start the search for
>> the following identifier(s) immediately in the correct scope, with no
>> chance for possible mismatches with identifiers P in some other active
>> scope.
>
> You're right, but this language isn't C++.  Nested procedures are
> allowed, and the s.x example shows that complete paths are not
> required.

Incomplete (relative) paths IMO deserve a unique evaluation order,
either RTL or LTR. Otherwise a different syntax must be used, so that
the starting point can be found out, from which evaluation can proceed
to both ends, e.g. P<-P<-R->x->y.

>> This would require that *all* local dictionaries are in the search path,
>> what IMO doesn't make sense (performance wise), because a full match of
>> s.x only can occur in the s dictionary, regardless of how many other
>> scopes contain an x.
>
> The dictionaries in the path are those for the static scopes currently
> open for parsing.  When parsing a procedure nested N deep, you have N
> dictionaries.  In a realistic program, N is often 1, maybe 2, seldom
> 3, almost never 4...  In my experience id lookups are a trivial cost
> of processing compared to once-per-character operations like scanning.
> And in practice the most frequent case by far is simple id's in the
> inner scope, which need only one dictionary lookup. So you'd be
> unlikely to find this algorithm is a bottleneck.

ACK, the order is O(n) for both strategies. My only objection addresses
the handling of structured types, in your model.

BTW, I once reduced the order of the search to almost O(1), by "merging"
new scopes into a single scope, with a stack for all visible
identifiers, and a hash table. The hashtable refers, after resolution of
name clashes, to the topmost (visible) identifier. When a scope is
closed, it's entries are popped off the stack, and the hashtable is
updated with the references to the next outer (now unidden) identifiers,
from a reference stored in every popped entry. It should be possible to
add qualified references into outer scopes to this model, somehow - my
implementation was for a C parser with no such requirement.

> Thanks for the discussion.

Thanks2 for the inspiration :-)

DoDi

[toc] | [prev] | [next] | [standalone]

#176

From	noitalmost <noitalmost@cox.net>
Date	2011-06-29 13:13 -0400
Message-ID	<11-06-050@comp.compilers>
In reply to	#173

On Saturday, June 25, 2011 06:55:06 am Hans-Peter Diettrich wrote:
> IMO qualifiers should allow to address identifiers in *external* (here:
> outer) scopes, which otherwise would be hidden by a local identifier of
> the same name.

For my language, this is my intention.

> A more elaborate example:
>
> program P;
> var P;
> var x;
> procedure P;
>    var P: RecordContainingX;
>    var x;
>    procedure P;
>      var x;
>      ... P.x ...
>      ... P.P.x ...
>    end; //inner P
>    ... P.x ... //x in var P or in outer procedure P?
		# should be x in the record var P
>    ... P() ... //recursive or local procedure call?
		# local call

My language's scopes for this program would have the following symbols:
global scope:
	P # the program
program P's scope:
	P, x	# variables
	P	# procedure
procedure P's scope:
	P	# record containing x
	x
	P	# procedure
inner procedure P's scope:
	x

Thanks for this example. It has helped me to consider some additional cases. I
never thought about the recursive vs. local call of P() in the above.

I'm parsing the dots left to right, which seems like it will give the results
I desire.

[toc] | [prev] | [next] | [standalone]

#170

From	"[Linux Magazine]" <uu3kw29sb7@snkmail.com>
Date	2011-06-24 13:58 +0200
Message-ID	<11-06-044@comp.compilers>
In reply to	#166

A follow up to the discussion on displays and linked list of stack
frames.  Many languages allow for storing references to procedures in
variables, or passing procedures as parameters.  One example is the
GNU variant of C and C++ implemented in GCC, which allows for both AND
nested procedures.

If thread starters language include storing references to procedures
and/or passing procedures as parameters, then thread starter needs to
consider, how to store references to procedures, and he better
consider it now.

When using linked lists of stack frames, the normal way of storing
references to procedures is to store a reference to the start of the
code implementing the procedure and a reference to the start of the
linked list of stack frames.  Thus you have a record with two fields.
When using displays you have the problem of storing the display
somewhere.  In some cases GCC uses something like trapezes for storing
references to procedures without having a record with two fields.
Unfortunately that involves storing code in writable storage, and that
is a big no no for security reasons.  Storing code in writable storage
makes life so much easier for hackers.

In practice people do not write procedures in procedures that much, and
procedures in procedures in procedures are rare.  This also means that
you need efficient access to global variables and variables local to the
procedure currently being executed.  Further the access to variables
declared in the directly surrounding scope should be fairly efficient.
The rest just needs to work.  You will do fine with an implementation
where all procedures have direct access to its own variables and global
variables, and variables declared in surrounding procedures may be more
difficult to access.

[toc] | [prev] | [next] | [standalone]

#174

From	George Neuner <gneuner2@comcast.net>
Date	2011-06-25 17:11 -0400
Message-ID	<11-06-048@comp.compilers>
In reply to	#170

On Fri, 24 Jun 2011 13:58:23 +0200, "[Linux Magazine]" <uu3kw29sb7@snkmail.com> wrote:
>In practice people do not write procedures in procedures that much, and
>procedures in procedures in procedures are rare.

Nested procedures and 1st class closures are very common in functional
languages.  C and its current popular variants don't permit them
(though C++0x has added anonymous lambda) and so, IMO, too many
programmers have no experience of their power.  1st class closures are
more general and more powerful than most object systems - closures
permit the construction of ad hoc objects that can bind arbitrary
functions to arbitrary data without the need for class definitions or
template objects.

>This also means that you need efficient access to global variables and
>variables local to the procedure currently being executed.  Further the
>access to variables declared in the directly surrounding scope should be
>fairly efficient.

This goes without saying.  However, it's fairly easy to arrange fast
access to nonlocals even in scopes several nest levels distant.

As has been mentioned, display is the canon method.  Displays, though,
have a problem dealing with deep nesting ... which, admittedly is
rare, but not unheard of.  Typically, the compiler will allow only so
many nesting levels and if you exceed that number, the compilation
fails.

Closure conversion - aka lambda lifting - is IMO a better method
because it can handle any nesting depth.  It takes a bit more analysis
during compilation, but it is more flexible than a display.

>The rest just needs to work.  You will do fine with an implementation
>where all procedures have direct access to its own variables and global
>variables, and variables declared in surrounding procedures may be more
>difficult to access.

Yes ... asymmetric access is acceptable for an academic exercise, but
not for a compiler that you would want others to use.  I know the OP
is a novice compiler developer, but let's advocate learning general
solutions that will continue to work for more advanced projects in the
future.

George

[toc] | [prev] | [next] | [standalone]

#168

From	noitalmost <noitalmost@cox.net>
Date	2011-06-23 12:43 -0400
Message-ID	<11-06-042@comp.compilers>
In reply to	#163

On Monday, June 20, 2011 03:43:24 pm John wrote:
>> Is this a reasonable way to approach the problem?
> [Pretty much.  The standard way to handle references to an enclosing
> scope is with a display, the calling procedure passes the enclosing
> stack frame addresses as hidden parameters.  If your language doesn't
> allow variable sized declarations or recursion, P.x will be at a fixed
> distance below the current stack frame so you can address it directly,
> but in a more general case, you need the display.  See any 1970s
> compiler text for details. -John]

My language does allow recursion, so my approach won't work as
written. So the AST node for an Ident should have a field for
enclosing scope levels, not stack levels?

I looked at Aho's description of displays. Currently, my interpreter
is using a more abstract stack. It's a stack of pointers, so the first
declared variable in a scope goes at frame offset 0, the second at
offset 1, etc. I was thinking of dedicating offset 0 to be a pointer
to all the bookkeeping info, such as enclosing scope pointers. Will
this work as my language matures, or are there some glaring gotchas?
[You more or less need a static place to pick up a pointer to the
stack frame for the current invocation of each lexically enclosing
routine. I don't see any reason that pointer to the display in a fixed
place wouldn't work. It's covered in detail in any compiler text
written when people still worried about compiling Pascal. -John]

[toc] | [prev] | [next] | [standalone]

#178

From	Tony Finch <dot@dotat.at>
Date	2011-06-29 18:55 +0100
Message-ID	<11-07-002@comp.compilers>
In reply to	#168

noitalmost <noitalmost@cox.net> wrote:
>
>I looked at Aho's description of displays. Currently, my interpreter
>is using a more abstract stack. It's a stack of pointers, so the first
>declared variable in a scope goes at frame offset 0, the second at
>offset 1, etc. I was thinking of dedicating offset 0 to be a pointer
>to all the bookkeeping info, such as enclosing scope pointers. Will
>this work as my language matures, or are there some glaring gotchas?

The paper on the implementation of Lua 5.0 describes how it deals with
references to variables in outer scopes.
http://www.lua.org/doc/jucs05.pdf
Lua keeps book-keeping information outside the main evaluation stack;
there's a parallel stack of procedure invocation records, and function
closures are stored on the heap.

Tony.
--
f.anthony.n.finch  <dot@dotat.at>  http://dotat.at/

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

How to handle qualified identifiers such as x.y in a Pascal-like language

Contents

#163 — How to handle qualified identifiers such as x.y in a Pascal-like language

#164

#165

#171

#180

#182

#184

#187

#190

#193

#185

#166

#169

#172

#173

#176

#170

#174

#168

#178