Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #163 > unrolled thread
| Started by | noitalmost <noitalmost@cox.net> |
|---|---|
| First post | 2011-06-20 15:43 -0400 |
| Last post | 2011-06-29 15:51 -0700 |
| Articles | 20 on this page of 21 — 9 participants |
Back to article view | Back to comp.compilers
How to handle qualified identifiers such as x.y in a Pascal-like language noitalmost <noitalmost@cox.net> - 2011-06-20 15:43 -0400
Re: How to handle qualified identifiers such as x.y in a Pascal-like language torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-06-22 10:57 +0200
Re: How to handle qualified identifiers such as x.y in a Pascal-like language Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2011-06-22 11:47 +0100
Re: How to handle qualified identifiers such as x.y in a Pascal-like language George Neuner <gneuner2@comcast.net> - 2011-06-24 18:13 -0400
Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-06-29 12:31 -0700
Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-07-01 12:46 -0700
Re: How to handle qualified identifiers such as x.y in a Pascal-like language anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2011-07-02 17:13 +0000
Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-07-03 13:14 -0700
Re: How to handle qualified identifiers such as x.y in a Pascal-like language torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-07-07 10:27 +0200
Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-07-07 04:14 -0700
Re: How to handle qualified identifiers such as x.y in a Pascal-like language George Neuner <gneuner2@comcast.net> - 2011-07-02 16:58 -0400
Re: How to handle qualified identifiers such as x.y in a Pascal-like language Gene <gene.ressler@gmail.com> - 2011-06-22 19:21 -0700
Re: How to handle qualified identifiers such as x.y in a Pascal-like language Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2011-06-24 07:56 +0100
Re: How to handle qualified identifiers such as x.y in a Pascal-like language Gene <gene.ressler@gmail.com> - 2011-06-24 19:19 -0700
Re: How to handle qualified identifiers such as x.y in a Pascal-like language Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2011-06-25 11:55 +0100
Re: How to handle qualified identifiers such as x.y in a Pascal-like language noitalmost <noitalmost@cox.net> - 2011-06-29 13:13 -0400
Re: How to handle qualified identifiers such as x.y in a Pascal-like language "[Linux Magazine]" <uu3kw29sb7@snkmail.com> - 2011-06-24 13:58 +0200
Re: How to handle qualified identifiers such as x.y in a Pascal-like language George Neuner <gneuner2@comcast.net> - 2011-06-25 17:11 -0400
Re: How to handle qualified identifiers such as x.y in a Pascal-like language noitalmost <noitalmost@cox.net> - 2011-06-23 12:43 -0400
Re: How to handle qualified identifiers such as x.y in a Pascal-like language Tony Finch <dot@dotat.at> - 2011-06-29 18:55 +0100
Re: How to handle qualified identifiers such as x.y in a Pascal-like language BGB <cr88192@hotmail.com> - 2011-06-29 15:51 -0700
Page 1 of 2 [1] 2 Next page →
| From | noitalmost <noitalmost@cox.net> |
|---|---|
| Date | 2011-06-20 15:43 -0400 |
| Subject | How to handle qualified identifiers such as x.y in a Pascal-like language |
| Message-ID | <11-06-037@comp.compilers> |
Here's a silly example program in my language syntax. program P: var x : int; procedure s() : var x : int; procedure t() : var x : int; x := 3; s.x := x + 10; end t; x := 2; t(); P.x := x + 10; end s; x := 1; s(); print x; # should output 23 end P. I know how to handle the notation if the qualified identifier is a record, since there's no change in the stack frame. It's just an offset in the current frame. What I don't quite understand is how to parse access to a variable in an outer stack frame. And I think this is similar to the problem of nested procedures (which I also don't quite know how to handle). My parser is recursive descent and produces an AST. The AST consists of nodes such as Program, ProcDef, VarDef, WhileStmt, AssignStmt, Ident, Num, and ProcCall. The nodes ProcDef and VarDef are the nodes in the symbol tables, while Ident represents an identifier as used in a statement. Thus, an Ident node contains a pointer to a definition in a symbol table. That definition has a field which gives an offset relative to the current stack frame. A Program or a ProcDef contains a symbol table and a list of code statements. The interpreter walks the AST. When it encounters a node such as ProcDef, it creates a new stack frame, uses the procedure's symbol table to allocate local variables, and then walks its list of code statements. So when I encounter P.x := x + 10; in the above program, I get an AST fragment something like Assign( P.x, Add( x, 10) ) where x is an Ident and 10 is a Num. But what do I do with P.x? Should the parser do all the work and make P.x simply an Ident (whose symTabEntry points to the outermost x? Then Ident would need a frameCnt field so the interpreter knows how many stack frames to back out of. In the above example, frameCnt would be 1, while reference to a local variable would have frameCnt of 0. Is this a reasonable way to approach the problem? [Pretty much. The standard way to handle references to an enclosing scope is with a display, the calling procedure passes the enclosing stack frame addresses as hidden parameters. If your language doesn't allow variable sized declarations or recursion, P.x will be at a fixed distance below the current stack frame so you can address it directly, but in a more general case, you need the display. See any 1970s compiler text for details. -John]
[toc] | [next] | [standalone]
| From | torbenm@diku.dk (Torben Ægidius Mogensen) |
|---|---|
| Date | 2011-06-22 10:57 +0200 |
| Message-ID | <11-06-038@comp.compilers> |
| In reply to | #163 |
noitalmost <noitalmost@cox.net> writes: > What I don't quite understand is how to parse access to a variable in > an outer stack frame. And I think this is similar to the problem of > nested procedures (which I also don't quite know how to handle). Parsing qualified names is easy enough, so I think you mean how to get access to a variable in an outer frame. John mentioned displays, but I think it is easier to use static links. Displays can be slightly more efficient, though. Basically, the stack frame for a procedure g has a pointer (the static link) to the frame of the procedure f in which g is declared. Note that this is not necessarily the previous frame, as g can be called from a procedure declared locally in f or g. In the simplest case, all variable for f are stored in f's frame, so g can access these by following the static link to get to f's frame and then using an offset from this to find the variable. If you need to access a variable from a scope even further out than f, you first follow the static link to f's frame and then in f's frame find the static link to its outer scope and so on, until you find the right frame. Your compile-time symbol table should contain information about the scope depth and offset of any variable, so you (by comparing to the current scope depth) can see how many static links you must follow. If you register-allocate variables, you must make sure that variables declared in outer scopes are stored at known offsets in the frames of their procedure at the time the are accessed in inner scopes. This can be done by using a caller-saves calling convention: At a call, store all live variables in known offsets in the frame. Note that variables accessed in inner scopes should be considered live at the call site. If you use a mixed caller-saves/callee-saves convention, allocate all variables that are accessed in inner scope in caller-saves registers and make sure these are stored at fixed offsets in the frame. If using a pure callee-saves strategy, spill variables that are called from inner scopes to the frame, so they have a known location. An alternative is lambda lifting. This transforms the program so all procedures are global. Any variables from (previously) outer scopes that are acccessed inside a lifted procedure are now passed as extra reference parameters to the procedure. So if g is lifted out of f and accesses f's local variable x, g is given the address of x as an extra parameter. You need to handle reference parameters like Pascal's VAR parameters or the address-of (&) operator in C. Basically, any variable that gets its address taken in either way should be stored in the frame of its procedure. This can be done by spilling it. Variables that can not be modified after initialisation can be passed by value rather than reference. Torben
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@aol.com> |
|---|---|
| Date | 2011-06-22 11:47 +0100 |
| Message-ID | <11-06-039@comp.compilers> |
| In reply to | #163 |
noitalmost schrieb: > What I don't quite understand is how to parse access to a variable in > an outer stack frame. And I think this is similar to the problem of > nested procedures (which I also don't quite know how to handle). First of all: local procedures using local variables in an *outer* stackframe are problematic. Such a language feature deserves much work, in detail when you also want to allow to use local procedures as callbacks (procedure pointers). I'd drop that from my language. Otherwise John already explained how to pass a reference to outer stackspace(s) to the nested procedure, as an hidden parameter. But this added parameter will disallow to give away references to the local subroutine, because the caller then doesn't know about that parameter. DoDi
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-06-24 18:13 -0400 |
| Message-ID | <11-06-045@comp.compilers> |
| In reply to | #165 |
On Wed, 22 Jun 2011 11:47:19 +0100, Hans-Peter Diettrich <DrDiettrich1@aol.com> wrote: >noitalmost schrieb: >> What I don't quite understand is how to parse access to a variable in >> an outer stack frame. And I think this is similar to the problem of >> nested procedures (which I also don't quite know how to handle). > >First of all: local procedures using local variables in an *outer* >stackframe are problematic. Such a language feature deserves much >work, in detail when you also want to allow to use local procedures as >callbacks (procedure pointers). I'd drop that from my language. But doing so sacrifices quite a bit of power. In a Pascal-like language having strict stack semantics, the best way to handle non-local variables is to identify and move them all into a defined structure in the stack frame of the outermost enclosing function, then add a hidden pointer argument to all the inner functions and pass the address of the structure down the call chain. The same analysis allows for creating persistent closures by allocating the non-local variable structure on the heap instead of on the stack. [Of course, then you need other mechanisms to invoke the closures and clean up after them when they are no longer needed ... mechanisms which are beyond the scope of this discussion.] George
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@hotmail.com> |
|---|---|
| Date | 2011-06-29 12:31 -0700 |
| Message-ID | <11-07-004@comp.compilers> |
| In reply to | #171 |
On 6/24/2011 3:13 PM, George Neuner wrote: > On Wed, 22 Jun 2011 11:47:19 +0100, Hans-Peter Diettrich > <DrDiettrich1@aol.com> wrote: > >> noitalmost schrieb: >>> What I don't quite understand is how to parse access to a variable in >>> an outer stack frame. And I think this is similar to the problem of >>> nested procedures (which I also don't quite know how to handle). >> >> First of all: local procedures using local variables in an *outer* >> stackframe are problematic. Such a language feature deserves much >> work, in detail when you also want to allow to use local procedures as >> callbacks (procedure pointers). I'd drop that from my language. > > But doing so sacrifices quite a bit of power. > > In a Pascal-like language having strict stack semantics, the best way > to handle non-local variables is to identify and move them all into a > defined structure in the stack frame of the outermost enclosing > function, then add a hidden pointer argument to all the inner > functions and pass the address of the structure down the call chain. > > The same analysis allows for creating persistent closures by > allocating the non-local variable structure on the heap instead of on > the stack. > [Of course, then you need other mechanisms to invoke the closures and > clean up after them when they are no longer needed ... mechanisms > which are beyond the scope of this discussion.] well, I may elaborate, pardon if I am going outside the allowed scope here... one nifty trick I came up with in the context of my own stuff is: allow using GC (Garbage Collection) for machine-code thunks; create the actual function as accepting an extra argument to the closed-over binding frames (chained heap-allocated structs, also GC'ed); create a "closure object", which holds the captured frame, and re-calls the actual function with the frame reference added to the argument list. this process is fairly simple on x86 (with cdecl), but sadly gets very nasty with the SysV/AMD64 calling convention (among other things, making many such operations a good deal more expensive and error-prone than their 32-bit counterparts). even then, there are obvious deficiencies in my support of SysV/AMD64, for example, a lot of my code doesn't handle variable argument lists correctly, and technically the way literal structs are passed in my stuff is just wrong (struct passing rules are like in Win64, involving passing pointers to-be-copied memory, rather than decomposing them into registers). but, grr... I am still left to much wish Linux/... had adopted the Win64 ABI instead, or at least something less complicated to work with. I am left to consider later re-adopting an alternative calling convention in this case, using SysV/AMD64 mostly only for external interfacing (possible: AMD64 callee/caller-save status kept, but using Win64 argument passing, and using name-decoration or name-mangling). I technically supported closures in my C compiler (using the same syntax as GCC's nested functions), but this feature wasn't really used, since I mostly ended up using only "standard C friendly" features, or ones which could be emulated via macros. later, my C compiler has mostly fallen victim to bit-rot and my inability to effectively debug it (leaving me to mostly use native compilers for C). however, closures in the above form are still used, in a slightly less nice-looking form, in plain C. also they are used much more in my own BGBScript language, which is mostly in the same language family as JavaScript and ActionScript (and mostly conforms with ECMA-262). this language has closures as a default feature. the above mechanism (creating executable thunks) is mostly only used for external interfacing though, as most internal calls are via dynamically-typed "function objects", which are typically optimized some by caching an "apply handler" (such as in class vtables/...), which are called with the function-object and a pointer to the argument list or similar (calling into a native ABI, such as SysV/AMD64, generally involves "re-packing" the argument list at the last moment). note: it is technically possible to call pretty much anything which has an apply handler (note: most handlers are registered with type-specific vtables, and there are many other types of handlers as well for various operations). how much of this would make sense in others' projects, I don't really know... [This strikes me as all stuff the Lisp community figured out in about 1980. -John]
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@hotmail.com> |
|---|---|
| Date | 2011-07-01 12:46 -0700 |
| Message-ID | <11-07-006@comp.compilers> |
| In reply to | #180 |
On 6/29/2011 12:31 PM, BGB wrote: > how much of this would make sense in others' projects, I don't really > know... > [This strikes me as all stuff the Lisp community figured out in about 1980. > -John] well, sadly, I don't have any good sources of information related to VM implementation and design, as most of what I can find is fragmentary information on the internet (much of my VM technology is built up from trivia I had read about other VMs, or figured out myself as part of an ongoing process). my original exposure to how such a VM would work was gained mostly by looking over the source for GNU Guile, and throwing together a sort of makeshift clone (I think at the time I also looked at SIOD and PLT and others for ideas). most other information is unusably high-level (mostly focusing on high-level language features or "pie in the sky" language/computing/... concepts), rather than how to implement them effectively, or deal with more practical concerns (such as dealing calls to/from C or C++ code and with matters of sharing data and structures). actually, many people refer to a lot of these concerns derisively as "implementation details", which personally I find a bit frustrating. most information I can find on compiler technology often doesn't match well with my architecture (often focuses on statically-typed languages using an AST-driven or SSA-based processes). in my case I use a type-agnostic stack-machine, and handle types much later in the process, generally mixed with register allocation and emitting ASM. most other information I have found has been of commercial origin (mostly technical documentation for various products, ...). (decided to leave out a big description of the overall compilation process...). so, it is maybe fair enough that I don't know everything that others have done. too bad there is not a big wiki or similar dedicated to compiler and VM design and implementation topics, without being dedicated to a particular project or technology. say, if it would be sort of like Wikipedia, but much more technically-oriented and topic-based (rather than a random mis-mash of comments and "stream of consciousness" type stuff like on c2wiki...).
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2011-07-02 17:13 +0000 |
| Message-ID | <11-07-008@comp.compilers> |
| In reply to | #180 |
BGB <cr88192@hotmail.com> writes: >I am still left to much wish Linux/... had adopted the Win64 ABI >instead That would have been quite an achievement, because the Windows x64 ABI came out quite some time after Linux and others adopted the SysV ABI, so it would have required looking into the future. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@hotmail.com> |
|---|---|
| Date | 2011-07-03 13:14 -0700 |
| Message-ID | <11-07-011@comp.compilers> |
| In reply to | #184 |
On 7/2/2011 10:13 AM, Anton Ertl wrote: > BGB<cr88192@hotmail.com> writes: >> I am still left to much wish Linux/... had adopted the Win64 ABI >> instead > > That would have been quite an achievement, because the Windows x64 ABI > came out quite some time after Linux and others adopted the SysV ABI, > so it would have required looking into the future. > an idle wish doesn't need to be within the realm of "could have actually happened...". I am half still considering of using a custom calling-convention for internal calls. I decided to leave out a more detailed description of the considered register assignments, argument passing, ... and if/when I would do something like this is uncertain. I guess it is mostly just the annoyance that SysV is sufficiently complex that none of my reflective-call stuff interacting with it can do so particularly efficiently or correctly (lots of nasty cruft here). also, it seems to interact badly with my C coding practices in general. I suspect to work well, SysV/AMD64 expects functions accepting lots of arguments and with most of the execution/computation being in leaf functions. coding practices where most of the computation is in non-leaf functions and leaf-functions are generally short/trivial (such as performing a simple operation or returning a status value), ... seem to not be the ideal use-case. all this seems to lead to my code generally performing somewhat better on 32-bit x86 systems and on Win64. for example, my assembler seems to run about 2x slower on 64-bit Linux than on 64-bit Windows in my tests, ... I don't imagine my coding practices are all that novel though. however, other people have benchmarked other programs, and it seems many types of programs (apparently especially data-compression and encryption programs) seem to perform better on Linux x86-64 than on Win64 or on 32-bit systems. it generally seems to be the reverse for larger programs though (web browsers, games, office-type apps, ...) which seem to generally perform better on 32-bit systems and on Windows. granted, I can't claim any objectivity here, given my personal dislike (in general) of the SysV/AMD64 ABI... [It is my impression that a lot of languages that allow closures and the like end up with their own calling sequences. -John]
[toc] | [prev] | [next] | [standalone]
| From | torbenm@diku.dk (Torben Ægidius Mogensen) |
|---|---|
| Date | 2011-07-07 10:27 +0200 |
| Message-ID | <11-07-014@comp.compilers> |
| In reply to | #187 |
> [It is my impression that a lot of languages that allow closures and the > like end up with their own calling sequences. -John] Not surprising, since most vendor procedure call standards are targeted for C-like languages: Flat scope, single result, no exceptions, no tail call optimisation, and so on. Once you go beyond that you will need to either extend the calling standard or do something quite different. For example, most modern call standards are almost exclusively callee-saves: While some registers are designated as caller-saves, these are often only used for variables that are not live across any call, so they are never saved. Variables live across calls are all stored in callee-saves registers (or spilled). This makes both tail call optimisation and exceptions more costly. The first because you have to restore the callee-saves registers before the tail-call jump (unless it is to the same procedure, which is why many compilers only support tail recursion and not general tail calls) and the latter because the registers you have to restore at an exception will be stored in many different frames, so you will have to unwind the stack to restore them, where a pure caller-saves strategy will allow you to go directly to the frame of the exception handler and take the saved live registers from this. Sure, at leaf calls a callee-saves strategies might avoid saving registers at all, but inlining will get you the same (and other) benefits, though it will increase code size. Torben
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@hotmail.com> |
|---|---|
| Date | 2011-07-07 04:14 -0700 |
| Message-ID | <11-07-017@comp.compilers> |
| In reply to | #187 |
On 7/3/2011 1:14 PM, BGB wrote: <snip> > granted, I can't claim any objectivity here, given my personal dislike > (in general) of the SysV/AMD64 ABI... > [It is my impression that a lot of languages that allow closures and the > like end up with their own calling sequences. -John] fair enough, however, most calling conventions are not nearly so antisocial towards external interfacing. glueing code together with cdecl (AKA: SysV/i386 or similar) or Win64 is usually a relatively straightforward process (albeit there is the complexity in the Win64 case that simply stuffing a single argument onto the stack would lead to a misaligned stack, and one still has to generally know the number and types of arguments to adjust for this). SysV/AMD64 is relatively unique in requiring one to essentially rebuild the arguments list to make minor adjustments (making these tasks potentially a bit more expensive). but, yeah, there are merits potentially to devising ones own ABI. for example, there are many tasks that could be potentially done more efficiently, ... also, if the new ABI can be built on top of things that can be done in C, then it may sidestep having to do a bunch of nasty ASM-based cruft to make it work (a current downside of some of my present stuff). but, my reason for trying to stick so much to the C ABIs, is that it makes it generally much easier to interface things with C. meanwhile, my assembler (previously x86/x86-64 only) now also targets ARM. however, in general, ARM systems are giving me lots of trouble, making the prospect of actually testing the thing on/with ARM a bit more problematic at the moment.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-07-02 16:58 -0400 |
| Message-ID | <11-07-009@comp.compilers> |
| In reply to | #180 |
On Wed, 29 Jun 2011 12:31:40 -0700, BGB <cr88192@hotmail.com> wrote: <some interesting stuff about his own implementation of closure invocation and GC> >[This strikes me as all stuff the Lisp community figured out in about >1980. -John] Most of it much earlier - the only thing yet missing is any kind of standard call convention - apart from foreign calls (e.g., to C libraries) nearly every Lisp handles its internal function calling differently. Closure invocation was implemented in the very first Lisp implementation (1965?), as was reference counting GC. By 1970 tracing GC was rising and reference counting was falling out of favor. Dijkstra and Lamport's incremental software GC and the Lisp-2 incremental GC which used VMM page protection to implement the tri-color abstraction, both debuted in 1967. Cheney described his semi-space copying collector in 1970. Unfortunately, I started this by mentioning closures in my post. I didn't intend to clutter the thread with concepts that are too advanced for the newbie OP, but rather just to indicate that the method I was advocating to solve his problem had utility beyond his current needs. George
[toc] | [prev] | [next] | [standalone]
| From | Gene <gene.ressler@gmail.com> |
|---|---|
| Date | 2011-06-22 19:21 -0700 |
| Message-ID | <11-06-040@comp.compilers> |
| In reply to | #163 |
On Jun 20, 3:43 pm, noitalmost <noitalm...@cox.net> wrote: > What I don't quite understand is how to parse access to a variable in > an outer stack frame. And I think this is similar to the problem of > nested procedures (which I also don't quite know how to handle). P.x and x have different scopes, which in this case are obviously nested. There are two separate issues: 1) How to represent different, potentially nested scopes in a symbol table and 2) what code to generate. As John says, any compiler text will deal with both. For 1) perhaps the simplest to implement is a chain of dictionaries. Every time a scope opens, create a new dictionary, chain it onto the current one, and start using the new. When a scope closes, throw away the current dictionary and resume using the next in the chain. In your sample program's case, there will be a dictionary for variables of program (P) scope and then one for each function. Always the P dictionary will be the last in the chain. Nested functions will each add to this chain, one per nesting level. Insertions occur for each declaration, always in the current dictionary. Lookups of plain identifiers check each dictionary in the chain until the id is found. The number of chain links followed before finding the id is important. Stay tuned. When the language allows (as does yours) explicit scope paths, each dictionary D should also be paired with its identifier id(D) for the scope that caused it to be created, here function or program name. The lookup of a qualified id reference then proceeds right-to-left. In your example to process s.x, work along the dictionary chain looking for the x. When it's found (in this case immediately) in a given dictionary D, match id(D) with s. If the path is longer, say P.s.x, you'd continue along the chain matching dictionary id's with path elements. If the whole path matches, you've found the right x. Otherwise keep looking for x's further down the chain. When generating code, each function call creates an "activation record" on the stack to hold parameters and local variables. The simplest way to implement static scopes is with static links. A SL is just a pointer at the same position in each activation record that points to the previous activation of the statically enclosing function. (It's the norm to have SL's point to the same location in the AR as the frame pointer would.) It turns out that when you compile a function call, the code that computes the new AR's static link must dereference the link of the current AR N times where N>=0 is the number of symbol table links needed to find the name of the called function during parsing. (N=0 corresponds to the current frame pointer. N=1 is the current AR's SL, N=2 is contents of current AR's SL, etc.) Cool or what? In this manner, static links always form a linked list of ARs with one AR per scope in the static nesting of the currently running function. So when you generate code for an identifier reference x, it will again need to dereference the current static link N times where N>=0 was the number of links needed to find x in the symbol table. If you read this rather confusing recipe a few times, it will at some point click and you'll see the beauty. If you need a picture, consult the aforementioned compiler or programming languages textbook. The "display" method John is talking about merely replaces the static links between ARs with what is effectively an array of pointers to ARs. Displays can be more efficient than static links, especially when functions are deeply nested and/or when the compiler is free to put oft-used display elements in registers, but implementing them correctly may be trickier, especially if your language allows function pointers and/or threads.
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@aol.com> |
|---|---|
| Date | 2011-06-24 07:56 +0100 |
| Message-ID | <11-06-043@comp.compilers> |
| In reply to | #166 |
Gene schrieb: > When the language allows (as does yours) explicit scope paths, each > dictionary D should also be paired with its identifier id(D) for the > scope that caused it to be created, here function or program name. The > lookup of a qualified id reference then proceeds right-to-left. This IMO is the wrong way. Structured data types require left-to-right evaluation of the qualified name, i.e. for P.x an identifier P would be searched first, followed by a search for x in P's *local* dictionary. This is problematic with the intended approach, to refer to the *current* module by its name (P), because the dictionary of P will already be in the scope-stack, and this one does *not normally* include the identifier P itself. Instead another (synthetic) outer scope would be required, that only contains the P identifier, which refers back to P's scope. Then the search for an global identifier x would end up immediately in P's scope, of module-level identifiers, while the qualified search had to go one entry deeper, to resolve identifier P, which then refers to the (already traversed) P scope. That's why e.g. C++ uses (AFAIR) "::" to denote the global (static) scope, not the name of any module. This allows to start the search for the following identifier(s) immediately in the correct scope, with no chance for possible mismatches with identifiers P in some other active scope. In ObjectPascal (Delphi, FPC) every module (unit) has a public scope. When a unit is "used", its public scope is pushed onto the scope stack. This makes available all public external identifiers without qualification. Newer Wirthian languages instead require full qualification for references to external symbols, what reduces the number of scopes to be searched, and eliminates errors that otherwise can occur when the sequence of the used units is changed, for some reason. > In your example to process s.x, work along the dictionary chain > looking for the x. When it's found (in this case immediately) in a > given dictionary D, match id(D) with s. If the path is longer, say > P.s.x, you'd continue along the chain matching dictionary id's with > path elements. If the whole path matches, you've found the right x. > Otherwise keep looking for x's further down the chain. This would require that *all* local dictionaries are in the search path, what IMO doesn't make sense (performance wise), because a full match of s.x only can occur in the s dictionary, regardless of how many other scopes contain an x. DoDi
[toc] | [prev] | [next] | [standalone]
| From | Gene <gene.ressler@gmail.com> |
|---|---|
| Date | 2011-06-24 19:19 -0700 |
| Message-ID | <11-06-046@comp.compilers> |
| In reply to | #169 |
On Jun 24, 2:56 am, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
> Gene schrieb:
>
> > When the language allows (as does yours) explicit scope paths, each
> > dictionary D should also be paired with its identifier id(D) for the
> > scope that caused it to be created, here function or program name. The
> > lookup of a qualified id reference then proceeds right-to-left.
>
> This IMO is the wrong way. Structured data types require left-to-right
> evaluation of the qualified name, i.e. for P.x an identifier P would
> be searched first, followed by a search for x in P's *local*
> dictionary.
Certainly there are lots of ways to implement. The one I proposed is
not the very best for speed. It is rather good for space, though, and
it's beautifully simple. Your proposal here though is I think
incorrect. If we have
procedure P:
var x;
procedure P:
var x;
... P.x ...
The P.x reference would conventionally be associated with the inner
x. If you start searching for P.x outer scope first, you get the
wrong one.
> This is problematic with the intended approach, to refer to the
> *current* module by its name (P), because the dictionary of P will
> already be in the scope-stack, and this one does *not normally*
> include the identifier P itself.
You're right about P not being in its own dictionary. Each dictionary
lies in a tuple (the pair I spoke of), where the other element is the
identifier of the dictionary's scope. The identifier will also appear
in the dictionary for the enclosing scope. The program identifier
lies in no dictionary at all because it has no enclosing scope. With
this convention I can't see a problem as long as procedure/program
names inhabit the same name space as variables. Maybe an example
would help me see your concern.
I have actually used this technique to implement an assembler with
scopes, and it worked quite well.
> That's why e.g. C++ uses (AFAIR) "::" to denote the global (static)
> scope, not the name of any module. This allows to start the search for
> the following identifier(s) immediately in the correct scope, with no
> chance for possible mismatches with identifiers P in some other active
> scope.
You're right, but this language isn't C++. Nested procedures are
allowed, and the s.x example shows that complete paths are not
required.
> This would require that *all* local dictionaries are in the search path,
> what IMO doesn't make sense (performance wise), because a full match of
> s.x only can occur in the s dictionary, regardless of how many other
> scopes contain an x.
The dictionaries in the path are those for the static scopes currently
open for parsing. When parsing a procedure nested N deep, you have N
dictionaries. In a realistic program, N is often 1, maybe 2, seldom
3, almost never 4... In my experience id lookups are a trivial cost
of processing compared to once-per-character operations like scanning.
And in practice the most frequent case by far is simple id's in the
inner scope, which need only one dictionary lookup. So you'd be
unlikely to find this algorithm is a bottleneck.
Thanks for the discussion.
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@aol.com> |
|---|---|
| Date | 2011-06-25 11:55 +0100 |
| Message-ID | <11-06-047@comp.compilers> |
| In reply to | #172 |
Gene schrieb:
> procedure P:
> var x;
> procedure P:
> var x;
> ... P.x ...
>
> The P.x reference would conventionally be associated with the inner
> x. If you start searching for P.x outer scope first, you get the
> wrong one.
I dare to disagree. When P.x would refer to the inner x, how then would
you refer to the outer x?
Also: how would you call the inner P, from inside the outer P?
More general, how to distinguish between an recursive call, and a call
of a local subroutine, when both have the same name?
IMO qualifiers should allow to address identifiers in *external* (here:
outer) scopes, which otherwise would be hidden by a local identifier of
the same name.
A more elaborate example:
program P;
var P;
var x;
procedure P;
var P: RecordContainingX;
var x;
procedure P;
var x;
... P.x ...
... P.P.x ...
end; //inner P
... P.x ... //x in var P or in outer procedure P?
... P() ... //recursive or local procedure call?
Some general questions:
Should we allow for nested scopes of equal names at all?
(procedure P inside program P and procedure P)
Should we allow for local names, duplicating the scope name?
(var P inside program P or procedure P)?
How to return values from functions?
(by assignment to the function name (Pascal), or to a Result variable
(Delphi), or by explicit "return <value>" statement (C) )
How to distinguish, if ever, between partial (relative) and fully
qualified (absolute) references? Relative to what?
IMO a name in an inner scope hides identifiers of the same name, in all
outer scopes. This should be perfectly allowed.
But duplicate names inside the same scope should be disallowed. This
will make a difference, depending on whether the scope name is part of
its own scope (as you suggest for P.x), or whether the name of a scope
is a member of its parent scope (with inner procedure P being a member
of the outer procedure P).
>> This is problematic with the intended approach, to refer to the
>> *current* module by its name (P), because the dictionary of P will
>> already be in the scope-stack, and this one does *not normally*
>> include the identifier P itself.
>
> You're right about P not being in its own dictionary. Each dictionary
> lies in a tuple (the pair I spoke of), where the other element is the
> identifier of the dictionary's scope. The identifier will also appear
> in the dictionary for the enclosing scope. The program identifier
> lies in no dictionary at all because it has no enclosing scope. With
> this convention I can't see a problem as long as procedure/program
> names inhabit the same name space as variables. Maybe an example
> would help me see your concern.
When we have these nested socpes:
program P
procedure P
procedure P
procedure P
what (unique!?) scope ID should be assigned to each scope?
IMO this example only reveals, that nested scopes of same names tend to
*fully* hide outer scopes of the same name. I assume that you want to
bypass this restriction, by qualifier evalution from right to left. I.e.
P.x would address an x in the nearest enclosing scope P, while P.P.x
would refer to an x in the next *outer* P scope. Right?
Then an implementation had to search for in identifer x first and, when
found, successively match the next qualifiers with the names of the
containing and enclosing scopes. Very nice, indeed :-)
But then I wonder how such a resolution will fit together with (nested)
scopes of structured types. Imagine a record variable R, then R.x will
refer to the immediate member x of this record. But then R.x.y has to
refer to member y of the scope of x, what IMO means that here the
qualifiers *must* be evaluated left-to-right.
A more concrete example:
type r1 = record x:integer ... end;
type r2 = record x:r1 ... end;
type r3 = record x:r1 ... end;
var R:r2;
... R.x.x ...
Where would you place the scopes for the records, so that their "x"s are
found when starting qualifier resolution from the right? And finally,
when x.x has been matched in scope r2, how to resolve the next qualifier
"R"?
> I have actually used this technique to implement an assembler with
> scopes, and it worked quite well.
Did your assembler include structured types, and if so, how did you
resolve beforementioned situation?
>> That's why e.g. C++ uses (AFAIR) "::" to denote the global (static)
>> scope, not the name of any module. This allows to start the search for
>> the following identifier(s) immediately in the correct scope, with no
>> chance for possible mismatches with identifiers P in some other active
>> scope.
>
> You're right, but this language isn't C++. Nested procedures are
> allowed, and the s.x example shows that complete paths are not
> required.
Incomplete (relative) paths IMO deserve a unique evaluation order,
either RTL or LTR. Otherwise a different syntax must be used, so that
the starting point can be found out, from which evaluation can proceed
to both ends, e.g. P<-P<-R->x->y.
>> This would require that *all* local dictionaries are in the search path,
>> what IMO doesn't make sense (performance wise), because a full match of
>> s.x only can occur in the s dictionary, regardless of how many other
>> scopes contain an x.
>
> The dictionaries in the path are those for the static scopes currently
> open for parsing. When parsing a procedure nested N deep, you have N
> dictionaries. In a realistic program, N is often 1, maybe 2, seldom
> 3, almost never 4... In my experience id lookups are a trivial cost
> of processing compared to once-per-character operations like scanning.
> And in practice the most frequent case by far is simple id's in the
> inner scope, which need only one dictionary lookup. So you'd be
> unlikely to find this algorithm is a bottleneck.
ACK, the order is O(n) for both strategies. My only objection addresses
the handling of structured types, in your model.
BTW, I once reduced the order of the search to almost O(1), by "merging"
new scopes into a single scope, with a stack for all visible
identifiers, and a hash table. The hashtable refers, after resolution of
name clashes, to the topmost (visible) identifier. When a scope is
closed, it's entries are popped off the stack, and the hashtable is
updated with the references to the next outer (now unidden) identifiers,
from a reference stored in every popped entry. It should be possible to
add qualified references into outer scopes to this model, somehow - my
implementation was for a C parser with no such requirement.
> Thanks for the discussion.
Thanks2 for the inspiration :-)
DoDi
[toc] | [prev] | [next] | [standalone]
| From | noitalmost <noitalmost@cox.net> |
|---|---|
| Date | 2011-06-29 13:13 -0400 |
| Message-ID | <11-06-050@comp.compilers> |
| In reply to | #173 |
On Saturday, June 25, 2011 06:55:06 am Hans-Peter Diettrich wrote: > IMO qualifiers should allow to address identifiers in *external* (here: > outer) scopes, which otherwise would be hidden by a local identifier of > the same name. For my language, this is my intention. > A more elaborate example: > > program P; > var P; > var x; > procedure P; > var P: RecordContainingX; > var x; > procedure P; > var x; > ... P.x ... > ... P.P.x ... > end; //inner P > ... P.x ... //x in var P or in outer procedure P? # should be x in the record var P > ... P() ... //recursive or local procedure call? # local call My language's scopes for this program would have the following symbols: global scope: P # the program program P's scope: P, x # variables P # procedure procedure P's scope: P # record containing x x P # procedure inner procedure P's scope: x Thanks for this example. It has helped me to consider some additional cases. I never thought about the recursive vs. local call of P() in the above. I'm parsing the dots left to right, which seems like it will give the results I desire.
[toc] | [prev] | [next] | [standalone]
| From | "[Linux Magazine]" <uu3kw29sb7@snkmail.com> |
|---|---|
| Date | 2011-06-24 13:58 +0200 |
| Message-ID | <11-06-044@comp.compilers> |
| In reply to | #166 |
A follow up to the discussion on displays and linked list of stack frames. Many languages allow for storing references to procedures in variables, or passing procedures as parameters. One example is the GNU variant of C and C++ implemented in GCC, which allows for both AND nested procedures. If thread starters language include storing references to procedures and/or passing procedures as parameters, then thread starter needs to consider, how to store references to procedures, and he better consider it now. When using linked lists of stack frames, the normal way of storing references to procedures is to store a reference to the start of the code implementing the procedure and a reference to the start of the linked list of stack frames. Thus you have a record with two fields. When using displays you have the problem of storing the display somewhere. In some cases GCC uses something like trapezes for storing references to procedures without having a record with two fields. Unfortunately that involves storing code in writable storage, and that is a big no no for security reasons. Storing code in writable storage makes life so much easier for hackers. In practice people do not write procedures in procedures that much, and procedures in procedures in procedures are rare. This also means that you need efficient access to global variables and variables local to the procedure currently being executed. Further the access to variables declared in the directly surrounding scope should be fairly efficient. The rest just needs to work. You will do fine with an implementation where all procedures have direct access to its own variables and global variables, and variables declared in surrounding procedures may be more difficult to access.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-06-25 17:11 -0400 |
| Message-ID | <11-06-048@comp.compilers> |
| In reply to | #170 |
On Fri, 24 Jun 2011 13:58:23 +0200, "[Linux Magazine]" <uu3kw29sb7@snkmail.com> wrote: >In practice people do not write procedures in procedures that much, and >procedures in procedures in procedures are rare. Nested procedures and 1st class closures are very common in functional languages. C and its current popular variants don't permit them (though C++0x has added anonymous lambda) and so, IMO, too many programmers have no experience of their power. 1st class closures are more general and more powerful than most object systems - closures permit the construction of ad hoc objects that can bind arbitrary functions to arbitrary data without the need for class definitions or template objects. >This also means that you need efficient access to global variables and >variables local to the procedure currently being executed. Further the >access to variables declared in the directly surrounding scope should be >fairly efficient. This goes without saying. However, it's fairly easy to arrange fast access to nonlocals even in scopes several nest levels distant. As has been mentioned, display is the canon method. Displays, though, have a problem dealing with deep nesting ... which, admittedly is rare, but not unheard of. Typically, the compiler will allow only so many nesting levels and if you exceed that number, the compilation fails. Closure conversion - aka lambda lifting - is IMO a better method because it can handle any nesting depth. It takes a bit more analysis during compilation, but it is more flexible than a display. >The rest just needs to work. You will do fine with an implementation >where all procedures have direct access to its own variables and global >variables, and variables declared in surrounding procedures may be more >difficult to access. Yes ... asymmetric access is acceptable for an academic exercise, but not for a compiler that you would want others to use. I know the OP is a novice compiler developer, but let's advocate learning general solutions that will continue to work for more advanced projects in the future. George
[toc] | [prev] | [next] | [standalone]
| From | noitalmost <noitalmost@cox.net> |
|---|---|
| Date | 2011-06-23 12:43 -0400 |
| Message-ID | <11-06-042@comp.compilers> |
| In reply to | #163 |
On Monday, June 20, 2011 03:43:24 pm John wrote: >> Is this a reasonable way to approach the problem? > [Pretty much. The standard way to handle references to an enclosing > scope is with a display, the calling procedure passes the enclosing > stack frame addresses as hidden parameters. If your language doesn't > allow variable sized declarations or recursion, P.x will be at a fixed > distance below the current stack frame so you can address it directly, > but in a more general case, you need the display. See any 1970s > compiler text for details. -John] My language does allow recursion, so my approach won't work as written. So the AST node for an Ident should have a field for enclosing scope levels, not stack levels? I looked at Aho's description of displays. Currently, my interpreter is using a more abstract stack. It's a stack of pointers, so the first declared variable in a scope goes at frame offset 0, the second at offset 1, etc. I was thinking of dedicating offset 0 to be a pointer to all the bookkeeping info, such as enclosing scope pointers. Will this work as my language matures, or are there some glaring gotchas? [You more or less need a static place to pick up a pointer to the stack frame for the current invocation of each lexically enclosing routine. I don't see any reason that pointer to the display in a fixed place wouldn't work. It's covered in detail in any compiler text written when people still worried about compiling Pascal. -John]
[toc] | [prev] | [next] | [standalone]
| From | Tony Finch <dot@dotat.at> |
|---|---|
| Date | 2011-06-29 18:55 +0100 |
| Message-ID | <11-07-002@comp.compilers> |
| In reply to | #168 |
noitalmost <noitalmost@cox.net> wrote: > >I looked at Aho's description of displays. Currently, my interpreter >is using a more abstract stack. It's a stack of pointers, so the first >declared variable in a scope goes at frame offset 0, the second at >offset 1, etc. I was thinking of dedicating offset 0 to be a pointer >to all the bookkeeping info, such as enclosing scope pointers. Will >this work as my language matures, or are there some glaring gotchas? The paper on the implementation of Lua 5.0 describes how it deals with references to variables in outer scopes. http://www.lua.org/doc/jucs05.pdf Lua keeps book-keeping information outside the main evaluation stack; there's a parallel stack of procedure invocation records, and function closures are stored on the heap. Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.compilers
csiph-web