Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #562 > unrolled thread

Good practical language and OS agnostic text?

Started bycompilers@is-not-my.name
First post2012-04-17 21:28 +0000
Last post2012-06-06 16:52 +0000
Articles 13 on this page of 53 — 20 participants

Back to article view | Back to comp.compilers


Contents

  Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-17 21:28 +0000
    Re: Good practical language and OS agnostic text? Philip Herron <redbrain@gcc.gnu.org> - 2012-04-18 14:25 +0100
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 16:32 +0000
        Re: Good practical language and OS agnostic text? arnold@skeeve.com (Aharon Robbins) - 2012-04-20 03:58 +0000
          Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-22 10:10 +0000
        Re: Good practical language and OS agnostic text? "BartC" <bc@freeuk.com> - 2012-04-20 09:45 +0100
          Re: Good practical language and OS agnostic text? "Jonathan Thornburg" <jthorn@astro.indiana.edu> - 2012-04-21 15:04 +0000
    Re: Good practical language and OS agnostic text? BGB <cr88192@hotmail.com> - 2012-04-18 08:39 -0700
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 17:32 +0000
    Re: Good practical language and OS agnostic text? Alain Ketterlin <alain@dpt-info.u-strasbg.fr> - 2012-04-18 18:24 +0200
      Re: Good practical language and OS agnostic text? Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-04-19 13:53 +0200
        Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-21 03:07 +0000
        Re: Good practical language and OS agnostic text? "BartC" <bc@freeuk.com> - 2012-04-21 12:01 +0100
          Re: code quality, was Good practical language and OS agnostic text? Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-04-22 12:41 +0200
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 11:31 +0000
        Re: Good practical language and OS agnostic text? "Jonathan Thornburg" <jthorn@astro.indiana.edu> - 2012-04-20 16:19 +0000
    Re: Good practical language and OS agnostic text? "Derek M. Jones" <derek@knosof.co.uk> - 2012-04-18 18:16 +0100
      Re: Good practical language and OS agnostic text? glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-04-18 22:43 +0000
        Re: Good practical language and OS agnostic text? BGB <cr88192@hotmail.com> - 2012-04-19 00:05 -0700
        Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 11:31 +0000
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 16:32 +0000
    Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-18 19:30 +0000
      Re: Good practical language and OS agnostic text? "BartC" <bc@freeuk.com> - 2012-04-19 18:43 +0100
    Re: Good practical language and OS agnostic text? glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-04-18 20:29 +0000
      Re: Good practical language and OS agnostic text? Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-04-19 14:20 +0200
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 19:05 +0000
        Re: Good practical language and OS agnostic text? Uli Kusterer <ulimakesacompiler@googlemail.com> - 2012-04-21 11:30 +0200
    Re: Good practical language and OS agnostic text? Roberto Waltman <usenet@rwaltman.com> - 2012-04-18 22:00 -0400
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-19 11:31 +0000
        Re: Good practical language and OS agnostic text? glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-04-20 07:02 +0000
          Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-22 11:10 +0000
            Re: Good practical language and OS agnostic text? glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2012-04-22 23:56 +0000
          Re: PL/360, was Good practical language and OS agnostic text? ArarghMail204@Arargh.com - 2012-04-24 19:13 -0500
    Re: Good practical language and OS agnostic text? Bakul Shah <usenet@bitblocks.com> - 2012-04-18 21:15 -0700
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-20 16:06 +0000
    Re: Good practical language and OS agnostic text? torbenm@diku.dk (Torben Ægidius Mogensen) - 2012-04-19 14:58 +0200
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-20 16:06 +0000
    Re: Good practical language and OS agnostic text? "Joe Schmo" <askmeforit@myisp.com> - 2012-04-21 02:53 -0600
      Re: Writing parsers, was Good practical language and OS agnostic text? Uli Kusterer <ulimakesacompiler@googlemail.com> - 2012-04-22 16:18 +0200
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-23 19:12 +0000
    Re: Good practical language and OS agnostic text? Uli Kusterer <ulimakesacompiler@googlemail.com> - 2012-04-21 11:22 +0200
      Re: Good practical language and OS agnostic text? BGB <cr88192@hotmail.com> - 2012-04-21 18:58 -0700
        Re: writing interpreters, was Good practical language and OS agnostic text? Uli Kusterer <ulimakesacompiler@googlemail.com> - 2012-04-22 12:53 +0200
          Re: writing interpreters, was Good practical language and OS agnostic text? BGB <cr88192@hotmail.com> - 2012-04-22 12:29 -0700
        Re: generating bytecode, was Good practical language and OS agnostic text? Uli Kusterer <ulimakesacompiler@googlemail.com> - 2012-04-22 13:12 +0200
        Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text? "BartC" <bc@freeuk.com> - 2012-04-22 12:51 +0100
          Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text? "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> - 2012-04-22 18:18 +0200
            Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text? "Bartc" <bartc@freeuk.com> - 2012-04-23 10:59 +0100
          Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text? BGB <cr88192@hotmail.com> - 2012-04-22 13:45 -0700
      Re: Good practical language and OS agnostic text? compilers@is-not-my.name - 2012-04-22 22:11 +0000
        Re: Good practical language and OS agnostic text? "BartC" <bc@freeuk.com> - 2012-04-23 18:41 +0100
    Re: Good practical language and OS agnostic text? basile@starynkevitch.net - 2012-05-02 22:16 -0700
      Re: Good practical language and OS agnostic text? Johann 'Myrkraverk' Oskarsson <johann@2ndquadrant.com> - 2012-06-06 16:52 +0000

Page 3 of 3 — ← Prev page 1 2 [3]


#599

FromUli Kusterer <ulimakesacompiler@googlemail.com>
Date2012-04-21 11:22 +0200
Message-ID<12-04-056@comp.compilers>
In reply to#562
On 17.04.2012, at 23:28, compilers@is-not-my.name wrote:
> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details. I want to understand the hows
> and whys of compiler writing.

Hi,

 I usually do C-style programming languages, and I'm by no means a
professional compiler writer, but I enjoy parsers, so I've been doing
stuff like this for ages and sponging up any bit of knowledge that
sounded useful.

 IMO if you know assembler or BASIC and general algorithms (i.e. you
could implement a binary tree and walk its nodes), and you can somehow
figure out what bytes your code gets compiled to (at worst by writing
a dummy assembler program and looking at what bytes show up between a
bunch of nop instructions whose bytes you know), you should be able to
at least get a basic working knowledge of how a compiler works. Just
write the naove approach of how you would design any program.

 The things that I benefited most from learning about compilers were:

- Recursive descent parsers: It's the obvious way to write a parser.
You write a function ReadProgram(), which calls ReadLine() in a loop,
which looks at the first word and then calls ReadIfStatement() if it's
"if" etc. If you find you go wrong, you add a way to either get back
to the last known good state (backtracking) or you just give an error
message. On the way you keep track of the variables in each function
and can later calculate their stack offsets once you know how many you
need etc.

- Tokenizing: Essentially grab all the words in your source text and
build an array with an entry for each so you can more quickly walk
forward and backward without having to scan individual characters. You
can also attach additional information to a token, i.e. an ID number
so you can quickly distinguish user-defined identifiers from known
identifiers like "if" or "repeat". Keep around a token's start offset
so you can report errors.

- Syntax tree: Build a tree structure that represents the parsed program. You
can then do things like evaluate constant nodes ahead of time (e.g. turn 5+4
into 9, but leave 5 + n as an addition node with a 5 and a reference to
variable 'n') by replacing groups of nodes with simpler nodes recursively.
Keep around the line number/first token offset in each node so you can report
errors.

- Code generation: A good starting point I've found is to generate a single
function. Write a little application that takes a file containing the raw
bytes of your function, load the pointer (mark it as executable if your
platform requires that), and then just call it. You write the prolog, the
epilog, and the raw instructions in between, and maybe do some labels. You may
have to remember the offset of a particular field and fill in the address
later on (e.g. for a return statement the offset to the epilog, so you don't
duplicate your clean-up code). Then advance to two functions that call each
other in the same block etc.

- Complete code generation: you generate several functions and put them in a
proper executable file, the way your OS expects. You may have to generate link
table entries ("trampolines") to call dynamically-linked system functions etc.
If you want to start simpler, write your own primitive linker just for the fun
of seeing how two functions that don't reside at fixed (relative) addresses
could call each other.

I can't really say anything about optimization on the compiled code level. I
suppose you'd build a data structure with additional information about each
instruction and then loop over it, looking for patterns that you know can be
simplified. Essentially the same as any other search-and-replace.

Is that un-computer-sciencey enough? This blog post may help with the basics
of my code generation approach:
http://orangejuiceliberationfront.com/generating-machine-code-at-runtime/ (but
it's C, and Intel, and badly wrapped by the new theme of my blog).

Cheers,
-- Uli Kusterer
http://stacksmith.com
[This is a really good summary. Thanks! -John]

[toc] | [prev] | [next] | [standalone]


#603

FromBGB <cr88192@hotmail.com>
Date2012-04-21 18:58 -0700
Message-ID<12-04-060@comp.compilers>
In reply to#599
On 4/21/2012 2:22 AM, Uli Kusterer wrote:
> On 17.04.2012, at 23:28, compilers@is-not-my.name wrote:
>> Guys, I'm having a bear of a time finding a good practical language
>> and OS agnostic text on writing a compiler. I'm weak in math and not
>> interested in the theoretical details. I want to understand the hows
>> and whys of compiler writing.
>
> Hi,
>
>   I usually do C-style programming languages, and I'm by no means a
> professional compiler writer, but I enjoy parsers, so I've been doing
> stuff like this for ages and sponging up any bit of knowledge that
> sounded useful.

agreed.

I am mostly implemented my own custom scripting language (for my own
uses), and wrote a C compiler before, although the C compiler "decayed"
mostly into being a code-processing tool (mostly as maintaining a full C
compiler turned out to be "not terribly useful" in my use-case).


as-is, the scripting language is probably at a vaguely similar
complexity level with C (in some ways it is simpler, and in others
likely more complex), as it gradually moves in the direction of being
"essentially" statically-typed, has pointers, class/instance OO, ...


(decided to leave out a bunch of stuff related to some of the hassles of
static type-checking, and dealing with the matter of having a scoping
model where the scope is both mutable and involves indirect visibility,
meaning that variable assignments can alter what bindings may be
potentially visible from a given piece of code, making determining all
this statically a relatively non-trivial problem).


I have recently been looking some at moving more logic for doing some of
this from the VM back-end into the front-end, as there are likely some
cases that the front-end can deal with more easily but for which it
would be a hassle to add to the back-end (at the cost of adding a big
pile more VM bytecode ops).

now, how is any of this simpler than a C compiler?
at least at present my main target is an interpreter, and weak
performance is generally passable.


>   IMO if you know assembler or BASIC and general algorithms (i.e. you
> could implement a binary tree and walk its nodes), and you can somehow
> figure out what bytes your code gets compiled to (at worst by writing
> a dummy assembler program and looking at what bytes show up between a
> bunch of nop instructions whose bytes you know), you should be able to
> at least get a basic working knowledge of how a compiler works. Just
> write the naove approach of how you would design any program.

yep.


>   The things that I benefited most from learning about compilers were:
>
> - Recursive descent parsers: It's the obvious way to write a parser.
> You write a function ReadProgram(), which calls ReadLine() in a loop,
> which looks at the first word and then calls ReadIfStatement() if it's
> "if" etc. If you find you go wrong, you add a way to either get back
> to the last known good state (backtracking) or you just give an error
> message. On the way you keep track of the variables in each function
> and can later calculate their stack offsets once you know how many you
> need etc.

although I use recursive descent, the above sounds different from what I
usually do.


the general structure of my parsers is more like:
ReadBlock: Reads statements until EOF or '}' is seen;
   calls ReadBlockStatement in a loop.

ReadBlockStatement:
   Peeks token;
   Sees if it is a known block-statement keyword (if/for/while/...):
     Invokes appropriate parsing logic for each keyword.
   Try to determine if it is a declaration:
     See if parsing as a declaration works.
   call ReadStatement
   call EatSemicolor

ReadStatement:
   checks for and handles vaious statement types
   calls ReadExpression

ReadExpression:
   (actually, this is a tower of functions, one for each precedence
level, working from lowest to highest precedence)
   typically (not all precedence levels are exactly the same):
     read expression at next lower level (n1)
     while(next token is a recognized operator)
       read operator token
       read expression at next lower level (n2)
       replace n1 with 'n1 <op> n2'
   (eventually) call ReadLiteral

ReadLiteral:
   peek token;
   dispatch to appropriate handler if a keyword;
   checks for token types, parsing out whatever:
     numbers become numeric literals;
     identifiers become loads;
     string tokens become string literals;
     ...


typically, the result of all of this is to produce a syntax tree.

C is a little uglier here, since it requires keeping track of typedefs
and checking for typedef when trying to parse declarations.


> - Tokenizing: Essentially grab all the words in your source text and
> build an array with an entry for each so you can more quickly walk
> forward and backward without having to scan individual characters. You
> can also attach additional information to a token, i.e. an ID number
> so you can quickly distinguish user-defined identifiers from known
> identifiers like "if" or "repeat". Keep around a token's start offset
> so you can report errors.

partly agreed, except that it isn't really strictly necessary to build
an array up-front.

most of my parsers don't bother using an array, but instead just call
the tokenizer function directly to peek and parse tokens based on the
current "stream position" (generally a raw char pointer in my language
parsers).

the version used in my C compiler currently uses a small hash table
(sort of, the "hash function" is simply to keep only the low-order bits
of the address) to avoid re-reading the same token multiple times.


practically though, it may be both higher performance and generally
cleaner to just build a token array up-front.

I had considered using a table in the C compiler, but the "hash" option
was less effort (the parser was already written in this case, so the
hash could be added via a quick hack, rather than by making significant
changes to the rest of the parser to use an token array instead of
direct function calls).

a potential advantage of not using an array though is not needing to
allocate memory for it.


> - Syntax tree: Build a tree structure that represents the parsed program. You
> can then do things like evaluate constant nodes ahead of time (e.g. turn 5+4
> into 9, but leave 5 + n as an addition node with a 5 and a reference to
> variable 'n') by replacing groups of nodes with simpler nodes recursively.
> Keep around the line number/first token offset in each node so you can report
> errors.

agreed.

in my case, I have generally used either "lists" built from "cons cells"
for this purpose, or in other cases a variant of XML DOM nodes.

I had actually thought this was fairly standard practice, until looking
into it and observing that, no, apparently most people store their
parse-trees in raw structures. either way I guess.


typically, my parsers do almost no direct processing on the ASTs while
building them, apart from building a semantic representation of the code
as-seen.

typically, things like simplifying expressions, propagating constants,
... is done in a stage I have typically called "reduction", which
happens between parsing and prior to producing (bytecode) output.


> - Code generation: A good starting point I've found is to generate a single
> function. Write a little application that takes a file containing the raw
> bytes of your function, load the pointer (mark it as executable if your
> platform requires that), and then just call it. You write the prolog, the
> epilog, and the raw instructions in between, and maybe do some labels. You may
> have to remember the offset of a particular field and fill in the address
> later on (e.g. for a return statement the offset to the epilog, so you don't
> duplicate your clean-up code). Then advance to two functions that call each
> other in the same block etc.
>
> - Complete code generation: you generate several functions and put them in a
> proper executable file, the way your OS expects. You may have to generate link
> table entries ("trampolines") to call dynamically-linked system functions etc.
> If you want to start simpler, write your own primitive linker just for the fun
> of seeing how two functions that don't reside at fixed (relative) addresses
> could call each other.

at this point, things vary go differently, and several more stages are
involved in my case:

- typically, after the AST has been "reduced", it will be transformed
into bytecode (basically, walk the AST, figure out expression types when
relevant, and spit out any relevant bytecode ops).

previously, only a very small amount of type-analysis was done here, but
I am looking to expand this (partly to simplify the work being done in
the interpreter back-end), such that the output produced here will be
"mostly statically typed".

I am looking at going to producing bytecode output more similar to
Java-ByteCode, basically where many opcodes are qualified for specific
types, rather than leaving it solely to the back-end to figure out what
the types are. this is a little lame, given I found it elegant how, for
example, MSIL/CIL had just used a few simple operations, and the
back-end would do its thing, but some things are currently more awkward
here, and at-present this leaves some awkward holes in the type-system.

it all still leaves a bit of uncertainty though (as to whether
adding/using piles of type-annotated opcodes is really "the right
direction to be going").


- typically (assuming the interpreter), the bytecode will be processed
and then transformed into threaded code, producing a threaded-code
version of the program. there are some parts which use generated native
code thunks, but for the most part, pretty much all of the machinery
takes place in native C functions. as-is it proves problematic to do any
non-trivial scope and type-analysis at this stage, short of making the
threaded-code logic contain most of the "hard parts" of a full native
code-generator (as well as likely making it considerably slower).

some past experiments (special purpose tests), have implied that
potentially threaded code can pull off "reasonably solid" interpreter
performance (potentially within 3-5x of native).

however, at present it is considerably slower (at least with tests like
the recursive Fibonacci sequence and bubble-sort, with most "practical"
code the slowdown seems to be smaller, but typically because most of the
actual logic takes place in native code in these cases).


- if a JIT / native code-generator is being used, the process works like
this:
complex operations are generally decomposed into simpler ones, and much
of the type information from the prior stage is discarded (the
code-generator uses its own type-analysis, unlike the "dumb"
interpreter, 1);
it will basically start figuring out how types propagate through the
stack and through variables;
it will map out operation to physical storage and/or registers;
it will then spit out the generated assembler code.

1: the interpreter uses some amount of explicit type information
already, as well as a number of "complex compound operations". these
are, while fairly helpful regarding interpreter performance, fairly
pointless for generating native code, so most such operations are broken
down into "simpler primitive operations" (for example, a single opcode
for "perform 2 loads, compare the values, and perform a conditional
jump", is less useful than the individual operations for doing so).

the JIT for my scripting language is not fully written or operational,
but partial results thus far have looked promising, but this is a low
priority. mostly it has big holes is the ISA and type-system (currently,
only the dynamically-typed path is really implemented).


- any ASM produced by the JIT is fed into the assembler, converting it
into a COFF module (COFF is used here regardless of whether the target
OS is Windows or Linux).

(note: the assembler is currently used by a number of different
components, so the lack of a currently functional JIT doesn't mean it is
goes unused).


- COFF modules are fed into an in-program linker, which proceeds to link
them against the current program image. there is some amount of trickery
here, involving potentially both invoking code-generation handlers, as
well as mechanisms to determine whether or not to "proxy" function calls
(IOW: whether to link a function call directly against the target
address, or use an indirect jump through another pointer).

reasons a proxy may be used: the call is out-of-range (on x86-64, this
usually means the call has gone outside the +-2GB window), or, either
the code has been "hot patched" or the linker has reason to suspect that
such "hot patching" is likely to occur (currently, this involves
heuristics).


> I can't really say anything about optimization on the compiled code level. I
> suppose you'd build a data structure with additional information about each
> instruction and then loop over it, looking for patterns that you know can be
> simplified. Essentially the same as any other search-and-replace.

this is one possibility...


the strategy used by the codegen backend for my C compiler involved
"large numbers of special cases", but this ultimately proved a bit
problematic to debug (this type of "optimization" turns out to be a
debugging mess in most cases where one doesn't have fairly immediate
feedback as to when/where/how something has broken).


funny though, is even just using "straightforward" strategies, like
caching variables in registers, ... and the code still manages to be a
bit more optimized-seeming than what usually seems to end up being spit
out by MSVC.

I have not often been entirely impressed by what I have seen being spit
out by MSVC, and my own code-generators have often been able to
outperform it (and be competitive with GCC in my tests), however...:
making my code generators "actually work" often proves a bit more difficult.


hence why I am still mostly using an interpreter-based strategy:
it is much easier IME to maintain and debug the interpreter, whereas IME
my native code generators have more often tended to blow up in my face
than actually work reliably, and it is much more difficult than it would
seem to dig through a big pile of assembler output looking for whatever
little bug is causing the code not to work.


only "I am using an interpreter" sounds a lot less impressive than "I am
using a JIT compiler", although I remember at least one VM I know of
which had claimed to be using a JIT, but the thing was using stupid
threaded code (in pretty much the same way I am using it in my
interpreter), so I was not terribly impressed...

I guess there would always be a "cheap but lame" way to write a JIT
(and, actually, how my first working JIT worked):
just use fixed-form globs of ASM for each VM opcode, and simply append
them together in-series (emitting prologues/epilogues and labels and
so-on as-needed).

ironically, at the time, I was getting nearly the same performance as
unoptimized GCC output doing this, so I guess it probably wouldn't be
entirely unworkable (rather than trying to write more complex
code-generators with type-analysis and register allocation and so on...).

( it sometimes almost makes me wonder if I am incompetent or something
in these regards. )


> Is that un-computer-sciencey enough? This blog post may help with the basics
> of my code generation approach:
> http://orangejuiceliberationfront.com/generating-machine-code-at-runtime/ (but
> it's C, and Intel, and badly wrapped by the new theme of my blog).
>

nifty...

[toc] | [prev] | [next] | [standalone]


#606 — Re: writing interpreters, was Good practical language and OS agnostic text?

FromUli Kusterer <ulimakesacompiler@googlemail.com>
Date2012-04-22 12:53 +0200
SubjectRe: writing interpreters, was Good practical language and OS agnostic text?
Message-ID<12-04-063@comp.compilers>
In reply to#603
On 22.04.2012, at 03:58, BGB wrote:
> as-is, the scripting language is probably at a vaguely similar
> complexity level with C (in some ways it is simpler, and in others
> likely more complex), as it gradually moves in the direction of being
> "essentially" statically-typed, has pointers, class/instance OO, ...

 My programming language, Hammer, goes in the opposite direction. It's
essentially dynamically typed scripting language (at least as far as the user
is concerned). Every variable is a variant, but I keep values around in their
native type. Also, the syntax is very English-like and the intention is that a
user *can not* make it crash. Of course, in practice that may not hold true
due to bugs, but it is different than C, where there are a lot of "user
errors" that are well in their rights to cause a crash.

> (decided to leave out a bunch of stuff related to some of the hassles of
> static type-checking, and dealing with the matter of having a scoping
> model where the scope is both mutable and involves indirect visibility,
> meaning that variable assignments can alter what bindings may be
> potentially visible from a given piece of code, making determining all
> this statically a relatively non-trivial problem).

 Yeah, that sounds more like it ;-)

> now, how is any of this simpler than a C compiler?
> at least at present my main target is an interpreter, and weak
> performance is generally passable.

 The nice part about writing your own interpreter is that you don't have to
mess as much with predefined platform specifics. You can define instructions
the way you need them to make them easy to parse. Also, it's easy to write a
debugger into your interpreter loop, and check out the contents of your stack
and lots of other things.

> although I use recursive descent, the above sounds different from what I
> usually do.
> (...) typically, the result of all of this is to produce a syntax tree.

 That may be more due to my lacking English skills than actual differences.
I've simplified a lot, and left out expression parsing in the description, to
get the big picture across. My parser and syntax tree are written in C++ and
nodes have a class hierarchy starting at a basic Node and progressing to
things like a function call (with an array of sub-nodes for parameters, and a
"name" string field) etc.

 My expression parser is kinda fun, because it essentially walks the subtree
of the expression currently parsed, looking at the precedence of each operator
(a simple numeric value), and then either inserts itself as the next param of
the deepest, rightmost node, or inserts itself in the middle of the tree
somewhere, taking what was previously in its slot as its left argument. So
precedence actually gets worked out during building of the tree fairly easily,
with a loop in one function call, not several functions for different
precedence levels.

 That reminds me, I should check whether I've implemented right-associative
operators yet. They're fairly easy though, just a slight flip on the criteria
when something gets appended or inserted.

> C is a little uglier here, since it requires keeping track of typedefs
> and checking for typedef when trying to parse declarations.

 My scripting language doesn't really have declarations, but it has something
similar in that any identifier can be used as a one-word string constant in
many places, and there aren't really any reserved identifiers. So originally I
used to look if there was a variable of that name already in use, and then
generated a variable reference, otherwise treated it as an unquoted string
constant.

 Later, I realized that the former is sufficient, if I pre-fill the variable
with its own name as a string. That way, my "do" command (think eval(), it
executes a string as code, and has access to the current function's local
variables etc.) can contain a command that uses what I think is a constant is
turned into a variable. The same thing could happen in loop conditions, where
the first time round nothing has been put into a variable yet, so it's treated
as a string, then inside the loop it gets turned into a variable.

 Fun little details of programming languages :-)

> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.
>
> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
> parsers).
>
> the version used in my C compiler currently uses a small hash table
> (sort of, the "hash function" is simply to keep only the low-order bits
> of the address) to avoid re-reading the same token multiple times.
>
> practically though, it may be both higher performance and generally
> cleaner to just build a token array up-front.

 Yeah, my main reason was that my language, Hammer, due to its verbose,
english-like syntax, can require a lot of look-ahead. So it just made the code
much simpler, and much, much more maintainable. Also, it's very convenient to
have a struct with everything we know about the current token while debugging,
including, if you want, its line number to more easily find it in the source
code than using a pointer or global offset.

> I had considered using a table in the C compiler, but the "hash" option
> was less effort (the parser was already written in this case, so the
> hash could be added via a quick hack, rather than by making significant
> changes to the rest of the parser to use an token array instead of
> direct function calls).

 I actually have myToken = GoNextToken(), myToken = GoPrevToken() convenience
calls around my token array.

> a potential advantage of not using an array though is not needing to
> allocate memory for it.

 Definitely, though the overhead is small if you reference your code instead
of storing copies of strings, and stick to small numeric values for the rest
of the ivars, too. Depending on average token length, it comes out to about
the same size as your source script, or slightly larger allowing for lots of
operators etc.

>> - Syntax tree: Build a tree structure that represents the parsed program.
You
>> can then do things like evaluate constant nodes ahead of time (e.g. turn
5+4
>> into 9, but leave 5 + n as an addition node with a 5 and a reference to
>> variable 'n') by replacing groups of nodes with simpler nodes recursively.
>> Keep around the line number/first token offset in each node so you can
report
>> errors.
>
> agreed.
>
> in my case, I have generally used either "lists" built from "cons cells"
> for this purpose, or in other cases a variant of XML DOM nodes.

 So I guess you're building a more dynamic structure, not unlike a
dictionary/hash map? My low-level programming instincts kinda yell at me that
this is inefficient, but honestly, the parse tree is probably the last place
where one needs to worry about that. At least until one actually finds a
situation where one runs out of memory or finds it to be a performance
bottleneck. Premature optimization, root of all evil, yadda, yadda.

> I had actually thought this was fairly standard practice, until looking
> into it and observing that, no, apparently most people store their
> parse-trees in raw structures. either way I guess.

 Yeah, I use C++ classes, but then it could be argued that C++ doesn't really
have *real* classes, so I guess it's just a glorified struct as well. I have
some virtual methods, which make walking the tree as easy as kicking off a
recursive function call, but apart from that, they're very struct-y.

> typically, my parsers do almost no direct processing on the ASTs while
> building them, apart from building a semantic representation of the code
> as-seen.

 I certainly don't simplify nodes *while* building the tree. I actually
intentionally start out with the stupidest representation that is still
correct (i.e. I pick where to insert expression nodes so evaluating the tree
gives the right result, but that's it). That way, I can see parser errors
easily and early on. I can debug-dump my tree with a recursive function call.

 And *then* I call Simplify(). Which does a recursive depth-first traversal
and does simple optimizations. So each type of node only needs to know how to
simplify itself. E.g. operators know how to check their operands whether they
are constant or not (*after* calling Simplify() on them), and can then replace
themselves with their result if both are.

> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.

 Yes, that sounds about what I do. I just used a dumber word :-)

> at this point, things vary go differently, and several more stages are
> involved in my case:
>
> - typically, after the AST has been "reduced", it will be transformed
> into bytecode (basically, walk the AST, figure out expression types when
> relevant, and spit out any relevant bytecode ops).

 Yup. I recursively call GenerateCode(), passing a CodeBlock object into which
the raw instructions get written.

> previously, only a very small amount of type-analysis was done here, but
> I am looking to expand this (partly to simplify the work being done in
> the interpreter back-end), such that the output produced here will be
> "mostly statically typed".

 I don't have that currently, but what I did in an earlier approach (where my
code actually compiled to C++), I had a "nail down types" phase. What it
essentially did was ask up the hierarchy to find out which types each
expression could have with its current parameters (I have about 5 types, so it
was just a bit field), and then pick the narrowest type that fit. Often it
turned out that somewhere at the end there was a +1 or so, which meant I ended
up with the widest numeric type or so. Or there was concatenation involved and
I knew the end result would be a string. But of course, often it was just a
variant as well.

> I am looking at going to producing bytecode output more similar to
> Java-ByteCode, basically where many opcodes are qualified for specific
> types, rather than leaving it solely to the back-end to figure out what
> the types are. this is a little lame, given I found it elegant how, for
> example, MSIL/CIL had just used a few simple operations, and the
> back-end would do its thing, but some things are currently more awkward
> here, and at-present this leaves some awkward holes in the type-system.

 Yeah. In an early version, I just chickened out and always picked the widest
type. I.e. the + operator just always did floating point maths. These days, I
check the operand types at runtime and fall back on integer maths for stuff
like addition, subtraction and multiplication when possible.

> it all still leaves a bit of uncertainty though (as to whether
> adding/using piles of type-annotated opcodes is really "the right
> direction to be going").

 It might give you better performance, particularly once you go JIT (if you
do). And it saves you a bunch of branching instructions during runtime, which
are a major slowdown these days on modern CPUs.

> - typically (assuming the interpreter), the bytecode will be processed
> and then transformed into threaded code, producing a threaded-code
> version of the program. there are some parts which use generated native
> code thunks, but for the most part, pretty much all of the machinery
> takes place in native C functions. as-is it proves problematic to do any
> non-trivial scope and type-analysis at this stage, short of making the
> threaded-code logic contain most of the "hard parts" of a full native
> code-generator (as well as likely making it considerably slower).

 "threaded-code"? *looks it up on Wikipedia* Ah, so that's what it's called
when you just spit out a bunch of CALL instructions... Good to know :-) I
build a bytecode, with an index into an instruction array and two param
fields. I thought about generating CALL instructions instead, but it's easier
to debug this way, and subroutines in my language all have a variable number
of parameters, so I thought the advantage of avoiding one function pointer
dereference probably won't gain me much. And I'm still in the "get this dang
thing working" phase, after all.

> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).
>
> however, at present it is considerably slower (at least with tests like
> the recursive Fibonacci sequence and bubble-sort, with most "practical"
> code the slowdown seems to be smaller, but typically because most of the
> actual logic takes place in native code in these cases).

 Yeah. My experiences are similar. Hammer is what the developer of another
HyperTalk-descended language called a "Very High Level Language" and is more
of the CISC than the RISC mindset, so the number of actual bytecode
instructions is comparatively small compared to the amount of actual
instructions the functions behind each bytecode are made up of.

> - if a JIT / native code-generator is being used, the process works like
> this:
> complex operations are generally decomposed into simpler ones, and much
> of the type information from the prior stage is discarded (the
> code-generator uses its own type-analysis, unlike the "dumb"
> interpreter, 1);

 That might actually make sense, considering all the temporaries, registers
and whatever that native code "adds" to the original source code. Will have to
keep in mind if I'm ever crazy enough to do that JIT :-)

> - COFF modules are fed into an in-program linker, which proceeds to link
> them against the current program image. there is some amount of trickery
> here, involving potentially both invoking code-generation handlers, as
> well as mechanisms to determine whether or not to "proxy" function calls
> (IOW: whether to link a function call directly against the target
> address, or use an indirect jump through another pointer).
>
> reasons a proxy may be used: the call is out-of-range (on x86-64, this
> usually means the call has gone outside the +-2GB window), or, either
> the code has been "hot patched" or the linker has reason to suspect that
> such "hot patching" is likely to occur (currently, this involves
> heuristics).

 Yeah, some of my early native-code tests also had a custom loader (seems to
be identical to your "in-program linker"). I essentially stored the offset to
the trampoline "external symbol table" in the file, and then fixed up the CALL
instructions to point to the symbols I looked up at runtime.

> the strategy used by the codegen backend for my C compiler involved
> "large numbers of special cases", but this ultimately proved a bit
> problematic to debug (this type of "optimization" turns out to be a
> debugging mess in most cases where one doesn't have fairly immediate
> feedback as to when/where/how something has broken).

 Yeah. At the least one would have to be very disciplined during code
generation and keep information on all the jump-in and jump-out points, so one
doesn't combine two instructions from different branches. But then I suppose
one would have to keep track of those anyway, so all offsets can be fixed up
if an instruction is removed.

> funny though, is even just using "straightforward" strategies, like
> caching variables in registers, ... and the code still manages to be a
> bit more optimized-seeming than what usually seems to end up being spit
> out by MSVC.

 Oooo... burrrrn ;-)

> I have not often been entirely impressed by what I have seen being spit
> out by MSVC, and my own code-generators have often been able to
> outperform it (and be competitive with GCC in my tests), however...:
> making my code generators "actually work" often proves a bit more
difficult.

 TBH, it's easy to generate faster code if it doesn't have to be correct. :-p
Isn't that how SSE and OpenGL and all that stuff came about? They found some
trade-offs they could make for a particular problem domain that would make it
faster. Effectively, you're making a trade-off for correctness ;-)

> hence why I am still mostly using an interpreter-based strategy:
> it is much easier IME to maintain and debug the interpreter, whereas IME
> my native code generators have more often tended to blow up in my face
> than actually work reliably, and it is much more difficult than it would
> seem to dig through a big pile of assembler output looking for whatever
> little bug is causing the code not to work.

 Fully agreed. Hence the suggestion to generate bytecode first. But of course,
if you plan to make an actual native code compiler, pointers in that direction
don't necessarily hurt.

> only "I am using an interpreter" sounds a lot less impressive than "I am
> using a JIT compiler", although I remember at least one VM I know of
> which had claimed to be using a JIT, but the thing was using stupid
> threaded code (in pretty much the same way I am using it in my
> interpreter), so I was not terribly impressed...

 I would like to see performance comparison between the two, say on ARM and
Intel. Does a function pointer de-ref and a loop really make that much
difference in practice? OK, Hammer has lots of slowdowns, and its main loop
isn't that simple, so there's a few exit flags to check each go through the
loop, and a call to update a source level debugger, but hey...

> I guess there would always be a "cheap but lame" way to write a JIT
> (and, actually, how my first working JIT worked):
> just use fixed-form globs of ASM for each VM opcode, and simply append
> them together in-series (emitting prologues/epilogues and labels and
> so-on as-needed).

 Actually, I could imagine that, with a good optimizer, that is probably the
common way of doing things. And it's also easy to maintain.

Cheers,
-- Uli Kusterer
http://stacksmith.org

[toc] | [prev] | [next] | [standalone]


#614 — Re: writing interpreters, was Good practical language and OS agnostic text?

FromBGB <cr88192@hotmail.com>
Date2012-04-22 12:29 -0700
SubjectRe: writing interpreters, was Good practical language and OS agnostic text?
Message-ID<12-04-071@comp.compilers>
In reply to#606
On 4/22/2012 3:53 AM, Uli Kusterer wrote:
> On 22.04.2012, at 03:58, BGB wrote:
>> as-is, the scripting language is probably at a vaguely similar
>> complexity level with C (in some ways it is simpler, and in others
>> likely more complex), as it gradually moves in the direction of being
>> "essentially" statically-typed, has pointers, class/instance OO, ...
>
>   My programming language, Hammer, goes in the opposite direction. It's
> essentially dynamically typed scripting language (at least as far as the user
> is concerned). Every variable is a variant, but I keep values around in their
> native type. Also, the syntax is very English-like and the intention is that a
> user *can not* make it crash. Of course, in practice that may not hold true
> due to bugs, but it is different than C, where there are a lot of "user
> errors" that are well in their rights to cause a crash.

this is not to say that the language in my case is not dynamically-typed
WRT the code the user types, but the objective is to essentially limit
what about of dynamic type-checking is used within the VM, as this can
hurt performance.

the language in my case is actually much closer to being an ECMAScript
variant, with some type annotations, and some basic level of
type-inference (it may attempt to "predict" the type of an untagged
variable at a given point in time, based on whatever was last assigned
into it).


as-is, in my VM, much of the code still ends up using dynamic
type-checking, and the goal is to shave this down somewhat (as well as
making what "is known" generally more effective).


>> (decided to leave out a bunch of stuff related to some of the hassles of
>> static type-checking, and dealing with the matter of having a scoping
>> model where the scope is both mutable and involves indirect visibility,
>> meaning that variable assignments can alter what bindings may be
>> potentially visible from a given piece of code, making determining all
>> this statically a relatively non-trivial problem).
>
>   Yeah, that sounds more like it ;-)

yeah.

basically, I use a scoping model itself vaguely derived from the Self
language, rather than the (more traditional) "pure lexical scoping" model.

internally, packages are objects, and imports are variable declarations
(declaring a "delegate variable" which links to the imported package).

a problem though is that it is problematic to try to statically
determine what all is potentially visible is fairly difficult. at the
moment, this largely means that anything being pulled in from a package
import uses dynamic checking.

a "cheap trick" here in the language semantics is that the visibility
order for any binding not directly visible as part of the lexical scope
is not determined. this allows the VM to use what it can figure out in
preference to "falling back" to a generic dynamically-typed lookup.

besides this, caching via "lookup hashes" is also used (so, the pure
dynamic case isn't necessarily "that" bad).


>> now, how is any of this simpler than a C compiler?
>> at least at present my main target is an interpreter, and weak
>> performance is generally passable.
>
>   The nice part about writing your own interpreter is that you don't have to
> mess as much with predefined platform specifics. You can define instructions
> the way you need them to make them easy to parse. Also, it's easy to write a
> debugger into your interpreter loop, and check out the contents of your stack
> and lots of other things.

pretty much.

one can also go and compile C code to an interpreter as well,
essentially allowing for a "portable C analogue". Quake3 actually did
something like this as well.


>> although I use recursive descent, the above sounds different from what I
>> usually do.
>> (...) typically, the result of all of this is to produce a syntax tree.
>
>   That may be more due to my lacking English skills than actual differences.
> I've simplified a lot, and left out expression parsing in the description, to
> get the big picture across. My parser and syntax tree are written in C++ and
> nodes have a class hierarchy starting at a basic Node and progressing to
> things like a function call (with an array of sub-nodes for parameters, and a
> "name" string field) etc.

yep.

as noted before, I have used either lists or XML nodes here (either
using a Scheme-like AST structure, or a different AST structure loosely
derived from XML-RPC, 1).

in my case, my whole VM framework is pretty much plain C, so the
alternative would be using structs.

I am not sure what would be best overall, as each has tradeoffs.


1: one of my interpreters had an origin story of roughly hacking
together a mock-up parser for a JavaScript style language on top of a
hacked-up version of XML-RPC (which I had previously implemented support
for). overall, this was probably one of the worst-performing
interpreters I had ever written.

I have little idea how slow it was, but do know this much:
at the time, all values were boxed heap objects;
at the time, checking the type of an object was O(n), essentially a
linear search over every object in the heap;
initially, it directly interpreted the XML nodes, but later switched to
using a 16-bit word-code (and using things like jumps for implementing
loops, rather than special instructions and blocks);
...

the one which directly followed (a different form of the same language)
was one of my fastest, but its use of precise GC and similar made it
very painful to work with.

then the third (partly recombining parts of the last two) was "sort of
in the middle". this one is the direct ancestor of the current
interpreter, which has since-then been mostly undergoing incremental
changes.



>   My expression parser is kinda fun, because it essentially walks the subtree
> of the expression currently parsed, looking at the precedence of each operator
> (a simple numeric value), and then either inserts itself as the next param of
> the deepest, rightmost node, or inserts itself in the middle of the tree
> somewhere, taking what was previously in its slot as its left argument. So
> precedence actually gets worked out during building of the tree fairly easily,
> with a loop in one function call, not several functions for different
> precedence levels.

interesting.

I had just used copy/paste/edit a bunch of times to make the various
precedence levels, so either way works I guess.

IIRC, at one point I had used naive left-associative arithmetic for all
operators (with the first horridly-slow VM), before noting that I could
copy/paste the function to get different levels.


>   That reminds me, I should check whether I've implemented right-associative
> operators yet. They're fairly easy though, just a slight flip on the criteria
> when something gets appended or inserted.

yes, or one can use recursion for the right-node at the same precedence
level, for the "stack of functions" strategy.


>> C is a little uglier here, since it requires keeping track of typedefs
>> and checking for typedef when trying to parse declarations.
>
>   My scripting language doesn't really have declarations, but it has something
> similar in that any identifier can be used as a one-word string constant in
> many places, and there aren't really any reserved identifiers. So originally I
> used to look if there was a variable of that name already in use, and then
> generated a variable reference, otherwise treated it as an unquoted string
> constant.
>
>   Later, I realized that the former is sufficient, if I pre-fill the variable
> with its own name as a string. That way, my "do" command (think eval(), it
> executes a string as code, and has access to the current function's local
> variables etc.) can contain a command that uses what I think is a constant is
> turned into a variable. The same thing could happen in loop conditions, where
> the first time round nothing has been put into a variable yet, so it's treated
> as a string, then inside the loop it gets turned into a variable.
>
>   Fun little details of programming languages :-)

I am not entirely sure I understand how this would work.


my language actually has two basic types of declarations:
via the "var" and "function" keywords;
via a different strategy (for using "C-like" declarations).

so, for example:
"var x;" or "var i:int;" or ...
"function foo(x, y) { ... }";
...

the alternative strategy relies on another detail:
<identifier> <identifier> ...

is not actually valid anywhere else in the syntax at statement level.
so, basically, if a construction like:
<identifier> <identifier>;
<modifier*> <identifier> <identifier>;
...

is seen, the parser can fairly safely assume it is a variable declaration.

actually, the rules are a little more complicated:
<modifier*> <type_expression> <identifier> ...

where type-expression is an special expression which is also a
syntactically valid type, consisting of, potentially:
several prefix modifier tokens (*, &, ...);
a known type-name keyword, or an identifier;
several allowed suffixes ("[]", "[...]", "<...>", "...", ...).

so, for example, "*int[] arr;", would declare an array of
integer-pointers (well, as would "var arr:*int[];").

there is a little annoyance though, from a design POV, of having two
clearly redundant declaration syntax forms is, well, redundant.



>> partly agreed, except that it isn't really strictly necessary to build
>> an array up-front.
>>
>> most of my parsers don't bother using an array, but instead just call
>> the tokenizer function directly to peek and parse tokens based on the
>> current "stream position" (generally a raw char pointer in my language
>> parsers).
>>
>> the version used in my C compiler currently uses a small hash table
>> (sort of, the "hash function" is simply to keep only the low-order bits
>> of the address) to avoid re-reading the same token multiple times.
>>
>> practically though, it may be both higher performance and generally
>> cleaner to just build a token array up-front.
>
>   Yeah, my main reason was that my language, Hammer, due to its verbose,
> english-like syntax, can require a lot of look-ahead. So it just made the code
> much simpler, and much, much more maintainable. Also, it's very convenient to
> have a struct with everything we know about the current token while debugging,
> including, if you want, its line number to more easily find it in the source
> code than using a pointer or global offset.

yeah...

if I want the line number, the typical strategy is to walk from the
start of the buffer to the current position and count lines.

this didn't work for C though, due to preprocessing, so a different
strategy was used:
nearly every line of PP output is annotated with the current line-number
(though it could be potentially skipped for the case of consecutive line
numbers).

it is lame, but hell, it works...


>> I had considered using a table in the C compiler, but the "hash" option
>> was less effort (the parser was already written in this case, so the
>> hash could be added via a quick hack, rather than by making significant
>> changes to the rest of the parser to use an token array instead of
>> direct function calls).
>
>   I actually have myToken = GoNextToken(), myToken = GoPrevToken() convenience
> calls around my token array.

yeah, that works.


>> a potential advantage of not using an array though is not needing to
>> allocate memory for it.
>
>   Definitely, though the overhead is small if you reference your code instead
> of storing copies of strings, and stick to small numeric values for the rest
> of the ivars, too. Depending on average token length, it comes out to about
> the same size as your source script, or slightly larger allowing for lots of
> operators etc.

yes, but at a potential cost, depending if the language has or how it
implements things like unicode escapes.


<snip>

>>
>> agreed.
>>
>> in my case, I have generally used either "lists" built from "cons cells"
>> for this purpose, or in other cases a variant of XML DOM nodes.
>
>   So I guess you're building a more dynamic structure, not unlike a
> dictionary/hash map? My low-level programming instincts kinda yell at me that
> this is inefficient, but honestly, the parse tree is probably the last place
> where one needs to worry about that. At least until one actually finds a
> situation where one runs out of memory or finds it to be a performance
> bottleneck. Premature optimization, root of all evil, yadda, yadda.

yeah.

it is more worrisome for the C parser because:
it parses chunks of code that are often measured in the MB range (like
what happens if "windows.h" or similar gets in the mix, esp the MSVC
version which results in about 8MB of output);
because the C parser uses XML nodes.

but, oddly enough, this is not apparently the main source of slowness.
the tokenizer tends to bog down considerably more than any code having
much to do with XML, and the preprocessor eats up more time than
building the AST.

granted, the XML implementation is itself somewhat micro-optimized,
IIRC, with direct support for numeric-valued attributes, omission of
unnecessary features for this use-case (such as namespaces), ....

so, in all, it doesn't seem "too terrible".

however, granted, the parser may take easily 500ms or 2s or so to parse
a source module if large headers are involved, so it isn't free.

given that GCC is similarly slow, I am probably not too far off.
MSVC tears through code much more quickly though, making me suspect MSVC
has some sort of "scary-fast parser" going on.


>> I had actually thought this was fairly standard practice, until looking
>> into it and observing that, no, apparently most people store their
>> parse-trees in raw structures. either way I guess.
>
>   Yeah, I use C++ classes, but then it could be argued that C++ doesn't really
> have *real* classes, so I guess it's just a glorified struct as well. I have
> some virtual methods, which make walking the tree as easy as kicking off a
> recursive function call, but apart from that, they're very struct-y.

yeah.


I don't think the present form of ASTs is all that bad though as, after
all, I can quickly dump them to or read them from files, and (at least
in concept) it allows people to more easily write independent code
compatible with the parser or compiler back-end.

not that it probably matters all that much though, more just a quirk of
my personal history it seems.


>> typically, my parsers do almost no direct processing on the ASTs while
>> building them, apart from building a semantic representation of the code
>> as-seen.
>
>   I certainly don't simplify nodes *while* building the tree. I actually
> intentionally start out with the stupidest representation that is still
> correct (i.e. I pick where to insert expression nodes so evaluating the tree
> gives the right result, but that's it). That way, I can see parser errors
> easily and early on. I can debug-dump my tree with a recursive function call.

agreed.


>   And *then* I call Simplify(). Which does a recursive depth-first traversal
> and does simple optimizations. So each type of node only needs to know how to
> simplify itself. E.g. operators know how to check their operands whether they
> are constant or not (*after* calling Simplify() on them), and can then replace
> themselves with their result if both are.

yeah, this sounds similar to how my "reduce" process works.
there is a little context though, used mostly for things like constant
propagation and similar.


>> typically, things like simplifying expressions, propagating constants,
>> ... is done in a stage I have typically called "reduction", which
>> happens between parsing and prior to producing (bytecode) output.
>
>   Yes, that sounds about what I do. I just used a dumber word :-)

fair enough.


>> at this point, things vary go differently, and several more stages are
>> involved in my case:
>>
>> - typically, after the AST has been "reduced", it will be transformed
>> into bytecode (basically, walk the AST, figure out expression types when
>> relevant, and spit out any relevant bytecode ops).
>
>   Yup. I recursively call GenerateCode(), passing a CodeBlock object into which
> the raw instructions get written.

sounds functionally equivalent, though terms like "Compile" and "Emit"
(ex: "CompileStatement" or "EmitOpcode") are used instead, and the
target structure is called a "Context".


>> previously, only a very small amount of type-analysis was done here, but
>> I am looking to expand this (partly to simplify the work being done in
>> the interpreter back-end), such that the output produced here will be
>> "mostly statically typed".
>
>   I don't have that currently, but what I did in an earlier approach (where my
> code actually compiled to C++), I had a "nail down types" phase. What it
> essentially did was ask up the hierarchy to find out which types each
> expression could have with its current parameters (I have about 5 types, so it
> was just a bit field), and then pick the narrowest type that fit. Often it
> turned out that somewhere at the end there was a +1 or so, which meant I ended
> up with the widest numeric type or so. Or there was concatenation involved and
> I knew the end result would be a string. But of course, often it was just a
> variant as well.

the current "real" type-system is considerably more complex.

as far as the VM itself is concerned, a simpler type-model is used:
ILFDV (Int, Long, Float, Double, Variant).

most other types map to Variant, which is (partly) subdivided:
FN: "Fixnum", actually means "any integer types represented as a variant
and within the int32 range" (originally, it did mean, specifically,
"fixnum" in prior VMs, but was relaxed to "int32 range", which may mean
either an actual fixnum or a "BVT int32");
FL: "Flonum", which represents flonum and potentially "BVT float32";
BVT: "Boxed Value Type" (should be mostly self-explanatory), these
generally represent most value types which don't fit into a fixnum.


currently (in the interpreter), the ILFD types are themselves BVT's,
mostly for sake of simplicity.

I have also recently been working on "optimizing" the handling of BVTs
somewhat, mostly so that they will be faster and less prone to leaking
memory (and making the ILFD types use dedicated free-lists rather than
general-purpose memory-allocation functions).

granted, some of this new interpreter logic consists almost more of
macros than it does of actual code (mostly because of large numbers of
very repetitive operation-handler functions).


>> I am looking at going to producing bytecode output more similar to
>> Java-ByteCode, basically where many opcodes are qualified for specific
>> types, rather than leaving it solely to the back-end to figure out what
>> the types are. this is a little lame, given I found it elegant how, for
>> example, MSIL/CIL had just used a few simple operations, and the
>> back-end would do its thing, but some things are currently more awkward
>> here, and at-present this leaves some awkward holes in the type-system.
>
>   Yeah. In an early version, I just chickened out and always picked the widest
> type. I.e. the + operator just always did floating point maths. These days, I
> check the operand types at runtime and fall back on integer maths for stuff
> like addition, subtraction and multiplication when possible.
>

mostly, in my case, it was just lots of cases where the interpreter just
ends up using "variant" for everything, rather than, say, producing
threaded code specialized for 32-bit integer values or whatever else.

this means a big pile of new opcodes though, most with names like:
"ADD_XI" for "add fixed integer" and "MUL_XL_C" for "multiply fixed long
by a constant" and similar.

I don't really like this, as the interpreter "should" be smart enough to
figure all this stuff out on its own, but at present, it is not.


my native code-generator back-ends could figure this stuff out, but
granted, they had considerably more complex logic for things like
working with the type-system.

when I tried before to write an interpreter which did all of this, I
soon realized that it was working out to nearly the same functional
complexity as a native code-generator, but with worse performance,
largely defeating the point.


>> it all still leaves a bit of uncertainty though (as to whether
>> adding/using piles of type-annotated opcodes is really "the right
>> direction to be going").
>
>   It might give you better performance, particularly once you go JIT (if you
> do). And it saves you a bunch of branching instructions during runtime, which
> are a major slowdown these days on modern CPUs.

well, or at least, it allows the possibility of a "better yet simpler"
JIT, since the types will already be mostly known, meaning that the JIT
wont need to be able to figure out all of this stuff to, errm, actually
work...

theoretically, with threaded-code, the run-time checks could be partly
skipped already simply by inferring the type and then choosing the
handler with the correct type, and then keeping track of type-flow and
similar when building the threaded code.

however, I have not actually generally done this, due to the relative
amount of added complexity involved.

so, although less elegant, annotating many opcodes with types could be a
lot more generally workable.

for most other cases (those not arithmetic ops), the annotations are
more likely to be in the form of "signatures" (already commonly used,
these basically encode type-information for a variable/argument list/...
into the form of a string).


>> - typically (assuming the interpreter), the bytecode will be processed
>> and then transformed into threaded code, producing a threaded-code
>> version of the program. there are some parts which use generated native
>> code thunks, but for the most part, pretty much all of the machinery
>> takes place in native C functions. as-is it proves problematic to do any
>> non-trivial scope and type-analysis at this stage, short of making the
>> threaded-code logic contain most of the "hard parts" of a full native
>> code-generator (as well as likely making it considerably slower).
>
>   "threaded-code"? *looks it up on Wikipedia* Ah, so that's what it's called
> when you just spit out a bunch of CALL instructions... Good to know :-) I
> build a bytecode, with an index into an instruction array and two param
> fields. I thought about generating CALL instructions instead, but it's easier
> to debug this way, and subroutines in my language all have a variable number
> of parameters, so I thought the advantage of avoiding one function pointer
> dereference probably won't gain me much. And I'm still in the "get this dang
> thing working" phase, after all.


well, there is bytecode, just the interpreter converts it to threaded
code prior to actual execution (and just uses the threaded-code for
every time the function is called).

the actual "working logic" is, in this case, actually contained in a
mass of C functions.

I had looked it up, and concluded at the time that "treaded code" is
still technically an interpreter.


>> some past experiments (special purpose tests), have implied that
>> potentially threaded code can pull off "reasonably solid" interpreter
>> performance (potentially within 3-5x of native).
>>
>> however, at present it is considerably slower (at least with tests like
>> the recursive Fibonacci sequence and bubble-sort, with most "practical"
>> code the slowdown seems to be smaller, but typically because most of the
>> actual logic takes place in native code in these cases).
>
>   Yeah. My experiences are similar. Hammer is what the developer of another
> HyperTalk-descended language called a "Very High Level Language" and is more
> of the CISC than the RISC mindset, so the number of actual bytecode
> instructions is comparatively small compared to the amount of actual
> instructions the functions behind each bytecode are made up of.

yeah.

my VM currently has a "huge" number of opcodes (several hundred),
although due to there being a number of (sometimes large) holes in the
opcode space, they are currently numbered up to 860.

previously (before adding a bunch of type-annotated arithmetic opcodes)
the limit was 540. I tried to put them all in one such "hole", but they
didn't fit, and I didn't want to break them up and spread them out.

so, I "relocated" the whole range up to a space starting at 768 (next
even multiple of 256), leading to the current range.

probably a large number of opcodes are due to "compound operations",
which perform several smaller operations in sequence. most of these
were, in turn, because a single larger piece of C code to do something
was faster than making more passes through the dispatch loop to invoke
several smaller operations.

or: a single "lpostinc" opcode is faster than "lload; dup; push_1; add;
lstore".

so, it is a tradeoff...

better performance at the cost of 100s of opcodes (at present, 459).


>> - if a JIT / native code-generator is being used, the process works like
>> this:
>> complex operations are generally decomposed into simpler ones, and much
>> of the type information from the prior stage is discarded (the
>> code-generator uses its own type-analysis, unlike the "dumb"
>> interpreter, 1);
>
>   That might actually make sense, considering all the temporaries, registers
> and whatever that native code "adds" to the original source code. Will have to
> keep in mind if I'm ever crazy enough to do that JIT :-)
>

yes.

things like temporaries, register allocation, ... all serve to largely
nullify most of the gains from these complex opcodes, as the
code-generator will figure most of this out again on its own.

although it seems pointless to build compound ops merely to break them
down again in the JIT, otherwise different bytecode code would need to
be generated depending on whether the interpreter or the JIT was the
target, and decomposing the opcodes in the JIT, although slightly more
bulky, is not particularly complicated (vs pretty much everything else
going on in the thing...).


it is much like the "complexity" of a C style syntax for a programming
language:
if a person is writing a compiler or VM for a language that is much more
than a toy, the added complexity of parsing a C-like syntax is probably
the least of the worries.


>> - COFF modules are fed into an in-program linker, which proceeds to link
>> them against the current program image. there is some amount of trickery
>> here, involving potentially both invoking code-generation handlers, as
>> well as mechanisms to determine whether or not to "proxy" function calls
>> (IOW: whether to link a function call directly against the target
>> address, or use an indirect jump through another pointer).
>>
>> reasons a proxy may be used: the call is out-of-range (on x86-64, this
>> usually means the call has gone outside the +-2GB window), or, either
>> the code has been "hot patched" or the linker has reason to suspect that
>> such "hot patching" is likely to occur (currently, this involves
>> heuristics).
>
>   Yeah, some of my early native-code tests also had a custom loader (seems to
> be identical to your "in-program linker"). I essentially stored the offset to
> the trampoline "external symbol table" in the file, and then fixed up the CALL
> instructions to point to the symbols I looked up at runtime.

pretty much.

the linker is basically just a large table for symbols and relocations,
and a special-purpose memory manager (for allocating memory for those
pesky code/data/bss sections).

originally, it would link things in aggressively (get handed object,
link it in directly), but I later switched to a "lazy" 2-stage strategy
(where an object is not fully linked until one of its exported symbols
is referenced) mostly to fix some nasty problems which existed
previously (linking being sensitive to object ordering, ...).


the modules used here are basically raw COFF objects though.

as-needed, the linker will also defer to OS-specific functions
("dlsym()" and "GetProcAddress()"), in order to link against DLL exports.


>> the strategy used by the codegen backend for my C compiler involved
>> "large numbers of special cases", but this ultimately proved a bit
>> problematic to debug (this type of "optimization" turns out to be a
>> debugging mess in most cases where one doesn't have fairly immediate
>> feedback as to when/where/how something has broken).
>
>   Yeah. At the least one would have to be very disciplined during code
> generation and keep information on all the jump-in and jump-out points, so one
> doesn't combine two instructions from different branches. But then I suppose
> one would have to keep track of those anyway, so all offsets can be fixed up
> if an instruction is removed.

well, potentially, except I tend to produce ASM code as a giant string
to be fed into the assembler.

the assembler is, in fact, plenty fast IMO, and I speculate that by the
time one is trying to feed 10s of MB per second into the assembler, and
having this be a bottleneck, something has likely gone horribly wrong...

at first I had tried using direct code-generation, and later
function-call driven code generation, but soon found textual ASM to be
most convenient to work with, so I generally used this.


elsewhere in the VM, there are a number of cases where code generation
follows a pattern like:
Begin()
...
Print("ASM stuff...")
...
End()
...
fptr=GetAddr("foo")
...
fptr();


>> funny though, is even just using "straightforward" strategies, like
>> caching variables in registers, ... and the code still manages to be a
>> bit more optimized-seeming than what usually seems to end up being spit
>> out by MSVC.
>
>   Oooo... burrrrn ;-)

I think the issue is that MSVC seems to spit out some very often rather
naive-looking code.

the bigger irony though, is that as naive looking as it often is, it
still often manages to perform reasonably well.


like, it often seems to have a hard time with things like, keeping its
variables stored in registers, as it will very often spill the contents
of a register in cases where it is not necessary, often only to load the
variable again into a different register, and often not even in obvious
cases (such as spilling the register during a jump such that all of the
jump points have the value in the same place).

I am not really sure what is going on there sometimes...


that or, maybe it is just the profiler tending to get a lot more hits on
heavily used pieces of code where the compiler decided to do something
wacky.

this is combined sometimes though with the tendency of the compiler to,
every once in a great while, actually crash during a build. usually then
bringing up a "special" send error report box (to the effect of
"Microsoft is especially interesting in collecting these error
reports"...). maybe so, a crash in the compiler is probably hardly a
good sign...


OTOH, GCC tends to produce rather optimized-looking spaghetti-code.
code at least looks efficient and preforms well, but, how does it
work?... sometimes it is not so obvious.


>> I have not often been entirely impressed by what I have seen being spit
>> out by MSVC, and my own code-generators have often been able to
>> outperform it (and be competitive with GCC in my tests), however...:
>> making my code generators "actually work" often proves a bit more
> difficult.
>
>   TBH, it's easy to generate faster code if it doesn't have to be correct. :-p
> Isn't that how SSE and OpenGL and all that stuff came about? They found some
> trade-offs they could make for a particular problem domain that would make it
> faster. Effectively, you're making a trade-off for correctness ;-)

probably...

when I was writing the code generator, I used an optimization process like:
look at output;
identify any obvious inefficiencies;
figure out the logic which was involved in producing the code;
write a special case to detect the specific situation and emit an
alternate faster code sequence instead;
...

it then performed comparably with the other compilers, but lots of code
tended to break in non-obvious ways.


I later largely stopped using this code generator as:
I couldn't debug it sufficiently to make the output actually work reliably;
most of the code it contains is actually pretty horrid (whole thing is
basically bit-twiddly and special cases);
never mind its use of a multi-pass "magic bits" system, where the
compiler basically used multiple passes which operated (mostly) in
lock-step, and communicated primarily by using a synchronized byte-array
which was used typically to pass receive bit-flags from prior passes or
pass bit-flags to the next pass;
...

"magic bits" would encode things like which registers to try using (and
which have already been tried), warnings about things like potentials
for things like impending register spills, or running out of available
registers, ... (in effect, optimization by using multiple passes and a
system analogous to "road signs"). whenever the logic ran into a problem
case, it would flag a warning about it ("do not use this register here",
"insert stack padding here", "try to perform this operation without
using any additional registers", ...).

the goal would then be for the system to, between 2 and N passes
(usually with a limit of 16 or so), "converge" on an "optimal"/"stable"
output, followed by a "final" pass to actually emit the final-state code.

typically, then, the overall process looks like:
set up initial state;
while(!OutputIsStable())
   Send Begin
   Send all opcodes for function
   Send End
Flatten final output to an ASM string.


but, sadly, this was also my high-point here:
most of my subsequent native code generators have not actually gotten
fully written, before usually falling to either "grr... this is getting
a bit hairy" or "I have more important things to be working on".

meanwhile, other parts of the VM are often moving targets, so as to add
to the challenge, ...


>> hence why I am still mostly using an interpreter-based strategy:
>> it is much easier IME to maintain and debug the interpreter, whereas IME
>> my native code generators have more often tended to blow up in my face
>> than actually work reliably, and it is much more difficult than it would
>> seem to dig through a big pile of assembler output looking for whatever
>> little bug is causing the code not to work.
>
>   Fully agreed. Hence the suggestion to generate bytecode first. But of course,
> if you plan to make an actual native code compiler, pointers in that direction
> don't necessarily hurt.


actually, typically bytecode has been the intermediate stage.

however, usually most of the analysis and similar would have been left
in the back-end, leading to:
simple/naive front-end, very complex back-end.

hence, the realization that I may need to shift a little more of the
work to the front-end.


>> only "I am using an interpreter" sounds a lot less impressive than "I am
>> using a JIT compiler", although I remember at least one VM I know of
>> which had claimed to be using a JIT, but the thing was using stupid
>> threaded code (in pretty much the same way I am using it in my
>> interpreter), so I was not terribly impressed...
>
>   I would like to see performance comparison between the two, say on ARM and
> Intel. Does a function pointer de-ref and a loop really make that much
> difference in practice? OK, Hammer has lots of slowdowns, and its main loop
> isn't that simple, so there's a few exit flags to check each go through the
> loop, and a call to update a source level debugger, but hey...



well, if it is like the current partially-written JIT implementation,
which just uses full dynamic checking and invoking link-time
code-generation for everything, it would probably actually work out
being slower.

the major advantage of a JIT is in being able to do things like:
produce small/compact instruction sequences;
effectively utilize the system stack and registers for storage;
...

otherwise, there is not a huge advantage over threaded code.


>> I guess there would always be a "cheap but lame" way to write a JIT
>> (and, actually, how my first working JIT worked):
>> just use fixed-form globs of ASM for each VM opcode, and simply append
>> them together in-series (emitting prologues/epilogues and labels and
>> so-on as-needed).
>
>   Actually, I could imagine that, with a good optimizer, that is probably the
> common way of doing things. And it's also easy to maintain.
>

yeah.

may just go back to that strategy eventually...

[toc] | [prev] | [next] | [standalone]


#608 — Re: generating bytecode, was Good practical language and OS agnostic text?

FromUli Kusterer <ulimakesacompiler@googlemail.com>
Date2012-04-22 13:12 +0200
SubjectRe: generating bytecode, was Good practical language and OS agnostic text?
Message-ID<12-04-065@comp.compilers>
In reply to#603
On 22.04.2012, at 12:53, Uli Kusterer wrote:
> Fully agreed. Hence the suggestion to generate bytecode first. But of
course, if you plan to make an actual native code compiler, pointers in that
direction don't necessarily hurt.

 Huh. Just realized I never actually *wrote* that suggestion, it seems. So I
guess I should re-qualify that as:

 Yes, I agree, it is actually a good idea to start with a simple bytecode
interpreter first and to compile against that, before you try to generate your
own assembler. It may look like more work, but it lets you get one module
working before you have to figure out the other one.

 Thanks for pointing that out, BGB.

While we're on things I wrote that are missing something I wanted to say:
> I build a bytecode, with an index into an instruction array and two param
fields. I thought about generating CALL instructions instead, but it's easier
to debug this way, and subroutines in my language all have a variable number
of parameters, so I thought the advantage of avoiding one function pointer
dereference probably won't gain me much. And I'm still in the "get this dang
thing working" phase, after all.


 Another reason I'm doing the bytecode is that I'm hoping I can eventually
just save that to disk. I would love to get this thing running on iOS
eventually, and having files contain compiled x86 code in them wouldn't go
over too well on that ARM CPU :-) Also, if I save byte code (right now I save
source code) would be a slight added level of obfuscation for people who want
to ship their programs commercially (compared to aforementioned source code),
and the ability to make sure only instructions I consider "safe" can actually
be used.

Cheers,
-- Uli Kusterer
http://stacksmith.org

[toc] | [prev] | [next] | [standalone]


#609 — Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?

From"BartC" <bc@freeuk.com>
Date2012-04-22 12:51 +0100
SubjectRe: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?
Message-ID<12-04-066@comp.compilers>
In reply to#603
"BGB" <cr88192@hotmail.com> wrote in message
> On 4/21/2012 2:22 AM, Uli Kusterer wrote:

>> - Recursive descent parsers: It's the obvious way to write a parser.

> although I use recursive descent, the above sounds different from what I
> usually do.

> ReadStatement:
>   checks for and handles vaious statement types
>   calls ReadExpression
>
> ReadExpression:
>   (actually, this is a tower of functions, one for each precedence
> level, working from lowest to highest precedence)

Sounds like it's influenced by the C grammar, which defines expressions
using something like 13 or 17 layers of precedence.

Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
don't use any precedence in the grammar at all; I have precedence as an
attribute of an operator, and an expression can be parsed with a single
function.

Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
the  binary operators linking successive terms, while readterm() does all
the real work (since my syntax doesn't distinguish between expressions and
statements, that's quite a big workload).

>> - Tokenizing: Essentially grab all the words in your source text and
>> build an array with an entry for each so you can more quickly walk
>> forward and backward without having to scan individual characters.

> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.

> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
> parsers).

I've tried reading all the tokens in a separate pass, but didn't really like
it. And it takes a lot more storage as well, especially with macro
expansions.

Instead I read them as I go along, but with provision for a one-symbol
look-ahead.

>> - Syntax tree: Build a tree structure that represents the parsed program.
>> You

> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.

I use the following passes (which seem to be fairly typical):

Syntax analysis (lexing and parsing)
Name resolution (a recent introduction for me)
Type analysis (static type checks and coercions, constant folding)
Code generation (to intermediate code or to byte-code)
Final pass (from intermediate code to the target code)

Usually invoked one after the other for the entire module, where a
compile-time expressions is needed, then the first three passes have to be
done immediately (and the result had better be a constant value..)

I use the same structure now when generating byte-code (originally such a
compiler was just single-pass). Because such code is usually dynamically
typed, the type analysis pass only needs a nominal amount of work, but still
takes care of a few things (l-values for example).

> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).

Assuming you're talking about dynamic typing, I found it difficult to get
within 3-5x, unless some sort of type hinting is used, or perhaps you're
comparing with not too highly optimised native code. Or it's code that is
memory-intensive, then memory access will dominate.

--
Bartc

[toc] | [prev] | [next] | [standalone]


#612 — Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?

From"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Date2012-04-22 18:18 +0200
SubjectRe: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?
Message-ID<12-04-069@comp.compilers>
In reply to#609
On Sun, 22 Apr 2012 12:51:44 +0100, BartC wrote:

> "BGB" <cr88192@hotmail.com> wrote in message
>> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>
>> although I use recursive descent, the above sounds different from what I
>> usually do.
>
>> ReadStatement:
>>   checks for and handles vaious statement types
>>   calls ReadExpression
>>
>> ReadExpression:
>>   (actually, this is a tower of functions, one for each precedence
>> level, working from lowest to highest precedence)
>
> Sounds like it's influenced by the C grammar, which defines expressions
> using something like 13 or 17 layers of precedence.

Rather by an attempt to describe precedence using grammar means.

> Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
> don't use any precedence in the grammar at all; I have precedence as an
> attribute of an operator, and an expression can be parsed with a single
> function.

Yes, there is a very simple technique using two stacks, one for operands
another for operations.

> Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
> the  binary operators linking successive terms, while readterm() does all
> the real work (since my syntax doesn't distinguish between expressions and
> statements, that's quite a big workload).

Actually operations have two priorities; left and right. When left < right
you have left to right association. When right > left, it becomes right to
left.

Some operators may have these priorities sufficiently different. For
example the assignment operator. If your unlucky languages allows it, then
A+B = C+D better be A+(B=(C+D)). That would require the following order of
partial priorities:

   LP("=") << LP( "+")  < RP("+")  << RP("=")

> Instead I read them as I go along, but with provision for a one-symbol
> look-ahead.

Same here. Except that I have one token look-ahead.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

[toc] | [prev] | [next] | [standalone]


#621 — Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?

From"Bartc" <bartc@freeuk.com>
Date2012-04-23 10:59 +0100
SubjectRe: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?
Message-ID<12-04-078@comp.compilers>
In reply to#612
"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
> On Sun, 22 Apr 2012 12:51:44 +0100, BartC wrote:

>> Or rather two: readfactor(priority), and readterm(). Readfactor() deals
>> with
>> the  binary operators linking successive terms, while readterm() does all
>> the real work (since my syntax doesn't distinguish between expressions
>> and
>> statements, that's quite a big workload).
>
> Actually operations have two priorities; left and right. When left < right
> you have left to right association. When right > left, it becomes right to
> left.
>
> Some operators may have these priorities sufficiently different. For
> example the assignment operator. If your unlucky languages allows it, then
> A+B = C+D better be A+(B=(C+D)). That would require the following order of
> partial priorities:

My scheme doesn't work tidily when some operators have right-to-left
association.

Fortunately I only have two such operators, assignment and power (:= and
**), and some extra logic takes care of that. At least, the results are what
you would expect.

However, when I do something like A+B+:=C+D, then I might need a bit more
logic to deal with that, to avoid parentheses..

--
Bartc
[We theory weenies would use an operator precedence parser.  It'd be
about 12 lines of code and a tiny lookup table, and can easily handle
arbitrary precedence and associativity.  See the Wikipedia article for
details and pseudocode. The original Ritchie C compiler used recursive
descent for most of the language, and operator precedence for the
expressions.  It was two passes and fit in 24K bytes of RAM.  -John]

[toc] | [prev] | [next] | [standalone]


#616 — Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?

FromBGB <cr88192@hotmail.com>
Date2012-04-22 13:45 -0700
SubjectRe: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?
Message-ID<12-04-073@comp.compilers>
In reply to#609
On 4/22/2012 4:51 AM, BartC wrote:
> "BGB"<cr88192@hotmail.com>  wrote in message
>> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>
>>> - Recursive descent parsers: It's the obvious way to write a parser.
>
>> although I use recursive descent, the above sounds different from what I
>> usually do.
>
>> ReadStatement:
>>    checks for and handles vaious statement types
>>    calls ReadExpression
>>
>> ReadExpression:
>>    (actually, this is a tower of functions, one for each precedence
>> level, working from lowest to highest precedence)
>
> Sounds like it's influenced by the C grammar, which defines expressions
> using something like 13 or 17 layers of precedence.
>
> Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
> don't use any precedence in the grammar at all; I have precedence as an
> attribute of an operator, and an expression can be parsed with a single
> function.
>
> Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
> the  binary operators linking successive terms, while readterm() does all
> the real work (since my syntax doesn't distinguish between expressions and
> statements, that's quite a big workload).


I have functions for each precedence level, and there is a lot of them.

luckily, I tend to have the stack of expression precedence level
functions in their own source-file, so it is not all that bad.


>>> - Tokenizing: Essentially grab all the words in your source text and
>>> build an array with an entry for each so you can more quickly walk
>>> forward and backward without having to scan individual characters.
>
>> partly agreed, except that it isn't really strictly necessary to build
>> an array up-front.
>
>> most of my parsers don't bother using an array, but instead just call
>> the tokenizer function directly to peek and parse tokens based on the
>> current "stream position" (generally a raw char pointer in my language
>> parsers).
>
> I've tried reading all the tokens in a separate pass, but didn't really like
> it. And it takes a lot more storage as well, especially with macro
> expansions.
>
> Instead I read them as I go along, but with provision for a one-symbol
> look-ahead.

my parsers typically look at 1 or 2 tokens at a time, as usually this
seems sufficient for nearly any "ordinary" construction in a C style syntax.

a partial exception here is declarations, which often have to be tried
to be parsed, and will generally fail (revert to a prior point) if the
syntax turns out to not be an expression.

an annoyance (part of how this makes parsing C slower) is that pretty
much every normal identifier as the first token on a statement ends up
being checked if it refers to a typedef, although I guess an
optimization could be to check if the next token is in a "known bad"
list (ex: "=", "->", ...) and then skip out on looking up a typedef in
this case (a known-bad token meaning "this is obviously not a
declaration"). probably not that this would help when parsing header
contents though (as pretty much everything there is a declaration).


<snip>

yes, ok.


>> some past experiments (special purpose tests), have implied that
>> potentially threaded code can pull off "reasonably solid" interpreter
>> performance (potentially within 3-5x of native).
>
> Assuming you're talking about dynamic typing, I found it difficult to get
> within 3-5x, unless some sort of type hinting is used, or perhaps you're
> comparing with not too highly optimised native code. Or it's code that is
> memory-intensive, then memory access will dominate.


no, this was assuming that the output was resolved to static types
(although the input HLL is largely dynamically-typed, the goal is to
have largely statically-typed logic going on in the lower levels of the
VM, due either to the use of explicit annotation, and/or, type-inference).

so, these tests assumed that the VM would be able to effectively resolve
the types to their primitive forms (basically, they were intended to try
some things out in advance to help guide the design of a then-planned
new interpreter core, which never got fully implemented).

all this was approx 6 months ago.


the tests involved various numeric calculations (generally using integer
and floating point types), and compared them against their analogues as
plain C code.

there would have been an approx 3-5x slowdown between the interpreter
and native code compiled with the same optimization settings (trying
different settings, although changing the total running times, seemed to
have little impact on the relative execution times).

these weren't really general-purpose tests though, but rather were
involving hand-crafted "instruction sequences" (as lists of function
pointers).

the "stack" in the case of the VM was represented as a sliding pointer,
and was using fixed 64-bit elements (as was the argument list).

so, a double addition would have looked something like:
*((double *)(ctx->stack+1))=
	*((double *)(ctx->stack+0))+
	*((double *)(ctx->stack+1));
ctx->stack++;


I ended up not actually using an interpreter structured this way
(instead continuing to use and develop my older interpreter), but
performance was sufficiently good to show that threaded-code was a
viable option and "not drastically slower" than native code.

the result was, me eventually, migrating the older interpreter to the
use of threaded code (for a modest speed-up vs the "big switch()"
strategy I had been using previously).

as-is, the current interpreter performance is closer to around 100x
slower than native, for tests like Fib and bubble-sort (and generally
much smaller if a lot of C-side logic is involved).


for the newer version of the interpreter, with faster/improved "Boxed
Value Type" operations, here is a version of the opcode-handler logic:

BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
{	BVT_DefPopUV_Ty(f64, u, v); BVT_AddUV(u, v);
	BVT_FreeF64(ctx, v); return(cur->next); }

and if all macros are manually expanded out...

BSVM_ThreadOp *BSVM_ThOp_ADD_XD(BSVM_SVMState *ctx, BSVM_ThreadOp *cur)
{
	f64 *u, *v;
	v=(f64 *)(ctx->stack[--ctx->stackpos]);
	u=(f64 *)(ctx->stack[ctx->stackpos-1])
	*u=(*u)+(*v);
	*(f64 **)v=ctx->fbvt_double;
	ctx->fbvt_double=v;
	return(cur->next);
}

where "f64" is a typedef for "double".

as can be seen, this isn't quite as efficient, but this sort of logic
should be a little faster than the current (dynamically-typed) logic.

note that "SVMState *ctx" is the interpreter context, and "BSVM_ThreadOp
*cur" is the current opcode (it contains a function-pointer to the
handler, as well as any operands used by the opcode).

note that, in this interpreter, the stack is an array of pointers.


I don't yet have an estimate for "overall" performance of the new logic,
considering that a good deal more logic is still be noted before I will
be able to actually test and benchmark a lot of this stuff... (but, at
least, with the changes made thus far the VM hasn't started violently
blowing up, which is at least itself a good sign...).

[toc] | [prev] | [next] | [standalone]


#618

Fromcompilers@is-not-my.name
Date2012-04-22 22:11 +0000
Message-ID<12-04-075@comp.compilers>
In reply to#599
Uli Kusterer <ulimakesacompiler@nospam.googlemail.com> wrote:

>  IMO if you know assembler or BASIC and general algorithms (i.e. you
> could implement a binary tree and walk its nodes), and you can somehow
> figure out what bytes your code gets compiled to (at worst by writing
> a dummy assembler program and looking at what bytes show up between a
> bunch of nop instructions whose bytes you know), you should be able to
> at least get a basic working knowledge of how a compiler works. Just
> write the naive approach of how you would design any program.

I have written a few interpreters and I thought about winging it but I
realize there is a science to compiling and there are right and wrong ways
to do things. I would like to do things the right way but maybe with my weak
background and broken undergrad CS degree that is expecting too much.

snip

> Is that un-computer-sciencey enough? This blog post may help with the basics
> of my code generation approach:
> http://orangejuiceliberationfront.com/generating-machine-code-at-runtime/ (but
> it's C, and Intel, and badly wrapped by the new theme of my blog).

Thanks, I found your comments very useful. Seems like a good summary.

[toc] | [prev] | [next] | [standalone]


#622

From"BartC" <bc@freeuk.com>
Date2012-04-23 18:41 +0100
Message-ID<12-04-079@comp.compilers>
In reply to#618
<compilers@is-not-my.name> wrote
> Uli Kusterer <ulimakesacompiler@nospam.googlemail.com> wrote:
>
>>  IMO if you know assembler or BASIC and general algorithms (i.e. you
>> could implement a binary tree and walk its nodes), and you can somehow

> I have written a few interpreters and I thought about winging it but I
> realize there is a science to compiling and there are right and wrong ways
> to do things. I would like to do things the right way but maybe with my
> weak background and broken undergrad CS degree that is expecting too much.

I'm intrigued as to why you think writing compilers is a science but writing
interpreters isn't? Interpreters can include a big chunk of what's in a
compiler, and these days I think can be just as challenging.

And I don't know about right ways and wrong ways to write programs, but for
compilers there are probably formal and informal ways of implementing one.

(Naturally, I've always done things informally; it wasn't my job to write
compilers, they were just useful tools I created. But despite probably being
considered toys, they were used to write actual commercial applications and
to earn a living with!)

BTW I don't think CS degrees existed when compilers started being created.

--
Bartc

[toc] | [prev] | [next] | [standalone]


#632

Frombasile@starynkevitch.net
Date2012-05-02 22:16 -0700
Message-ID<12-05-003@comp.compilers>
In reply to#562
On Tuesday, April 17, 2012 11:28:46 PM UTC+2, (unknown) wrote:
> Guys, I'm having a bear of a time finding a good practical language
> and OS agnostic text on writing a compiler. I'm weak in math and not
> interested in the theoretical details.

In addition to all the good advice given by others, you might be interested by Christian Queinnec's Lisp in Small Pieces
http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html
which describes many flavors of Lisp interpreters and compilers.

You might also want to look into the internal representations of existing compilers, even complex ones. For that, you could even play with GCC MELT (see http://gcc-melt.org/ for more), a high level domain specific language to work on GCC internals.

Regards.
--
Basile Starynkevitch         http://starynkevitch.net/Basile/
[Lisp compilers do some really interesting things, and since the language handles
data structures so well, the code can be surprisingly easy to follow once you
spend an hour or two learning the basics of the language. -John]

[toc] | [prev] | [next] | [standalone]


#668

FromJohann 'Myrkraverk' Oskarsson <johann@2ndquadrant.com>
Date2012-06-06 16:52 +0000
Message-ID<12-06-007@comp.compilers>
In reply to#632
basile@starynkevitch.net writes:

> On Tuesday, April 17, 2012 11:28:46 PM UTC+2, (unknown) wrote:
>> Guys, I'm having a bear of a time finding a good practical language
>> and OS agnostic text on writing a compiler. I'm weak in math and not
>> interested in the theoretical details.
>
> In addition to all the good advice given by others, you might be
> interested by Christian Queinnec's Lisp in Small Pieces
> http://pagesperso-systeme.lip6.fr/Christian.Queinnec/WWW/LiSP.html
> which describes many flavors of Lisp interpreters and compilers.

I have not read the English version, only the 2nd edition in French,
Principes d'Implantation de Scheme et Lisp.

Each chapter has its own version of a complete interpreter implementing
the features discussed.  So it's quite a down to the earth discussion.

Even though it discusses lisp with an implementation in lisp it's well
worth reading, or browsing through, for any language or compiler
project.

But you need to be able to read lisp.  Which may or may not be worse
than math.


It goes over lisp features, one at a time.  The difference between lisp
1 and 2; that is the difference between lexical and dynamic scoping and
the andvances & difficulties each imposes on the interpreter.

Recursion, single and mutual.  And how each lisp version influences the
implementation.

It goes into continuations and control flow.  A discussion on how
exceptions are handled in the by interpreter.

A chapter on side effects.

For the mathimatically inclined, a chapter in implementing Lisp in
Lambda Calculus.  To me, this was amusing, but not apparently relevant
to practical applications.

The chapters I haven't read yet are (or recently enough to comment) are
Fast Interpretation, Compilation, Reflection, Macros (not to be confused
with the C preprocessor), Compiling to C and Essence of an Object
System.


--
   Johann Oskarsson                http://www.2ndquadrant.com/    |[]
   PostgreSQL Development, 24x7 Support, Training and Services  --+--
                                                                  |
   Blog: http://my.opera.com/myrkraverk/blog/

[toc] | [prev] | [standalone]


Page 3 of 3 — ← Prev page 1 2 [3]

Back to top | Article view | comp.compilers


csiph-web