Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!nerds-end From: "BartC" Newsgroups: comp.compilers Subject: Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text? Date: Sun, 22 Apr 2012 12:51:44 +0100 Organization: A noiseless patient Spider Lines: 83 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <12-04-066@comp.compilers> References: <12-04-019@comp.compilers> <12-04-056@comp.compilers> <12-04-060@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: leila.iecc.com 1335104973 17164 64.57.183.58 (22 Apr 2012 14:29:33 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 22 Apr 2012 14:29:33 +0000 (UTC) Keywords: parse, design Posted-Date: 22 Apr 2012 10:29:33 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:609 "BGB" wrote in message > On 4/21/2012 2:22 AM, Uli Kusterer wrote: >> - Recursive descent parsers: It's the obvious way to write a parser. > although I use recursive descent, the above sounds different from what I > usually do. > ReadStatement: > checks for and handles vaious statement types > calls ReadExpression > > ReadExpression: > (actually, this is a tower of functions, one for each precedence > level, working from lowest to highest precedence) Sounds like it's influenced by the C grammar, which defines expressions using something like 13 or 17 layers of precedence. Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I don't use any precedence in the grammar at all; I have precedence as an attribute of an operator, and an expression can be parsed with a single function. Or rather two: readfactor(priority), and readterm(). Readfactor() deals with the binary operators linking successive terms, while readterm() does all the real work (since my syntax doesn't distinguish between expressions and statements, that's quite a big workload). >> - Tokenizing: Essentially grab all the words in your source text and >> build an array with an entry for each so you can more quickly walk >> forward and backward without having to scan individual characters. > partly agreed, except that it isn't really strictly necessary to build > an array up-front. > most of my parsers don't bother using an array, but instead just call > the tokenizer function directly to peek and parse tokens based on the > current "stream position" (generally a raw char pointer in my language > parsers). I've tried reading all the tokens in a separate pass, but didn't really like it. And it takes a lot more storage as well, especially with macro expansions. Instead I read them as I go along, but with provision for a one-symbol look-ahead. >> - Syntax tree: Build a tree structure that represents the parsed program. >> You > typically, things like simplifying expressions, propagating constants, > ... is done in a stage I have typically called "reduction", which > happens between parsing and prior to producing (bytecode) output. I use the following passes (which seem to be fairly typical): Syntax analysis (lexing and parsing) Name resolution (a recent introduction for me) Type analysis (static type checks and coercions, constant folding) Code generation (to intermediate code or to byte-code) Final pass (from intermediate code to the target code) Usually invoked one after the other for the entire module, where a compile-time expressions is needed, then the first three passes have to be done immediately (and the result had better be a constant value..) I use the same structure now when generating byte-code (originally such a compiler was just single-pass). Because such code is usually dynamically typed, the type analysis pass only needs a nominal amount of work, but still takes care of a few things (l-values for example). > some past experiments (special purpose tests), have implied that > potentially threaded code can pull off "reasonably solid" interpreter > performance (potentially within 3-5x of native). Assuming you're talking about dynamic typing, I found it difficult to get within 3-5x, unless some sort of type hinting is used, or perhaps you're comparing with not too highly optimised native code. Or it's code that is memory-intensive, then memory access will dominate. -- Bartc