Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: A minimal LL(1) parser generator ? Date: Thu, 26 Dec 2019 16:21:29 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 86 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-12-030@comp.compilers> References: <19-12-016@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="96917"; mail-complaints-to="abuse@iecc.com" Keywords: LL(1), tools Posted-Date: 26 Dec 2019 12:12:38 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2411 Andy writes: >ANTLR has even LL(*) but is too complicated. I am searching maximal >simple and elegant generator which generates function call like >written by hand. Sounds like you want a generator that generates a recursive-descent parser. A while ago I compared a number of parser generators [ertl99ef], and among those that generate recursive-descent parsers the shortest one by far was Gray . Whether you consider it simple and elegant is for you to decide. @InProceedings{ertl99ef, author = {M. Anton Ertl}, title = {Is {Forth} Code Compact? {A} Case Study}, booktitle = {EuroForth'99 Conference Proceedings}, year = {1999}, address = {St. Petersburg, Russia}, url = {http://www.complang.tuwien.ac.at/papers/ertl99ef.ps.gz}, abstract = {Forth advocates often claim that Forth code is smaller, faster, and requires less development time than equivalent programs in other languages. This paper investigates this claim by comparing a number of parser generators written in various languages with respect to source code size. The smallest parser generator (14 lines) in this comparison is written in Forth, and the other Forth program is smaller than the others in its class by a factor of 8 or more; however, the Forth programs do not have all the features of their counterparts. I took a closer look at Gray (in Forth) and Coco/R (in Modula-2) and found that several Forth features missing from Modula-2 give Gray more than a factor of three advantage over Coco/R (even if the other size differences were solely due to differences in functionality): run-time code generation; access to the parser and a simple, flexible syntax; and Forth's dictionary.} } Concerning the other question that came up, about dealing with left recursion, and building the abstract syntax tree (AST) appropriately, that part works nicely in Gray: In BNF you would write expr->expr '-' number expr->number (and the parse tree would have the same structure as the AST). Gray uses a variant of regular right part grammars, and you would write that as a pure parsing rule as follows: (( number (( '-' number )) ** )) <- expr For buiding the AST, let's assume that "number" leaves a tree node on the stack, and we have a word new-bin-node ( node1 node2 -- node3 ) (i.e., it pops node1 and node2 from the stack, and pushes node3). Then we can extend the above as follows: (( number (( '-' number {{ new-bin-node }} )) ** )) <- expr ( -- node ) (The "( -- node )" comment documents the stack effect of parsing "expr"). Passing the nodes on Forth's stack rather than using pseudo-variables like yacc's $1 etc. works nicely when going beyond BNF. Now I wonder how Algol-based EBNF parser generators do it, but am too lazy to actually research it. Getting a right-recursive tree is actually slightly harder: Now we actually have to perform recursion (non-recursive variants exist, but are not simpler). Let's have a similar example, but with the right-associative operator '^'. nonterminal expr ( -- node ) \ declare before use (( number (( '^' expr {{ new-bin-node }} )) ?? )) expr rule - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/