Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: A minimal LL(1) parser generator ? Date: Wed, 22 Jan 2020 17:08:19 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 75 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-01-027@comp.compilers> References: <19-12-016@comp.compilers> <20-01-005@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="8496"; mail-complaints-to="abuse@iecc.com" Keywords: LL(1) Posted-Date: 23 Jan 2020 10:28:45 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2435 rockbrentwood@gmail.com writes: >On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote: >> ANTLR has even LL(*) but is too complicated. I am searching maximal >> simple and elegant generator which generates function call like >> written by hand. > >A large set of parsers are lined up in the parser generator comparison on >Wikipedia here >https://en.wikipedia.org/wiki/Comparison_of_parser_generators > >The question of who in the list does bona fide code synthesis (as opposed to >cookie-cutter code generation) is not directly addressed, as far as I can see. >But the items can be reviewed individually. It's unclear to me what you mean with those two terms. "Like written by hand" is somewhat clearer in that I don't write code manually that automata-based generators generate. The code generated by generators of recursive-descent parsers should look more like hand-written code; especially if you optimize for that. In general, though there are a number of reasons why the code will deviate from that ideal: * The interface to the scanner is narrow and leads to stylized code where a hand-written parser might make use of knowledge of the scanner (i.e., use a wider interface). * Some things are simpler for a human, and others are simpler for a code generator, so if one does not optimize for looking like code written by a human, the result will look different. As an example, consider the following rule in Gray: nonterminal expr ... (( (( term || "-" term {{ 0 swap - }} )) (( "+" term {{ + }} || "-" term {{ - }} )) ** )) expr rule ( -- n ) (This is for a simple calculator). Gray generates code for the rule that Gforth decompiles as: noname : testsym IF @ execute ELSE <"+"+$A0> testsym ?readnext @ execute THEN BEGIN testsym WHILE <")"+$A0> testsym IF <")"+$A0> testsym ?readnext @ execute ELSE <"+"+$A0> testsym ?readnext @ execute THEN REPEAT ; You see control structures similar to what a human would write. The || results in an IF...ELSE...THEN, the ** in a BEGIN...WHILE...REPEAT. That part is fairly straightforward and does not need a detour through state machines (at least as long as you stay with LL(1) grammars). But you also see that the word has no name and lots of the words it calls have no name, either (all the <...+$...> things are addresses of unnamed words); that's because it's easier for the generator to deal with addresses than to generate new names. A human would use names instead. You also see occurences of testsym and ?readnext that are the interface to the scanner. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/