Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: A minimal LL(1) parser generator ?
Date: Wed, 22 Jan 2020 17:08:19 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 75
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-01-027@comp.compilers>
References: <19-12-016@comp.compilers> <20-01-005@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="8496"; mail-complaints-to="abuse@iecc.com"
Keywords: LL(1)
Posted-Date: 23 Jan 2020 10:28:45 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2435

rockbrentwood@gmail.com writes:
>On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
>> ANTLR has even LL(*) but is too complicated. I am searching maximal
>> simple and elegant generator which generates function call like
>> written by hand.
>
>A large set of parsers are lined up in the parser generator comparison on
>Wikipedia here
>https://en.wikipedia.org/wiki/Comparison_of_parser_generators
>
>The question of who in the list does bona fide code synthesis (as opposed to
>cookie-cutter code generation) is not directly addressed, as far as I can see.
>But the items can be reviewed individually.

It's unclear to me what you mean with those two terms.  "Like written
by hand" is somewhat clearer in that I don't write code manually that
automata-based generators generate.  The code generated by generators
of recursive-descent parsers should look more like hand-written code;
especially if you optimize for that.  In general, though there are a
number of reasons why the code will deviate from that ideal:

* The interface to the scanner is narrow and leads to stylized code
  where a hand-written parser might make use of knowledge of the
  scanner (i.e., use a wider interface).

* Some things are simpler for a human, and others are simpler for a
  code generator, so if one does not optimize for looking like code
  written by a human, the result will look different.

As an example, consider the following rule in Gray:

nonterminal expr
...
(( (( term
   || "-" term {{ 0 swap - }} ))
   (( "+" term {{ + }}
   || "-" term {{ - }} )) **
)) expr rule ( -- n )

(This is for a simple calculator).

Gray generates code for the rule that Gforth decompiles as:

noname :
  <term+$D18> testsym
  IF     <term+$A8> @ execute
  ELSE   <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$158>
  THEN
  BEGIN  <term+$D58> testsym
  WHILE  <")"+$A0> testsym
         IF     <")"+$A0> testsym ?readnext <term+$A8> @ execute <term+$428>
         ELSE   <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$638>
         THEN
  REPEAT ;

You see control structures similar to what a human would write.  The
|| results in an IF...ELSE...THEN, the ** in a BEGIN...WHILE...REPEAT.
That part is fairly straightforward and does not need a detour through
state machines (at least as long as you stay with LL(1) grammars).

But you also see that the word has no name and lots of the words it
calls have no name, either (all the <...+$...> things are addresses of
unnamed words); that's because it's easier for the generator to deal
with addresses than to generate new names.  A human would use names
instead.

You also see occurences of testsym and ?readnext that are the
interface to the scanner.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/