Groups > comp.compilers > #2397 > unrolled thread

A minimal LL(1) parser generator ?

Started by	Andy <borucki.andrzej@gmail.com>
First post	2019-12-21 19:07 -0800
Last post	2020-01-25 15:27 -0500
Articles	2 on this page of 22 — 11 participants

Back to article view | Back to comp.compilers

  A minimal LL(1) parser generator ? Andy <borucki.andrzej@gmail.com> - 2019-12-21 19:07 -0800
    Re: A minimal LL(1) parser generator ? arnold@skeeve.com (Aharon Robbins) - 2019-12-22 19:29 +0000
    Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2019-12-26 16:21 +0000
      Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2019-12-29 14:47 -0800
        Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2019-12-31 16:30 +0000
          Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-01 01:02 -0800
            Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-02 17:25 +0000
              Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 11:59 -0800
                Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 13:59 -0800
                Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 14:44 -0800
                Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-22 17:12 +0000
                  Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-23 02:41 -0800
                    Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-25 18:25 +0000
    Re: A minimal LL(1) parser generator ? gaztoast@gmail.com - 2019-12-31 07:10 -0800
    Re: A minimal LL(1) parser generator ? honey crisis <gaztoast@gmail.com> - 2020-01-02 08:50 -0800
    Re: A minimal LL(1) parser generator ? George Neuner <gneuner2@comcast.net> - 2020-01-02 13:16 -0500
    Re: A minimal LL(1) parser generator ? rockbrentwood@gmail.com - 2020-01-04 10:37 -0800
      Re: A minimal LL(1) parser generator ? honey crisis <gaztoast@gmail.com> - 2020-01-05 05:05 -0800
      Branched gotos was: Re: A minimal LL(1) parser generator ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-01-06 08:47 +0200
        Re: Branched gotos was: Re: A minimal LL(1) parser generator ? Ben Hanson <jamin.hanson@googlemail.com> - 2020-01-07 11:32 -0800
      Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-22 17:08 +0000
        Re: A minimal LL(1) parser generator ? "Fred J. Scipione" <FredJScipione@alum.RPI.edu> - 2020-01-25 15:27 -0500

Page 2 of 2 — ← Prev page 1 [2]

#2435

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2020-01-22 17:08 +0000
Message-ID	<20-01-027@comp.compilers>
In reply to	#2425

rockbrentwood@gmail.com writes:
>On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
>> ANTLR has even LL(*) but is too complicated. I am searching maximal
>> simple and elegant generator which generates function call like
>> written by hand.
>
>A large set of parsers are lined up in the parser generator comparison on
>Wikipedia here
>https://en.wikipedia.org/wiki/Comparison_of_parser_generators
>
>The question of who in the list does bona fide code synthesis (as opposed to
>cookie-cutter code generation) is not directly addressed, as far as I can see.
>But the items can be reviewed individually.

It's unclear to me what you mean with those two terms.  "Like written
by hand" is somewhat clearer in that I don't write code manually that
automata-based generators generate.  The code generated by generators
of recursive-descent parsers should look more like hand-written code;
especially if you optimize for that.  In general, though there are a
number of reasons why the code will deviate from that ideal:

* The interface to the scanner is narrow and leads to stylized code
  where a hand-written parser might make use of knowledge of the
  scanner (i.e., use a wider interface).

* Some things are simpler for a human, and others are simpler for a
  code generator, so if one does not optimize for looking like code
  written by a human, the result will look different.

As an example, consider the following rule in Gray:

nonterminal expr
...
(( (( term
   || "-" term {{ 0 swap - }} ))
   (( "+" term {{ + }}
   || "-" term {{ - }} )) **
)) expr rule ( -- n )

(This is for a simple calculator).

Gray generates code for the rule that Gforth decompiles as:

noname :
  <term+$D18> testsym
  IF     <term+$A8> @ execute
  ELSE   <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$158>
  THEN
  BEGIN  <term+$D58> testsym
  WHILE  <")"+$A0> testsym
         IF     <")"+$A0> testsym ?readnext <term+$A8> @ execute <term+$428>
         ELSE   <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$638>
         THEN
  REPEAT ;

You see control structures similar to what a human would write.  The
|| results in an IF...ELSE...THEN, the ** in a BEGIN...WHILE...REPEAT.
That part is fairly straightforward and does not need a detour through
state machines (at least as long as you stay with LL(1) grammars).

But you also see that the word has no name and lots of the words it
calls have no name, either (all the <...+$...> things are addresses of
unnamed words); that's because it's easier for the generator to deal
with addresses than to generate new names.  A human would use names
instead.

You also see occurences of testsym and ?readnext that are the
interface to the scanner.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2438

From	"Fred J. Scipione" <FredJScipione@alum.RPI.edu>
Date	2020-01-25 15:27 -0500
Message-ID	<20-01-030@comp.compilers>
In reply to	#2435

In article <20-01-027@comp.compilers>, anton@mips.complang.tuwien.ac.at
says...
>
> rockbrentwood@gmail.com writes:
> >On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
> >> ANTLR has even LL(*) but is too complicated. I am searching maximal
> >> simple and elegant generator which generates function call like
> >> written by hand.
> >
> >A large set of parsers are lined up in the parser generator comparison on
> >Wikipedia here
> >https://en.wikipedia.org/wiki/Comparison_of_parser_generators
> >
> >The question of who in the list does bona fide code synthesis (as opposed to
> >cookie-cutter code generation) is not directly addressed, as far as I can see.
> >But the items can be reviewed individually.
>
> It's unclear to me what you mean with those two terms.  "Like written
> by hand" is somewhat clearer in that I don't write code manually that
....<snipped>...

To the original poster:  For "like written by hand" you might look at
RDP (recursive descent parser generator) -
<http://www.cs.rhul.ac.uk/research/languages/projects/rdp.html>

RDP provides sources for the generator and support libraries for lexing
and parsing.  Sources are in ANSI C and produce parsers in ANSI C
(including the RDP parser, of course).

The authors have not been looking for collaborators, but I have found
the project easy to modify for my personal 'improvements'.  They include
some extensions to the RDP grammar and producing 'switch' statements and
custom 'for', 'while', and 'do ... while' loops where possible in place
of the authors generic 'if ... else' chains and
'while(1)... if() break;' constructs.  I also made the generated calls
to library function uses macros, so that it would be easier to use
custom replacements where additional functionality was needed (e.g.
filters on symbol table searching).  The results are much closer to
"like written by hand".

One drawback to the RDP generated code that you might want to improve
is a way to give the generated symbol sets meaningful names in place
of the current auto-generated generic names (RDP001[], RDP002[], etc.).

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

A minimal LL(1) parser generator ?

Contents

#2435

#2438