Groups > comp.compilers > #2397 > unrolled thread

A minimal LL(1) parser generator ?

Started by	Andy <borucki.andrzej@gmail.com>
First post	2019-12-21 19:07 -0800
Last post	2020-01-25 15:27 -0500
Articles	20 on this page of 22 — 11 participants

Back to article view | Back to comp.compilers

  A minimal LL(1) parser generator ? Andy <borucki.andrzej@gmail.com> - 2019-12-21 19:07 -0800
    Re: A minimal LL(1) parser generator ? arnold@skeeve.com (Aharon Robbins) - 2019-12-22 19:29 +0000
    Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2019-12-26 16:21 +0000
      Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2019-12-29 14:47 -0800
        Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2019-12-31 16:30 +0000
          Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-01 01:02 -0800
            Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-02 17:25 +0000
              Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 11:59 -0800
                Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 13:59 -0800
                Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 14:44 -0800
                Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-22 17:12 +0000
                  Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-23 02:41 -0800
                    Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-25 18:25 +0000
    Re: A minimal LL(1) parser generator ? gaztoast@gmail.com - 2019-12-31 07:10 -0800
    Re: A minimal LL(1) parser generator ? honey crisis <gaztoast@gmail.com> - 2020-01-02 08:50 -0800
    Re: A minimal LL(1) parser generator ? George Neuner <gneuner2@comcast.net> - 2020-01-02 13:16 -0500
    Re: A minimal LL(1) parser generator ? rockbrentwood@gmail.com - 2020-01-04 10:37 -0800
      Re: A minimal LL(1) parser generator ? honey crisis <gaztoast@gmail.com> - 2020-01-05 05:05 -0800
      Branched gotos was: Re: A minimal LL(1) parser generator ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-01-06 08:47 +0200
        Re: Branched gotos was: Re: A minimal LL(1) parser generator ? Ben Hanson <jamin.hanson@googlemail.com> - 2020-01-07 11:32 -0800
      Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-22 17:08 +0000
        Re: A minimal LL(1) parser generator ? "Fred J. Scipione" <FredJScipione@alum.RPI.edu> - 2020-01-25 15:27 -0500

Page 1 of 2 [1] 2 Next page →

#2397 — A minimal LL(1) parser generator ?

From	Andy <borucki.andrzej@gmail.com>
Date	2019-12-21 19:07 -0800
Subject	A minimal LL(1) parser generator ?
Message-ID	<19-12-016@comp.compilers>

ANTLR has even LL(*) but is too complicated. I am searching maximal
simple and elegant generator which generates function call like
written by hand.

Temporary goal: must be very simple, bacause this will not parse
program/document, just only strings with my definitions of (augmented)
regular expressions. I can write it by hand, but generator minimizes
possibility making mistake and systemztize work.

[toc] | [next] | [standalone]

#2399

From	arnold@skeeve.com (Aharon Robbins)
Date	2019-12-22 19:29 +0000
Message-ID	<19-12-018@comp.compilers>
In reply to	#2397

In article <19-12-016@comp.compilers>, Andy  <borucki.andrzej@gmail.com> wrote:
>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand. ...


https://github.com/arnoldrobbins/stacc may be of interest.

Arnold
--
Aharon (Arnold) Robbins 		arnold AT skeeve DOT com

[toc] | [prev] | [next] | [standalone]

#2411

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2019-12-26 16:21 +0000
Message-ID	<19-12-030@comp.compilers>
In reply to	#2397

Andy <borucki.andrzej@gmail.com> writes:
>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand.

Sounds like you want a generator that generates a recursive-descent
parser.  A while ago I compared a number of parser generators
[ertl99ef], and among those that generate recursive-descent parsers
the shortest one by far was Gray
<http://www.complang.tuwien.ac.at/forth/Gray5.zip>.  Whether you
consider it simple and elegant is for you to decide.

@InProceedings{ertl99ef,
  author = 	 {M. Anton Ertl},
  title = 	 {Is {Forth} Code Compact? {A} Case Study},
  booktitle = 	 {EuroForth'99 Conference Proceedings},
  year =	 {1999},
  address =	 {St. Petersburg, Russia},
  url = 	 {http://www.complang.tuwien.ac.at/papers/ertl99ef.ps.gz},
  abstract = 	 {Forth advocates often claim that Forth code is
                  smaller, faster, and requires less development time
                  than equivalent programs in other languages.  This
                  paper investigates this claim by comparing a number
                  of parser generators written in various languages
                  with respect to source code size.  The smallest
                  parser generator (14 lines) in this comparison is
                  written in Forth, and the other Forth program is
                  smaller than the others in its class by a factor of
                  8 or more; however, the Forth programs do not have
                  all the features of their counterparts.  I took a
                  closer look at Gray (in Forth) and Coco/R (in
                  Modula-2) and found that several Forth features
                  missing from Modula-2 give Gray more than a factor
                  of three advantage over Coco/R (even if the other
                  size differences were solely due to differences in
                  functionality): run-time code generation; access to
                  the parser and a simple, flexible syntax; and
                  Forth's dictionary.}
}

Concerning the other question that came up, about dealing with left
recursion, and building the abstract syntax tree (AST) appropriately,
that part works nicely in Gray:

In BNF you would write

expr->expr '-' number
expr->number

(and the parse tree would have the same structure as the AST).

Gray uses a variant of regular right part grammars, and you would
write that as a pure parsing rule as follows:

(( number (( '-' number )) ** )) <- expr

For buiding the AST, let's assume that "number" leaves a tree node on
the stack, and we have a word

new-bin-node ( node1 node2 -- node3 )

(i.e., it pops node1 and node2 from the stack, and pushes node3).
Then we can extend the above as follows:

(( number (( '-' number {{ new-bin-node }} )) ** )) <- expr ( -- node )

(The "( -- node )" comment documents the stack effect of parsing "expr").

Passing the nodes on Forth's stack rather than using pseudo-variables
like yacc's $1 etc. works nicely when going beyond BNF.  Now I wonder
how Algol-based EBNF parser generators do it, but am too lazy to
actually research it.

Getting a right-recursive tree is actually slightly harder: Now we
actually have to perform recursion (non-recursive variants exist, but
are not simpler).  Let's have a similar example, but with the
right-associative operator '^'.

nonterminal expr ( -- node ) \ declare before use
(( number (( '^' expr {{ new-bin-node }} )) ?? )) expr rule

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2413

From	carlglassberg@gmail.com
Date	2019-12-29 14:47 -0800
Message-ID	<19-12-032@comp.compilers>
In reply to	#2411

Lightweight and interesting, this Gray parser-generator.

Example:

Gray's meta notation uses a postix operator "&&" for "separator list".

"&&" seems particularly elegant and interesting:

If Gray: "a b &&" ---> EBNF: "a { b a }"

then  I assume:

"a b && c &&" ---> "a { b a } { c a }"

and so on ...

Further "a b && b && ..." ===> "a b &&"

??? Is my understanding correct?

Carl
----

On Thursday, December 26, 2019 at 9:12:41 AM UTC-8, Anton Ertl wrote:
> Andy <borucki.andrzej@gmail.com> writes:
> >ANTLR has even LL(*) but is too complicated. I am searching maximal
> >simple and elegant generator which generates function call like
> >written by hand.
>
> Sounds like you want a generator that generates a recursive-descent
> parser.  A while ago I compared a number of parser generators
> [ertl99ef], and among those that generate recursive-descent parsers
> the shortest one by far was Gray
> <http://www.complang.tuwien.ac.at/forth/Gray5.zip>.  Whether you
> consider it simple and elegant is for you to decide.
>
> @InProceedings{ertl99ef,
>   author = 	 {M. Anton Ertl},
>   title = 	 {Is {Forth} Code Compact? {A} Case Study},
>   booktitle = 	 {EuroForth'99 Conference Proceedings},
>   year =	 {1999},
>   address =	 {St. Petersburg, Russia},
>   url = 	 {http://www.complang.tuwien.ac.at/papers/ertl99ef.ps.gz},

[toc] | [prev] | [next] | [standalone]

#2420

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2019-12-31 16:30 +0000
Message-ID	<19-12-040@comp.compilers>
In reply to	#2413

carlglassberg@gmail.com writes:
>Gray's meta notation uses a postix operator "&&" for "separator list".
>
>"&&" seems particularly elegant and interesting:
>
>If Gray: "a b &&" ---> EBNF: "a { b a }"
>
>then  I assume:
>
>"a b && c &&" ---> "a { b a } { c a }"

The left operand of the second && would be "a b &&", so the whole
thing would be equivalent to

a {b a} {c {a {b a}}}

a (( b || c )) &&

would parse the same strings, but with different parse trees; so if
you add actions, the results would probably be different.

>and so on ...
>
>Further "a b && b && ..." ===> "a b &&"

They describe the same strings, yes.  However, the former has an
ambiguous grammar that allows to parse "aba" in two ways: either the b
is the first b or the second one.  In Gray, this leads to a conflict
(it just can deal with LL(1) grammars), and I guess one of these
parsing options never be exercised, which means that it works as if
you had written "a b &&" in the first place.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2421

From	carlglassberg@gmail.com
Date	2020-01-01 01:02 -0800
Message-ID	<20-01-001@comp.compilers>
In reply to	#2420

On Tuesday, December 31, 2019 at 8:42:51 AM UTC-8, Anton Ertl wrote:
> carlglassberg@gmail.com writes:
> >Gray's meta notation uses a postix operator "&&" for "separator list".
...
> >If Gray: "a b &&" ---> EBNF: "a { b a }"
...
> >then  ...
...
> >"a b && c &&"
>
> The left operand of the second && would be "a b &&", so the whole
> thing would be equivalent to
>
> a {b a} {c {a {b a}}}
...

Should that be ?
a {b a} { c a {b a}}

Also, would
  a b && c d &&
parse as:
  a { b a } c { d c }
or as:
  a { b a } c { d a { b a } c }

???

Carl

[toc] | [prev] | [next] | [standalone]

#2423

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2020-01-02 17:25 +0000
Message-ID	<20-01-003@comp.compilers>
In reply to	#2421

carlglassberg@gmail.com writes:
>On Tuesday, December 31, 2019 at 8:42:51 AM UTC-8, Anton Ertl wrote:
>> carlglassberg@gmail.com writes:
>> >Gray's meta notation uses a postix operator "&&" for "separator list".
>...
>> >If Gray: "a b &&" ---> EBNF: "a { b a }"
...
>> >"a b && c &&"
...
>Should that be ?
>a {b a} { c a {b a}}

Yes.

>Also, would
>  a b && c d &&

I assume you mean a concatenation of "a b &&" and "c d &&".  In Gray
syntax that would be

(( a b && c d && ))

>parse as:
>  a { b a } c { d c }

Yes.

>or as:
>  a { b a } c { d a { b a } c }

If you want that, you would write it as

(( a b && c )) d &&

&& is just a binary postfix operator.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2428

From	carlglassberg@gmail.com
Date	2020-01-05 11:59 -0800
Message-ID	<20-01-008@comp.compilers>
In reply to	#2423

On Thursday, January 2, 2020 at 10:21:53 AM UTC-8, Anton Ertl wrote:
...
> (( a b && c d && ))

1. For concatenation of these, I would write, in Gray:
   ((a b &&))((c d & &&)) for EBNF: a { b a } c { d c }
because wouldn't
   ((a b && c d & &&))
be the same as?
   a { b a } c { d a { b a } c }

2. Let us compare separator-list notation.

In Waite and Goos, "Compiler Construction", there is an infix operator, "||", for separator list, not to be confused with Gray's meta-symbol for "or".
And Eiffel has a special notation for separator list.

In comparison, Anton's Gray has a postfix operator, "&&", for separator-list.

/Comparison of separator-list notation in
Waite/Goos EBNF, Eiffel EBNF (BNF-E), and Gray EBNF/

                               Wirth EBNF         Gray EBNF:
-------------------------------------------------------------------
Waite/Goos: x || y ==========> x { y x }          x y &&

In Eiffel:  { x y ...}* ===> [ x { y x } ]     (( x y && )) ??
            { x y ...}+ ===>   x { y x }          x y &&
-------------------------------------------------------------------

What is better about Gray is that it only uses braces in action semantics,
in doubled form {{ ... }}, and no one is likely to be confused by the use of
braces.

Also it's postfix separator-list operator, "&&", is easier to remember than
Eiffel's more verbose special variation of standard EBNF braces.

And Gray's "&&" separator-list operator is just another postfix op, consistent with it's use of "??", "**" and "++"

Gray better separates the 2 styles of notation, it's "regular expression" style versus standard bracketed EBNF style.

The 2 styles ideally should not be mixed, or at least confusion should be avoided by not introducing yet another similiar notation, in the form of a special variation of standard EBNF braces for separator lists.

One minor criticism of Gray:

I do wish Gray had a production rule terminator, say semicolon ";" or stop "."

The end of 1 rule would be clearly distinguishable from the beginning of the next without special processing of end-of-line which makes scanning and
parsing space sensitive. We would avoid the need for LL(2) parsing or other
parsing and scanning techniques.

Carl
----

[toc] | [prev] | [next] | [standalone]

#2429

From	carlglassberg@gmail.com
Date	2020-01-05 13:59 -0800
Message-ID	<20-01-009@comp.compilers>
In reply to	#2428

correction:
  ((a b &&))((c d & &&)) for EBNF: a { b a } c { d c }
should be:
  ((a b &&))((c d &&)) for EBNF: a { b a } c { d c }

and
  ((a b && c d & &&))
should be:
  ((a b && c d &&))

Don't know how those extraneous single "&" got there!

Carl
---

[toc] | [prev] | [next] | [standalone]

#2430

From	carlglassberg@gmail.com
Date	2020-01-05 14:44 -0800
Message-ID	<20-01-010@comp.compilers>
In reply to	#2428

That table didn't align properly, so lets present the same results this way:
-------------------------------------------------------------------------
Waite/Goos: x || y ===> Wirth EBNF: x { y x } ===> Gray: x y &&
-------------------------------------------------------------------------
Eiffel: { x y ...}* => Wirth EBNF: [ x { y x } ] => Gray: (( x y && )) ??
-------------------------------------------------------------------------
Eiffel: { x y ...}+ => Wirth EBNF: x { y x } => Gray: xy &&
-------------------------------------------------------------------------
Carl
----

[toc] | [prev] | [next] | [standalone]

#2434

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2020-01-22 17:12 +0000
Message-ID	<20-01-025@comp.compilers>
In reply to	#2428

carlglassberg@gmail.com writes:
[Applied the corrections from later postings]
>On Thursday, January 2, 2020 at 10:21:53 AM UTC-8, Anton Ertl wrote:
>...
>> (( a b && c d && ))
>
>1. For concatenation of these, I would write, in Gray:
>   ((a b &&))((c d &&)) for EBNF: a { b a } c { d c }
>because wouldn't
>   ((a b && c d &&))
>be the same as?
>   a { b a } c { d a { b a } c }

In Gray, you cannot concatenate x and y by writing them adjacent to
one another, so

a b && c

are just two unrelated grammar expressions: "a b &&" and "c".  If you continue that with

a b && c d

you now have three unrelated grammar expressions.  Now apply the &&
operator:

a b && c d &&

and you have two unrelated grammar expressions: "a b &&" and "c d &&".
In order to combine them into a concatenatio, you normally surround
them with (( ... )):

(( a b && c d && ))

(or alternatively, you use the binary postfix operator "concat").  Likewise,

(( a b && )) (( c d && ))

would be two unrelated grammar expressions.

Also, note that each word (including grammar operators) has to be
separated from adjacent words with white space.

>2. Let us compare separator-list notation.
>
>In Waite and Goos, "Compiler Construction", there is an infix operator, "||", for separator list, not to be confused with Gray's meta-symbol for "or".
>And Eiffel has a special notation for separator list.
>
>In comparison, Anton's Gray has a postfix operator, "&&", for separator-list.
>
>/Comparison of separator-list notation in
>Waite/Goos EBNF, Eiffel EBNF (BNF-E), and Gray EBNF/
>
>                               Wirth EBNF         Gray EBNF:
>-------------------------------------------------------------------
>Waite/Goos: x || y ==========> x { y x }          x y &&
>
>In Eiffel:  { x y ...}* ===> [ x { y x } ]     (( x y && )) ??
>            { x y ...}+ ===>   x { y x }          x y &&
>-------------------------------------------------------------------

(( x y && )) ??

is equivalent to

x y && ??

>I do wish Gray had a production rule terminator, say semicolon ";" or stop "."
>
>The end of 1 rule would be clearly distinguishable from the beginning of the next without special processing of end-of-line which makes scanning and
>parsing space sensitive. We would avoid the need for LL(2) parsing or other
>parsing and scanning techniques.

Gray just uses Forth's parser (which just scans for white space, i.e.,
as far as the parser is concerned, it's a regular, not a context-free
language).  The rest works by putting grammar expressions on the
stack, and having grammar operators that take grammar expressions from
the stack and put the resulting grammar expression on the stack;
that's why the operators are postfix.  The

(( ... || ... || ... ))

syntax is syntactic sugar for a more readable syntax than using the
binary postfix operators "concat" and "alt".  If && had been part of
the original concept instead of an afterthought that demonstrates the
extensibility, maybe I would have had syntactic sugar for that, too.
Even as a Forth programmer, I admit that binary postfix operators are
not very readable when expressions with three or more binary operators
are involved (which is rare in most programming, but pretty common in
grammars).

Back to the question of production rule terminators: As a result of
the above, a rule typically reads

(( ... )) <- nonterminal

so it's pretty clear to the system and the programmer where a rule
starts and where it ends.  However, there is little error checking
involved, and if you write

(( ... (( ... )) <- nonterminal

(i.e., if you forget to write all the necessary closing "))"), you
will get stuff left over on the stack and the nonterminal will not do
what you want.  While the compiler does not flag that as error, this
error is easy to notice and find in testing on typical Forth systems
(and it's something you could also get with other Forth code).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2436

From	carlglassberg@gmail.com
Date	2020-01-23 02:41 -0800
Message-ID	<20-01-028@comp.compilers>
In reply to	#2434

Anton, I now see that, in Gray, the precedence of "&&" is higher than every other operator.

So Gray EBNF:
a b && c d && e f && <- x

is equivalent to Wirth EBNF:

x = a { b a } c { d c } e { f e } .

Likewise:

a b && ? <- x

is
x = [ a { b a } ] .

Carl

[toc] | [prev] | [next] | [standalone]

#2437

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2020-01-25 18:25 +0000
Message-ID	<20-01-029@comp.compilers>
In reply to	#2436

carlglassberg@gmail.com writes:
>Anton, I now see that, in Gray, the precedence of "&&" is higher than every other operator.

Among the postfix operators, precedence is determined by the way they
are written down.  But yes, if you see (( a b || c d )) as having two
infix operators, || and the empty operator, then you can write a
precedence hierarchy:

postfix
empty
||

>So Gray EBNF:
>a b && c d && e f && <- x

Correctly written as

(( a b && c d && e f && )) <- x

>is equivalent to Wirth EBNF:
>
>x = a { b a } c { d c } e { f e } .

Yes.

>Likewise:
>
>a b && ? <- x

a b && ?? <- x

>is
>x = [ a { b a } ] .

Yes.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2418

From	gaztoast@gmail.com
Date	2019-12-31 07:10 -0800
Message-ID	<19-12-037@comp.compilers>
In reply to	#2397

On Sunday, December 22, 2019 at 8:17:44 AM UTC-8, Andy wrote:
> ANTLR has even LL(*) but is too complicated. I am searching maximal
> simple and elegant generator which generates function call like
> written by hand. ...

https://www.codeproject.com/Articles/5162249/How-to-Make-an-LL-1-Parser-Lesson-1

I wrote a lesson on building one in 3 steps

[toc] | [prev] | [next] | [standalone]

#2422

From	honey crisis <gaztoast@gmail.com>
Date	2020-01-02 08:50 -0800
Message-ID	<20-01-002@comp.compilers>
In reply to	#2397

On Sunday, December 22, 2019 at 8:17:44 AM UTC-8, Andy wrote:
> I am searching maximal simple and elegant generator which generates function call like written by hand.

I designed Parsley to do exactly that.

i don't know how minimal it is at this point, but I've been using the LL1.cs file in it in several of my parser projects to generate parse tables. Maybe it will help.

https://www.codeproject.com/Articles/5255160/Parse-Anything-with-Parsley-A-Different-Approach

[toc] | [prev] | [next] | [standalone]

#2424

From	George Neuner <gneuner2@comcast.net>
Date	2020-01-02 13:16 -0500
Message-ID	<20-01-004@comp.compilers>
In reply to	#2397

A little late to the discussion, but ...

On Sat, 21 Dec 2019 19:07:55 -0800 (PST), Andy
<borucki.andrzej@gmail.com> wrote:

>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand.

There are a number of decent tools that will generate recursive
descent code, but you won't find any that will produce code as tight
as custom hand written.

You neglected to mention what programming language you want to target.
However:

PCCTS is the ancestor of ANTLR. It is LL(k) for configurable k.  It
produces C or C++ code for the parser.  You need to supply a lexer
separately.

Coco/R is LL(k).  There are versions available for several languages,
including C, C++, Java, Delphi, VB, etc.  Like ANTLR, Coco/R also can
generate a lexer.

In my experience, PCCTS produces tighter code than ANTLR.  As long as
you don't need the flexibility of LL(*) - or need a lexer generated,
or to target a language other than C or C++  - then PCCTS may be the
better choice.

If you need one stop shopping for RD coded parsers and lexers, then
you do need something like ANTLR or Coco/R which can do both.

>Temporary goal: must be very simple, because this will not parse
>program/document, just only strings with my definitions of (augmented)
>regular expressions. I can write it by hand, but generator minimizes
>possibility making mistake and systemztize work.

If you are working with individual strings rather than a multi-string
"document", then perhaps you really only want a lexer generator.

Perhaps something like RE2C which takes regex patterns and generates
code (in C) to recognize them.  The recognizers are FA, but the code
is generated inline in the function that needs it.

Or boost::spirit, perhaps in combination with boost::regex (or
std::regex in C++11 or later).  Spirit has a fairly steep learning
curve also, but it has the advantage that it requires only the C++
compiler and does not need a separate tool.

Hope this helps,
George

[toc] | [prev] | [next] | [standalone]

#2425

From	rockbrentwood@gmail.com
Date	2020-01-04 10:37 -0800
Message-ID	<20-01-005@comp.compilers>
In reply to	#2397

On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
> ANTLR has even LL(*) but is too complicated. I am searching maximal
> simple and elegant generator which generates function call like
> written by hand.

A large set of parsers are lined up in the parser generator comparison on
Wikipedia here
https://en.wikipedia.org/wiki/Comparison_of_parser_generators

The question of who in the list does bona fide code synthesis (as opposed to
cookie-cutter code generation) is not directly addressed, as far as I can see.
But the items can be reviewed individually.

Perhaps you will end up writing a parser generator that does this. The obvious
"direct map" method is
Draft Version:
state = goto label
state transition = goto
general format:
Label:
   action
goto(s) - either single goto or branched gotos

Synthesized Version:
control flow structures are synthesized from this.

The transformation of jump tables to control flow structures is essentially
the same process as the transformation of finite state automata to regular
expressions. An intelligent transformation process will leave some goto's in
place in order to prevent the introduction of redundancy.

If the generated parser is linked to the parser specification, the
transformation has to be able to correctly inject "#line" directives.
Alternatively, a better method is NOT to inject "#line" directives, but
comments that interweave displays of the original code that the transformation
took place from.

[toc] | [prev] | [next] | [standalone]

#2427

From	honey crisis <gaztoast@gmail.com>
Date	2020-01-05 05:05 -0800
Message-ID	<20-01-007@comp.compilers>
In reply to	#2425

> Synthesized Version:
> control flow structures are synthesized from this.


Do you think this is something that's sought after more generally?

I've considered it for my parser generators but the problem i run into is a
kind of "state explosion" wherein since i can't freely branch, i can't
converge paths, so each dysjunction can only create more dysjunctions.
Basically, it ends up copying code to fit it inside whiles and ifs.

[toc] | [prev] | [next] | [standalone]

#2431 — Branched gotos was: Re: A minimal LL(1) parser generator ?

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2020-01-06 08:47 +0200
Subject	Branched gotos was: Re: A minimal LL(1) parser generator ?
Message-ID	<20-01-011@comp.compilers>
In reply to	#2425

rockbrentwood@gmail.com wrote:
> The question of who in the list does bona fide code synthesis (as opposed to
> cookie-cutter code generation) is not directly addressed, as far as I can see.
> But the items can be reviewed individually.

> Perhaps you will end up writing a parser generator that does this. The obvious
> "direct map" method is
> Draft Version:
> state = goto label
> state transition = goto
> general format:
> Label:
>      action
> goto(s) - either single goto or branched gotos

> Synthesized Version:
> control flow structures are synthesized from this.

While I must admit that I am not sure I understand your question (your
conception of how automata work is too different than mine for me to
readily understand your points), if you are asking what I think you
are asking, we solve your question a slightly different way.

So, let me rephrase your question a little bit.  When you come to the
end of a production, you have a reduce action and that causes the
popping of a non-terminal from the stack.  Depending upon what you
pop, the state machine transitions to another state.  This is normally
coded as the "goto table" in standard LR constructions.  But the goto
table is in fact essentially a jump table, just as the shift table is
a jump table on incoming tokens.

If that is a correct interpretation of your question, then we do
something different for that, and what we do is similar to a code
generation process, and in fact, we have a version that generates the
code is if-then-else and/or case statements as appropriate, to
completely encode the tables in the target programming language.  Not
recursive ascent, but more a direct implementation of the FSA as code.

So, first our LR construction is non-standard.  We not only keep
tokens on our parser stack, but when we reduce a production, we put
the non-terminal itself on the stack.  Thus, in a state, you might see
either a token or a non-terminal as the item you are processing.  So,
the shift-table and the goto-table are merged into one table.

Now, to make that work, we don't encode what to do in the state
itself.  There are no goto states.   No reduce states.  No accept
states.  Etc. A state is simply a jump table that implements "see this
on the stack and take this transition".  The transition itself encodes
what the action should be.  I think this difference is similar to the
difference in Mealy and Moore machines.  One encodes the output in the
state, the other on the transition (and you can transform one to the
other).  I just happen to like my encoding to be all on transitions,
with states just as tables.

And we have an "assembly" language of actions that a transition can
invoke.  Thus, normally, upon seeing a token, the actions are "shift;
goto state n;"  two assembly language steps.  At the beginning of a
non-terminal (and only then) do we want to push something on the stack
(this is similar to what Mule does and I think similar to your bracket
notation, although I have not entirely figured that out), so if the
token is the first token in a rule, it does three assembly language
steps, "push; shift, goto state n".   Similarly if there is an
output/action associated with the transition, that will be added to
the assembly language sequence "shift; do act m; goto state n;".   I
am not going to detail all the possible actions.  There are more than
one would expect because it allows us to implement a variety of
things.  (I am currently considering making the machine deal with
captures and backreferences.  That is simply some more actions in the
assembly language.)

In any case, hopefully, you can imagine how we simply invoke those
actions in a switch/case statement as the steps (macro calls, function
calls, or by expanding such code inline) for that case.  You can then
decide whether you want two nested switch/case statements (one on
state, one per state on input) to encode them or to merge them or
generate them as code.  (At Intel, I used a similar technique to
compress the states as many states have only a few cases to consider.
So some states want to be a jump table, but many (most) states in a
FSA are simply check for one or two items and if not found fail.  You
can often turn than into linear code.)  There are numerous choices
that we implement as various table generation options.  The internal
model of jump tables with transitions that have specific actions does
not limit that.

Best regards,
Chris

******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------

[toc] | [prev] | [next] | [standalone]

#2432 — Re: Branched gotos was: Re: A minimal LL(1) parser generator ?

From	Ben Hanson <jamin.hanson@googlemail.com>
Date	2020-01-07 11:32 -0800
Subject	Re: Branched gotos was: Re: A minimal LL(1) parser generator ?
Message-ID	<20-01-012@comp.compilers>
In reply to	#2431

On Monday, 6 January 2020 14:39:06 UTC, Christopher F Clark  wrote:
> (I am currently considering making the machine deal with
> captures and backreferences.  That is simply some more actions in the
> assembly language.)

I can confirm that this works well. Adding the ability to search using a grammar rather than simply match also opens up a whole other world of possibilities.

Regards,

Ben

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

A minimal LL(1) parser generator ?

Contents

#2397 — A minimal LL(1) parser generator ?

#2399

#2411

#2413

#2420

#2421

#2423

#2428

#2429

#2430

#2434

#2436

#2437

#2418

#2422

#2424

#2425

#2427

#2431 — Branched gotos was: Re: A minimal LL(1) parser generator ?

#2432 — Re: Branched gotos was: Re: A minimal LL(1) parser generator ?