Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2397 > unrolled thread
| Started by | Andy <borucki.andrzej@gmail.com> |
|---|---|
| First post | 2019-12-21 19:07 -0800 |
| Last post | 2020-01-25 15:27 -0500 |
| Articles | 20 on this page of 22 — 11 participants |
Back to article view | Back to comp.compilers
A minimal LL(1) parser generator ? Andy <borucki.andrzej@gmail.com> - 2019-12-21 19:07 -0800
Re: A minimal LL(1) parser generator ? arnold@skeeve.com (Aharon Robbins) - 2019-12-22 19:29 +0000
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2019-12-26 16:21 +0000
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2019-12-29 14:47 -0800
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2019-12-31 16:30 +0000
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-01 01:02 -0800
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-02 17:25 +0000
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 11:59 -0800
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 13:59 -0800
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-05 14:44 -0800
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-22 17:12 +0000
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com - 2020-01-23 02:41 -0800
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-25 18:25 +0000
Re: A minimal LL(1) parser generator ? gaztoast@gmail.com - 2019-12-31 07:10 -0800
Re: A minimal LL(1) parser generator ? honey crisis <gaztoast@gmail.com> - 2020-01-02 08:50 -0800
Re: A minimal LL(1) parser generator ? George Neuner <gneuner2@comcast.net> - 2020-01-02 13:16 -0500
Re: A minimal LL(1) parser generator ? rockbrentwood@gmail.com - 2020-01-04 10:37 -0800
Re: A minimal LL(1) parser generator ? honey crisis <gaztoast@gmail.com> - 2020-01-05 05:05 -0800
Branched gotos was: Re: A minimal LL(1) parser generator ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-01-06 08:47 +0200
Re: Branched gotos was: Re: A minimal LL(1) parser generator ? Ben Hanson <jamin.hanson@googlemail.com> - 2020-01-07 11:32 -0800
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2020-01-22 17:08 +0000
Re: A minimal LL(1) parser generator ? "Fred J. Scipione" <FredJScipione@alum.RPI.edu> - 2020-01-25 15:27 -0500
Page 1 of 2 [1] 2 Next page →
| From | Andy <borucki.andrzej@gmail.com> |
|---|---|
| Date | 2019-12-21 19:07 -0800 |
| Subject | A minimal LL(1) parser generator ? |
| Message-ID | <19-12-016@comp.compilers> |
ANTLR has even LL(*) but is too complicated. I am searching maximal simple and elegant generator which generates function call like written by hand. Temporary goal: must be very simple, bacause this will not parse program/document, just only strings with my definitions of (augmented) regular expressions. I can write it by hand, but generator minimizes possibility making mistake and systemztize work.
[toc] | [next] | [standalone]
| From | arnold@skeeve.com (Aharon Robbins) |
|---|---|
| Date | 2019-12-22 19:29 +0000 |
| Message-ID | <19-12-018@comp.compilers> |
| In reply to | #2397 |
In article <19-12-016@comp.compilers>, Andy <borucki.andrzej@gmail.com> wrote: >ANTLR has even LL(*) but is too complicated. I am searching maximal >simple and elegant generator which generates function call like >written by hand. ... https://github.com/arnoldrobbins/stacc may be of interest. Arnold -- Aharon (Arnold) Robbins arnold AT skeeve DOT com
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2019-12-26 16:21 +0000 |
| Message-ID | <19-12-030@comp.compilers> |
| In reply to | #2397 |
Andy <borucki.andrzej@gmail.com> writes:
>ANTLR has even LL(*) but is too complicated. I am searching maximal
>simple and elegant generator which generates function call like
>written by hand.
Sounds like you want a generator that generates a recursive-descent
parser. A while ago I compared a number of parser generators
[ertl99ef], and among those that generate recursive-descent parsers
the shortest one by far was Gray
<http://www.complang.tuwien.ac.at/forth/Gray5.zip>. Whether you
consider it simple and elegant is for you to decide.
@InProceedings{ertl99ef,
author = {M. Anton Ertl},
title = {Is {Forth} Code Compact? {A} Case Study},
booktitle = {EuroForth'99 Conference Proceedings},
year = {1999},
address = {St. Petersburg, Russia},
url = {http://www.complang.tuwien.ac.at/papers/ertl99ef.ps.gz},
abstract = {Forth advocates often claim that Forth code is
smaller, faster, and requires less development time
than equivalent programs in other languages. This
paper investigates this claim by comparing a number
of parser generators written in various languages
with respect to source code size. The smallest
parser generator (14 lines) in this comparison is
written in Forth, and the other Forth program is
smaller than the others in its class by a factor of
8 or more; however, the Forth programs do not have
all the features of their counterparts. I took a
closer look at Gray (in Forth) and Coco/R (in
Modula-2) and found that several Forth features
missing from Modula-2 give Gray more than a factor
of three advantage over Coco/R (even if the other
size differences were solely due to differences in
functionality): run-time code generation; access to
the parser and a simple, flexible syntax; and
Forth's dictionary.}
}
Concerning the other question that came up, about dealing with left
recursion, and building the abstract syntax tree (AST) appropriately,
that part works nicely in Gray:
In BNF you would write
expr->expr '-' number
expr->number
(and the parse tree would have the same structure as the AST).
Gray uses a variant of regular right part grammars, and you would
write that as a pure parsing rule as follows:
(( number (( '-' number )) ** )) <- expr
For buiding the AST, let's assume that "number" leaves a tree node on
the stack, and we have a word
new-bin-node ( node1 node2 -- node3 )
(i.e., it pops node1 and node2 from the stack, and pushes node3).
Then we can extend the above as follows:
(( number (( '-' number {{ new-bin-node }} )) ** )) <- expr ( -- node )
(The "( -- node )" comment documents the stack effect of parsing "expr").
Passing the nodes on Forth's stack rather than using pseudo-variables
like yacc's $1 etc. works nicely when going beyond BNF. Now I wonder
how Algol-based EBNF parser generators do it, but am too lazy to
actually research it.
Getting a right-recursive tree is actually slightly harder: Now we
actually have to perform recursion (non-recursive variants exist, but
are not simpler). Let's have a similar example, but with the
right-associative operator '^'.
nonterminal expr ( -- node ) \ declare before use
(( number (( '^' expr {{ new-bin-node }} )) ?? )) expr rule
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | carlglassberg@gmail.com |
|---|---|
| Date | 2019-12-29 14:47 -0800 |
| Message-ID | <19-12-032@comp.compilers> |
| In reply to | #2411 |
Lightweight and interesting, this Gray parser-generator.
Example:
Gray's meta notation uses a postix operator "&&" for "separator list".
"&&" seems particularly elegant and interesting:
If Gray: "a b &&" ---> EBNF: "a { b a }"
then I assume:
"a b && c &&" ---> "a { b a } { c a }"
and so on ...
Further "a b && b && ..." ===> "a b &&"
??? Is my understanding correct?
Carl
----
On Thursday, December 26, 2019 at 9:12:41 AM UTC-8, Anton Ertl wrote:
> Andy <borucki.andrzej@gmail.com> writes:
> >ANTLR has even LL(*) but is too complicated. I am searching maximal
> >simple and elegant generator which generates function call like
> >written by hand.
>
> Sounds like you want a generator that generates a recursive-descent
> parser. A while ago I compared a number of parser generators
> [ertl99ef], and among those that generate recursive-descent parsers
> the shortest one by far was Gray
> <http://www.complang.tuwien.ac.at/forth/Gray5.zip>. Whether you
> consider it simple and elegant is for you to decide.
>
> @InProceedings{ertl99ef,
> author = {M. Anton Ertl},
> title = {Is {Forth} Code Compact? {A} Case Study},
> booktitle = {EuroForth'99 Conference Proceedings},
> year = {1999},
> address = {St. Petersburg, Russia},
> url = {http://www.complang.tuwien.ac.at/papers/ertl99ef.ps.gz},
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2019-12-31 16:30 +0000 |
| Message-ID | <19-12-040@comp.compilers> |
| In reply to | #2413 |
carlglassberg@gmail.com writes:
>Gray's meta notation uses a postix operator "&&" for "separator list".
>
>"&&" seems particularly elegant and interesting:
>
>If Gray: "a b &&" ---> EBNF: "a { b a }"
>
>then I assume:
>
>"a b && c &&" ---> "a { b a } { c a }"
The left operand of the second && would be "a b &&", so the whole
thing would be equivalent to
a {b a} {c {a {b a}}}
a (( b || c )) &&
would parse the same strings, but with different parse trees; so if
you add actions, the results would probably be different.
>and so on ...
>
>Further "a b && b && ..." ===> "a b &&"
They describe the same strings, yes. However, the former has an
ambiguous grammar that allows to parse "aba" in two ways: either the b
is the first b or the second one. In Gray, this leads to a conflict
(it just can deal with LL(1) grammars), and I guess one of these
parsing options never be exercised, which means that it works as if
you had written "a b &&" in the first place.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | carlglassberg@gmail.com |
|---|---|
| Date | 2020-01-01 01:02 -0800 |
| Message-ID | <20-01-001@comp.compilers> |
| In reply to | #2420 |
On Tuesday, December 31, 2019 at 8:42:51 AM UTC-8, Anton Ertl wrote:
> carlglassberg@gmail.com writes:
> >Gray's meta notation uses a postix operator "&&" for "separator list".
...
> >If Gray: "a b &&" ---> EBNF: "a { b a }"
...
> >then ...
...
> >"a b && c &&"
>
> The left operand of the second && would be "a b &&", so the whole
> thing would be equivalent to
>
> a {b a} {c {a {b a}}}
...
Should that be ?
a {b a} { c a {b a}}
Also, would
a b && c d &&
parse as:
a { b a } c { d c }
or as:
a { b a } c { d a { b a } c }
???
Carl
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2020-01-02 17:25 +0000 |
| Message-ID | <20-01-003@comp.compilers> |
| In reply to | #2421 |
carlglassberg@gmail.com writes:
>On Tuesday, December 31, 2019 at 8:42:51 AM UTC-8, Anton Ertl wrote:
>> carlglassberg@gmail.com writes:
>> >Gray's meta notation uses a postix operator "&&" for "separator list".
>...
>> >If Gray: "a b &&" ---> EBNF: "a { b a }"
...
>> >"a b && c &&"
...
>Should that be ?
>a {b a} { c a {b a}}
Yes.
>Also, would
> a b && c d &&
I assume you mean a concatenation of "a b &&" and "c d &&". In Gray
syntax that would be
(( a b && c d && ))
>parse as:
> a { b a } c { d c }
Yes.
>or as:
> a { b a } c { d a { b a } c }
If you want that, you would write it as
(( a b && c )) d &&
&& is just a binary postfix operator.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | carlglassberg@gmail.com |
|---|---|
| Date | 2020-01-05 11:59 -0800 |
| Message-ID | <20-01-008@comp.compilers> |
| In reply to | #2423 |
On Thursday, January 2, 2020 at 10:21:53 AM UTC-8, Anton Ertl wrote:
...
> (( a b && c d && ))
1. For concatenation of these, I would write, in Gray:
((a b &&))((c d & &&)) for EBNF: a { b a } c { d c }
because wouldn't
((a b && c d & &&))
be the same as?
a { b a } c { d a { b a } c }
2. Let us compare separator-list notation.
In Waite and Goos, "Compiler Construction", there is an infix operator, "||", for separator list, not to be confused with Gray's meta-symbol for "or".
And Eiffel has a special notation for separator list.
In comparison, Anton's Gray has a postfix operator, "&&", for separator-list.
/Comparison of separator-list notation in
Waite/Goos EBNF, Eiffel EBNF (BNF-E), and Gray EBNF/
Wirth EBNF Gray EBNF:
-------------------------------------------------------------------
Waite/Goos: x || y ==========> x { y x } x y &&
In Eiffel: { x y ...}* ===> [ x { y x } ] (( x y && )) ??
{ x y ...}+ ===> x { y x } x y &&
-------------------------------------------------------------------
What is better about Gray is that it only uses braces in action semantics,
in doubled form {{ ... }}, and no one is likely to be confused by the use of
braces.
Also it's postfix separator-list operator, "&&", is easier to remember than
Eiffel's more verbose special variation of standard EBNF braces.
And Gray's "&&" separator-list operator is just another postfix op, consistent with it's use of "??", "**" and "++"
Gray better separates the 2 styles of notation, it's "regular expression" style versus standard bracketed EBNF style.
The 2 styles ideally should not be mixed, or at least confusion should be avoided by not introducing yet another similiar notation, in the form of a special variation of standard EBNF braces for separator lists.
One minor criticism of Gray:
I do wish Gray had a production rule terminator, say semicolon ";" or stop "."
The end of 1 rule would be clearly distinguishable from the beginning of the next without special processing of end-of-line which makes scanning and
parsing space sensitive. We would avoid the need for LL(2) parsing or other
parsing and scanning techniques.
Carl
----
[toc] | [prev] | [next] | [standalone]
| From | carlglassberg@gmail.com |
|---|---|
| Date | 2020-01-05 13:59 -0800 |
| Message-ID | <20-01-009@comp.compilers> |
| In reply to | #2428 |
correction:
((a b &&))((c d & &&)) for EBNF: a { b a } c { d c }
should be:
((a b &&))((c d &&)) for EBNF: a { b a } c { d c }
and
((a b && c d & &&))
should be:
((a b && c d &&))
Don't know how those extraneous single "&" got there!
Carl
---
[toc] | [prev] | [next] | [standalone]
| From | carlglassberg@gmail.com |
|---|---|
| Date | 2020-01-05 14:44 -0800 |
| Message-ID | <20-01-010@comp.compilers> |
| In reply to | #2428 |
That table didn't align properly, so lets present the same results this way:
-------------------------------------------------------------------------
Waite/Goos: x || y ===> Wirth EBNF: x { y x } ===> Gray: x y &&
-------------------------------------------------------------------------
Eiffel: { x y ...}* => Wirth EBNF: [ x { y x } ] => Gray: (( x y && )) ??
-------------------------------------------------------------------------
Eiffel: { x y ...}+ => Wirth EBNF: x { y x } => Gray: xy &&
-------------------------------------------------------------------------
Carl
----
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2020-01-22 17:12 +0000 |
| Message-ID | <20-01-025@comp.compilers> |
| In reply to | #2428 |
carlglassberg@gmail.com writes:
[Applied the corrections from later postings]
>On Thursday, January 2, 2020 at 10:21:53 AM UTC-8, Anton Ertl wrote:
>...
>> (( a b && c d && ))
>
>1. For concatenation of these, I would write, in Gray:
> ((a b &&))((c d &&)) for EBNF: a { b a } c { d c }
>because wouldn't
> ((a b && c d &&))
>be the same as?
> a { b a } c { d a { b a } c }
In Gray, you cannot concatenate x and y by writing them adjacent to
one another, so
a b && c
are just two unrelated grammar expressions: "a b &&" and "c". If you continue that with
a b && c d
you now have three unrelated grammar expressions. Now apply the &&
operator:
a b && c d &&
and you have two unrelated grammar expressions: "a b &&" and "c d &&".
In order to combine them into a concatenatio, you normally surround
them with (( ... )):
(( a b && c d && ))
(or alternatively, you use the binary postfix operator "concat"). Likewise,
(( a b && )) (( c d && ))
would be two unrelated grammar expressions.
Also, note that each word (including grammar operators) has to be
separated from adjacent words with white space.
>2. Let us compare separator-list notation.
>
>In Waite and Goos, "Compiler Construction", there is an infix operator, "||", for separator list, not to be confused with Gray's meta-symbol for "or".
>And Eiffel has a special notation for separator list.
>
>In comparison, Anton's Gray has a postfix operator, "&&", for separator-list.
>
>/Comparison of separator-list notation in
>Waite/Goos EBNF, Eiffel EBNF (BNF-E), and Gray EBNF/
>
> Wirth EBNF Gray EBNF:
>-------------------------------------------------------------------
>Waite/Goos: x || y ==========> x { y x } x y &&
>
>In Eiffel: { x y ...}* ===> [ x { y x } ] (( x y && )) ??
> { x y ...}+ ===> x { y x } x y &&
>-------------------------------------------------------------------
(( x y && )) ??
is equivalent to
x y && ??
>I do wish Gray had a production rule terminator, say semicolon ";" or stop "."
>
>The end of 1 rule would be clearly distinguishable from the beginning of the next without special processing of end-of-line which makes scanning and
>parsing space sensitive. We would avoid the need for LL(2) parsing or other
>parsing and scanning techniques.
Gray just uses Forth's parser (which just scans for white space, i.e.,
as far as the parser is concerned, it's a regular, not a context-free
language). The rest works by putting grammar expressions on the
stack, and having grammar operators that take grammar expressions from
the stack and put the resulting grammar expression on the stack;
that's why the operators are postfix. The
(( ... || ... || ... ))
syntax is syntactic sugar for a more readable syntax than using the
binary postfix operators "concat" and "alt". If && had been part of
the original concept instead of an afterthought that demonstrates the
extensibility, maybe I would have had syntactic sugar for that, too.
Even as a Forth programmer, I admit that binary postfix operators are
not very readable when expressions with three or more binary operators
are involved (which is rare in most programming, but pretty common in
grammars).
Back to the question of production rule terminators: As a result of
the above, a rule typically reads
(( ... )) <- nonterminal
so it's pretty clear to the system and the programmer where a rule
starts and where it ends. However, there is little error checking
involved, and if you write
(( ... (( ... )) <- nonterminal
(i.e., if you forget to write all the necessary closing "))"), you
will get stuff left over on the stack and the nonterminal will not do
what you want. While the compiler does not flag that as error, this
error is easy to notice and find in testing on typical Forth systems
(and it's something you could also get with other Forth code).
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | carlglassberg@gmail.com |
|---|---|
| Date | 2020-01-23 02:41 -0800 |
| Message-ID | <20-01-028@comp.compilers> |
| In reply to | #2434 |
Anton, I now see that, in Gray, the precedence of "&&" is higher than every other operator.
So Gray EBNF:
a b && c d && e f && <- x
is equivalent to Wirth EBNF:
x = a { b a } c { d c } e { f e } .
Likewise:
a b && ? <- x
is
x = [ a { b a } ] .
Carl
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2020-01-25 18:25 +0000 |
| Message-ID | <20-01-029@comp.compilers> |
| In reply to | #2436 |
carlglassberg@gmail.com writes:
>Anton, I now see that, in Gray, the precedence of "&&" is higher than every other operator.
Among the postfix operators, precedence is determined by the way they
are written down. But yes, if you see (( a b || c d )) as having two
infix operators, || and the empty operator, then you can write a
precedence hierarchy:
postfix
empty
||
>So Gray EBNF:
>a b && c d && e f && <- x
Correctly written as
(( a b && c d && e f && )) <- x
>is equivalent to Wirth EBNF:
>
>x = a { b a } c { d c } e { f e } .
Yes.
>Likewise:
>
>a b && ? <- x
a b && ?? <- x
>is
>x = [ a { b a } ] .
Yes.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | gaztoast@gmail.com |
|---|---|
| Date | 2019-12-31 07:10 -0800 |
| Message-ID | <19-12-037@comp.compilers> |
| In reply to | #2397 |
On Sunday, December 22, 2019 at 8:17:44 AM UTC-8, Andy wrote: > ANTLR has even LL(*) but is too complicated. I am searching maximal > simple and elegant generator which generates function call like > written by hand. ... https://www.codeproject.com/Articles/5162249/How-to-Make-an-LL-1-Parser-Lesson-1 I wrote a lesson on building one in 3 steps
[toc] | [prev] | [next] | [standalone]
| From | honey crisis <gaztoast@gmail.com> |
|---|---|
| Date | 2020-01-02 08:50 -0800 |
| Message-ID | <20-01-002@comp.compilers> |
| In reply to | #2397 |
On Sunday, December 22, 2019 at 8:17:44 AM UTC-8, Andy wrote: > I am searching maximal simple and elegant generator which generates function call like written by hand. I designed Parsley to do exactly that. i don't know how minimal it is at this point, but I've been using the LL1.cs file in it in several of my parser projects to generate parse tables. Maybe it will help. https://www.codeproject.com/Articles/5255160/Parse-Anything-with-Parsley-A-Different-Approach
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2020-01-02 13:16 -0500 |
| Message-ID | <20-01-004@comp.compilers> |
| In reply to | #2397 |
A little late to the discussion, but ... On Sat, 21 Dec 2019 19:07:55 -0800 (PST), Andy <borucki.andrzej@gmail.com> wrote: >ANTLR has even LL(*) but is too complicated. I am searching maximal >simple and elegant generator which generates function call like >written by hand. There are a number of decent tools that will generate recursive descent code, but you won't find any that will produce code as tight as custom hand written. You neglected to mention what programming language you want to target. However: PCCTS is the ancestor of ANTLR. It is LL(k) for configurable k. It produces C or C++ code for the parser. You need to supply a lexer separately. Coco/R is LL(k). There are versions available for several languages, including C, C++, Java, Delphi, VB, etc. Like ANTLR, Coco/R also can generate a lexer. In my experience, PCCTS produces tighter code than ANTLR. As long as you don't need the flexibility of LL(*) - or need a lexer generated, or to target a language other than C or C++ - then PCCTS may be the better choice. If you need one stop shopping for RD coded parsers and lexers, then you do need something like ANTLR or Coco/R which can do both. >Temporary goal: must be very simple, because this will not parse >program/document, just only strings with my definitions of (augmented) >regular expressions. I can write it by hand, but generator minimizes >possibility making mistake and systemztize work. If you are working with individual strings rather than a multi-string "document", then perhaps you really only want a lexer generator. Perhaps something like RE2C which takes regex patterns and generates code (in C) to recognize them. The recognizers are FA, but the code is generated inline in the function that needs it. Or boost::spirit, perhaps in combination with boost::regex (or std::regex in C++11 or later). Spirit has a fairly steep learning curve also, but it has the advantage that it requires only the C++ compiler and does not need a separate tool. Hope this helps, George
[toc] | [prev] | [next] | [standalone]
| From | rockbrentwood@gmail.com |
|---|---|
| Date | 2020-01-04 10:37 -0800 |
| Message-ID | <20-01-005@comp.compilers> |
| In reply to | #2397 |
On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote: > ANTLR has even LL(*) but is too complicated. I am searching maximal > simple and elegant generator which generates function call like > written by hand. A large set of parsers are lined up in the parser generator comparison on Wikipedia here https://en.wikipedia.org/wiki/Comparison_of_parser_generators The question of who in the list does bona fide code synthesis (as opposed to cookie-cutter code generation) is not directly addressed, as far as I can see. But the items can be reviewed individually. Perhaps you will end up writing a parser generator that does this. The obvious "direct map" method is Draft Version: state = goto label state transition = goto general format: Label: action goto(s) - either single goto or branched gotos Synthesized Version: control flow structures are synthesized from this. The transformation of jump tables to control flow structures is essentially the same process as the transformation of finite state automata to regular expressions. An intelligent transformation process will leave some goto's in place in order to prevent the introduction of redundancy. If the generated parser is linked to the parser specification, the transformation has to be able to correctly inject "#line" directives. Alternatively, a better method is NOT to inject "#line" directives, but comments that interweave displays of the original code that the transformation took place from.
[toc] | [prev] | [next] | [standalone]
| From | honey crisis <gaztoast@gmail.com> |
|---|---|
| Date | 2020-01-05 05:05 -0800 |
| Message-ID | <20-01-007@comp.compilers> |
| In reply to | #2425 |
> Synthesized Version: > control flow structures are synthesized from this. Do you think this is something that's sought after more generally? I've considered it for my parser generators but the problem i run into is a kind of "state explosion" wherein since i can't freely branch, i can't converge paths, so each dysjunction can only create more dysjunctions. Basically, it ends up copying code to fit it inside whiles and ifs.
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2020-01-06 08:47 +0200 |
| Subject | Branched gotos was: Re: A minimal LL(1) parser generator ? |
| Message-ID | <20-01-011@comp.compilers> |
| In reply to | #2425 |
rockbrentwood@gmail.com wrote: > The question of who in the list does bona fide code synthesis (as opposed to > cookie-cutter code generation) is not directly addressed, as far as I can see. > But the items can be reviewed individually. > Perhaps you will end up writing a parser generator that does this. The obvious > "direct map" method is > Draft Version: > state = goto label > state transition = goto > general format: > Label: > action > goto(s) - either single goto or branched gotos > Synthesized Version: > control flow structures are synthesized from this. While I must admit that I am not sure I understand your question (your conception of how automata work is too different than mine for me to readily understand your points), if you are asking what I think you are asking, we solve your question a slightly different way. So, let me rephrase your question a little bit. When you come to the end of a production, you have a reduce action and that causes the popping of a non-terminal from the stack. Depending upon what you pop, the state machine transitions to another state. This is normally coded as the "goto table" in standard LR constructions. But the goto table is in fact essentially a jump table, just as the shift table is a jump table on incoming tokens. If that is a correct interpretation of your question, then we do something different for that, and what we do is similar to a code generation process, and in fact, we have a version that generates the code is if-then-else and/or case statements as appropriate, to completely encode the tables in the target programming language. Not recursive ascent, but more a direct implementation of the FSA as code. So, first our LR construction is non-standard. We not only keep tokens on our parser stack, but when we reduce a production, we put the non-terminal itself on the stack. Thus, in a state, you might see either a token or a non-terminal as the item you are processing. So, the shift-table and the goto-table are merged into one table. Now, to make that work, we don't encode what to do in the state itself. There are no goto states. No reduce states. No accept states. Etc. A state is simply a jump table that implements "see this on the stack and take this transition". The transition itself encodes what the action should be. I think this difference is similar to the difference in Mealy and Moore machines. One encodes the output in the state, the other on the transition (and you can transform one to the other). I just happen to like my encoding to be all on transitions, with states just as tables. And we have an "assembly" language of actions that a transition can invoke. Thus, normally, upon seeing a token, the actions are "shift; goto state n;" two assembly language steps. At the beginning of a non-terminal (and only then) do we want to push something on the stack (this is similar to what Mule does and I think similar to your bracket notation, although I have not entirely figured that out), so if the token is the first token in a rule, it does three assembly language steps, "push; shift, goto state n". Similarly if there is an output/action associated with the transition, that will be added to the assembly language sequence "shift; do act m; goto state n;". I am not going to detail all the possible actions. There are more than one would expect because it allows us to implement a variety of things. (I am currently considering making the machine deal with captures and backreferences. That is simply some more actions in the assembly language.) In any case, hopefully, you can imagine how we simply invoke those actions in a switch/case statement as the steps (macro calls, function calls, or by expanding such code inline) for that case. You can then decide whether you want two nested switch/case statements (one on state, one per state on input) to encode them or to merge them or generate them as code. (At Intel, I used a similar technique to compress the states as many states have only a few cases to consider. So some states want to be a jump table, but many (most) states in a FSA are simply check for one or two items and if not found fail. You can often turn than into linear code.) There are numerous choices that we implement as various table generation options. The internal model of jump tables with transitions that have specific actions does not limit that. Best regards, Chris ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | Ben Hanson <jamin.hanson@googlemail.com> |
|---|---|
| Date | 2020-01-07 11:32 -0800 |
| Subject | Re: Branched gotos was: Re: A minimal LL(1) parser generator ? |
| Message-ID | <20-01-012@comp.compilers> |
| In reply to | #2431 |
On Monday, 6 January 2020 14:39:06 UTC, Christopher F Clark wrote: > (I am currently considering making the machine deal with > captures and backreferences. That is simply some more actions in the > assembly language.) I can confirm that this works well. Adding the ability to search using a grammar rather than simply match also opens up a whole other world of possibilities. Regards, Ben
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.compilers
csiph-web