Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2442 > unrolled thread
| Started by | Andy <borucki.andrzej@gmail.com> |
|---|---|
| First post | 2020-02-06 10:43 -0800 |
| Last post | 2020-02-08 18:18 -0800 |
| Articles | 7 — 4 participants |
Back to article view | Back to comp.compilers
FIRST_k, FOLLOW_k, k>1 Andy <borucki.andrzej@gmail.com> - 2020-02-06 10:43 -0800
Re: FIRST_k, FOLLOW_k, k>1 Andy <borucki.andrzej@gmail.com> - 2020-02-06 14:16 -0800
Re: FIRST_k, FOLLOW_k, k>1 Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-02-08 11:00 +0100
Re: FIRST_k, FOLLOW_k, k>1 Andy <borucki.andrzej@gmail.com> - 2020-02-08 11:54 -0800
Re: FIRST_k, FOLLOW_k, k>1 Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-02-09 02:46 +0100
FIRST_k, FOLLOW_k, k>1 Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-02-07 16:12 +0200
Re: FIRST_k, FOLLOW_k, k>1 honey crisis <gaztoast@gmail.com> - 2020-02-08 18:18 -0800
| From | Andy <borucki.andrzej@gmail.com> |
|---|---|
| Date | 2020-02-06 10:43 -0800 |
| Subject | FIRST_k, FOLLOW_k, k>1 |
| Message-ID | <20-02-004@comp.compilers> |
If LL(l) and LR(k) need sets FIRST_k, FOLLOW_k, k>1, for example LR(3) need this sets of degree 3? How make it? How is the best structure for these sets? I think about pyramid: - one bit - is epsilon - bit table size n - bit table size n^2 ... - bit table size n^k because sets degree of k also have shorter strings. This exponentially grows, in real grammars, number of tokens can be quite big, ~80, It need then 0(80^k) bits. This sets will sparse and is better organization of substrings in sets? For example, not bit set but set of trees or DFA's ?
[toc] | [next] | [standalone]
| From | Andy <borucki.andrzej@gmail.com> |
|---|---|
| Date | 2020-02-06 14:16 -0800 |
| Message-ID | <20-02-005@comp.compilers> |
| In reply to | #2442 |
I search examples,
E->aWbXYcdZ
W->w
X->x
Y->y
Z->z
if for FIRST(k=4) wil be: E={awbx} W={wbxy} X={xycd} Y={ycdz} Z={z}
what is convention?
E is obviously {awbx} but W is (difficult){wbxy} or only {w}?
FIRST(k=4)(W) we compute for alone W or W in context whole E?
similar:
E->aWbXYcdZ
W->w
X->x
Y->y
Y->
Z->z
;y->epsilon
will?
FIRST3: E={awb} W={wbx} X={xyc xcd} Y={ycd cdz} Z={z} ;dotyczy też X
FIRST4: E={awbx} W={wbxy wbxc} X={xycd xcdz} Y={ycdz cdz} Z={z}
E->aWbXYcdZ
>E->X
W->w
X->x
Y->y
Y->
Z->z
will?
FIRST1: E={a x} W={w} X={x} Y={y c} Z={z}
FIRST2: E={aw x} W={wb} X={xy xc x} Y={yc cd} Z={z}
FIRST3: E={awb x} W={wbx} X={xyc xcd x} Y={ycd cdz} Z={z}
FIRST4: E={awbx x} W={wbxy wbxc} X={xycd xcdz x} Y={ycdz cdz} Z={z}
FOLLOW1: E={$} W={b} X={y c $} Y={c} Z={$}
FOLLOW2: E={$} W={bx} X={yc cd $} Y={cd} Z={$}
FOLLOW3: E={$} W={bxy bxc} X={ycd cdz $} Y={cdz} Z={$}
FOLLOW4: E={$} W={bxyc bxcd} X={ycdz cdz $} Y={cdz} Z={$}
etc
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-02-08 11:00 +0100 |
| Message-ID | <20-02-007@comp.compilers> |
| In reply to | #2443 |
Am 06.02.2020 um 23:16 schrieb Andy:
> I search examples,
> E->aWbXYcdZ
> W->w
> X->x
> Y->y
> Z->z
>
> if for FIRST(k=4) wil be: E={awbx} W={wbxy} X={xycd} Y={ycdz} Z={z}
> what is convention?
Your grammar definitely is LL(1). You should provide a grammar that
requires longer lookahead.
IMO the FIRST set covers all *different* sequences, in your case
FIRST(E)={a} hence LL(1).
Only if there exist multiple alternatives starting with 'a' they have to
be listed as e.g. FIRST(E)={ax, ay}.
Dunno about the FOLLOW set, perhaps it can stay LL(1)?
DoDi
[toc] | [prev] | [next] | [standalone]
| From | Andy <borucki.andrzej@gmail.com> |
|---|---|
| Date | 2020-02-08 11:54 -0800 |
| Message-ID | <20-02-008@comp.compilers> |
| In reply to | #2445 |
W dniu sobota, 8 lutego 2020 15:36:39 UTC+1 użytkownik Hans-Peter Diettrich
napisał:
> IMO the FIRST set covers all *different* sequences, in your case
> FIRST(E)={a} hence LL(1).
> Only if there exist multiple alternatives starting with 'a' they have to
> be listed as e.g. FIRST(E)={ax, ay}.
Suppose that A is non-starting nonterminal.
FIRST_k(A) is set of token strings lengths 1..k derived from A or derived from
start symbol and begins at A begin?
for example, if we have E->AB, First_4(A) are also strings from B? (if in
First_4(A) exists shorter strings than 4)
It seems that in both situation Follow_4(A) will complicated and contains all
strings from B,C,D... for E->ABCD (if exists in A,B,C shorter stings)
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-02-09 02:46 +0100 |
| Message-ID | <20-02-009@comp.compilers> |
| In reply to | #2446 |
Am 08.02.2020 um 20:54 schrieb Andy:
> W dniu sobota, 8 lutego 2020 15:36:39 UTC+1 użytkownik Hans-Peter
Diettrich
> napisał:
>> IMO the FIRST set covers all *different* sequences, in your case
>> FIRST(E)={a} hence LL(1).
>> Only if there exist multiple alternatives starting with 'a' they have to
>> be listed as e.g. FIRST(E)={ax, ay}.
>
> Suppose that A is non-starting nonterminal.
> FIRST_k(A) is set of token strings lengths 1..k derived from A or derived
from
> start symbol and begins at A begin?
What's the purpose of exploding the FIRST and FOLLOW sets of a LL(1)
grammar into something more complicated?
IMO you misunderstand the classification of LL grammars and languages :-(
DoDi
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2020-02-07 16:12 +0200 |
| Subject | FIRST_k, FOLLOW_k, k>1 |
| Message-ID | <20-02-006@comp.compilers> |
| In reply to | #2442 |
> If LL(l) and LR(k) need sets FIRST_k, FOLLOW_k, k>1, for example LR(3) need this sets of degree 3?
> How make it? How is the best structure for these sets? I think about pyramid:
...
> For example, not bit set but set of trees or DFA's ?
I've been meaning to write a paper on this for awhile, so I'll give
you a very rough draft here.
We implemented a version of it in Yacc++, although my understanding of
it was much more
crude at that time.
Everything you need to know is in the LR(0) machine, which is
essentially a DFA (with a stack).
The problem is you need to disentangle it.
Let's take a reduce-reduce conflict. You are in some state. I diagram
states by showing their
dotted item, using Yacc++ notation for the rules.
That state looks something like this:
state r-r:
x: a b c .; // the dotted item says we will reduce x in this state
y: z c.; // we also reduce a y in this state.
So, now we need to know where the goto-function will take us if we
reduce an x and where it
will take us if we reduce a y.
Now, in Yacc++ we know where we started the productions for x and y,
we call them dot-states
(as they are the states with the dotted item before an x or a y).
There is often more than one dot state for a given non-terminal. But,
we will assume just one
for ease of exposition.
state dot-x: // There will often be a set of these, one for every
context in which x was used.
x1: a .x b; // if see an x, goto state goto-x;
state goto-x:
x1: a x .b c; // b is the lookahead if you see x.
state dot-y:
y1: a b .y b; // if see a y, goto state goto-y;
state goto-y:
y1: a b y .b d; // b is the lookahead for y.
If you look carefully, you see that the goto states have dots before
(transitions on) exactly the
tokens that are lookaheads for the reductions that conflict.
And we have a conflict because the token b is in both states (sets).
If the two states are disjoint, have no tokens in common, then there
will be no conflict the tokens
in goto-x will be the lookaheads for reducing x and the tokens in
goto-y will be the lookaheads
for reducing y.
Now, if both states goto-x and goto-y are the same, it is an ambiguity
not just a conflict.
There will be two ways to parse some sentence (with an x or a y),
because once you get to the
shared goto state, the parse will be the same for both.
However, if there is a conflict (some tokens appear in both goto-x and
goto-y and they are not
the same state), then you need to look for another token of lookahead.
You do that by taking the transition(s) on the common tokens and
seeing what that state that
takes you to.
state goto-x-b:
x1: a x b .c; // bc is the lookahead if you see x.
state goto-y-b:
y1: a b y b .d; // bd is the lookahead if you see y
You continue doing this until you have disambiguated the lookahead or
you have proven the
grammar is truly ambiguous for some conflict.
There are a few more considerations.
Now, if one of those goto states is also a reduce state, you have to
apply the rules to backtrack
through calling rule.
e.g.
state dot-y;
y1: a b .y;
state dot-y1;:
y2: c .y1 b;
state goto-y1:
y2: c y1 .b;
If you merged states (ala LALR), you may need to un-merge (split) them
to resolve conflicts,
that are LALR only and not-LR.
You also have to deal with loops in the states.
If there is a finite k lookahead that solves the conflict, this is trivial.
You simply are doing an induction on k as you step through.
However, there are grammars that would be GLR parsable (and not
ambiguous) that include
loops.
That takes more work to recognize, although I am convinced that you
can compute a closure
that resolves it (and when I have proven that I will do the formal
paper on this).
What you are building could be called a lookahead automaton,
especially if you build something
that accepts regular expressions (or even more powerfully LR-type grammars).
But, this is the basic outline.
You can even use this to construct an LL (or recursive descent)
parser. You simply need to remember to do "calls" when you get to
states where you have computed a closure and you are processing one of
the non-terminals you have expanded in doing so. We do this in Yacc++
so we can handle variable length productions in ELR grammars without
backtracking, counting, or lookbacks.
And, you need one more trick if you have nullable productions (that's
another paper I need to write).
--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | honey crisis <gaztoast@gmail.com> |
|---|---|
| Date | 2020-02-08 18:18 -0800 |
| Message-ID | <20-02-010@comp.compilers> |
| In reply to | #2442 |
Antlr used to use trees (don't know if it still does) to represent its lookahead with LL(k) This paper gets into it: https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1311&context=ecetr
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web