Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: FIRST_k, FOLLOW_k, k>1 Date: Fri, 7 Feb 2020 16:12:43 +0200 Organization: Compilers Central Lines: 135 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-02-006@comp.compilers> References: <20-02-004@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="78562"; mail-complaints-to="abuse@iecc.com" Keywords: parse Posted-Date: 07 Feb 2020 22:17:31 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2444 > If LL(l) and LR(k) need sets FIRST_k, FOLLOW_k, k>1, for example LR(3) need this sets of degree 3? > How make it? How is the best structure for these sets? I think about pyramid: ... > For example, not bit set but set of trees or DFA's ? I've been meaning to write a paper on this for awhile, so I'll give you a very rough draft here. We implemented a version of it in Yacc++, although my understanding of it was much more crude at that time. Everything you need to know is in the LR(0) machine, which is essentially a DFA (with a stack). The problem is you need to disentangle it. Let's take a reduce-reduce conflict. You are in some state. I diagram states by showing their dotted item, using Yacc++ notation for the rules. That state looks something like this: state r-r: x: a b c .; // the dotted item says we will reduce x in this state y: z c.; // we also reduce a y in this state. So, now we need to know where the goto-function will take us if we reduce an x and where it will take us if we reduce a y. Now, in Yacc++ we know where we started the productions for x and y, we call them dot-states (as they are the states with the dotted item before an x or a y). There is often more than one dot state for a given non-terminal. But, we will assume just one for ease of exposition. state dot-x: // There will often be a set of these, one for every context in which x was used. x1: a .x b; // if see an x, goto state goto-x; state goto-x: x1: a x .b c; // b is the lookahead if you see x. state dot-y: y1: a b .y b; // if see a y, goto state goto-y; state goto-y: y1: a b y .b d; // b is the lookahead for y. If you look carefully, you see that the goto states have dots before (transitions on) exactly the tokens that are lookaheads for the reductions that conflict. And we have a conflict because the token b is in both states (sets). If the two states are disjoint, have no tokens in common, then there will be no conflict the tokens in goto-x will be the lookaheads for reducing x and the tokens in goto-y will be the lookaheads for reducing y. Now, if both states goto-x and goto-y are the same, it is an ambiguity not just a conflict. There will be two ways to parse some sentence (with an x or a y), because once you get to the shared goto state, the parse will be the same for both. However, if there is a conflict (some tokens appear in both goto-x and goto-y and they are not the same state), then you need to look for another token of lookahead. You do that by taking the transition(s) on the common tokens and seeing what that state that takes you to. state goto-x-b: x1: a x b .c; // bc is the lookahead if you see x. state goto-y-b: y1: a b y b .d; // bd is the lookahead if you see y You continue doing this until you have disambiguated the lookahead or you have proven the grammar is truly ambiguous for some conflict. There are a few more considerations. Now, if one of those goto states is also a reduce state, you have to apply the rules to backtrack through calling rule. e.g. state dot-y; y1: a b .y; state dot-y1;: y2: c .y1 b; state goto-y1: y2: c y1 .b; If you merged states (ala LALR), you may need to un-merge (split) them to resolve conflicts, that are LALR only and not-LR. You also have to deal with loops in the states. If there is a finite k lookahead that solves the conflict, this is trivial. You simply are doing an induction on k as you step through. However, there are grammars that would be GLR parsable (and not ambiguous) that include loops. That takes more work to recognize, although I am convinced that you can compute a closure that resolves it (and when I have proven that I will do the formal paper on this). What you are building could be called a lookahead automaton, especially if you build something that accepts regular expressions (or even more powerfully LR-type grammars). But, this is the basic outline. You can even use this to construct an LL (or recursive descent) parser. You simply need to remember to do "calls" when you get to states where you have computed a closure and you are processing one of the non-terminals you have expanded in doing so. We do this in Yacc++ so we can handle variable length productions in ELR grammars without backtracking, counting, or lookbacks. And, you need one more trick if you have nullable productions (that's another paper I need to write). -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------