Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: rockbrentwood@gmail.com Newsgroups: comp.compilers Subject: Shift/Accept & Reduce/Accept Conflicts - parsing in mid-stream Date: Sat, 4 Jan 2020 11:28:07 -0800 (PST) Organization: Compilers Central Lines: 106 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-01-006@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="72019"; mail-complaints-to="abuse@iecc.com" Keywords: parse Posted-Date: 04 Jan 2020 21:19:25 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2426 This is an example grammar, where I'll discuss the issue. Base Grammar: L -> | L S, S -> x | u L v, -> S w L Canonical Bottom-Up/Rightmost Transduction: L -> a | L S b, S -> x c | u L v d, -> S w L z Inputs/Tokens: {u,v,w,x}, Outputs/Actions: {a,b,c,d,z} LR Table: States: 0-8 Shift: u: [0,1,2]>3 v: [1]>8 w: [5]>4 x: [0,1,2]>7 Goto: L: 3>1,4>2 S: 0>5,[1,2]>6 Parser: State: S Start: -> >0 S Shift: S -> u [0,1,2]>3 S S -> v [1]>8 S S -> w [5]>4 S S -> x [0,1,2]>7 S Reduce: S -> a [3]>1,[4]>2 S S -> b 6<(1<[3]>1,2<[4]>2) S S -> c 7<([0]>5,[1,2]>6) S S -> d 8<1<3<([0]>5,[1,2]>6) S Accept: S -> z 2<4<5<0< An example describing the format used above: The (b) action pops 6, and either 1 & 3, pushing 3 and 1 back, or 2 & 4, pushing 4 and 2 back; i.e. [3] = 3<>3; similarly ([1,2] = 1<>1,2<>2, so [0]>5,[1,2]>6 is equivalent to 0<>0>5,1<>1>6,2<>2>6). The shift states are: 0,1,2,5 The reduce states are: 3,4,6,8 The accept states are: 2 There are no shift/reduce or reduce/reduce conflicts. But there is one shift/accept conflict in state 2. Normally for LR parsing, you ignore this issue by adopting a convention that an extra "end marker" is present. But this is NOT something you can do for free. There's a price to pay for this convention; namely: that the parsing application is restricted to only those cases where the input is being processed all together, like a batch process; rather than piecemeal in mid-stream. The convention is equivalent to the "longest first" rule: * All shift-accept conflicts are resolved in favor of shift. * All reduce-accept conflicts are resolved in favor of reduce. Example input: uvwxx Up to uvw, it is processed as: >0 (u [0,1,2]>3) ([3]>1,[4]>2 a) ([1]>8 v) (8<1<3<([0]>5,[1,2]>6) d) ([5]>4 w) ([3]>1,[4]>2 a) = >0 (u [0]>3) ([3]>1) a) ([1]>8 v) (8<1<3<[0]>5 d) ([5]>4 w) ([4]>2 a) = >0>5>4>2 uvw ada There's a shift-accept conflict at the first x. So, it branches as >0>5>4>2 uvw ada (2<4<5<0< z) = uvw adaz for the accept, with xx remaining and >0>5>4>2 uvw ada ([2]>7 x) (7<[2]>6 c) (6<2<[4]>2) b) = >0>5>4>2 uvwx adacb with one more x left remaining. This creates a second shift-accept conflict and second branch. Overall, the final results are: Branch #1: uvw adaz, with xx left over Branch #2: uvwx adacbz, with x left over Branch #3: uvwxx adacbcbz The kinds of places where you might actually WANT to do mid-stream parsing are - Processing macros intelligently (i.e. without in-line substituting them!) - Code synthesis ... particularly with comments and macros left intact - Intelligent translations of code fragments, where context is taken into account (particularly important for processing macros). An example of the last item: the old "cfront" routine (where "old" means "release 1") can be almost entirely in-line translated into C99 with little more than name-mangling. But there is one macro that makes reference to a class member whose expansion in the translated C99 code would be a function of the class itself. So, if the macro were f(X), the translated macro would be f(T,X) where T is the name of the class type. Merely to recognize which macros are in-line substitute-able as expressions and statements or other types of phrases is the essence of an in-line parsing problem. More generally, just to recognize which input fragments make up which types of phrases is the essence of the problem. In rare cases, you will find macros that cut across phrase boundaries; e.g. macros that include commas or unbalanced parentheses in them or ending parts of one expression/statement and beginning parts of another.