Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: rockbrentwood@gmail.com
Newsgroups: comp.compilers
Subject: Shift/Accept & Reduce/Accept Conflicts - parsing in mid-stream
Date: Sat, 4 Jan 2020 11:28:07 -0800 (PST)
Organization: Compilers Central
Lines: 106
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-01-006@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="72019"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 04 Jan 2020 21:19:25 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2426

This is an example grammar, where I'll discuss the issue.

Base Grammar:
L -> | L S, S -> x | u L v, -> S w L
Canonical Bottom-Up/Rightmost Transduction:
L -> a | L S b, S -> x c | u L v d, -> S w L z
Inputs/Tokens: {u,v,w,x}, Outputs/Actions: {a,b,c,d,z}

LR Table:
States: 0-8
Shift:
        u: [0,1,2]>3
        v: [1]>8
        w: [5]>4
        x: [0,1,2]>7
Goto:
        L: 3>1,4>2
        S: 0>5,[1,2]>6

Parser:
State: S
Start:
        ->      >0                      S
Shift:
S       -> u    [0,1,2]>3               S
S       -> v    [1]>8                   S
S       -> w    [5]>4                   S
S       -> x    [0,1,2]>7               S
Reduce:
S       -> a    [3]>1,[4]>2             S
S       -> b    6<(1<[3]>1,2<[4]>2)     S
S       -> c    7<([0]>5,[1,2]>6)       S
S       -> d    8<1<3<([0]>5,[1,2]>6)   S
Accept:
S       -> z    2<4<5<0<
An example describing the format used above:
The (b) action pops 6, and either 1 & 3, pushing 3 and 1 back, or 2 & 4,
pushing 4 and 2 back; i.e. [3] = 3<>3; similarly ([1,2] = 1<>1,2<>2, so
[0]>5,[1,2]>6 is equivalent to 0<>0>5,1<>1>6,2<>2>6).

The shift states are: 0,1,2,5
The reduce states are: 3,4,6,8
The accept states are: 2
There are no shift/reduce or reduce/reduce conflicts.
But there is one shift/accept conflict in state 2.

Normally for LR parsing, you ignore this issue by adopting a convention that
an extra "end marker" is present. But this is NOT something you can do for
free. There's a price to pay for this convention; namely: that the parsing
application is restricted to only those cases where the input is being
processed all together, like a batch process; rather than piecemeal in
mid-stream.

The convention is equivalent to the "longest first" rule:
* All shift-accept conflicts are resolved in favor of shift.
* All reduce-accept conflicts are resolved in favor of reduce.

Example input: uvwxx

Up to uvw, it is processed as:
>0 (u [0,1,2]>3) ([3]>1,[4]>2 a) ([1]>8 v) (8<1<3<([0]>5,[1,2]>6) d) ([5]>4 w)
([3]>1,[4]>2 a)
=
>0 (u [0]>3) ([3]>1) a) ([1]>8 v) (8<1<3<[0]>5 d) ([5]>4 w) ([4]>2 a)
=
>0>5>4>2 uvw ada

There's a shift-accept conflict at the first x. So, it branches as
>0>5>4>2 uvw ada (2<4<5<0< z) = uvw adaz for the accept, with xx remaining

and

>0>5>4>2 uvw ada ([2]>7 x) (7<[2]>6 c) (6<2<[4]>2) b)
=
>0>5>4>2 uvwx adacb

with one more x left remaining. This creates a second shift-accept conflict
and second branch. Overall, the final results are:

Branch #1: uvw adaz, with xx left over
Branch #2: uvwx adacbz, with x left over
Branch #3: uvwxx adacbcbz

The kinds of places where you might actually WANT to do mid-stream parsing
are
- Processing macros intelligently (i.e. without in-line substituting them!)
- Code synthesis ... particularly with comments and macros left intact
- Intelligent translations of code fragments, where context is taken into
account (particularly important for processing macros).

An example of the last item: the old "cfront" routine (where "old" means
"release 1") can be almost entirely in-line translated into C99 with little
more than name-mangling. But there is one macro that makes reference to a
class member whose expansion in the translated C99 code would be a function of
the class itself. So, if the macro were f(X), the translated macro would be
f(T,X) where T is the name of the class type.

Merely to recognize which macros are in-line substitute-able as expressions
and statements or other types of phrases is the essence of an in-line parsing
problem.

More generally, just to recognize which input fragments make up which types of
phrases is the essence of the problem. In rare cases, you will find macros
that cut across phrase boundaries; e.g. macros that include commas or
unbalanced parentheses in them or ending parts of one expression/statement and
beginning parts of another.