Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!nx02.iad01.newshosting.com!newshosting.com!69.16.185.11.MISMATCH!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!news2.google.com!border1.nntp.dca.giganews.com!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!nerds-end
From: Paul B Mann <paul@paulbmann.com>
Newsgroups: comp.compilers
Subject: Re: coupling LALR with a scanner?
Date: Tue, 13 Sep 2011 13:38:48 -0700 (PDT)
Organization: Compilers Central
Lines: 57
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <11-09-016@comp.compilers>
References: <11-07-013@comp.compilers> <11-07-015@comp.compilers> <11-07-018@comp.compilers> <11-08-004@comp.compilers>
NNTP-Posting-Host: news.iecc.com
X-Trace: gal.iecc.com 1316148681 52953 64.57.183.58 (16 Sep 2011 04:51:21 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Fri, 16 Sep 2011 04:51:21 +0000 (UTC)
Keywords: LALR, parse
Posted-Date: 16 Sep 2011 00:51:21 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: x330-a1.tempe.blueboxinc.net comp.compilers:266

/* Test1 grammar (like the one above). */

   G -> A <eof>
   A -> a B c
      -> a B d
   B -> b

Test1 grammar is LR(0) and requires NO lookahead symbols.
By default, b always reduces to B.

/* Test2 grammar. */

   G -> A <eof>
   A -> a B1 c
      -> a B2 d
   B1 -> b
   B2 -> b

Test2 grammar is LALR(1) and probably SLR(1) also.  The parser
requires one symbol of lookahead to know which reduction to make (B1
or B2).

/* Test3 grammar. */

   G -> A <eof>
   A -> a B1 c
      -> a B2 d
      -> a B1 d
      -> a B2 c
   B1 -> b
   B2 -> b

Test3 grammar is NOT LALR(1), but it is LR(1).  The parser requires
one symbol of lookahead to know which reduction to make (B1 or B2).

Test3 is a highly contrived situation, not usually seen in real-world
computer languages (as far as I know).  Test3 could be written as
Test2.

It seems that LALR(1) parsers are more powerful than some people
realize.  LR(1) parser tables can be huge.  A COBOL-85 grammar created
an LR(1) parser with over 2,000,000 states, whereas a IELR(1) parser
for that grammar created about 1668 states.  Bison can create IELR(1)
parsers. HYACC can also create IELR(1) parsers.

I don't understand why you want to have a different scanner for each
state.  The parser can easily make the decision, whether a token is
valid.  In fact, an LALR parser already has this information in the
parser tables, so why make a simple situation complicated?

Also in case of a syntax error, the parser can tell the user what
tokens were expected at the error point.  So there is no need for
different lexers, unless the syntax of the tokens changes from one
state to the next and this could be confusing to the the user of your
language.

Paul B Mann