Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!nx02.iad01.newshosting.com!newshosting.com!69.16.185.11.MISMATCH!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!news2.google.com!border1.nntp.dca.giganews.com!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!nerds-end From: Paul B Mann Newsgroups: comp.compilers Subject: Re: coupling LALR with a scanner? Date: Tue, 13 Sep 2011 13:38:48 -0700 (PDT) Organization: Compilers Central Lines: 57 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-09-016@comp.compilers> References: <11-07-013@comp.compilers> <11-07-015@comp.compilers> <11-07-018@comp.compilers> <11-08-004@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1316148681 52953 64.57.183.58 (16 Sep 2011 04:51:21 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Fri, 16 Sep 2011 04:51:21 +0000 (UTC) Keywords: LALR, parse Posted-Date: 16 Sep 2011 00:51:21 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:266 /* Test1 grammar (like the one above). */ G -> A A -> a B c -> a B d B -> b Test1 grammar is LR(0) and requires NO lookahead symbols. By default, b always reduces to B. /* Test2 grammar. */ G -> A A -> a B1 c -> a B2 d B1 -> b B2 -> b Test2 grammar is LALR(1) and probably SLR(1) also. The parser requires one symbol of lookahead to know which reduction to make (B1 or B2). /* Test3 grammar. */ G -> A A -> a B1 c -> a B2 d -> a B1 d -> a B2 c B1 -> b B2 -> b Test3 grammar is NOT LALR(1), but it is LR(1). The parser requires one symbol of lookahead to know which reduction to make (B1 or B2). Test3 is a highly contrived situation, not usually seen in real-world computer languages (as far as I know). Test3 could be written as Test2. It seems that LALR(1) parsers are more powerful than some people realize. LR(1) parser tables can be huge. A COBOL-85 grammar created an LR(1) parser with over 2,000,000 states, whereas a IELR(1) parser for that grammar created about 1668 states. Bison can create IELR(1) parsers. HYACC can also create IELR(1) parsers. I don't understand why you want to have a different scanner for each state. The parser can easily make the decision, whether a token is valid. In fact, an LALR parser already has this information in the parser tables, so why make a simple situation complicated? Also in case of a syntax error, the parser can tell the user what tokens were expected at the error point. So there is no need for different lexers, unless the syntax of the tokens changes from one state to the next and this could be confusing to the the user of your language. Paul B Mann