Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!nerds-end From: Paul B Mann Newsgroups: comp.compilers Subject: Re: coupling LALR with a scanner? Date: Mon, 19 Sep 2011 12:12:40 -0700 (PDT) Organization: Compilers Central Lines: 38 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-09-022@comp.compilers> References: <11-07-013@comp.compilers> <11-07-015@comp.compilers> <11-07-018@comp.compilers> <11-08-004@comp.compilers> <11-09-016@comp.compilers> <11-09-017@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1316460557 41520 64.57.183.58 (19 Sep 2011 19:29:17 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Mon, 19 Sep 2011 19:29:17 +0000 (UTC) Keywords: parse Posted-Date: 19 Sep 2011 15:29:17 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:272 > IELR was exactly made for that reason, as a first step to PSLR: some > grammars have no 'tokens' and 'grammar rules', they just have a > 'grammar' where mutually exclusive tokens are present, e.g. you > cannot make a Javascript single lexer as there are state where / > (slash) means 'start of regular expression' (of course the content > of the regular expression follows totally different lexing rules > than the rest of the text) whereas in other states it means > 'division' operator. If your parser cannot tell which of the two > lexers to use, you are off. IELR(1) does not solve this problem. It solves the problem of reduce- reduce conflicts in an LR(0) state machine where a state has multiple "lookback nonterminal transitions" which cause conflicts. IELR(1) does state splitting (or duplicating) in order to remove the conflicts (if possible). It has nothing to do with how many scanners you may need. It only pertains to the language defined by the grammar. It looks like you have two languages or two lexical languages (and need two scanners). To show that IELR(1) solves this problem you would have to have an LALR(1) grammar that has reduce-reduce conflicts caused by the need of two different lexers (scanners) and show that IELR(1) removes the conflicts. What you described with Javascript is a little language inside of another language. Why not switch parsers when a slash ('/') is encountered if necessary? Parsing expression grammars are more appropriate for this Javascript problem, if you don't want to hand-code a "fix" within an LALR parser. A similar situation occurs when parsing C. You encounter a "#if ..." while parsing C. At the '#' you must leave the C parser and enter the C- preprocessor parser. After the conditional statement has been parsed, you must switch back to the C parser. / Paul Mann