Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!nerds-end From: "Karsten Nyblad" Newsgroups: comp.compilers Subject: Re: coupling LALR with a scanner? Date: Fri, 08 Jul 2011 14:39:02 +0200 Organization: Compilers Central Lines: 27 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-07-018@comp.compilers> References: <11-07-013@comp.compilers> <11-07-015@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1310316983 41493 64.57.183.58 (10 Jul 2011 16:56:23 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Sun, 10 Jul 2011 16:56:23 +0000 (UTC) Keywords: parse, lex Posted-Date: 10 Jul 2011 12:56:22 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:194 >the LALR generator is one of mine and the idea here would be to select >automatically the right lexer from the currently accepted tokens. I >have the feeling that it must be doable. The target is to be able to >write grammars with such dependencies naturally without any (user >level) grammar actions and very minimal lexer meta-information (such as >which lexer produces which tokens). You have not written if your generator supports LALR(k) or just LALR(1). In the later case my suggestion is that you change your generator to an LR(1) generator. Not only will you get an easy implementation of what you want. You will also catch syntax errors right after the offending token is put in the window, such that you can get better error reporting and recovery. Do not be afraid the parsers will be too big. That would have been the case 25 years ago, but not for the last 10 years. If you don't to go that way, then do as I suggested in my previous posting: Say you have a parse stack W and a terminal been pushed on the stack. Now you want to know if a terminal t is acceptable. Then you copy W onto a new stack W' and use W' to parse with t in the window. If t is stacked, then t is acceptable. My guess is that you will find this approach will make the parsers a bit slow. The later approach is so simple to implement, that I think somebody would have put it into some parser generator if it was a facility, that would scanning and parsing significantly easier. To me it looks like you have an idea for a solution with no problem.