Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!lnews.iecc.com!nerds-end From: Chris F Clark Newsgroups: comp.compilers Subject: Re: coupling LALR with a scanner? Date: Mon, 03 Oct 2011 11:59:42 -0400 Organization: The World Public Access UNIX, Brookline, MA Lines: 24 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-10-006@comp.compilers> References: <11-07-013@comp.compilers> <11-07-015@comp.compilers> <11-07-018@comp.compilers> <11-08-004@comp.compilers> <11-09-016@comp.compilers> <11-09-017@comp.compilers> <11-09-022@comp.compilers> <11-09-023@comp.compilers> <11-10-003@comp.compilers> <11-10-005@comp.compilers> NNTP-Posting-Host: lnews.iecc.com X-Trace: gal.iecc.com 1317672328 78740 64.57.183.34 (3 Oct 2011 20:05:28 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Mon, 3 Oct 2011 20:05:28 +0000 (UTC) Keywords: parse Posted-Date: 03 Oct 2011 16:05:28 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:288 "Armel" writes: >> there is an argument for introducing such a gap to "segment" >> tokens into smaller chunks for both performance and expressibility >> reasons. > > could you elaborate on this segmentation mechanism? I would like to, but the idea is going through the patent process, as it has hardware design implications. Once I know that the patent has been filed, or get other ok from Intel legal, I will write the idea up. > In my lexer generator, the developer can introduce start states by himself > and 'cut' complex expressions into smaller expressions which still respect > AFD capabilities and introduce dynamic regular expressions where absolutely > necessary, for languages allowing dynamic string delimiters for example > (e.g. doc-strings like << END_OF_STR_MARKER, some lines then a line with > END_OF_STR_MARKER only on the line), languages such as ruby are very funny > from that point of view if I remember well. Those sound like useful extensions. In the hands of an experienced parser writer, that could make the process much easier. Let the system figure out what it can, then adjust where you must.