Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!news.linkpendium.com!news.linkpendium.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: Re: How make multifinished DFA for merged regexps? Date: Tue, 24 Dec 2019 04:42:58 -0500 Organization: Compilers Central Lines: 30 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-12-029@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="26936"; mail-complaints-to="abuse@iecc.com" Keywords: lex, design Posted-Date: 25 Dec 2019 21:23:09 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2410 Hans-Peter Diettrich wrote: > Why should "123." not form a valid float number? In fact it's the C way > to force a possibly int number into a float. This is actually a good point. If you are defining a language (rather than simply implementing a standard language that is already well specified), your tokens should not generally have subsets that are errors. Thus, if 123.0 is a float, then 1 12 123 and 123. should all be legal tokens, or if they are errors specific "error" tokens as in "123." -> errorMissingDigitsAfterDot If you do that, you can have only 1 character lookahead and no complicated backtracking. So, after you generate your lexer, you should look at all error states (or error transitions) depending on how your FSA implements errors and determine whether you should change your token definitions to cover those cases, either by generalizing some token definition so that the error is legal or defining an error token to cover that case or as I suggested previously, make a rule that matches that case to two (or more) tokens. There may be cases where you do nothing and just leave the FSA as is, but you should do so consciously, by making choices. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris