Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2410

Re: How make multifinished DFA for merged regexps?

Path csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!news.linkpendium.com!news.linkpendium.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups comp.compilers
Subject Re: How make multifinished DFA for merged regexps?
Date Tue, 24 Dec 2019 04:42:58 -0500
Organization Compilers Central
Lines 30
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <19-12-029@comp.compilers> (permalink)
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="26936"; mail-complaints-to="abuse@iecc.com"
Keywords lex, design
Posted-Date 25 Dec 2019 21:23:09 EST
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
Xref csiph.com comp.compilers:2410

Show key headers only | View raw


Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:

> Why should "123." not form a valid float number? In fact it's the C way
> to force a possibly int number into a float.

This is actually a good point.  If you are defining a language (rather
than simply implementing a standard language that is already well
specified), your tokens should not generally have subsets that are
errors.  Thus, if 123.0 is a float, then 1 12 123 and 123. should all
be legal tokens, or if they are errors specific "error" tokens as in
"123." -> errorMissingDigitsAfterDot

If you do that, you can have only 1 character lookahead and no
complicated backtracking.

So, after you generate your lexer, you should look at all error states
(or error transitions) depending on how your FSA implements errors and
determine whether you should change your token definitions to cover
those cases, either by generalizing some token definition so that the
error is legal or defining an error token to cover that case or as I
suggested previously, make a rule that matches that case to two (or
more) tokens.  There may be cases where you do nothing and just leave
the FSA as is, but you should do so consciously, by making choices.

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris

Back to comp.compilers | Previous | Next | Find similar


Thread

Re: How make multifinished DFA for merged regexps? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2019-12-24 04:42 -0500

csiph-web