Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3053

Re: State-of-the-art algorithms for lexical analysis?

Path csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups comp.compilers
Subject Re: State-of-the-art algorithms for lexical analysis?
Date Tue, 7 Jun 2022 06:52:45 +0200
Organization Compilers Central
Lines 27
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-06-015@comp.compilers> (permalink)
References <22-06-006@comp.compilers> <22-06-007@comp.compilers> <22-06-008@comp.compilers> <22-06-013@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding 8bit
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="28212"; mail-complaints-to="abuse@iecc.com"
Keywords lex, comment
Posted-Date 07 Jun 2022 10:51:05 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
Xref csiph.com comp.compilers:3053

Show key headers only | View raw


On 6/6/22 8:16 PM, Christopher F Clark wrote:

> In fact, there is only thing that I have not seen a DFA lexer do that I think is
> worth doing at the lexical level (and not via a screener).  That is recognizing
> tokens the start with a length prefix, e.g. 10Habcdefhij.  Such tokens are
> common in things like network protocols and they would be relatively easy
> to implement, but I've not seen it done.

I'm not sure what you mean. The nnH syntax has to be included into
general number syntax (like 0x... or nnE...).

Or do you mean a token built from the next nn input characters? In this
case both a lower and upper bound were interesting for e.g. (recognized)
identifier length or distinction of Unicode codepoint formats.

> Beyond that it is my relatively firm belief that one should almost always
> have only simple regular expressions, e.g. that the one for floating point
> numbers should be one of the most complex ones.  Otherwise you are trying
> to do too much in the scanner.  And you are asking for trouble when you do.

ACK

DoDi
[I believe he means Fortran style Hollerith strings, where the number says
how many characters are in the following string.  The number is just a count,
not semantically a number in the language.  DFAs can't do that other than by
enumerating every possible length. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

State-of-the-art algorithms for lexical analysis? Roger L Costello <costello@mitre.org> - 2022-06-05 20:53 +0000
  Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-05 16:05 -0700
    Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-06 08:59 +0200
      State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 21:16 +0300
        Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-07 06:52 +0200
          Re: State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-07 19:40 +0300
            Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-08 05:32 +0200
              Re: counted strings, was State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-09 11:54 -0700
                Re: counted characters in strings "Robin Vowels" <robin51@dodo.com.au> - 2022-06-10 12:21 +1000
                Re: counted characters in strings Martin Ward <martin@gkc.org.uk> - 2022-06-11 10:52 +0100
                Re: counted characters in strings drb@msu.edu (Dennis Boone) - 2022-06-11 11:09 -0500
    Re: State-of-the-art algorithms for lexical analysis? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-06 16:00 +0000
    References for PSL ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 20:11 +0300

csiph-web