Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3046

Re: State-of-the-art algorithms for lexical analysis?

Path csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups comp.compilers
Subject Re: State-of-the-art algorithms for lexical analysis?
Date Mon, 6 Jun 2022 08:59:47 +0200
Organization Compilers Central
Lines 54
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-06-008@comp.compilers> (permalink)
References <22-06-006@comp.compilers> <22-06-007@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding 8bit
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="35996"; mail-complaints-to="abuse@iecc.com"
Keywords lex
Posted-Date 06 Jun 2022 11:04:44 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
In-Reply-To <22-06-007@comp.compilers>
Xref csiph.com comp.compilers:3046

Show key headers only | View raw


On 6/6/22 1:05 AM, gah4 wrote:

> It has a completely different PSL, Pattern Specification Language,
> much more powerful than the usual regular expression.

I wonder about the need for powerful patterns in programming languages.
Most items (operators, punctuators, keywords) are fixed literals with a
fixed ID for use by the parser and code generator. If source code is
written by humans then the remaining types (identifiers, literals,
comments) should not have a too complicated syntax. For machine
generated source code a lexer is not required, the generator can
immediately produce the tokens for the parser. And if humans should
understand the code produced by the generator then again the syntax has
to be as simple and easy understandable as possible to humans.


> Among others, PSL has the ability to define approximate matches,
> such as a word with one or more misspellings, that is insertions,
> deletions, or substitutions. Usual RE don't have that ability.

That's fine for keywords but does not help with user defined
identifiers. Still a nice to have feature :-)

> There are also PSL expressions for ranges of numbers.
> You can often do that with very complicated RE, considering
> all of the possibilities.  PSL automatically processes those
> possibilities.  (Some can expand to complicated code.)

If this feature is really helpful to the user?


> I suspect that in many cases the usual RE is not optimal for
> lexical analysis, other than being well known.
>
> But as noted, DFA are likely the best way to do them.

ACK

> Though that could change with changes in computer hardware.

Or with the style of writing. APL already tried to simplify typing, in
the near future a Chinese programming language with a glyph for each
token (except literals) would eliminate the need for a lexer. Then a
demand may arise for speech-to-text and reverse tools instead of a
lexer, for each natural language.

DoDi
[Regular expressions have the advantage that once you've paid the one-time cost
of making a DFA, the matching is extremely fast.  Since the lexer is usually
one of the slowest parts of a compiler since it is the only part that has to
look at each character of the source program, this is a place where speed
matters.  Anyone know how fast PSLs are?  I've seen fuzzy matchers but they
haven't been very fast. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

State-of-the-art algorithms for lexical analysis? Roger L Costello <costello@mitre.org> - 2022-06-05 20:53 +0000
  Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-05 16:05 -0700
    Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-06 08:59 +0200
      State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 21:16 +0300
        Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-07 06:52 +0200
          Re: State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-07 19:40 +0300
            Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-08 05:32 +0200
              Re: counted strings, was State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-09 11:54 -0700
                Re: counted characters in strings "Robin Vowels" <robin51@dodo.com.au> - 2022-06-10 12:21 +1000
                Re: counted characters in strings Martin Ward <martin@gkc.org.uk> - 2022-06-11 10:52 +0100
                Re: counted characters in strings drb@msu.edu (Dennis Boone) - 2022-06-11 11:09 -0500
    Re: State-of-the-art algorithms for lexical analysis? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-06 16:00 +0000
    References for PSL ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 20:11 +0300

csiph-web