Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Subject: Re: State-of-the-art algorithms for lexical analysis?
Date: Sun, 5 Jun 2022 16:05:38 -0700 (PDT)
Organization: Compilers Central
Lines: 35
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-06-007@comp.compilers>
References: <22-06-006@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="46525"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 05 Jun 2022 21:11:12 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-06-006@comp.compilers>
Xref: csiph.com comp.compilers:3045

On Sunday, June 5, 2022 at 2:08:12 PM UTC-7, Roger L Costello wrote:

(snip)

> Are regular expressions still the best way to specify tokens?

Some years ago, I used to work with a company that sold hardware
search processors to a certain three letter agency that we are not
supposed to mention, but everyone knows.

It has a completely different PSL, Pattern Specification Language,
much more powerful than the usual regular expression.

Both the standard and extended regular expression are nice, in that we
get used to using them, especially with grep, and without thinking too
much about them.

I suspect, though, that if they hadn't previously been defined, we
might come up with something different today.

Among others, PSL has the ability to define approximate matches,
such as a word with one or more misspellings, that is insertions,
deletions, or substitutions. Usual RE don't have that ability.

There are also PSL expressions for ranges of numbers.
You can often do that with very complicated RE, considering
all of the possibilities.  PSL automatically processes those
possibilities.  (Some can expand to complicated code.)

I suspect that in many cases the usual RE is not optimal for
lexical analysis, other than being well known.

But as noted, DFA are likely the best way to do them.

Though that could change with changes in computer hardware.