Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: gah4 Newsgroups: comp.compilers Subject: Re: State-of-the-art algorithms for lexical analysis? Date: Sun, 5 Jun 2022 16:05:38 -0700 (PDT) Organization: Compilers Central Lines: 35 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-007@comp.compilers> References: <22-06-006@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="46525"; mail-complaints-to="abuse@iecc.com" Keywords: lex Posted-Date: 05 Jun 2022 21:11:12 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-06-006@comp.compilers> Xref: csiph.com comp.compilers:3045 On Sunday, June 5, 2022 at 2:08:12 PM UTC-7, Roger L Costello wrote: (snip) > Are regular expressions still the best way to specify tokens? Some years ago, I used to work with a company that sold hardware search processors to a certain three letter agency that we are not supposed to mention, but everyone knows. It has a completely different PSL, Pattern Specification Language, much more powerful than the usual regular expression. Both the standard and extended regular expression are nice, in that we get used to using them, especially with grep, and without thinking too much about them. I suspect, though, that if they hadn't previously been defined, we might come up with something different today. Among others, PSL has the ability to define approximate matches, such as a word with one or more misspellings, that is insertions, deletions, or substitutions. Usual RE don't have that ability. There are also PSL expressions for ranges of numbers. You can often do that with very complicated RE, considering all of the possibilities. PSL automatically processes those possibilities. (Some can expand to complicated code.) I suspect that in many cases the usual RE is not optimal for lexical analysis, other than being well known. But as noted, DFA are likely the best way to do them. Though that could change with changes in computer hardware.