Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: John R Levine Newsgroups: comp.compilers Subject: TDFA regular expression matching Date: Thu, 22 Feb 2024 14:28:40 -0500 Organization: Compilers Central Sender: johnl%iecc.com Approved: comp.compilers@iecc.com Message-ID: <24-02-006@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="93830"; mail-complaints-to="abuse@iecc.com" Keywords: lex, performance Posted-Date: 22 Feb 2024 14:29:18 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3556 This 2022 paper describes tagged DFA matching of regular expressions that have tags to track which input matched where in the regex. Tagging is easy when matching with an NFA but harder with a DFA since the tag locations in the NFA may end up combined or split in the DFA. It seems pretty clever. And it does work, implemented in re2c. https://arxiv.org/abs/2206.01398 Regards, John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly