Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: John R Levine Newsgroups: comp.compilers Subject: Interesting paper on regex NFA matching Date: Thu, 25 Jan 2024 22:10:56 -0500 Organization: Compilers Central Sender: johnl%iecc.com Approved: comp.compilers@iecc.com Message-ID: <24-01-003@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="55512"; mail-complaints-to="abuse@iecc.com" Keywords: lex, NFA, performance Posted-Date: 25 Jan 2024 22:12:43 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3545 The easiest way to implement regular expressions is to turn them into NFAs, but that has well known problems that if you feed hostile input to the NFAs, they can take exponential time and it's a way to DDoS the server. If you translate the NFA to a DFA it runs in linear time, but if the regex is ambiguous, as many are, the DFA may match something different from the NFA. This paper on the Arxiv preprint server proposes some rather complex tweaks to NFA regex matching to fix many performance issues while giving the same answers as before. https://arxiv.org/abs/2401.12639 Regards, John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly