Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3548
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: Interesting paper on regex NFA matching |
| Date | 2024-01-27 12:47 +0200 |
| Organization | Compilers Central |
| Message-ID | <24-01-006@comp.compilers> (permalink) |
Kaz Kylheku <433-929-6894@kylheku.com> wrote: > (I personaly am not so interested in backtracking regexes; and they are > not directly applicable to compilers. You'd never want to design a > lexical analyzer that requires "lookaround" or whatnot in order to > recognize tokens. It's nice they are generating Arxiv papers for > someone; no genuine intellectual inquiry is invalid; good luck! It's > not applicable to the "Programming Languages" category it has been filed > under. I will skim through it, though.) You are mistaken that they are not applicable to programming languages. Even Pascal [originally] had cases where lookahead and backtracking are required to properly tokenize the input. The well-known case is "1." if followed by another "." it is two tokens, and if followed by a digit it is a floating point number. I don't recall what it should be if followed by any other character. In any case, parser generators like ANTLR use backtracking to great effect (and yes, we "borrowed" that idea in Yacc++) and so did the entire PEG (parsing expression grammar) community, using memoizartion to give linear time performance in many cases. Thus, to me, it is no surprise to see this reintegrated back into the PCRE world. Moreover, even if it is of little interest [mistakenly in my opinion] to the "programming language" community, it is of definite interest in the "real world:" PCRE regular expressions are both used (and abused) in many applications, including SNORT where linear performance and preventing DDOS attacks would be of prime importance. (While at Intel, I had the honor of working with the Hyperscan (Sensory Networks) team and improving PCRE performance for SNORT was a major "selling point" And "greedy" (longest match), "non-greedy" (shortest match) and other variations like "lexically first in the grammar" matching all have their applications. Saying that only one of them is correct is ignoring the needs of actual users. And, yes, we had discussions about "automata theory correct" interpretations of what certain PCRE expressions meant and how even "canonical" implementations had various "bugs" that made their meaning ambiguous. By the way, to give credit where due, we "borrowed" the idea of "tunnel automatons" from Josef Grosch to implement controlled backtracking in Yacc++. It lets one do subset construction (i.e. the LR(0) algorithm) for items, breaking out for "syntactic predicates" exactly as needed. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
Back to comp.compilers | Previous | Next — Next in thread | Find similar
Re: Interesting paper on regex NFA matching Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2024-01-27 12:47 +0200 Re: Interesting paper on regex NFA matching Kaz Kylheku <433-929-6894@kylheku.com> - 2024-01-27 20:20 +0000
csiph-web