Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: rockbrentwood@gmail.com Newsgroups: comp.compilers Subject: Algebraic RegEx toolbase with a grep superset clone. Date: Mon, 30 Dec 2019 16:34:14 -0800 (PST) Organization: Compilers Central Lines: 48 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-12-036@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="53472"; mail-complaints-to="abuse@iecc.com" Keywords: tools, lex Posted-Date: 30 Dec 2019 20:12:12 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2417 Regular expressions and finite state automata using Kleene algebraic techniques. DFA: converts extended regular expressions to minimal deterministic finite state automata NFA: converts regular expressions to small non-deterministic finite state automata FSC: a "finite state classifier" infers a classification delimited by mutually disjoint regular expressions from a set of exemplars REX: a GREP-like utility that works with extended regular expressions. The "automata" are listed in algebraic form as a system of equations - equivalent to their formulation as a regular grammar. In addition to the standard Kleene algebraic operations ("|" for union, "*" for 0,1,2+ iterations, "+" for 1,2+ iterations, "?" for 0,1 iterations), extended regular expressions include "&" for intersection, "-" for relative compliment and "^" for interleave. Formerly archived in the comp.compilers archive Formerly in the comp.compilers archive in the 1990's, it's on GitHub here now: https://github.com/RockBrentwood/RegEx The programs were written in ANSI-C (and have been updated to C99). The programs are small: at present, 154 lines (fsc.c), 365 lines (nfa.c), 483 lines (dfa.c), 615 lines (rex.c). The build is set up for POSIX-based systems. There are legacy hooks in rex.c to handle wildcard expansions for DOS/Windows, but they've not been recently tested. The underlying algebraic framework for regular expressions has been extended up the Chomsky hierarchy to type 2 - context-free expressions. Two mathematical formulations * one, based on the fixed-point operator, descending from the earlier 1970's Grusky/McWhirter/Yntema, expanded by Leiss/Esik/Kozen in the 2000's and 2010's; * the other described here and elsewhere in the 1990's, whose underlying theory - an Eilenberg-Moore category - was published in 2008 were proven equivalent in 2018. In the future, the RegEx library will be expanded to embody context-free expressions.