Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2417

Algebraic RegEx toolbase with a grep superset clone.

From rockbrentwood@gmail.com
Newsgroups comp.compilers
Subject Algebraic RegEx toolbase with a grep superset clone.
Date 2019-12-30 16:34 -0800
Organization Compilers Central
Message-ID <19-12-036@comp.compilers> (permalink)

Show all headers | View raw


Regular expressions and finite state automata using Kleene algebraic
techniques.

DFA: converts extended regular expressions to minimal deterministic finite
state automata

NFA: converts regular expressions to small non-deterministic finite state
automata

FSC: a "finite state classifier" infers a classification delimited by mutually
disjoint regular expressions from a set of exemplars

REX: a GREP-like utility that works with extended regular expressions.

The "automata" are listed in algebraic form as a system of equations -
equivalent to their formulation as a regular grammar.

In addition to the standard Kleene algebraic operations ("|" for union, "*"
for 0,1,2+ iterations, "+" for 1,2+ iterations, "?" for 0,1 iterations),
extended regular expressions include "&" for intersection, "-" for relative
compliment and "^" for interleave. Formerly archived in the comp.compilers
archive

Formerly in the comp.compilers archive in the 1990's, it's on GitHub here
now:
https://github.com/RockBrentwood/RegEx

The programs were written in ANSI-C (and have been updated to C99). The
programs are small: at present, 154 lines (fsc.c), 365 lines (nfa.c), 483
lines (dfa.c), 615 lines (rex.c). The build is set up for POSIX-based systems.
There are legacy hooks in rex.c to handle wildcard expansions for DOS/Windows,
but they've not been recently tested.

The underlying algebraic framework for regular expressions has been extended
up the Chomsky hierarchy to type 2 - context-free expressions. Two
mathematical formulations

* one, based on the fixed-point operator, descending from the earlier 1970's
Grusky/McWhirter/Yntema, expanded by Leiss/Esik/Kozen in the 2000's and
2010's;

* the other described here and elsewhere in the 1990's, whose underlying
theory - an Eilenberg-Moore category - was published in 2008

were proven equivalent in 2018.

In the future, the RegEx library will be expanded to embody context-free
expressions.

Back to comp.compilers | Previous | Next | Find similar


Thread

Algebraic RegEx toolbase with a grep superset clone. rockbrentwood@gmail.com - 2019-12-30 16:34 -0800

csiph-web