Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2414

Re: How make multifinished DFA for merged regexps?

Path csiph.com!tncsrv06.tnetconsulting.net!news.snarked.org!news.linkpendium.com!news.linkpendium.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From rockbrentwood@gmail.com
Newsgroups comp.compilers
Subject Re: How make multifinished DFA for merged regexps?
Date Sun, 29 Dec 2019 20:56:21 -0800 (PST)
Organization Compilers Central
Lines 77
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <19-12-033@comp.compilers> (permalink)
References <19-12-005@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Content-Transfer-Encoding 8bit
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="7825"; mail-complaints-to="abuse@iecc.com"
Keywords lex, DFA
Posted-Date 30 Dec 2019 11:30:03 EST
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
In-Reply-To <19-12-005@comp.compilers>
Xref csiph.com comp.compilers:2414

Show key headers only | View raw


On Thursday, December 19, 2019 at 10:01:24 PM UTC-6, Andy wrote:
> I can create DFA direct from regexp.
> But for language lexer I must have DFA for couple regexp.
> One solution is crating DFA with multi finished states.
> For example
> r0 = ab
> r1 = ac
>
>   | 0 | 1
> a | 1 |
> b |   | 2(F)
> c |   | 3(F)
>
> How to check if r0 and r1 are disjoint?

In a day or so, I'll put back up on GitHub our regex utilities that had once
been up on the comp.compilers archive. The "DFA" program can process boolean
operations - including intersection and relative compliment. (The "NFA"
program produces efficient linear-sized near-deterministic FA, "REX" is like
GREP except it can process boolean combinations too, and "FSC" is a "finite
state classifier"). I'll post a link when it's up on GitHub.

Denote intersection by &. The regular expressions ab, ac are the least
fixed-point solutions to the following systems
[0] = ab >= a[1]
[1] = b >= b[2]
[2] = 1

[3] = ac >= a[4]
[4] = c >= c[5]
[5] = 1

The system for the intersection is obtained by distributivity:
[0]&[3] >= a([1]&[4])
[1]&[4] >= (b[2] | c0) & (b0 | c[2]) = b([2]&0) | c(0&[2]) = b0 | c0 = 0
The least fixed-point solution is [1]&[4] = 0, [0]&[3] = a0 = 0.
So, there's 0 intersection.

A more interesting example with relative compliment "-":
(a|b)* - (a|b)*(aa|bb)(a|b)*

The first term (a|b)* has the following right-linear system
[0] = (a|b)* = 1 | (a|b)(a|b)* = 1 | a(a|b)* | b(a|b)* >= 1 | a[0] | b[0]

The second term (a|b)*(aa|bb)(a|b)* has the following:
[1] = (a|b)*(aa|bb)(a|b)* = (aa|bb)(a|b)* | a[1] | b[1]
   = a(a(a|b)* | [1]) | b(b(a|b)* | [1])
   >= a[2] | b[3]
[2] = a(a|b)* | [1] = a(a|b)* | 1 | a[1] | b[1] = 1 | a((a|b)* | [1]) | b[1]
   >= 1 | a[4] | b[1]
[3] = b(a|b)* | [1] = b(a|b)* | 1 | a[1] | b[1] = 1 | a[1] | b((a|b)* | [1])
   >= 1 | a[1] | b[4]
[4] = (a|b)* | [1] = 1 | (a|b)(a|b)* | 1 | a[1] | b[1]
   = 1 | 1 | a((a|b)* | [1]) | b((a|b)* | [1])
   >= 1 | a[4] | b[4]

The system for the relative compliment can be found by distributivity:
[0]-[1] >= (1|a[0]|b[0]) - (0|a[2]|b[3])
   = 1-0 | a([0]-[2]) | b([0]-[3])
   = 1 | a([0]-[2]) | b([0]-[3])
[0]-[2] >= (1-1) | a([0]-[4]) | b([0]-[1]) = a([0]-[4]) | b([0]-[1])
[0]-[3] >= (1-1) | a([0]-[1]) | b([0]-[4]) = a([0]-[1]) | b([0]-[4])
[0]-[4] >= (1-1) | a([0]-[4]) | b([0]-[4]) = a([0]-[4]) | b([0]-[4])

The least fixed-point solution x >= ax | bx is x = 0, so for the last item, it
is [0]-[4] = 0. For the remaining items, the system therefore reduces to
[0]-[1] >= 1 | a([0]-[2]) | b([0]-[3])
[0]-[2] >= a0 | b([0]-[1]) = b([0]-[1])
[0]-[3] >= a([0]-[1]) | b0 = a([0]-[1])
which upon substitution reduces to a single inequality
[0]-[1] >= 1 | ab([0]-[1]) | ba([0]-[1]) = 1 | (ab|ba)([0]-[1]).

The least fixed point solution to x >= 1 | cx is x = c*. Thus, for [0]-[1], it
is
[0]-[1] = (ab|ba)*
Therefore, the relative compliment evaluates to:
(a|b)* - (a|b)*(aa|bb)(a|b)* = (ab|ba)*.

Back to comp.compilers | Previous | NextPrevious in thread | Find similar


Thread

How make multifinished DFA for merged regexps? Andy <borucki.andrzej@gmail.com> - 2019-12-19 18:19 -0800
  Re: How make multifinished DFA for merged regexps? Kaz Kylheku <493-878-3164@kylheku.com> - 2019-12-20 05:54 +0000
  Re: How make multifinished DFA for merged regexps? Andy <borucki.andrzej@gmail.com> - 2019-12-20 16:29 -0800
    Re: How make multifinished DFA for merged regexps? Kaz Kylheku <493-878-3164@kylheku.com> - 2019-12-21 04:04 +0000
    Re: How make multifinished DFA for merged regexps? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-12-24 02:15 +0100
  Re: How make multifinished DFA for merged regexps? Matt Timmermans <matt.timmermans@gmail.com> - 2019-12-23 22:29 -0800
  Re: How make multifinished DFA for merged regexps? rockbrentwood@gmail.com - 2019-12-29 20:56 -0800

csiph-web