Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Ben Hanson Newsgroups: comp.compilers Subject: Re: Regular expression string searching & matching Date: Tue, 13 Mar 2018 11:30:30 -0700 (PDT) Organization: Compilers Central Lines: 28 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <18-03-054@comp.compilers> References: <18-03-016@comp.compilers> <18-03-032@comp.compilers> <18-03-034@comp.compilers> <18-03-035@comp.compilers> <18-03-041@comp.compilers> <18-03-045@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="60596"; mail-complaints-to="abuse@iecc.com" Keywords: lex Posted-Date: 13 Mar 2018 15:36:30 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2000 > It also looks like you are running a DFA minimizer (like Hopcroft) on your > result since I am not producing a minimal DFA. That also may help me figure > out if I'm producing the right automaton because they'd match... Actually, although I have a minimising routine I'm not applying it here. I intersect every character class in the regex to produce non-overlapping sets and then these are grouped in equivalence sets for each state in the DFA. This is the part that I didn't find in the Dragon Book (or any other compiler book I looked in) and it was a message on this group from Vern Paxon that tipped me off: https://compilers.iecc.com/comparch/article/94-10-091 (I only paid attention to the equivalence classes bit by the way!) I intersect the equivalence sets as I build the DFA and I have found with this technique you kind of have to go out of your way to get a non minimal DFA by default. I've imagined an even more rigorous approach where character classes are intersected as they are added to the syntax tree up front, but I've not got around to trying to implement it. I'd be interested to see your DFA output once you've cleaned it up a bit. Using derivatives is interesting too - I've seen it discussed but never tried to get into that. Regards, Ben