Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!news.panservice.it!feeder.news-service.com!border3.nntp.ams.giganews.com!Xl.tags.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!local2.nntp.ams.giganews.com!nntp.bt.com!news.bt.com.POSTED!not-for-mail NNTP-Posting-Date: Sun, 02 Oct 2011 06:21:39 -0500 From: "Chris Uppal" Newsgroups: comp.programming References: <9I4hq.4$yO1.1@newsfe21.iad> Subject: Re: Deterministic Finite State Machine for regexs. Date: Sun, 2 Oct 2011 12:13:19 +0100 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5512 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512 X-RFC2646: Format=Flowed; Response Message-ID: Lines: 34 X-Usenet-Provider: http://www.giganews.com X-AuthenticatedUsername: NoAuthUser X-Trace: sv3-xpO17Jn4hTENTxWJxSAHS87F/A+6EEW6llIj6FWmhTB+c2hEU/1f1L/txY+IGHcTHFm+WcduY/iKDoc!Wpn8c/XKcuTVb8xelBh0rcE6qNrmoUcTdOcT/VoLtt36Xx23qo7S++fbn+bhu7Fi8pOGIUgikhg= X-Complaints-To: abuse@btinternet.com X-DMCA-Complaints-To: abuse@btinternet.com X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 2834 Xref: x330-a1.tempe.blueboxinc.net comp.programming:897 Daniel Pitts wrote: [on the subject of irregular extensions to regular language parsing] > I was wondering if anyone has any thoughts on how to implement that kind > of thing? I'd really like to the internal representation of a word > boundary to be equivalent to (last != \w && current == \w || last == \w > && current != \w) > > On an related note, I'd like to know if anyone has thoughts about > supporting back-references and/or group-capturing with such a system. > Its not as important for my current use-cases, but if I can build such a > generic tool, it will almost certainly be useful in the future. If you haven't already found Ross Cox's pages on regular expressions, then you may find it helpful. A fair bit of it is going over the NFA/DFA stuff which you already know, but there are historical notes, and thoughts on implementations of non-regular features, plus descriptions of how his own library (RE2) implements such things. Start with: http://swtch.com/~rsc/regexp/regexp3.html BTW, if have my own little library too, and have faced (without actually bothering to solve -- yet) similar problems. The approach I want to take, but haven't yet planned in enough detail to see if it'll even work let alone be acceptable fast, is to allow nodes in the NFA to be annotated with more-or-less arbitrary code (to be executed by the automaton) and which would (in some magical way) be preserved during translation to the DFA form (presumably it would normally result in larger DFA, since reachable state-sets couldn't be joined unless the code segments were equivalent along all paths leading to those states). -- chris