Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: gah4 Newsgroups: comp.compilers Subject: Re: Parallel lexers by chunking the input Date: Fri, 25 Mar 2022 07:58:04 -0700 (PDT) Organization: Compilers Central Lines: 22 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-066@comp.compilers> References: <22-03-058@comp.compilers> <22-03-064@comp.compilers> <22-03-065@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="48623"; mail-complaints-to="abuse@iecc.com" Keywords: lex, parallel Posted-Date: 25 Mar 2022 10:59:33 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-03-065@comp.compilers> Xref: csiph.com comp.compilers:2960 On Friday, March 25, 2022 at 4:45:41 AM UTC-7, Anton Ertl wrote: (snip) > Yes. But you can represent this uncertainty in the state of the DFA. > Essentially you can take the original DFA, and make all states of that > (nondeterministic) start states of our continuation DFA. Process the > pieces of the input in parallel with the continuation automaton. At some point, it is Aho-Corasick algorithm. That works well if you are looking for a large number of things, and you don't know where they start. The DFA needs one bit, in addition to the next state indicator, to indicate that you found something. A favorite implementation is the low bit of a pointer on a byte addressed system, which will normally be zero. I suspect this might be used for malware search programs, which look for any of a large list of matching strings though a large file. Not quite as useful for programming language scanners, though.