Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: Parallel lexers by chunking the input Date: Fri, 25 Mar 2022 08:04:59 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 34 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-065@comp.compilers> References: <22-03-058@comp.compilers> <22-03-064@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="97958"; mail-complaints-to="abuse@iecc.com" Keywords: lex, parallel, comment Posted-Date: 25 Mar 2022 07:45:38 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2959 Christopher F Clark writes: >But, the real issue is strings, comments, and markdown like features. >They all suffer from the same issue, which is that they are generally >unbounded strings that start and stop with very short sequences. And, >by picking some arbitrary point in the future to start scanning, you >cannot be sure whether you are inside or outside one of those >features. Yes. But you can represent this uncertainty in the state of the DFA. Essentially you can take the original DFA, and make all states of that (nondeterministic) start states of our continuation DFA. Process the pieces of the input in parallel with the continuation automaton. Now you know the end state of the original automaton the first piece, and the end state of the continuation automaton of the second piece; you can look up the end state of the original automaton for the second piece from these informations. And so on for the further pieces. This is a sequential component, but it is fast. Now you can reprocess the second-to-last pieces in parallel using the original automaton and performing the actions (e.g., converting numbers into binary representation). Not sure how to interface that to the parser, but if we have a similar idea for parallelizing the parser, even the lex/flex interface might be ok. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/ [Experiments would be useful here. Yes, you can do this, but do you end up throwing away so much work that it's not worth it? -John]