Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2960

Re: Parallel lexers by chunking the input

From gah4 <gah4@u.washington.edu>
Newsgroups comp.compilers
Subject Re: Parallel lexers by chunking the input
Date 2022-03-25 07:58 -0700
Organization Compilers Central
Message-ID <22-03-066@comp.compilers> (permalink)
References <22-03-058@comp.compilers> <22-03-064@comp.compilers> <22-03-065@comp.compilers>

Show all headers | View raw


On Friday, March 25, 2022 at 4:45:41 AM UTC-7, Anton Ertl wrote:

(snip)

> Yes. But you can represent this uncertainty in the state of the DFA.
> Essentially you can take the original DFA, and make all states of that
> (nondeterministic) start states of our continuation DFA. Process the
> pieces of the input in parallel with the continuation automaton.

At some point, it is Aho-Corasick algorithm.

That works well if you are looking for a large number of things,
and you don't know where they start.  The DFA needs one bit,
in addition to the next state indicator, to indicate that you
found something.  A favorite implementation is the low bit
of a pointer on a byte addressed system, which will normally
be zero.

I suspect this might be used for malware search programs,
which look for any of a large list of matching strings though a
large file.  Not quite as useful for programming language
scanners, though.

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-23 18:49 +0000
  Re: How can the speed of a scanner be independent of the number of rules? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-23 21:27 +0000
    Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-23 19:57 -0700
    Re: How can the speed of a scanner be independent of the number of rules? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 13:12 +0200
    Re: How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-24 11:53 +0000
      Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:56 -0700
        Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-24 13:08 -0700
  Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 02:01 -0700
    Re: Parallel scanning, was How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:18 -0700
      Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 22:32 +0200
        Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 08:04 +0000
          Re: Parallel lexers by chunking the input gah4 <gah4@u.washington.edu> - 2022-03-25 07:58 -0700
          Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 17:47 +0000
            Parallel lexers by chunking the input. Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 00:36 +0200
            Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 01:00 +0200
            Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-26 09:27 +0000

csiph-web