Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2961

Re: Parallel lexers by chunking the input

From anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups comp.compilers
Subject Re: Parallel lexers by chunking the input
Date 2022-03-25 17:47 +0000
Organization Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID <22-03-067@comp.compilers> (permalink)
References <22-03-058@comp.compilers> <22-03-064@comp.compilers> <22-03-065@comp.compilers>

Show all headers | View raw


anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
...
>[Experiments would be useful here.  Yes, you can do this, but do you
>end up throwing away so much work that it's not worth it? -John]

There is no throwing away, but there are two parallelized passes
through the input (compared to one for the sequential version), plus
the sequential component (should be comparatively small if the chunks
are large), plus maybe some additional overhead for the interface to
the parser.  Still, if you run the scanner on an 8-core CPU, I would
expect a nice real-time speedup (but CPU-time slowdown) for scanning a single input.

However, for typical C/C++ compilation jobs, there often is enough
parallelism from "make -j", so you don't want a scanner that takes
more CPU time.

For load-and-go systems, this kind of parallelization may be more
appropriate, but often there is some interleaving with semantic stuff
that may prevent having the complete scanner run before the rest, and
thus prevent parallelizing the scanner.

@gah4: I don't think that the Aho-Corasick algorithm is particularly
relevant.  It seems to be a less general variant of what lex/flex
does, and is completely sequential in any case.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I don't understand where you get only two passes.  If you divide the input
into arbitrary sized chunks, you could potentially start at any state in the
scanner and while most of those states would quickly fail, that's a lot of
startup overhead for each block. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-23 18:49 +0000
  Re: How can the speed of a scanner be independent of the number of rules? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-23 21:27 +0000
    Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-23 19:57 -0700
    Re: How can the speed of a scanner be independent of the number of rules? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 13:12 +0200
    Re: How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-24 11:53 +0000
      Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:56 -0700
        Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-24 13:08 -0700
  Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 02:01 -0700
    Re: Parallel scanning, was How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:18 -0700
      Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 22:32 +0200
        Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 08:04 +0000
          Re: Parallel lexers by chunking the input gah4 <gah4@u.washington.edu> - 2022-03-25 07:58 -0700
          Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 17:47 +0000
            Parallel lexers by chunking the input. Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 00:36 +0200
            Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 01:00 +0200
            Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-26 09:27 +0000

csiph-web