Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2952

Re: How can the speed of a scanner be independent of the number of rules?

From Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Newsgroups comp.compilers
Subject Re: How can the speed of a scanner be independent of the number of rules?
Date 2022-03-24 02:01 -0700
Organization Compilers Central
Message-ID <22-03-052@comp.compilers> (permalink)
References <22-03-047@comp.compilers>

Show all headers | View raw


On Wednesday, March 23, 2022 at 8:24:44 PM UTC+1, Roger L Costello wrote:
> Hi Folks,
>
> On page 48 of the Flex manual [1] it says this amazing thing:
>
> Note that adding rules does not slow down the scanner! The speed of the
> scanner is independent of the number of rules or (modulo the considerations
> given at the beginning of this section) how complicated the rules are with
> regard to operators such as '*' and '|'.
>
> That is amazing! And counterintuitive. How can it possibly be that a scanner
> containing 1000 rules can operate as fast as a scanner containing 10 rules?
> Would you give some intuition to help me understand this, please?
>
> /Roger
>
> [1] https://epaperpress.com/lexandyacc/download/flex.pdf
>
> [Flex compiles the rules into a finite state machine. When the scanner
> runs, it just looks up each character it reads in the table for the current
> state to decide what to do. Creating the state tables for 1000 rules takes
> a lot longer than creating the tables for 10 rules, but that just happens
> once when you build the scanner, not when it's running.
> For more details on regular expressions and state machines, see any compiler
> textbook. It's one of the standard topics. -John]

I am not sure what answer you are expecting in respect to the complexity of
the answer, or with which viewpoint you are the most comfortable with. There
are many (more than just the 2 answers mentioned below) answers to the posed
question.

(1) Flex is performing an optimization that is similar to common subexpression
elimination (CSE). For example, conceptually, two scanner rules written using
BASIC-like notation:

10: IF in[p]='a' AND in[p+1]='b' THEN { p+=2; GOTO 20 }
10: IF in[p]='a' AND in[p+1]='c' THEN { p+=2; GOTO 30 }
....

are rewritten/converted by Flex into:

10: IF in[p]='a' THEN { p+=1; GOTO 11 }
11: IF in[p]='b' THEN { p+=1; GOTO 20 }
11: IF in[p]='c' THEN { p+=1; GOTO 30 }
....

And, as can be easily seen from the above example, the originally
non-deterministic line "10" has been converted into a single (deterministic)
line of code, which is a basic principle of how Flex is operating when
generating a scanner.

(2) Flex is old and was designed for single-threaded CPUs. It is unable to
generate a multi-threaded scanner. It is for example obvious that the language
(ab)* benefits from multi-core optimizations in case of sufficiently long
inputs. Note that this viewpoint is in direct contradiction to the
"old-school" viewpoint presented by John (in the square brackets in the text
cited above).

-atom
[Is there anything published about parallel scanning?  I'd think it'd be
inherently sequential since you don't know the state for a character
until you've processed all the previous characters. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-23 18:49 +0000
  Re: How can the speed of a scanner be independent of the number of rules? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-23 21:27 +0000
    Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-23 19:57 -0700
    Re: How can the speed of a scanner be independent of the number of rules? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 13:12 +0200
    Re: How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-24 11:53 +0000
      Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:56 -0700
        Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-24 13:08 -0700
  Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 02:01 -0700
    Re: Parallel scanning, was How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:18 -0700
      Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 22:32 +0200
        Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 08:04 +0000
          Re: Parallel lexers by chunking the input gah4 <gah4@u.washington.edu> - 2022-03-25 07:58 -0700
          Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 17:47 +0000
            Parallel lexers by chunking the input. Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 00:36 +0200
            Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 01:00 +0200
            Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-26 09:27 +0000

csiph-web