Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2951

Re: How can the speed of a scanner be independent of the number of rules?

From gah4 <gah4@u.washington.edu>
Newsgroups comp.compilers
Subject Re: How can the speed of a scanner be independent of the number of rules?
Date 2022-03-23 19:57 -0700
Organization Compilers Central
Message-ID <22-03-050@comp.compilers> (permalink)
References <22-03-047@comp.compilers> <22-03-048@comp.compilers>

Show all headers | View raw


On Wednesday, March 23, 2022 at 2:49:30 PM UTC-7, Kaz Kylheku wrote:
> On 2022-03-23, Roger L Costello <cost...@mitre.org> wrote:

(snip)
> > Note that adding rules does not slow down the scanner! The speed of the
> > scanner is independent of the number of rules or (modulo the considerations
> > given at the beginning of this section) how complicated the rules are with
> > regard to operators such as '*' and '|'.

(snip)

> In terms of theoretical computer science, it cannot be true that there
> is no slowdown regardless of the number of rules added. This is because
> the rules are compiled into tables, and tables are indexed by integers.
> Integers have to get wider (more bits) with increasing table size.

Yes, but 32 bit integers will get a huge number of states.

(snip)

> If the lexer is in a test case that does nothing but discard tokens,
> it may be that even though the 1000 rule lexer has a bigger cache
> footprint, it doesn't matter.

Yes, cache is the complication of just about any speed comparison.
And in this case, it depends not only on the scanner, but could be
sensitive to the actual input data.

It would seem that you could do the comparison based on the same
scanner, but using either UTF-8 or UTF-16 coded characters.

That would be closer to an apple vs. apple comparison.

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-23 18:49 +0000
  Re: How can the speed of a scanner be independent of the number of rules? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-23 21:27 +0000
    Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-23 19:57 -0700
    Re: How can the speed of a scanner be independent of the number of rules? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 13:12 +0200
    Re: How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-24 11:53 +0000
      Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:56 -0700
        Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-24 13:08 -0700
  Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 02:01 -0700
    Re: Parallel scanning, was How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:18 -0700
      Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 22:32 +0200
        Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 08:04 +0000
          Re: Parallel lexers by chunking the input gah4 <gah4@u.washington.edu> - 2022-03-25 07:58 -0700
          Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 17:47 +0000
            Parallel lexers by chunking the input. Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 00:36 +0200
            Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 01:00 +0200
            Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-26 09:27 +0000

csiph-web