Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2957

Re: How can the speed of a scanner be independent of the number of rules?

Path csiph.com!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From gah4 <gah4@u.washington.edu>
Newsgroups comp.compilers
Subject Re: How can the speed of a scanner be independent of the number of rules?
Date Thu, 24 Mar 2022 13:08:20 -0700 (PDT)
Organization Compilers Central
Lines 39
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-03-062@comp.compilers> (permalink)
References <22-03-047@comp.compilers> <22-03-048@comp.compilers> <22-03-056@comp.compilers> <22-03-059@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="47455"; mail-complaints-to="abuse@iecc.com"
Keywords lex, performance
Posted-Date 24 Mar 2022 19:18:16 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
In-Reply-To <22-03-059@comp.compilers>
Xref csiph.com comp.compilers:2957

Show key headers only | View raw


On Thursday, March 24, 2022 at 12:19:52 PM UTC-7, Jan Ziak wrote:

(snip)

> That is, in my opinion, an invalid way of looking at scanning, because it
> isn't computer science. A better way (I am not saying "the only right way") is
> for example to compare the size of integers, the size of a memory address,
> that a typical year-2022 CPU can process/access/read/write/lookup using in a
> single CPU instruction.

This distinction comes up more often for sorting algorithms,
but I suppose it applies here, too.

If one wants to sort a gigabyte of bytes, that is elements of length
one (8 bit) byte, it is faster to count them, and then generate an output
file with the appropriate number of each byte.

Depending on how you do the measurement, that violates some rule
on sorting algorithms.

> The fact is that the size of those integers and
> addresses (such as: 32 bits, 64 bits) is extremely large compared to the
> number of rules and complexity of any grammar that a human is able to directly
> (by hand, by means of keyboard/mouse/touchscreen) enter into a computer.

Some years ago, I was working on using DFA for DNA sequence comparison.

I did, at least, do some tests with millions of states, maybe tens of
millions.  (The alphabet size is four, so you can fit a lot of states in memory.)

But yes, I didn't try to enter the states using a keyboard, mouse,
or other human interface device.

Reminds me, though, of some years ago, close to the time I was working
on that one, it was said that much of the DNA sequences in GenBank were
obtained by OCR scanning the pages of journal articles.  The fraction that
were obtained that way kept decreasing, but the number was increasing.
(Like Moore's law, the size of the GenBank increases exponentially,
last I knew at 1%/week.)

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-23 18:49 +0000
  Re: How can the speed of a scanner be independent of the number of rules? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-23 21:27 +0000
    Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-23 19:57 -0700
    Re: How can the speed of a scanner be independent of the number of rules? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 13:12 +0200
    Re: How can the speed of a scanner be independent of the number of rules? Roger L Costello <costello@mitre.org> - 2022-03-24 11:53 +0000
      Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:56 -0700
        Re: How can the speed of a scanner be independent of the number of rules? gah4 <gah4@u.washington.edu> - 2022-03-24 13:08 -0700
  Re: How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 02:01 -0700
    Re: Parallel scanning, was How can the speed of a scanner be independent of the number of rules? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-03-24 11:18 -0700
      Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-24 22:32 +0200
        Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 08:04 +0000
          Re: Parallel lexers by chunking the input gah4 <gah4@u.washington.edu> - 2022-03-25 07:58 -0700
          Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-25 17:47 +0000
            Parallel lexers by chunking the input. Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 00:36 +0200
            Parallel lexers by chunking the input Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-26 01:00 +0200
            Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-03-26 09:27 +0000

csiph-web