Path: csiph.com!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Subject: Re: How can the speed of a scanner be independent of the number of rules?
Date: Thu, 24 Mar 2022 13:08:20 -0700 (PDT)
Organization: Compilers Central
Lines: 39
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-062@comp.compilers>
References: <22-03-047@comp.compilers> <22-03-048@comp.compilers> <22-03-056@comp.compilers> <22-03-059@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="47455"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, performance
Posted-Date: 24 Mar 2022 19:18:16 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-059@comp.compilers>
Xref: csiph.com comp.compilers:2957

On Thursday, March 24, 2022 at 12:19:52 PM UTC-7, Jan Ziak wrote:

(snip)

> That is, in my opinion, an invalid way of looking at scanning, because it
> isn't computer science. A better way (I am not saying "the only right way") is
> for example to compare the size of integers, the size of a memory address,
> that a typical year-2022 CPU can process/access/read/write/lookup using in a
> single CPU instruction.

This distinction comes up more often for sorting algorithms,
but I suppose it applies here, too.

If one wants to sort a gigabyte of bytes, that is elements of length
one (8 bit) byte, it is faster to count them, and then generate an output
file with the appropriate number of each byte.

Depending on how you do the measurement, that violates some rule
on sorting algorithms.

> The fact is that the size of those integers and
> addresses (such as: 32 bits, 64 bits) is extremely large compared to the
> number of rules and complexity of any grammar that a human is able to directly
> (by hand, by means of keyboard/mouse/touchscreen) enter into a computer.

Some years ago, I was working on using DFA for DNA sequence comparison.

I did, at least, do some tests with millions of states, maybe tens of
millions.  (The alphabet size is four, so you can fit a lot of states in memory.)

But yes, I didn't try to enter the states using a keyboard, mouse,
or other human interface device.

Reminds me, though, of some years ago, close to the time I was working
on that one, it was said that much of the DNA sequences in GenBank were
obtained by OCR scanning the pages of journal articles.  The fraction that
were obtained that way kept decreasing, but the number was increasing.
(Like Moore's law, the size of the GenBank increases exponentially,
last I knew at 1%/week.)