Path: csiph.com!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Jan Ziak <0xe2.0x9a.0x9b@gmail.com> Newsgroups: comp.compilers Subject: Re: How can the speed of a scanner be independent of the number of rules? Date: Thu, 24 Mar 2022 11:56:45 -0700 (PDT) Organization: Compilers Central Lines: 29 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-059@comp.compilers> References: <22-03-047@comp.compilers> <22-03-048@comp.compilers> <22-03-056@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="92259"; mail-complaints-to="abuse@iecc.com" Keywords: lex, performance Posted-Date: 24 Mar 2022 15:19:49 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-03-056@comp.compilers> Xref: csiph.com comp.compilers:2956 On Thursday, March 24, 2022 at 6:39:31 PM UTC+1, Roger L Costello wrote: > Kaz Kylheku wrote: > > > suppose we have some existing scanner with 10 rules, > > which correctly tokenizes an input. Then suppose we > > add 990 rules to it. None of these rules take precedence > > over the 10 rules, and so the the input is handled by the > > same rules. > > Ouch!!! > > Such a letdown. So the statement "adding rules does not slow down the scanner" > really isn't remarkable or awesome. Add 990 more irrelevant rules, and the > scanner operates just as fast. Big deal. That is, in my opinion, an invalid way of looking at scanning, because it isn't computer science. A better way (I am not saying "the only right way") is for example to compare the size of integers, the size of a memory address, that a typical year-2022 CPU can process/access/read/write/lookup using in a single CPU instruction. The fact is that the size of those integers and addresses (such as: 32 bits, 64 bits) is extremely large compared to the number of rules and complexity of any grammar that a human is able to directly (by hand, by means of keyboard/mouse/touchscreen) enter into a computer. And, creating a program just to generate or download an extremely large complex grammar, containing quantities exceeding the capabilities of a single CPU instruction and exceeding the size of memory installed in a typical year-2022 computer, is pointless. -atom