Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <480-992-1380@kylheku.com> Newsgroups: comp.compilers Subject: Re: Integer sizes and DFAs Date: Sat, 26 Mar 2022 18:39:53 -0000 (UTC) Organization: A noiseless patient Spider Lines: 57 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-072@comp.compilers> References: <22-03-071@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="19106"; mail-complaints-to="abuse@iecc.com" Keywords: performance, architecture, comment Posted-Date: 26 Mar 2022 14:49:17 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2966 On 2022-03-26, Christopher F Clark wrote: > One of the posters mentioned that ignoring the size of integers used > in the DFAs was "not computer science". Actually, it is. It's called > the RAM model of computing. And, in it you assume each "integer" > occupies one location and is also sufficient to address any memory > location. Sa, accessing any integer any where in memory is an O(1) > operation. The O notation is based on the independent variable(s) like n being able to taken on unlimited, arbitrarily large values. In certain common situations we model elementary operations as being O(1), with certain justifications. That simplifies the analysis, while leaving its results still relevant and true for practical applications running on present hardware. The justification needs to be articulated though. > Note how that even works for the DNA DFAs, assuming 32 bit integers > and a byte addressed machine. With that model, you can fit 10 million > (1E7) states in the 2 billion byte (2GB) address space (each state has > 4, 4 byte entries or 16 bytes). Each 32 bit state has 4 byte entries only if it doesn't express that it branches to other states for certain input symbols. A DFA state is a row in a table, not just the number which identifies/enumerates that row. If the input symbols are bytes, then the abstract row size capped to 256 entries. (Larger symbol sets can exist (e.g. the example of Unicode) but it's a reasonabl assumption that the symbol set is finite. Still, it can be large.) If we have a DFA built from certain lexical rules, and we extend those rules, it could be the case that existing states blow up in size, because of having to branch to many more other states on new characters that were not included in the old rules. If a machine initially recongnizes the set of strings { "foo", "bar" } then its state zero might have a condensed representation of two transitions on the sybmols 'f' and 'b', such as a tiny two-element array that is subject to a linear search. If we extend the recognized set of strings, state 0 may have to branch on more symbols than just 'f' and 'b'. Depending on how we are representing states, the size of the object may change. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal [No existing computer is actually Turing equivalent because every real computer has a finite memory, but we manage to do O(whatever) calculations anyway. I'm not sure how productive this line of argument is beyond noting what sorts of tasks are or aren't likely to fit in real computers. -John]