Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2967
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | RE: Integer sizes and DFAs |
| Date | 2022-03-27 00:54 +0200 |
| Organization | Compilers Central |
| Message-ID | <22-03-073@comp.compilers> (permalink) |
Kaz, I don't know whether you simply didn't read what I wrote or chose to ignore it. > In certain common situations we model elementary operations as being > O(1), with certain justifications. That simplifies the analysis, while > leaving its results still relevant and true for practical applications > running on present hardware. And, my point was 2**32 is large enough to be considered arbitrarily large with respect to most DFAs. Not quite the human genome, see extended analysis below. Here was my first analysis. me > Note how that even works for the DNA DFAs, assuming 32 bit integers me > and a byte addressed machine. With that model, you can fit 10 million me > (1E7) states in the 2 billion byte (2GB) address space (each state has me > 4, 4 byte entries or 16 bytes). > Each 32 bit state has 4 byte entries only if it doesn't express that > it branches to other states for certain input symbols. A state isn't 32 bits, a transition is. A transition (in a DFA) is just the address of another state. A state in a DFA for DNA has 4 such transitions (one for A, T, C, and G--the 4 DNA bases). Those are the only legal symbols in DNA. So a state is 4 * 4 bytes, each transition is 4 bytes and there are 4 of them per state. Thus, I explained what the justification is and how that allows more than 100 million states in the combined DFA. That's without any fancy tricks. Now, I just looked up the size of the human genome. it is 3 billion, so that's a little more than another order or magnitude bigger, so you definitely need slightly bigger integers (or to do some compression, although 64 bit integers are overkill, but convenient to work with and with those you could easily fit the genome in an 128 GB SSD, which would be addressible by 64 bit integer/pointers). And, for most other applications even 1 million states are more than sufficient. Thus, saying a transition can happen in O(1) time is a perfectly reasonable assumption. You can layout nearly any DFA we expect to see in memory and directly address it. Thus, the RAM model holds. By the way with 40 bit integers, you could fit it in 60GB (20 bytes per state and 3G states), We don't quite have laptops with that much DRAM memory yet. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
Back to comp.compilers | Previous | Next — Next in thread | Find similar
RE: Integer sizes and DFAs Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-27 00:54 +0200
Re: Integer sizes and DFAs gah4 <gah4@u.washington.edu> - 2022-03-26 19:32 -0700
RE: Integer sizes and DFAs Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-27 15:02 +0300
Re: Integer sizes and DFAs gah4 <gah4@u.washington.edu> - 2022-03-26 19:45 -0700
csiph-web