Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2968

Re: Integer sizes and DFAs

From gah4 <gah4@u.washington.edu>
Newsgroups comp.compilers
Subject Re: Integer sizes and DFAs
Date 2022-03-26 19:32 -0700
Organization Compilers Central
Message-ID <22-03-074@comp.compilers> (permalink)
References <22-03-073@comp.compilers>

Show all headers | View raw


On Saturday, March 26, 2022 at 4:42:55 PM UTC-7, Christopher F Clark wrote:

(snip)

> And, my point was 2**32 is large enough to be considered arbitrarily large with
> respect to most DFAs. Not quite the human genome, see extended analysis
> below. Here was my first analysis.

About 24 years ago I was working with a DNA sequencing group, and was
interested in speeding up this problem.  The one I was most interested in
was special purpose hardware with many of the largest DRAM I could find,
arranged just to do this operation.

(Note that you need one more bit, to indicate when a match is found.)

There would be logic to read data off disk, and pass it directly to the DFA
array.  There is, then, logic to store the offset into the disk file, and the
state at which the hit occured, to be read out later.

But we went onto other projects, and I never got to build one.

Since then, DRAM has gotten much larger, but so has the DNA database.

Yes the human genome is 3 gigabase, but the whole of GenBank is
now about 16 terabase, including WGS (whole genome sequences).

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

RE: Integer sizes and DFAs Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-27 00:54 +0200
  Re: Integer sizes and DFAs gah4 <gah4@u.washington.edu> - 2022-03-26 19:32 -0700
    RE: Integer sizes and DFAs Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-27 15:02 +0300
  Re: Integer sizes and DFAs gah4 <gah4@u.washington.edu> - 2022-03-26 19:45 -0700

csiph-web