Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2968

Re: Integer sizes and DFAs

Path csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From gah4 <gah4@u.washington.edu>
Newsgroups comp.compilers
Subject Re: Integer sizes and DFAs
Date Sat, 26 Mar 2022 19:32:17 -0700 (PDT)
Organization Compilers Central
Lines 25
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-03-074@comp.compilers> (permalink)
References <22-03-073@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="97762"; mail-complaints-to="abuse@iecc.com"
Keywords lex, performance
Posted-Date 26 Mar 2022 22:39:33 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
In-Reply-To <22-03-073@comp.compilers>
Xref csiph.com comp.compilers:2968

Show key headers only | View raw


On Saturday, March 26, 2022 at 4:42:55 PM UTC-7, Christopher F Clark wrote:

(snip)

> And, my point was 2**32 is large enough to be considered arbitrarily large with
> respect to most DFAs. Not quite the human genome, see extended analysis
> below. Here was my first analysis.

About 24 years ago I was working with a DNA sequencing group, and was
interested in speeding up this problem.  The one I was most interested in
was special purpose hardware with many of the largest DRAM I could find,
arranged just to do this operation.

(Note that you need one more bit, to indicate when a match is found.)

There would be logic to read data off disk, and pass it directly to the DFA
array.  There is, then, logic to store the offset into the disk file, and the
state at which the hit occured, to be read out later.

But we went onto other projects, and I never got to build one.

Since then, DRAM has gotten much larger, but so has the DNA database.

Yes the human genome is 3 gigabase, but the whole of GenBank is
now about 16 terabase, including WGS (whole genome sequences).

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

RE: Integer sizes and DFAs Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-27 00:54 +0200
  Re: Integer sizes and DFAs gah4 <gah4@u.washington.edu> - 2022-03-26 19:32 -0700
    RE: Integer sizes and DFAs Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-27 15:02 +0300
  Re: Integer sizes and DFAs gah4 <gah4@u.washington.edu> - 2022-03-26 19:45 -0700

csiph-web