Groups | Search | Server Info | Login | Register


Groups > comp.compilers > #162

Re: Matching very large patterns, was Re: Inverse grep

From glen herrmannsfeldt <gah@ugcs.caltech.edu>
Newsgroups comp.compilers
Subject Re: Matching very large patterns, was Re: Inverse grep
Date 2011-06-20 04:52 +0000
Organization A noiseless patient Spider
Message-ID <11-06-036@comp.compilers> (permalink)
References <11-06-015@comp.compilers> <11-06-022@comp.compilers> <11-06-033@comp.compilers>

Show all headers | View raw


Chris F Clark <cfc@shell01.theworld.com> wrote:
> Our moderator asked:
>> How serious is the explosion problem in practice, now that compilers
>> no longer have to fit into 64k? We use lex scanners with a thousand
>> tokens, and spamassassin feeds about 2000 patterns to re2c if you use
>> the compiled patterns feature. None of them seem to strain our storage
>> systems. -John

> For compilers on mainline chips this is generally not a problem.
> However, this is such a useful tool, it gets used in areas where the
> problems are much-much larger (or the device is smaller).  The
> Aho-Corasick algorithm is one of the key algorithms in matching a set
> of strings to a large set of incoming data. The two places where I've
> seen it used and where size is a problem is networking and
> computational biology.

The favorite for computational biology for many years was dynamic
programming, though often it is too slow.  The dynamic programming
algorithm used for diff originated in the one Needleman-Wunsch used
for protein sequences.  (I believe that it is referenced in the diff
documentation.)

More recently, the Burrows-Wheeler transform, commonly used in
compression, has become popular for computational biology.

BLAST is based on a DFA similar to the first part of Aho-Corasick, but
dynamic programming is used instead of the usual second part.

-- glen

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Inverse grep glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-06-08 23:01 +0000
  Re: Inverse grep Chris F Clark <cfc@shell01.TheWorld.com> - 2011-06-12 14:16 -0400
    Re: Inverse grep Tony Finch <dot@dotat.at> - 2011-06-13 17:48 +0100
  Re: Inverse grep torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-06-14 11:16 +0200
    Re: Inverse grep anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2011-06-15 12:28 +0000
    Matching very large patterns, was Re: Inverse grep Chris F Clark <cfc@shell01.TheWorld.com> - 2011-06-19 21:45 -0400
      Re: Matching very large patterns, was Re: Inverse grep glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-06-20 04:52 +0000
  Re: Inverse grep Paul Wankadia <junyer@gmail.com> - 2012-05-28 07:50 -0700

csiph-web