Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #10853

Re: Fast String search algorithm

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail
From Travers Naran <tnaran@gmail.com>
Newsgroups comp.lang.java.programmer
Subject Re: Fast String search algorithm
Date Sun, 18 Dec 2011 09:17:47 -0800
Organization A noiseless patient Spider
Lines 21
Message-ID <jcl77s$g27$1@dont-email.me> (permalink)
References <i02ne7ptvvl8ofv9uracdu2jm3oq61v2h3@4ax.com>
Mime-Version 1.0
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
Injection-Date Sun, 18 Dec 2011 17:17:49 +0000 (UTC)
Injection-Info mx04.eternal-september.org; posting-host="AaN2uAKV8098/kNQ464RLg"; logging-data="16455"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180GAn0UFGHpWPlzZynT8g7"
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0
In-Reply-To <i02ne7ptvvl8ofv9uracdu2jm3oq61v2h3@4ax.com>
Cancel-Lock sha1:jXOEvC5j6MGlIsYRyXsWpNHgMAw=
Xref x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:10853

Show key headers only | View raw


On 16/12/2011 10:08 AM, Roedy Green wrote:
> Let's say I had a million records each with a text field.  I wanted to
> find all records that contained a given substring.  Are there fast
> algorithms to do that or do you have to scan the whole thing linearly?
>
> I imagine you could build an index of words.  Then maybe you could
> figure out which words contain the string, and narrow your search to
> records with those words.
>
> Similarly, I was wondering how you might build an index to all the
> files on a computer so that you could find one just by giving a wild
> card or partial filename.

In addition to the other links mentioned, here are some other links I 
found helpful for the same problem:

http://en.wikipedia.org/wiki/Bloom_filter
http://en.wikipedia.org/wiki/Bitap_algorithm
http://johannburkard.de/software/stringsearch/

The Bloom Filter can be used to help reduce the set of files to search.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Fast String search algorithm Roedy Green <see_website@mindprod.com.invalid> - 2011-12-16 10:08 -0800
  Re: Fast String search algorithm Tom Anderson <twic@urchin.earth.li> - 2011-12-16 18:58 +0000
    Re: Fast String search algorithm markspace <-@.> - 2011-12-17 09:26 -0800
      Re: Fast String search algorithm Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-12-17 21:07 -0800
  Re: Fast String search algorithm Tassilo Horn <tassilo@member.fsf.org> - 2011-12-16 20:07 +0100
    Re: Fast String search algorithm Roedy Green <see_website@mindprod.com.invalid> - 2011-12-16 13:32 -0800
  Re: Fast String search algorithm Gene Wirchenko <genew@ocis.net> - 2011-12-16 12:12 -0800
    Re: Fast String search algorithm glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-12-16 21:22 +0000
  Re: Fast String search algorithm Arne Vajhøj <arne@vajhoej.dk> - 2011-12-16 22:20 -0500
    Re: Fast String search algorithm Lew <lewbloch@gmail.com> - 2011-12-16 21:39 -0800
  Re: Fast String search algorithm Wanja Gayk <brixomatic@yahoo.com> - 2011-12-17 16:15 +0100
  Re: Fast String search algorithm Travers Naran <tnaran@gmail.com> - 2011-12-18 09:17 -0800

csiph-web