Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: markspace <-@.> Newsgroups: comp.lang.java.programmer Subject: Re: Fast String search algorithm Date: Sat, 17 Dec 2011 09:26:44 -0800 Organization: A noiseless patient Spider Lines: 25 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 17 Dec 2011 17:26:46 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="XjIWM99mD7Ijfdu600oVPA"; logging-data="22320"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19qnEeW5ECzOgeuLzhjZB7UH/cseEHJZiA=" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 In-Reply-To: Cancel-Lock: sha1:ppYBcKSEIqSU83LuRiygEHxRSHQ= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:10828 On 12/16/2011 10:58 AM, Tom Anderson wrote: > On Fri, 16 Dec 2011, Roedy Green wrote: > >> Let's say I had a million records each with a text field. I wanted to >> find all records that contained a given substring. Are there fast >> algorithms to do that or do you have to scan the whole thing linearly? > > http://en.wikipedia.org/wiki/Inverted_index > An implementation of an inverted index is often called a search engine. One of the best known implementations is a web search engine. Alt Vista, Yahoo, and Google are well known examples. There's a Java based search engine I happen to know about, Lucene. This is a roll-your-own library, but it has a lot of features. Beyond a cursory reading of the documentation I can't really say much about it unfortunately.