Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeder.erje.net!news-1.dfn.de!news.dfn.de!cache.uni-koblenz.de!not-for-mail From: Tassilo Horn Newsgroups: comp.lang.java.programmer Subject: Re: Fast String search algorithm Date: Fri, 16 Dec 2011 20:07:25 +0100 Organization: University Koblenz-Landau Campus Koblenz Lines: 25 Message-ID: <87r504s5ci.fsf@thinkpad.tsdh.de> References: NNTP-Posting-Host: 91-67-11-43-dynip.superkabel.de Mime-Version: 1.0 Content-Type: text/plain X-Trace: cache.uni-koblenz.de 1324062446 30966 91.67.11.43 (16 Dec 2011 19:07:26 GMT) X-Complaints-To: news@cache.uni-koblenz.de NNTP-Posting-Date: Fri, 16 Dec 2011 19:07:26 +0000 (UTC) User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.92 (gnu/linux) Cancel-Lock: sha1:UCJwt5ybRyMv04CnMACCyEyG8jY= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:10809 Roedy Green writes: > Let's say I had a million records each with a text field. I wanted to > find all records that contained a given substring. Are there fast > algorithms to do that or do you have to scan the whole thing linearly? Sure. See http://en.wikipedia.org/wiki/Substring_index > Similarly, I was wondering how you might build an index to all the > files on a computer so that you could find one just by giving a wild > card or partial filename. On UNIXes, that's exactly what the locate tool does. Then you can search with globbing wildcards % # all files containing the substring foo % locate *foo* or even with regular expressions % # all files that end with "r.c" or "r.cpp" % locate -r '.*r\.\(c\|cpp\)$' Bye, Tassilo