Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!goblin3!goblin.stu.neva.ru!newsfeed2.funet.fi!newsfeeds.funet.fi!uio.no!nntp.uio.no!.POSTED!not-for-mail From: Hallvard B Furuseth Newsgroups: comp.lang.java.programmer,comp.programming,comp.lang.java.databases Subject: Re: Storing large strings for future equality checks Date: Thu, 09 Jun 2011 12:38:08 +0200 Organization: University of Oslo, Norway Lines: 24 Message-ID: References: NNTP-Posting-Host: bombur.uio.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: readme.uio.no 1307615894 2367 129.240.6.233 (9 Jun 2011 10:38:14 GMT) X-Complaints-To: abuse@uio.no NNTP-Posting-Date: Thu, 9 Jun 2011 10:38:14 +0000 (UTC) User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) Cancel-Lock: sha1:sFWa5FGiTTFRab6rHhOfeNWSp7A= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5154 comp.programming:450 comp.lang.java.databases:472 Gene Wirchenko writes: >On Wed, 08 Jun 2011 22:05:30 +0530, Abu Yahya >wrote: >>For my application, I get a feeling that storing these strings in my >>table will be a waste of space, and will impact performance due to >>retrieval and storage times, as well as comparison times. > > Your feeling is irrelevant. You should benchmark. If you do > not, you may end up jumping through hoops for something that is > unneeded (though you may never find out it that it is unneeded). Indeed. No point in using a lot of time to solve a non-problem. But after the benchmark, the decision can depend on who the application is for. If it scales poorly, that can bite other users with different input data. OTOH if he'll be the only user and he finds that full strings and SHA are both too slow: Another approach would be to use a faster hash, count hash collisions, and don't bother with more if the count is acceptable. Or try tries, as Tom Anderson suggested. -- Hallvard