Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail From: Abu Yahya Newsgroups: comp.lang.java.programmer,comp.programming,comp.lang.java.databases Subject: Re: Storing large strings for future equality checks Date: Wed, 08 Jun 2011 23:58:31 +0530 Organization: Aioe.org NNTP Server Lines: 19 Message-ID: References: NNTP-Posting-Host: LePVoNEtezBuiMA9+cM5gA.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 X-Notice: Filtered by postfilter v. 0.8.2 Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5129 comp.programming:445 comp.lang.java.databases:467 On 6/8/2011 11:37 PM, Gene Wirchenko wrote: > On Wed, 08 Jun 2011 22:05:30 +0530, Abu Yahya > wrote: > >> I considered using an SHA-512 hash of these strings and storing them in >> the database. However, while these will save on storage space, it will >> take time to do the hashing before comparing an incoming string. So I'm >> still wasting time. (Collisions due to hashing will not be a problem, >> since an occasional false positive will not be fatal for my application). > > What does "occasional" mean? > The application will be doing some probabilistic evaluation using the data it is storing, and needs to be fairly close to the actual /most/ of the time. Because it would anyway be impossible to be correct 100% of the time, using a wrong value for a particular piece of input data will not matter, provided it does not happen too often. (In my text above, "rare" would have been a better choice of word than "occasional")