Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.lang.java.programmer > #5128

Re: Storing large strings for future equality checks

Path csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!feeder.erje.net!news.internetdienste.de!news.tu-darmstadt.de!news.belwue.de!rz.uni-karlsruhe.de!feed.news.schlund.de!schlund.de!news.online.de!not-for-mail
From Lothar Kimmeringer <news200709@kimmeringer.de>
Newsgroups comp.lang.java.programmer, comp.programming, comp.lang.java.databases
Subject Re: Storing large strings for future equality checks
Followup-To comp.lang.java.programmer
Date Wed, 8 Jun 2011 20:28:11 +0200
Organization Organization?! Only chaos here!
Lines 42
Message-ID <171dpt2926br2.dlg@kimmeringer.de> (permalink)
References <iso8cm$a80$1@speranza.aioe.org>
Reply-To news@kimmeringer.de
NNTP-Posting-Host mnch-4d044605.pool.mediaways.net
Mime-Version 1.0
Content-Type text/plain; charset="us-ascii"
Content-Transfer-Encoding 7bit
X-Trace online.de 1307557691 32691 77.4.70.5 (8 Jun 2011 18:28:11 GMT)
X-Complaints-To abuse@einsundeins.com
NNTP-Posting-Date Wed, 8 Jun 2011 18:28:11 +0000 (UTC)
User-Agent 40tude_Dialog/2.0.15.1de
Xref x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5128 comp.programming:444 comp.lang.java.databases:466

Cross-posted to 3 groups.

Followups directed to: comp.lang.java.programmer

Show key headers only | View raw


F'up to cljp

Abu Yahya wrote:

> I considered using an SHA-512 hash of these strings and storing them in 
> the database. However, while these will save on storage space, it will 
> take time to do the hashing before comparing an incoming string. So I'm 
> still wasting time. (Collisions due to hashing will not be a problem, 
> since an occasional false positive will not be fatal for my application).

If you write seldom and read often, why not using two columns:

string_hashcode
sha1_hashcode

If the first is equal, you can calculate the sha1-hash for the string
to be checked and if that is equal as well, you can consider the
string as equal. That both hashes collide I expect to be very
very unlikely (which is why I changed the other alg to sha-1, that
should be considerably more performant than sha512).

So calculation of the more complex algorithm is only done while
storing to the database and when checking a string that is already
in the database. If you have that case very often you still might
get a better performance with String.hashcode and SHA1 than with
just SHA512.

> What would be the best approach?

There is no single best approach, only an optimal one. Which
one it is dependend on what defines one way to be better than
the other (in terms of performance, storage-space, collision-
rates, etc).


Regards, Lothar
-- 
Lothar Kimmeringer                E-Mail: spamfang@kimmeringer.de
               PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
                 questions!

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 22:05 +0530
  Re: Storing large strings for future equality checks markspace <-@.> - 2011-06-08 09:49 -0700
    Re: Storing large strings for future equality checks Willem <willem@toad.stack.nl> - 2011-06-08 17:28 +0000
      Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
  Re: Storing large strings for future equality checks David Kerber <dkerber@WarrenRogersAssociates.invalid> - 2011-06-08 12:58 -0400
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:49 +0530
    Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:31 +0200
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-09 10:50 +0200
        Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-09 11:44 +0100
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-10 10:15 +0200
  Re: Storing large strings for future equality checks Gene Wirchenko <genew@ocis.net> - 2011-06-08 11:07 -0700
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:58 +0530
    Re: Storing large strings for future equality checks Hallvard B Furuseth <h.b.furuseth@usit.uio.no> - 2011-06-09 12:38 +0200
    Re: Storing large strings for future equality checks Michael Wojcik <mwojcik@newsguy.com> - 2011-06-09 17:32 -0400
      Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-10 10:51 +0100
  Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:28 +0200
    Re: Storing large strings for future equality checks Martin Gregorie <martin@address-in-sig.invalid> - 2011-06-08 22:02 +0000
  Re: Storing large strings for future equality checks rossum <rossum48@coldmail.com> - 2011-06-08 21:38 +0100
  Re: Storing large strings for future equality checks Robert Klemme <shortcutter@googlemail.com> - 2011-06-08 23:20 +0200
  Re: Storing large strings for future equality checks Tom Anderson <twic@urchin.earth.li> - 2011-06-08 23:02 +0100
  Re: Storing large strings for future equality checks Joshua Maurice <joshuamaurice@gmail.com> - 2011-06-09 15:01 -0700

csiph-web