Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.lang.java.programmer > #5134

Re: Storing large strings for future equality checks

From rossum <rossum48@coldmail.com>
Newsgroups comp.lang.java.programmer, comp.programming, comp.lang.java.databases
Subject Re: Storing large strings for future equality checks
Date 2011-06-08 21:38 +0100
Message-ID <eimvu65abk3j0n0l80a85u979bkij3rv8a@4ax.com> (permalink)
References <iso8cm$a80$1@speranza.aioe.org>

Cross-posted to 3 groups.

Show all headers | View raw


On Wed, 08 Jun 2011 22:05:30 +0530, Abu Yahya <abu_yahya@invalid.com>
wrote:

>A small application that I'm making requires me to store very long 
>strings (>1000 characters) in a database.
>
>I will need to use these strings later to compare for equality to 
>incoming strings from another application. I will also want to add some 
>of the incoming strings to the storage, if they meet certain criteria.
>
>For my application, I get a feeling that storing these strings in my 
>table will be a waste of space, and will impact performance due to 
>retrieval and storage times, as well as comparison times.
>
>I considered using an SHA-512 hash of these strings and storing them in 
>the database. However, while these will save on storage space, it will 
>take time to do the hashing before comparing an incoming string. So I'm 
>still wasting time. (Collisions due to hashing will not be a problem, 
>since an occasional false positive will not be fatal for my application).
>
>What would be the best approach?
As others have said, write the simple obvious approach and see if that
is good enough.  Tune where required after measuring.

Lothar's suggestion of using SHA-1 is good.  You could even drop back
to MD-4 if you are sure that nobody is going to be deliberately trying
to create false collisions.  MD-4 is much too badly broken for any
cryptographic purposes, but is even faster than SHA-1.

If the amount of storage needed is a problem then you might want to
zip the strings before storing them.

If you can be sure that the zipped versions are identical (not always
possible with unicode combining characters) then you could hash the
zipped version rather than the originals for more time saving.

rossum

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 22:05 +0530
  Re: Storing large strings for future equality checks markspace <-@.> - 2011-06-08 09:49 -0700
    Re: Storing large strings for future equality checks Willem <willem@toad.stack.nl> - 2011-06-08 17:28 +0000
      Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
  Re: Storing large strings for future equality checks David Kerber <dkerber@WarrenRogersAssociates.invalid> - 2011-06-08 12:58 -0400
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:49 +0530
    Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:31 +0200
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-09 10:50 +0200
        Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-09 11:44 +0100
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-10 10:15 +0200
  Re: Storing large strings for future equality checks Gene Wirchenko <genew@ocis.net> - 2011-06-08 11:07 -0700
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:58 +0530
    Re: Storing large strings for future equality checks Hallvard B Furuseth <h.b.furuseth@usit.uio.no> - 2011-06-09 12:38 +0200
    Re: Storing large strings for future equality checks Michael Wojcik <mwojcik@newsguy.com> - 2011-06-09 17:32 -0400
      Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-10 10:51 +0100
  Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:28 +0200
    Re: Storing large strings for future equality checks Martin Gregorie <martin@address-in-sig.invalid> - 2011-06-08 22:02 +0000
  Re: Storing large strings for future equality checks rossum <rossum48@coldmail.com> - 2011-06-08 21:38 +0100
  Re: Storing large strings for future equality checks Robert Klemme <shortcutter@googlemail.com> - 2011-06-08 23:20 +0200
  Re: Storing large strings for future equality checks Tom Anderson <twic@urchin.earth.li> - 2011-06-08 23:02 +0100
  Re: Storing large strings for future equality checks Joshua Maurice <joshuamaurice@gmail.com> - 2011-06-09 15:01 -0700

csiph-web