Groups | Search | Server Info | Keyboard shortcuts | Login | Register
Groups > comp.lang.java.programmer > #5134
| From | rossum <rossum48@coldmail.com> |
|---|---|
| Newsgroups | comp.lang.java.programmer, comp.programming, comp.lang.java.databases |
| Subject | Re: Storing large strings for future equality checks |
| Date | 2011-06-08 21:38 +0100 |
| Message-ID | <eimvu65abk3j0n0l80a85u979bkij3rv8a@4ax.com> (permalink) |
| References | <iso8cm$a80$1@speranza.aioe.org> |
Cross-posted to 3 groups.
On Wed, 08 Jun 2011 22:05:30 +0530, Abu Yahya <abu_yahya@invalid.com> wrote: >A small application that I'm making requires me to store very long >strings (>1000 characters) in a database. > >I will need to use these strings later to compare for equality to >incoming strings from another application. I will also want to add some >of the incoming strings to the storage, if they meet certain criteria. > >For my application, I get a feeling that storing these strings in my >table will be a waste of space, and will impact performance due to >retrieval and storage times, as well as comparison times. > >I considered using an SHA-512 hash of these strings and storing them in >the database. However, while these will save on storage space, it will >take time to do the hashing before comparing an incoming string. So I'm >still wasting time. (Collisions due to hashing will not be a problem, >since an occasional false positive will not be fatal for my application). > >What would be the best approach? As others have said, write the simple obvious approach and see if that is good enough. Tune where required after measuring. Lothar's suggestion of using SHA-1 is good. You could even drop back to MD-4 if you are sure that nobody is going to be deliberately trying to create false collisions. MD-4 is much too badly broken for any cryptographic purposes, but is even faster than SHA-1. If the amount of storage needed is a problem then you might want to zip the strings before storing them. If you can be sure that the zipped versions are identical (not always possible with unicode combining characters) then you could hash the zipped version rather than the originals for more time saving. rossum
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 22:05 +0530
Re: Storing large strings for future equality checks markspace <-@.> - 2011-06-08 09:49 -0700
Re: Storing large strings for future equality checks Willem <willem@toad.stack.nl> - 2011-06-08 17:28 +0000
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
Re: Storing large strings for future equality checks David Kerber <dkerber@WarrenRogersAssociates.invalid> - 2011-06-08 12:58 -0400
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:49 +0530
Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:31 +0200
Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-09 10:50 +0200
Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-09 11:44 +0100
Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-10 10:15 +0200
Re: Storing large strings for future equality checks Gene Wirchenko <genew@ocis.net> - 2011-06-08 11:07 -0700
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:58 +0530
Re: Storing large strings for future equality checks Hallvard B Furuseth <h.b.furuseth@usit.uio.no> - 2011-06-09 12:38 +0200
Re: Storing large strings for future equality checks Michael Wojcik <mwojcik@newsguy.com> - 2011-06-09 17:32 -0400
Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-10 10:51 +0100
Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:28 +0200
Re: Storing large strings for future equality checks Martin Gregorie <martin@address-in-sig.invalid> - 2011-06-08 22:02 +0000
Re: Storing large strings for future equality checks rossum <rossum48@coldmail.com> - 2011-06-08 21:38 +0100
Re: Storing large strings for future equality checks Robert Klemme <shortcutter@googlemail.com> - 2011-06-08 23:20 +0200
Re: Storing large strings for future equality checks Tom Anderson <twic@urchin.earth.li> - 2011-06-08 23:02 +0100
Re: Storing large strings for future equality checks Joshua Maurice <joshuamaurice@gmail.com> - 2011-06-09 15:01 -0700
csiph-web