Groups | Search | Server Info | Keyboard shortcuts | Login | Register
Groups > comp.lang.java.programmer > #5123
| From | Abu Yahya <abu_yahya@invalid.com> |
|---|---|
| Newsgroups | comp.lang.java.programmer, comp.programming, comp.lang.java.databases |
| Subject | Re: Storing large strings for future equality checks |
| Date | 2011-06-08 23:45 +0530 |
| Organization | Aioe.org NNTP Server |
| Message-ID | <isoe78$pdj$1@speranza.aioe.org> (permalink) |
| References | <iso8cm$a80$1@speranza.aioe.org> <iso96r$pml$1@dont-email.me> |
Cross-posted to 3 groups.
On 6/8/2011 10:19 PM, markspace wrote: > On 6/8/2011 9:35 AM, Abu Yahya wrote: > >> I considered using an SHA-512 hash of these strings and storing them in >> the database. However, while these will save on storage space, it will >> take time to do the hashing before comparing an incoming string. So I'm >> still wasting time. (Collisions due to hashing will not be a problem, >> since an occasional false positive will not be fatal for my application). > > > You have to store the whole string. Even if the SHA-512 hash codes are > equal, it could be that the strings are different. You'll have to > eventually compare the raw string, even if the SHA is used as a > quick-out case. You're right about comparing the whole strings anyway, but, for this application I'm creating, I wouldn't mind very very rare incorrect results. > > No one can really tell what is "faster" or "wasting time" until you can > better characterize the usage patterns. How big can these strings get? > How often will you get an actual duplicate? What's the penalty when you > need to add a new string? You'll need to implement a few algorithms, > profile them and then make a decision based on actual data. That makes sense. I'd need to analyze my input data and then run some empirical tests. > > For Java, I'd store the strings in a WeakHashMap or similar to allow > them to be cached, but tossed away when more storage is needed. Also you > should look into getting some DB caching library, much easier than > implementing this yourself (sorry I can't personally recommend any). The WeakHashMap idea looks useful. As for a DB caching library, would you recommend this as a replacement for the WeakHashMap, or as a complement? Thanks for the useful pointers.
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 22:05 +0530
Re: Storing large strings for future equality checks markspace <-@.> - 2011-06-08 09:49 -0700
Re: Storing large strings for future equality checks Willem <willem@toad.stack.nl> - 2011-06-08 17:28 +0000
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
Re: Storing large strings for future equality checks David Kerber <dkerber@WarrenRogersAssociates.invalid> - 2011-06-08 12:58 -0400
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:49 +0530
Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:31 +0200
Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-09 10:50 +0200
Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-09 11:44 +0100
Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-10 10:15 +0200
Re: Storing large strings for future equality checks Gene Wirchenko <genew@ocis.net> - 2011-06-08 11:07 -0700
Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:58 +0530
Re: Storing large strings for future equality checks Hallvard B Furuseth <h.b.furuseth@usit.uio.no> - 2011-06-09 12:38 +0200
Re: Storing large strings for future equality checks Michael Wojcik <mwojcik@newsguy.com> - 2011-06-09 17:32 -0400
Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-10 10:51 +0100
Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:28 +0200
Re: Storing large strings for future equality checks Martin Gregorie <martin@address-in-sig.invalid> - 2011-06-08 22:02 +0000
Re: Storing large strings for future equality checks rossum <rossum48@coldmail.com> - 2011-06-08 21:38 +0100
Re: Storing large strings for future equality checks Robert Klemme <shortcutter@googlemail.com> - 2011-06-08 23:20 +0200
Re: Storing large strings for future equality checks Tom Anderson <twic@urchin.earth.li> - 2011-06-08 23:02 +0100
Re: Storing large strings for future equality checks Joshua Maurice <joshuamaurice@gmail.com> - 2011-06-09 15:01 -0700
csiph-web