Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #5166

Re: Storing large strings for future equality checks

From Michael Wojcik <mwojcik@newsguy.com>
Newsgroups comp.lang.java.programmer, comp.programming, comp.lang.java.databases
Subject Re: Storing large strings for future equality checks
Date 2011-06-09 17:32 -0400
Organization Micro Focus
Message-ID <isre5k32k44@news3.newsguy.com> (permalink)
References <iso8cm$a80$1@speranza.aioe.org> <g6evu6t9847r0g5u2nj17aj61gsktspb93@4ax.com>

Cross-posted to 3 groups.

Show all headers | View raw


Gene Wirchenko wrote:
> On Wed, 08 Jun 2011 22:05:30 +0530, Abu Yahya <abu_yahya@invalid.com>
> wrote:
> 
>> I considered using an SHA-512 hash of these strings and storing them in 
>> the database. However, while these will save on storage space, it will 
>> take time to do the hashing before comparing an incoming string. So I'm 
>> still wasting time. (Collisions due to hashing will not be a problem, 
>> since an occasional false positive will not be fatal for my application).
> 
>      What does "occasional" mean?

Assuming the application doesn't accidentally run afoul of a
hitherto-unknown flaw in SHA-512 - a tremendously unlikely event - it
means "once in every N/2**256 uses", where N is the current number of
hashes in the database. (2**256 because of the Birthday Paradox; we're
interested in the probability of two arbitrary colliding pre-images.)

Or, in other words, "probably not before the heat death of the
universe". (Or false vacuum decay, zombie apocalypse, Rapture, etc.)

Worrying about the time to hash the string is silly. It's linear in
the length of the string, so it's roughly equivalent to the time taken
to do a few comparisons (where "a few" depends on how long, on
average, the matching prefix of the new string and the candidates is).

However, as various others have pointed out, this is premature
optimization. There's no reason to use any design other than the
obvious until a problem is demonstrated.

-- 
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 22:05 +0530
  Re: Storing large strings for future equality checks markspace <-@.> - 2011-06-08 09:49 -0700
    Re: Storing large strings for future equality checks Willem <willem@toad.stack.nl> - 2011-06-08 17:28 +0000
      Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
  Re: Storing large strings for future equality checks David Kerber <dkerber@WarrenRogersAssociates.invalid> - 2011-06-08 12:58 -0400
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:49 +0530
    Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:31 +0200
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-09 10:50 +0200
        Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-09 11:44 +0100
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-10 10:15 +0200
  Re: Storing large strings for future equality checks Gene Wirchenko <genew@ocis.net> - 2011-06-08 11:07 -0700
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:58 +0530
    Re: Storing large strings for future equality checks Hallvard B Furuseth <h.b.furuseth@usit.uio.no> - 2011-06-09 12:38 +0200
    Re: Storing large strings for future equality checks Michael Wojcik <mwojcik@newsguy.com> - 2011-06-09 17:32 -0400
      Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-10 10:51 +0100
  Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:28 +0200
    Re: Storing large strings for future equality checks Martin Gregorie <martin@address-in-sig.invalid> - 2011-06-08 22:02 +0000
  Re: Storing large strings for future equality checks rossum <rossum48@coldmail.com> - 2011-06-08 21:38 +0100
  Re: Storing large strings for future equality checks Robert Klemme <shortcutter@googlemail.com> - 2011-06-08 23:20 +0200
  Re: Storing large strings for future equality checks Tom Anderson <twic@urchin.earth.li> - 2011-06-08 23:02 +0100
  Re: Storing large strings for future equality checks Joshua Maurice <joshuamaurice@gmail.com> - 2011-06-09 15:01 -0700

csiph-web