Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.lang.java.programmer > #5174

Re: Storing large strings for future equality checks

Date 2011-06-10 10:51 +0100
From bugbear <bugbear@trim_papermule.co.uk_trim>
Newsgroups comp.lang.java.programmer, comp.programming, comp.lang.java.databases
Subject Re: Storing large strings for future equality checks
References <iso8cm$a80$1@speranza.aioe.org> <g6evu6t9847r0g5u2nj17aj61gsktspb93@4ax.com> <isre5k32k44@news3.newsguy.com>
Message-ID <z6qdnRH0SMqAdGzQnZ2dnUVZ8n-dnZ2d@brightview.co.uk> (permalink)

Cross-posted to 3 groups.

Show all headers | View raw


Michael Wojcik wrote:
> Gene Wirchenko wrote:
>> On Wed, 08 Jun 2011 22:05:30 +0530, Abu Yahya<abu_yahya@invalid.com>
>> wrote:
>>
>>> I considered using an SHA-512 hash of these strings and storing them in
>>> the database. However, while these will save on storage space, it will
>>> take time to do the hashing before comparing an incoming string. So I'm
>>> still wasting time. (Collisions due to hashing will not be a problem,
>>> since an occasional false positive will not be fatal for my application).
>>
>>       What does "occasional" mean?
>
> Assuming the application doesn't accidentally run afoul of a
> hitherto-unknown flaw in SHA-512 - a tremendously unlikely event - it
> means "once in every N/2**256 uses", where N is the current number of
> hashes in the database. (2**256 because of the Birthday Paradox; we're
> interested in the probability of two arbitrary colliding pre-images.)
>
> Or, in other words, "probably not before the heat death of the
> universe". (Or false vacuum decay, zombie apocalypse, Rapture, etc.)
>
> Worrying about the time to hash the string is silly. It's linear in
> the length of the string, so it's roughly equivalent to the time taken
> to do a few comparisons (where "a few" depends on how long, on
> average, the matching prefix of the new string and the candidates is).
>
> However, as various others have pointed out, this is premature
> optimization. There's no reason to use any design other than the
> obvious until a problem is demonstrated.

I any case, it's plausible that the DB is using "something"
to index the strings, in other words the string search in the DB
already has some-kind-of optimisation.

   BugBear

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 22:05 +0530
  Re: Storing large strings for future equality checks markspace <-@.> - 2011-06-08 09:49 -0700
    Re: Storing large strings for future equality checks Willem <willem@toad.stack.nl> - 2011-06-08 17:28 +0000
      Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:45 +0530
  Re: Storing large strings for future equality checks David Kerber <dkerber@WarrenRogersAssociates.invalid> - 2011-06-08 12:58 -0400
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:49 +0530
    Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:31 +0200
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-09 10:50 +0200
        Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-09 11:44 +0100
      Re: Storing large strings for future equality checks Harry Tuttle <OTPXDAJCSJVU@spammotel.com> - 2011-06-10 10:15 +0200
  Re: Storing large strings for future equality checks Gene Wirchenko <genew@ocis.net> - 2011-06-08 11:07 -0700
    Re: Storing large strings for future equality checks Abu Yahya <abu_yahya@invalid.com> - 2011-06-08 23:58 +0530
    Re: Storing large strings for future equality checks Hallvard B Furuseth <h.b.furuseth@usit.uio.no> - 2011-06-09 12:38 +0200
    Re: Storing large strings for future equality checks Michael Wojcik <mwojcik@newsguy.com> - 2011-06-09 17:32 -0400
      Re: Storing large strings for future equality checks bugbear <bugbear@trim_papermule.co.uk_trim> - 2011-06-10 10:51 +0100
  Re: Storing large strings for future equality checks Lothar Kimmeringer <news200709@kimmeringer.de> - 2011-06-08 20:28 +0200
    Re: Storing large strings for future equality checks Martin Gregorie <martin@address-in-sig.invalid> - 2011-06-08 22:02 +0000
  Re: Storing large strings for future equality checks rossum <rossum48@coldmail.com> - 2011-06-08 21:38 +0100
  Re: Storing large strings for future equality checks Robert Klemme <shortcutter@googlemail.com> - 2011-06-08 23:20 +0200
  Re: Storing large strings for future equality checks Tom Anderson <twic@urchin.earth.li> - 2011-06-08 23:02 +0100
  Re: Storing large strings for future equality checks Joshua Maurice <joshuamaurice@gmail.com> - 2011-06-09 15:01 -0700

csiph-web