Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #385067

Re: Good hash for pointers

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.lang.c
Subject Re: Good hash for pointers
Date 2024-05-25 11:12 -0700
Organization A noiseless patient Spider
Message-ID <86y17xg3qc.fsf@linuxsc.com> (permalink)
References (1 earlier) <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2si1h$2rs39$1@dont-email.me>

Show all headers | View raw


bart <bc@freeuk.com> writes:

> On 25/05/2024 10:12, Tim Rentsch wrote:
>
>> bart <bc@freeuk.com> writes:
>> [...]
>> It looks like your hash function was tuned for this testing
>> setup.  With different choices for testing it does much
>> worse.
>
> It wasn't tuned at all;  it was a copy of what I've longed used
> within my lexers.
>
> It is designed to give a good spread when inputs are similar or
> change sequentially.  (Well, not exactly designed;  more trial
> and error.)

The trial and error process you mention had the effect of tuning
your function for the testing setup used, whether you meant it to
or not.


>> The testing is done with malloc() blocks all of the same
>> requested size, and that size is 1.  Typical workloads
>> are likely to be both larger and more variable.
>>
>> When adding a new entry finds a collision with an old
>> entry, linear probing is used to find a free slot.  It's
>> well understood that linear probing suffers badly as the
>> load factor goes up.  Better to take a few high bits of
>> the hash value -- as few as five or six is fine -- to
>> have the reprobe sequences have different strides.
>>
>> Your hash function is expensive to compute, moreso even
>> than the "FNV" function shown earlier.
>
> You mean Malcolm's version?  That also uses a loop.  Neither
> need to process all 8 bytes of a 64-bit pointer; 6 will do, and
> probably just 4.
> And such a loop can be trivially unrolled.

Both your function and the function posted by Malcolm are slow.
It's just that yours is slower.  It isn't hard to produce a
hash function of equal quality that runs more than twice as
fast.

>> In a case like
>> this one where the compares are cheap, it's better to
>> have a dumb-but-fast hash function that might need a
>> few more looks to find an open slot, because the cost
>> of looking is so cheap compared to computing the hash
>> function.
>
> My first thought in this thread was to just make use of the
> pointer value directly.  My version was written only to compare
> against Malcolm's FNV code.
>
> If I try the other suggestions, RH's and KK's which simply
> shift the pointer value (similar to my idea), the results I got
> were interesting:  sometimes there were zero clashes (almost as
> though bits 4 to 19 of the pointer were a perfect hash), but
> sometimes there were millions.
>
> But when I tried to mix up the input values, then all methods
> seemed to converge to similar results.
>
> In that case you might as well use the simplest and fastest
> method.
>
> However it really needs testing within the actual application.

Let me tell you a little secret.  Hash functions are like random
number generators:  good ones never do exceptionally well or
exceptionally poorly.  The reason for this is simple - whenever
there are scenarios where one does well then inevitably there are
other scenarios where it does poorly.  A really high quality hash
function will never do badly, no matter what mix of inputs is
given.  The key takeaway here is to try as many different input
sets as you can, but don't look for candidates that do well;
rather, throw away any that do poorly on any input mix.  The ones
that are left are all, with high probability, just fine no matter
what application they are used in.  That is not to say there is
no need to test within the intended application, it's always good
to do a sanity check, but the point of the sanity check is just
to make sure the earlier process didn't miss something;  it isn't
the right way to compare alternatives.

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-23 12:11 +0100
  Re: Good hash for pointers Richard Harnden <richard.nospam@gmail.invalid> - 2024-05-23 15:37 +0100
    Re: Good hash for pointers Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-05-23 15:51 -0700
      Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 00:42 +0100
  Re: Good hash for pointers Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-23 20:34 +0000
  Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 15:49 -0700
    Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 00:43 +0100
      Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 16:52 -0700
        Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 01:28 +0100
          Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 18:39 -0700
            Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-24 11:14 +0100
              Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 12:05 +0100
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-24 10:49 -0700
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-24 10:51 -0700
              Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-24 06:18 -0700
                Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 15:07 +0100
                Re: Good hash for pointers scott@slp53.sl.home (Scott Lurndal) - 2024-05-24 14:51 +0000
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 02:49 -0700
              Re: Good hash for pointers David Brown <david.brown@hesbynett.no> - 2024-05-24 17:00 +0200
                Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 17:10 +0100
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-05-24 19:27 +0300
          Re: Good hash for pointers David Brown <david.brown@hesbynett.no> - 2024-05-24 09:41 +0200
  Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-23 17:32 -0700
    Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-23 18:59 -0700
      Re: Good hash for pointers jak <nospam@please.ty> - 2024-05-24 04:09 +0200
  Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-24 20:28 +0200
    Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 19:57 +0100
      Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 00:54 +0100
        Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 02:12 -0700
          Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 12:28 +0100
            Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 11:12 -0700
              Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 20:31 +0100
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 22:54 -0700
          Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-25 17:00 +0200
            Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 10:40 -0700
              Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-25 18:56 +0100
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 11:23 -0700
                Re: Good hash for pointers Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-05-25 23:13 +0200
                Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-25 23:07 +0100
                Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 23:42 +0100
                Re: Good hash for pointers Richard Harnden <richard.nospam@gmail.invalid> - 2024-05-26 19:58 +0100
                Re: Good hash for pointers Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-26 22:42 +0000
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:05 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:07 +0200
              Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:04 +0200
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-26 09:24 -0700
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:36 +0200
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-26 10:20 -0700
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 19:39 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 19:54 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-27 08:07 +0200
                Re: Good hash for pointers Ben Bacarisse <ben@bsb.me.uk> - 2024-05-28 11:07 +0100
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-30 10:10 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-30 11:27 +0200
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-30 19:26 -0700
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-30 19:27 -0700
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-02 10:45 +0300
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-02 12:42 -0700
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-03 12:35 -0700
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-02 16:02 -0700
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 10:50 +0300
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 18:02 -0700
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-04 11:38 +0300
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 16:34 +0200
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 17:46 +0300
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 17:54 +0200
                Re: Good hash for pointers bart <bc@freeuk.com> - 2024-06-03 17:24 +0100
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 20:16 +0300
                Re: Good hash for pointers bart <bc@freeuk.com> - 2024-06-03 19:48 +0100
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 22:41 +0300
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 22:51 +0300
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 16:51 -0700
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 17:01 -0700
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 20:25 +0200
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 19:50 +0300
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 20:31 +0200
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-05 00:59 +0300
  Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 20:06 +0200
    Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-26 20:10 +0100
  Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 20:24 +0200

csiph-web