Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #385037

Re: Good hash for pointers

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.lang.c
Subject Re: Good hash for pointers
Date 2024-05-25 02:12 -0700
Organization A noiseless patient Spider
Message-ID <86fru6gsqr.fsf@linuxsc.com> (permalink)
References <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me>

Show all headers | View raw


bart <bc@freeuk.com> writes:

> On 24/05/2024 19:57, Malcolm McLean wrote:
>
>> On 24/05/2024 19:28, Bonita Montero wrote:
>>
>>> Am 23.05.2024 um 13:11 schrieb Malcolm McLean:
>>>
>>>> What is a good hash function for pointers to use in portable ANSI C?
>>>>
>>>> The pointers are nodes of a tree, which are read only, and I want
>>>> to associate read/write data with them.  So potentially a lage
>>>> number of pointers,and they might be consecutively ordered if they
>>>> are taken from an array, or they might be returned from repeared
>>>> calls to malloc() with small allocations.  Obviously I have no
>>>> control over pointer size or internal representation.
>>>
>>> Use FNV.
>>
>> Here's an attempt.
>>
>> /* FNV hash of a pointer */
>> static unsigned int hash(void *address)
>> {
>>   int i;
>>   unsigned long answer = 2166136261;
>>   unsigned char *byte = (unsigned char *) &address;
>>
>>   for (i = 0; i < sizeof(void *); i++)
>>   {
>>   answer *= 16777619;
>>   answer ^= byte[i];
>>   }
>>   return (unsigned int) (answer & 0xFFFFFFFF);
>> }
>>
>> Now what will compilers make of that?
>
> Compiler, or performance?
>
> I tried this function with the test program shown below.  I used it to
> populate a hash table of 64K entries with pointers from successive
> calls to malloc.
>
> Results, in terms of clashes, for different numbers N of entries
> populated out of 64K were:
>
>  10000     1100
>  30000    12000
>  50000    67000
>  60000   216000
>  65535  5500000    (largest N possible)
>
> Result were rather variable as malloc produces different patterns of
> pointers on different runs.  These were simply the result from the
> first run.
>
> Was this good?  I'd no idea.  But as a comparison, I used my own hash
> function, normally used to hash identifiers, shown below the main
> program as the function 'bchash'.
>
> If this is subsituted instead, the results were:
>
>  10000      230
>  30000     3800
>  50000    10300
>  60000    50300
>  65535  2700000
>
> Hash tables need a certain amount of free capacity to stay efficient,
> so 3/4 full (about 50K/64K) is about right.
>
> Again I don't know if these figures are good, they are just better
> than from your hash() function, for the inputs I used, within this
> test, and for this size of table.
>
> No doubt there are much better ones.
>
>
>
> ------------------------------------------
> #include <stdio.h>
> #include <stdlib.h>
>
> static unsigned int hash(void *address)
> {
>     int i;
>     unsigned long answer = 2166136261;
>     unsigned char *byte = (unsigned char *) &address;
>
>     for (i = 0; i < sizeof(void *); i++)
>     {
>         answer *= 16777619;
>         answer ^= byte[i];
>     }
>     return (unsigned int) (answer & 0xFFFFFFFF);
> }
>
> void* table[65536];
>
> int main(void) {
>     void* p;
>
>     int clashes=0, wrapped;
>     int j;
>
>     for (int i=0; i<30000; ++i) {
>         p = malloc(1);
>         j = hash(p) & 65535;
>
>         wrapped=0;
>         while (table[j]) {
>             ++clashes;
>             ++j;
>             if (j>65535) {
>                 if (wrapped) { puts("Table full"); exit(1);}
>                 wrapped=1;
>                 j=0;
>             }
>         }
>         table[j] = p;
>
>     }
>     printf("Clashes %d\n", clashes);
> }
>
>
> ------------------------------------------
>
> static unsigned int bchash(void *address)
> {
>     int i;
>     unsigned long hsum = 0;
>     unsigned char *byte = (unsigned char *) &address;
>
>     for (i = 0; i < sizeof(void *); i++)   {
>         hsum = (hsum<<4) - hsum + byte[i];
>     }
>     return (hsum<<5) - hsum;
> }

It looks like your hash function was tuned for this testing
setup.  With different choices for testing it does much
worse.

The testing is done with malloc() blocks all of the same
requested size, and that size is 1.  Typical workloads
are likely to be both larger and more variable.

When adding a new entry finds a collision with an old
entry, linear probing is used to find a free slot.  It's
well understood that linear probing suffers badly as the
load factor goes up.  Better to take a few high bits of
the hash value -- as few as five or six is fine -- to
have the reprobe sequences have different strides.

Your hash function is expensive to compute, moreso even
than the "FNV" function shown earlier.  In a case like
this one where the compares are cheap, it's better to
have a dumb-but-fast hash function that might need a
few more looks to find an open slot, because the cost
of looking is so cheap compared to computing the hash
function.

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-23 12:11 +0100
  Re: Good hash for pointers Richard Harnden <richard.nospam@gmail.invalid> - 2024-05-23 15:37 +0100
    Re: Good hash for pointers Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-05-23 15:51 -0700
      Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 00:42 +0100
  Re: Good hash for pointers Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-23 20:34 +0000
  Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 15:49 -0700
    Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 00:43 +0100
      Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 16:52 -0700
        Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 01:28 +0100
          Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 18:39 -0700
            Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-24 11:14 +0100
              Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 12:05 +0100
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-24 10:49 -0700
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-24 10:51 -0700
              Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-24 06:18 -0700
                Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 15:07 +0100
                Re: Good hash for pointers scott@slp53.sl.home (Scott Lurndal) - 2024-05-24 14:51 +0000
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 02:49 -0700
              Re: Good hash for pointers David Brown <david.brown@hesbynett.no> - 2024-05-24 17:00 +0200
                Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 17:10 +0100
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-05-24 19:27 +0300
          Re: Good hash for pointers David Brown <david.brown@hesbynett.no> - 2024-05-24 09:41 +0200
  Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-23 17:32 -0700
    Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-23 18:59 -0700
      Re: Good hash for pointers jak <nospam@please.ty> - 2024-05-24 04:09 +0200
  Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-24 20:28 +0200
    Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 19:57 +0100
      Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 00:54 +0100
        Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 02:12 -0700
          Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 12:28 +0100
            Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 11:12 -0700
              Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 20:31 +0100
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 22:54 -0700
          Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-25 17:00 +0200
            Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 10:40 -0700
              Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-25 18:56 +0100
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 11:23 -0700
                Re: Good hash for pointers Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-05-25 23:13 +0200
                Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-25 23:07 +0100
                Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 23:42 +0100
                Re: Good hash for pointers Richard Harnden <richard.nospam@gmail.invalid> - 2024-05-26 19:58 +0100
                Re: Good hash for pointers Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-26 22:42 +0000
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:05 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:07 +0200
              Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:04 +0200
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-26 09:24 -0700
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:36 +0200
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-26 10:20 -0700
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 19:39 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 19:54 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-27 08:07 +0200
                Re: Good hash for pointers Ben Bacarisse <ben@bsb.me.uk> - 2024-05-28 11:07 +0100
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-30 10:10 +0200
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-30 11:27 +0200
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-30 19:26 -0700
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-30 19:27 -0700
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-02 10:45 +0300
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-02 12:42 -0700
                Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-03 12:35 -0700
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-02 16:02 -0700
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 10:50 +0300
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 18:02 -0700
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-04 11:38 +0300
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 16:34 +0200
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 17:46 +0300
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 17:54 +0200
                Re: Good hash for pointers bart <bc@freeuk.com> - 2024-06-03 17:24 +0100
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 20:16 +0300
                Re: Good hash for pointers bart <bc@freeuk.com> - 2024-06-03 19:48 +0100
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 22:41 +0300
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 22:51 +0300
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 16:51 -0700
                Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 17:01 -0700
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 20:25 +0200
                Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 19:50 +0300
                Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 20:31 +0200
  Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 20:06 +0200
    Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-26 20:10 +0100
  Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 20:24 +0200

csiph-web