Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Tue, 04 Jun 2024 22:10:41 -0700
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <86ed9cm0tq.fsf@linuxsc.com>
References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <v2vs40$3gflh$1@raubtier-asyl.eternal-september.org> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <86le3nne36.fsf@linuxsc.com> <20240603105005.0000091f@yahoo.com> <86r0ddmsf6.fsf@linuxsc.com> <20240604113839.000068f5@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Wed, 05 Jun 2024 07:10:43 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f7ed60976cbd53203d898763c1c85511"; logging-data="894233"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1++9pFxnpgyO3EsfrR51w2+Lt/K2zZ1I9o="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:lZMP720wH3Fd//bE3P/I0zS9xFk= sha1:gLU3r1vuSvYGYwXJrZIMGO+DrVk=
Xref: csiph.com comp.lang.c:385573

Michael S <already5chosen@yahoo.com> writes:

> On Mon, 03 Jun 2024 18:02:21 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Michael S <already5chosen@yahoo.com> writes:
>>
>>
>>> I am less in axioms and more interested in your experimental
>>> findings.
>>
>> I'm not sure what you're looking for here.
>
> I'd give an example.
> You said that some of the variants had 4x differences between cases.
> From my perspective, if you found a hash function that performs up to 3
> times better* than "crypto-alike" hash in majority of tests and is 1.33x
> worse that "crypto-alike" in few other tests, it's something that I'd
> consider as valuable option.
>
> * - i.e. produces 3x less collisions at, say, occupation ratio of 0.7

Okay.  For the sake of discussion let me call a "crypto-alike" hash
an "average hash" (meaning in some sense statistically average, or
never very far from random hash values).

Unfortunately the example you describe is something that won't
happen and can't happen.  I say it won't happen because for all
the hash functions I've looked at, I've never seen one that is
"better than average" most of the time and perhaps a little bit
"worse than average" only occasionally.  The reverse situation
pops up fairly often:  a hash function that is a little bit
"better than average" in a small number of cases, and "no
better than average or worse than average (sometimes by quite
a lot)" in most cases.

For the second part, I say it can't happen because there isn't
enough headroom for the amount of performance improvement you
mention.  For an occupancy rate of 0.7, an average hash function
using a rehashing approach uses only 1.7 probes per insertion
(with a minimum of 1) to fill the table.  There is no way to
get a dramatic performance improvement.  Even with an 85% load
factor, an average hash function takes just a little over 2
probes (roughly 2.2 probes) per value inserted.

Going the other direction, I've seen examples of hash functions
that in some circumstances are _worse_ than average by a factor of
10 or more.  The bad examples just come up - I don't especially go
looking for them.  The small possible upside gain is basically
never worth the much larger potential downside risk.