Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Tue, 04 Jun 2024 22:10:41 -0700 Organization: A noiseless patient Spider Lines: 50 Message-ID: <86ed9cm0tq.fsf@linuxsc.com> References: <86fru6gsqr.fsf@linuxsc.com> <8634q5hjsp.fsf@linuxsc.com> <86le3wfsmd.fsf@linuxsc.com> <86ed9ofq14.fsf@linuxsc.com> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <86le3nne36.fsf@linuxsc.com> <20240603105005.0000091f@yahoo.com> <86r0ddmsf6.fsf@linuxsc.com> <20240604113839.000068f5@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Wed, 05 Jun 2024 07:10:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f7ed60976cbd53203d898763c1c85511"; logging-data="894233"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1++9pFxnpgyO3EsfrR51w2+Lt/K2zZ1I9o=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:lZMP720wH3Fd//bE3P/I0zS9xFk= sha1:gLU3r1vuSvYGYwXJrZIMGO+DrVk= Xref: csiph.com comp.lang.c:385573 Michael S writes: > On Mon, 03 Jun 2024 18:02:21 -0700 > Tim Rentsch wrote: > >> Michael S writes: >> >> >>> I am less in axioms and more interested in your experimental >>> findings. >> >> I'm not sure what you're looking for here. > > I'd give an example. > You said that some of the variants had 4x differences between cases. > From my perspective, if you found a hash function that performs up to 3 > times better* than "crypto-alike" hash in majority of tests and is 1.33x > worse that "crypto-alike" in few other tests, it's something that I'd > consider as valuable option. > > * - i.e. produces 3x less collisions at, say, occupation ratio of 0.7 Okay. For the sake of discussion let me call a "crypto-alike" hash an "average hash" (meaning in some sense statistically average, or never very far from random hash values). Unfortunately the example you describe is something that won't happen and can't happen. I say it won't happen because for all the hash functions I've looked at, I've never seen one that is "better than average" most of the time and perhaps a little bit "worse than average" only occasionally. The reverse situation pops up fairly often: a hash function that is a little bit "better than average" in a small number of cases, and "no better than average or worse than average (sometimes by quite a lot)" in most cases. For the second part, I say it can't happen because there isn't enough headroom for the amount of performance improvement you mention. For an occupancy rate of 0.7, an average hash function using a rehashing approach uses only 1.7 probes per insertion (with a minimum of 1) to fill the table. There is no way to get a dramatic performance improvement. Even with an 85% load factor, an average hash function takes just a little over 2 probes (roughly 2.2 probes) per value inserted. Going the other direction, I've seen examples of hash functions that in some circumstances are _worse_ than average by a factor of 10 or more. The bad examples just come up - I don't especially go looking for them. The small possible upside gain is basically never worth the much larger potential downside risk.