Path: csiph.com!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Mon, 03 Jun 2024 17:01:26 -0700
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <86v82pmv8p.fsf@linuxsc.com>
References: <86fru6gsqr.fsf@linuxsc.com> <8634q5hjsp.fsf@linuxsc.com> <86le3wfsmd.fsf@linuxsc.com> <86ed9ofq14.fsf@linuxsc.com> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <20240603174604.000014d4@yahoo.com> <20240603201646.0000319d@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Tue, 04 Jun 2024 02:01:27 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2e3473276f70e973f0e20911b7f6b99d"; logging-data="112210"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+aNadqD+iHn7PM3xPUnD9k894vcN9eMFs="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:t6wj5hjhRwwAcHg8h3623GPNPDU= sha1:fJLF0rVOX3/PrJthwG4wOK4PHoo=
Xref: csiph.com comp.lang.c:385496
Michael S writes:
> On Mon, 3 Jun 2024 17:24:36 +0100
> bart wrote:
>
>> On 03/06/2024 16:54, Bonita Montero wrote:
>>
>>> Am 03.06.2024 um 16:46 schrieb Michael S:
>>>
>>>> On Mon, 3 Jun 2024 16:34:37 +0200
>>>> Bonita Montero wrote:
>>>>
>>>>> Am 02.06.2024 um 09:45 schrieb Michael S:
>>>>>
>>>>>> So, what were your conclusions?
>>>>>> Ignoring the speed of computation, would something like
>>>>>> cryptographic hash scaled to bucket size be a best hash for this
>>>>>> type of application? Or some sort of less thorough grinding of
>>>>>> the bits is better?
>>>>>
>>>>> There's no need for a crypto-hash here.
>>>>
>>>> Do you think I don't know?
>>>> Crypto hash is just an example of near-ideal pseudo-random
>>>> uniformity.
>>>
>>> As I've shown for pointers you get a perfect equal distribution with
>>> multiplying by an appropriate prime.
>>
>> A pointer with 8-byte or 16-byte alignment will have the bottom 3-4
>> bits zero.
>>
>> No matter what number you multiply them by, prime or otherwise, those
>> 3-4 bits will always be zero.
>>
>> If you mask the result to fit a table of size power-of-two, then the
>> resulting index will can only ever refer to every 8th or every 16th
>> slot; there will 8-16x as many clashes as there ought to be.
>>
>> So some extra work is needed to get around that, for example
>> right-shifting before masking as some here have done, something you
>> have never mentioned.
>
> According to my understanding, Bonita and Tim are discussing hash
> generator which output is not used as is. They assume that index of
> the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1).
To clarify, I have been talking about hash functions that deliver
a "full sized" hash value, either 32 or 64 bits, without regard
to how hash table management code might make use of that value.
I think that's sort of what you said, but it seemed reasonable
to offer a clarification.
As a point of information, I've decided to adopt a policy of
not reading any further postings from Bonita Montero, who
seems to have nothing useful or interesting to say.