Path: csiph.com!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Mon, 03 Jun 2024 17:01:26 -0700 Organization: A noiseless patient Spider Lines: 56 Message-ID: <86v82pmv8p.fsf@linuxsc.com> References: <86fru6gsqr.fsf@linuxsc.com> <8634q5hjsp.fsf@linuxsc.com> <86le3wfsmd.fsf@linuxsc.com> <86ed9ofq14.fsf@linuxsc.com> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <20240603174604.000014d4@yahoo.com> <20240603201646.0000319d@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Tue, 04 Jun 2024 02:01:27 +0200 (CEST) Injection-Info: dont-email.me; posting-host="2e3473276f70e973f0e20911b7f6b99d"; logging-data="112210"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+aNadqD+iHn7PM3xPUnD9k894vcN9eMFs=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:t6wj5hjhRwwAcHg8h3623GPNPDU= sha1:fJLF0rVOX3/PrJthwG4wOK4PHoo= Xref: csiph.com comp.lang.c:385496 Michael S writes: > On Mon, 3 Jun 2024 17:24:36 +0100 > bart wrote: > >> On 03/06/2024 16:54, Bonita Montero wrote: >> >>> Am 03.06.2024 um 16:46 schrieb Michael S: >>> >>>> On Mon, 3 Jun 2024 16:34:37 +0200 >>>> Bonita Montero wrote: >>>> >>>>> Am 02.06.2024 um 09:45 schrieb Michael S: >>>>> >>>>>> So, what were your conclusions? >>>>>> Ignoring the speed of computation, would something like >>>>>> cryptographic hash scaled to bucket size be a best hash for this >>>>>> type of application? Or some sort of less thorough grinding of >>>>>> the bits is better? >>>>> >>>>> There's no need for a crypto-hash here. >>>> >>>> Do you think I don't know? >>>> Crypto hash is just an example of near-ideal pseudo-random >>>> uniformity. >>> >>> As I've shown for pointers you get a perfect equal distribution with >>> multiplying by an appropriate prime. >> >> A pointer with 8-byte or 16-byte alignment will have the bottom 3-4 >> bits zero. >> >> No matter what number you multiply them by, prime or otherwise, those >> 3-4 bits will always be zero. >> >> If you mask the result to fit a table of size power-of-two, then the >> resulting index will can only ever refer to every 8th or every 16th >> slot; there will 8-16x as many clashes as there ought to be. >> >> So some extra work is needed to get around that, for example >> right-shifting before masking as some here have done, something you >> have never mentioned. > > According to my understanding, Bonita and Tim are discussing hash > generator which output is not used as is. They assume that index of > the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1). To clarify, I have been talking about hash functions that deliver a "full sized" hash value, either 32 or 64 bits, without regard to how hash table management code might make use of that value. I think that's sort of what you said, but it seemed reasonable to offer a clarification. As a point of information, I've decided to adopt a policy of not reading any further postings from Bonita Montero, who seems to have nothing useful or interesting to say.