Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #385084
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Newsgroups | comp.lang.c |
| Subject | Re: Good hash for pointers |
| Date | 2024-05-25 22:54 -0700 |
| Organization | A noiseless patient Spider |
| Message-ID | <86plt9f77w.fsf@linuxsc.com> (permalink) |
| References | (3 earlier) <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2si1h$2rs39$1@dont-email.me> <86y17xg3qc.fsf@linuxsc.com> <v2tea9$30qg0$1@dont-email.me> |
bart <bc@freeuk.com> writes: > On 25/05/2024 19:12, Tim Rentsch wrote: > >> bart <bc@freeuk.com> writes: >> >>> On 25/05/2024 10:12, Tim Rentsch wrote: >>> >>>> bart <bc@freeuk.com> writes: >>>> [...] >>>> It looks like your hash function was tuned for this testing >>>> setup. With different choices for testing it does much >>>> worse. >>> >>> It wasn't tuned at all; it was a copy of what I've longed used >>> within my lexers. >>> >>> It is designed to give a good spread when inputs are similar or >>> change sequentially. (Well, not exactly designed; more trial >>> and error.) >> >> The trial and error process you mention had the effect of tuning >> your function for the testing setup used, whether you meant it to >> or not. >> >> >>>> The testing is done with malloc() blocks all of the same >>>> requested size, and that size is 1. Typical workloads >>>> are likely to be both larger and more variable. >>>> >>>> When adding a new entry finds a collision with an old >>>> entry, linear probing is used to find a free slot. It's >>>> well understood that linear probing suffers badly as the >>>> load factor goes up. Better to take a few high bits of >>>> the hash value -- as few as five or six is fine -- to >>>> have the reprobe sequences have different strides. >>>> >>>> Your hash function is expensive to compute, moreso even >>>> than the "FNV" function shown earlier. >>> >>> You mean Malcolm's version? That also uses a loop. Neither >>> need to process all 8 bytes of a 64-bit pointer; 6 will do, and >>> probably just 4. >>> And such a loop can be trivially unrolled. >> >> Both your function and the function posted by Malcolm are slow. >> It's just that yours is slower. > > With unoptimised code, mine is up to 50% faster (suggesting an > inherently faster method). > > Optimised via gcc -O3, it's 17% slower. On my test platform the Malcolm code runs between dead even up to about 44% faster, depending on optimization level, except for -O0 where it runs about 12% slower. Note that these results represent only one measurement of each case, run on only one target environment. > If I duplicate the tests in my language, which has a meagre optimiser, > then mine is 3 times the speed (I haven't looked at why; something > about the MM code that keeps locals out of registers I guess): > > Unopt gcc-O3 My lang opt (uses 64 bits) > > MM hash 2.7 0.34 1.9 seconds > BC hash 1.7 0.4 0.6 > > There are anyway any number of tweaks that can be made; no need to do > all 64 bits for a start, and the loops can be manually > unrolled. Suggesting it is slower is premature. I am simply reporting results of empirical observations made on my test system. Obviously different environments might produce different results. >> It isn't hard to produce a >> hash function of equal quality that runs more than twice as >> fast. > > With or without the aid of an optimising compiler? You can't always > tell with the latter whether it's you who's clever, or the compiler. As long as both functions are run at the same level of optimization I don't think it matters much. For the sake of concreteness we can stipulate code quality comparable to gcc -O1 if that helps you. >> Let me tell you a little secret. Hash functions are like random >> number generators: good ones never do exceptionally well or >> exceptionally poorly. The reason for this is simple - whenever >> there are scenarios where one does well then inevitably there are >> other scenarios where it does poorly. > The ones >> that are left are all, with high probability, just fine no matter >> what application they are used in. That is not to say there is >> no need to test within the intended application, it's always good >> to do a sanity check, but the point of the sanity check is just >> to make sure the earlier process didn't miss something; it isn't >> the right way to compare alternatives. > > So what's a good one for use within the identifier lookup of a > compiler's lexer? That depends a lot on what sort of lookup is done and on the degree to which the hash computation is integrated into the lexing code. How many entries will the hash table have? That also makes a difference. If the hash table is never more than 50% full then we can expect most lookups will succeed on the first try and almost all will succeed in no more than two tries.
Back to comp.lang.c | Previous | Next — Previous in thread | Next in thread | Find similar
Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-23 12:11 +0100
Re: Good hash for pointers Richard Harnden <richard.nospam@gmail.invalid> - 2024-05-23 15:37 +0100
Re: Good hash for pointers Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-05-23 15:51 -0700
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 00:42 +0100
Re: Good hash for pointers Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-23 20:34 +0000
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 15:49 -0700
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 00:43 +0100
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 16:52 -0700
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 01:28 +0100
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-23 18:39 -0700
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-24 11:14 +0100
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 12:05 +0100
Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-24 10:49 -0700
Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-24 10:51 -0700
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-24 06:18 -0700
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 15:07 +0100
Re: Good hash for pointers scott@slp53.sl.home (Scott Lurndal) - 2024-05-24 14:51 +0000
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 02:49 -0700
Re: Good hash for pointers David Brown <david.brown@hesbynett.no> - 2024-05-24 17:00 +0200
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 17:10 +0100
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-05-24 19:27 +0300
Re: Good hash for pointers David Brown <david.brown@hesbynett.no> - 2024-05-24 09:41 +0200
Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-23 17:32 -0700
Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-05-23 18:59 -0700
Re: Good hash for pointers jak <nospam@please.ty> - 2024-05-24 04:09 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-24 20:28 +0200
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-24 19:57 +0100
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 00:54 +0100
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 02:12 -0700
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 12:28 +0100
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 11:12 -0700
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 20:31 +0100
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 22:54 -0700
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-25 17:00 +0200
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 10:40 -0700
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-25 18:56 +0100
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-25 11:23 -0700
Re: Good hash for pointers Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-05-25 23:13 +0200
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-25 23:07 +0100
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-05-25 23:42 +0100
Re: Good hash for pointers Richard Harnden <richard.nospam@gmail.invalid> - 2024-05-26 19:58 +0100
Re: Good hash for pointers Kaz Kylheku <643-408-1753@kylheku.com> - 2024-05-26 22:42 +0000
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:05 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:07 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:04 +0200
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-26 09:24 -0700
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 18:36 +0200
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-26 10:20 -0700
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 19:39 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 19:54 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-27 08:07 +0200
Re: Good hash for pointers Ben Bacarisse <ben@bsb.me.uk> - 2024-05-28 11:07 +0100
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-30 10:10 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-30 11:27 +0200
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-30 19:26 -0700
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-05-30 19:27 -0700
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-02 10:45 +0300
Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-02 12:42 -0700
Re: Good hash for pointers "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-06-03 12:35 -0700
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-02 16:02 -0700
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 10:50 +0300
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 18:02 -0700
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-04 11:38 +0300
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-04 22:10 -0700
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 16:34 +0200
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 17:46 +0300
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 17:54 +0200
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-06-03 17:24 +0100
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 20:16 +0300
Re: Good hash for pointers bart <bc@freeuk.com> - 2024-06-03 19:48 +0100
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 22:41 +0300
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 22:51 +0300
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 16:51 -0700
Re: Good hash for pointers Tim Rentsch <tr.17687@z991.linuxsc.com> - 2024-06-03 17:01 -0700
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 20:25 +0200
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-03 19:50 +0300
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-03 20:31 +0200
Re: Good hash for pointers Michael S <already5chosen@yahoo.com> - 2024-06-05 00:59 +0300
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-06-05 11:10 +0200
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 20:06 +0200
Re: Good hash for pointers Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-05-26 20:10 +0100
Re: Good hash for pointers Bonita Montero <Bonita.Montero@gmail.com> - 2024-05-26 20:24 +0200
csiph-web