Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!nx01.iad01.newshosting.com!newshosting.com!news-out.readnews.com!transit3.readnews.com!news-out.news.tds.net!newsreading01.news.tds.net!53ab2750!not-for-mail From: "Peter Duniho" Subject: Re: hashCode Message-ID: <50269FD3.56658.calajapr@time.synchro.net> X-Comment-To: Roedy Green Newsgroups: comp.lang.java.programmer In-Reply-To: <50269FD1.56651.calajapr@time.synchro.net> References: <50269FD1.56651.calajapr@time.synchro.net> X-FTN-AREA: COMP.LANG.JAVA.PROGRAMMER X-FTN-MSGID: 1:261/38 abc1ae19 X-FTN-REPLY: 1:261/38 8dbf78a7 Content-Type: text/plain; charset=IBM437 Content-Transfer-Encoding: 8bit X-Gateway: time.synchro.net [Synchronet 3.16a-Win32 NewsLink 1.98] Lines: 53 Date: Sat, 11 Aug 2012 18:17:56 GMT NNTP-Posting-Host: 69.21.70.65 X-Complaints-To: news@tds.net X-Trace: newsreading01.news.tds.net 1344709076 69.21.70.65 (Sat, 11 Aug 2012 13:17:56 CDT) NNTP-Posting-Date: Sat, 11 Aug 2012 13:17:56 CDT Organization: tds.net Xref: csiph.com comp.lang.java.programmer:17711 To: Roedy Green From: Peter Duniho On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote: > On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew > wrote, quoted or indirectly quoted someone who said : > >> h =3D 31 * h + attribute.hashCode(); >> } > In my essay I recommend XOR which is an inherentely faster operation > than multiply. I wonder which actually works out better. If you had a > large number of fields, the multiply effect could fall off the left > hand end. It is the algorithm used for String which could have very > long strings, so Sun must have thought of that. Lew's example is better than XOR. I think you should fix your essay. There are of course even better approaches if one knows something specific about the input data. But the "multiply previous by prime, add next value" is a reasonably standard, "pretty good" composition function. XOR a well-known problems in the context of hash codes: in many situations, large numbers of combinations of values wind up mapping to the same result. Consider a pair of integers: any time those values are identical, the result of XOR is 0. On the other hand, with the multiply-and-add approach, it's much less common for "typical" integer values to combine to a hash value of 0. (Integers are an important scenario, because a 32-bit integer naturally is its own hash value...there's no reason to try to mix the bits any further as is done with other data types, so they don't). Now (unless I've done my math wrong) if your component hash codes are actually well-distributed across the entire range of a 32-bit integer (i.e. are completely uncorrelated to each other and have entirely random distribution), the XOR should work fine. It's just that it's very susceptible to creating hash value collisions when the input values aren't themselves well-distributed and so as a general technique it's not nearly as good as multiply-and-add. As far as performance goes, even if the multiply-and-add costs more (and as Joerg points out, this may only be a 1-cycle vs 2-cycle difference anyway on that specific operation), your code should not be spending a lot of time computing hash values in the first place. If that's a significant bottleneck, there are better ways to improve performance than worrying about XOR vs multiply-and-add, especially since a poor hash function can hurt performance much worse than using the wrong math operation would. Pete --- BBBS/Li6 v4.10 Dada-1 * Origin: Prism bbs (1:261/38) --- Synchronet 3.16a-Win32 NewsLink 1.98 Time Warp of the Future BBS - telnet://time.synchro.net:24