Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #17644 > unrolled thread

hashCode

Started by"bob smith" <bob.smith@1:261/38.remove-t9h-this>
First post2012-08-10 18:39 +0000
Last post2012-08-13 18:36 +0000
Articles 20 on this page of 52 — 30 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  hashCode "bob smith" <bob.smith@1:261/38.remove-t9h-this> - 2012-08-10 18:39 +0000
    Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-t9h-this> - 2012-08-10 18:39 +0000
      Re: hashCode "markspace" <markspace@1:261/38.remove-t9h-this> - 2012-08-10 18:39 +0000
        Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-t9h-this> - 2012-08-10 18:39 +0000
          Re: hashCode "rossum" <rossum@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
    Re: hashCode "Eric Sosman" <eric.sosman@1:261/38.remove-t9h-this> - 2012-08-10 18:39 +0000
      Re: hashCode "bob smith" <bob.smith@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
        Re: hashCode "Lew" <lew@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
            Re: hashCode "Lew" <lew@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
              Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
                Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
                  Re: hashCode "Eric Sosman" <eric.sosman@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
                    Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Eric Sosman" <eric.sosman@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Lew" <lew@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
            Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Robert Klemme" <robert.klemme@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
            Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-nlb-this> - 2012-08-13 18:36 +0000
              Re: hashCode "Robert Klemme" <robert.klemme@1:261/38.remove-nlb-this> - 2012-08-13 18:36 +0000
                Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-fzq-this> - 2012-08-20 18:58 +0000
                Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-fzq-this> - 2012-08-20 18:58 +0000
        Re: hashCode "Patricia Shanahan" <patricia.shanahan@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Joshua Cranmer" <joshua.cranmer@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
          Re: hashCode "Patricia Shanahan" <patricia.shanahan@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
            Re: hashCode "Eric Sosman" <eric.sosman@1:261/38.remove-6fd-this> - 2012-08-12 20:00 +0000
              Re: hashCode "Patricia Shanahan" <patricia.shanahan@1:261/38.remove-6fd-this> - 2012-08-12 20:00 +0000
            Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-nlb-this> - 2012-08-13 18:36 +0000
          Re: hashCode "Jan Burse" <jan.burse@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
    Re: hashCode "Roedy Green" <roedy.green@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
      Re: hashCode "Lew" <lew@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
        Re: hashCode "Roedy Green" <roedy.green@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Joerg Meier" <joerg.meier@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
            Re: hashCode "Mike Winter" <mike.winter@1:261/38.remove-ya-this> - 2012-08-11 19:19 +0000
          Re: hashCode "Peter Duniho" <peter.duniho@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Jan Burse" <jan.burse@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "rossum" <rossum@1:261/38.remove-ya-this> - 2012-08-11 19:19 +0000
      Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
        Re: hashCode "Eric Sosman" <eric.sosman@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
          Re: hashCode "Arne Vajhøj" <arne.vajhøj@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
    Re: hashCode "Jan Burse" <jan.burse@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
      Re: hashCode "Jan Burse" <jan.burse@1:261/38.remove-pjg-this> - 2012-08-11 18:17 +0000
      Re: hashCode "Lew" <lew@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Lew" <lew@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
          Re: hashCode "Lew" <lew@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
      Re: hashCode "Jan Burse" <jan.burse@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Jan Burse" <jan.burse@1:261/38.remove-m2z-this> - 2012-08-12 18:58 +0000
        Re: hashCode "Lew" <lew@1:261/38.remove-6fd-this> - 2012-08-12 20:00 +0000
          Re: hashCode "Arne Vajhøj" <������
høj@1:261/38.remove-nlb-this> - 2012-08-13 18:36 +0000

Page 2 of 3 — ← Prev page 1 [2] 3  Next page →


#17836

From"Robert Klemme" <robert.klemme@1:261/38.remove-nlb-this>
Date2012-08-13 18:36 +0000
Message-ID<502943B8.56784.calajapr@time.synchro.net>
In reply to#17807
  To: Arne Vajhøj
From: Robert Klemme <shortcutter@googlemail.com>

On 12.08.2012 22:59, Arne Vajhoj wrote:
> On 8/12/2012 11:06 AM, Robert Klemme wrote:
>> On 11.08.2012 01:27, Arne Vajhoj wrote:
>>> On 8/10/2012 6:22 PM, bob smith wrote:
>>>> On Friday, August 10, 2012 11:34:28 AM UTC-5, Eric Sosman wrote:
>>>>> On 8/10/2012 11:47 AM, bob smith wrote:
>>>>>> Is it always technically correct to override the hashCode function
>>>>>> like so:
>>>>>>     @Override
>>>>>>     public int hashCode() {
>>>>>>         return 1;
>>>>>>     }
>>>>>> Would it be potentially better if that was Object's implementation?
>>>>>
>>>>>       Define "better."
>>>>
>>>> Better in the sense that you would never HAVE to override hashCode.
>>>>
>>>> Now, there are cases where you HAVE to override it, or your code is
>>>> very broken.
>>>
>>> It is not broken.
>>>
>>> It will perform poorly in many cases.
>>
>> Well, I would go as far as to say that it will perform poorly in all
>> cases where hashCode() is actually needed
>
> More or less.
>
>>                                        - and that makes it broken.
>
> This thread started about whether it is correct. The term correct
> has a very specific meaning in programming and that always return 1
> code is correct.
>
> Now you talk about broken.

Actually it wasn't me who brought up the term. :-)

> If broken means not good, then you are right.
> If broken means not correct, then you are wrong. I suspect
> that broken is not very well defined.

Right. :-)

>> The whole idea of hashing is based on the fact that the hash code
>> _somehow_ represents the item to be hashed.  If all items have the same
>> constant hash code there is no point in hashing at all.  So while it
>> does work, it does not work as intended.
>
> It disable the entire hashing functionality and a HashMap get
> characteristics of ArrayList.

An ArrayList allows multiple occurrences of the same instance - and does not 
store key value pairs.  That won't be the case with HashMap as equals() (if 
properly implemented) will prevent that (albeit slowly, or more correct: slower 
than with a proper implementation of hashCode()). Also, a HashMap will 
degenerate more to a LinkedList via the chaining of a bucket's entries.

> But the code should actually work.

Yes, sort of - depending on whether O(1) is a requirement or not. Still, the 
idea to implement hashCode() in Object in a way to return a constant to avoid 
necessity of overriding it is crazy - especially since you then cannot 
efficiently use Object as hash key - which you can today.

Kind regards

        robert


--
remember.guy do |as, often| as.you_can - without end 
http://blog.rubybestpractices.com/

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#18231

From"Arne Vajhøj" <arne.vajhøj@1:261/38.remove-fzq-this>
Date2012-08-20 18:58 +0000
Message-ID<50327C3B.57156.calajapr@time.synchro.net>
In reply to#17836
  To: Robert Klemme
From: Arne Vajhoj <arne@vajhoej.dk>

On 8/13/2012 1:17 PM, Robert Klemme wrote:
> On 12.08.2012 22:59, Arne Vajhoj wrote:
>> On 8/12/2012 11:06 AM, Robert Klemme wrote:
>>> The whole idea of hashing is based on the fact that the hash code
>>> _somehow_ represents the item to be hashed.  If all items have the same
>>> constant hash code there is no point in hashing at all.  So while it
>>> does work, it does not work as intended.
>>
>> It disable the entire hashing functionality and a HashMap get
>> characteristics of ArrayList.
>
> An ArrayList allows multiple occurrences of the same instance - and does
> not store key value pairs.  That won't be the case with HashMap as
> equals() (if properly implemented) will prevent that (albeit slowly, or
> more correct: slower than with a proper implementation of hashCode()).
> Also, a HashMap will degenerate more to a LinkedList via the chaining of
> a bucket's entries.

I guess my statement was a bit misleading.

... get O(1) characteristics for getting data similar to various List 
implementation.

Arne

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#18232

From"Arne Vajhøj" <arne.vajhøj@1:261/38.remove-fzq-this>
Date2012-08-20 18:58 +0000
Message-ID<50327C3B.57157.calajapr@time.synchro.net>
In reply to#17836
  To: Robert Klemme
From: Arne Vajhoj <arne@vajhoej.dk>

On 8/13/2012 1:17 PM, Robert Klemme wrote:
> On 12.08.2012 22:59, Arne Vajhoj wrote:
>> But the code should actually work.
>
> Yes, sort of - depending on whether O(1) is a requirement or not.

That is a rare functional requirement.

(common non-functional)

>                                                                 Still,
> the idea to implement hashCode() in Object in a way to return a constant
> to avoid necessity of overriding it is crazy - especially since you then
> cannot efficiently use Object as hash key - which you can today.

Actually the more I think about the question the more logical it sounds!

We are looking at two alternatives: A) default functions properly but good 
performance requires an override B) default gives good performance but may need 
an override to function properly

#A actually sounds more the Java way to me.

Arne

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17756

From"Patricia Shanahan" <patricia.shanahan@1:261/38.remove-m2z-this>
Date2012-08-12 18:58 +0000
Message-ID<5027F2CB.56701.calajapr@time.synchro.net>
In reply to#17691
  To: bob smith
From: Patricia Shanahan <pats@acm.org>

On 8/10/2012 3:22 PM, bob smith wrote: ...
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.
>

I've decided to go back to this message, because I feel the key issue is the 
circumstances under which hashCode would need to be overridden if Object's 
version returned a constant, compared to the current situation.

If Object's hashCode returned a constant, in practice anyone using an object as 
a key in a hash structure would want it overridden with one that has at least 
some chance of using multiple buckets. Without that property, a HashMap is an 
over-complicated, space-wasting cousin of a linked list.

The problem with this is that the programmer who knows that Widget instances 
are going to be used as HashMap keys does not necessarily control the Widget 
implementation. The programmer writing Widget has no idea whether it will ever 
be used as a HashMap key, and therefore no way of knowing whether it is safe, 
assuming Widget inherits the Object equals, to also inherit the Object 
hashCode.

Now compare to the current situation. The programmer implementing Widget 
decides whether to inherit a superclass equals or to write a Widget-specific 
equals. That programmer can assume the superclass has a hashCode that would be 
effective for a HashMap key, and only has to override hashCode if they are 
overriding equals.

In practice, it is a long time since I've written a hashCode manually. 
Generally, when I decide to override equals I tell Eclipse to generate an 
equals/hashCode pair based on the fields that control whether two instances are 
equal. Overriding hashCode is no additional work given that I would be telling 
Eclipse to generate an equals based on those fields anyway.

To me, the current situation seems "better".

Patricia

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17758

From"Joshua Cranmer" <joshua.cranmer@1:261/38.remove-m2z-this>
Date2012-08-12 18:58 +0000
Message-ID<5027F2CB.56703.calajapr@time.synchro.net>
In reply to#17691
  To: bob smith
From: Joshua Cranmer <Pidgeot18@verizon.invalid>

On 8/10/2012 6:22 PM, bob smith wrote:
> Better in the sense that you would never HAVE to override hashCode.
>
> Now, there are cases where you HAVE to override it, or your code is very
broken.

Returning a constant hash code is correct in the same sense that answering 
"yes" to the question "Can you tell me the correct way to do this?" would 
be--syntactically and semantically correct, but completely contrary to the 
actual intent of the question.

The point of the hash code is to provide a cheap way to quickly distinguish 
inputs (in the sense that Pr(a.hashCode() == b.hashCode() and !a.equals(b)) 
should be as small as possible [1]). A constant-value hash completely negates 
the purpose of the hash code, and this renders the hashCode again completely 
unusable for anything that actually wants to use it.

In the default case, a.hashCode() == b.hashCode() only when a == b (this 
definitely holds true with 32-bit machines and I'm pretty sure it still holds 
true with 64-bit machines, but I'd have to reverify the JVM source code to be 
certain). It is thus correct so long as identity equals is correct. It is also 
potentially correct in a limited set of cases where a.equals(b) and a != b. In 
all of these cases, it would not only be correct but also extremely useful, 
having pretty strong guarantees about the distribution of hash values.

[1] Actually, for good performance, hash codes should go one step further and 
make slightly stronger guarantees about independence with respect to the size 
of the hash table. But I digress.

--
Beware of bugs in the above code; I have only proved it correct, not tried it. 
-- Donald E. Knuth

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17759

From"Patricia Shanahan" <patricia.shanahan@1:261/38.remove-m2z-this>
Date2012-08-12 18:58 +0000
Message-ID<5027F2CB.56704.calajapr@time.synchro.net>
In reply to#17758
  To: Joshua Cranmer
From: Patricia Shanahan <pats@acm.org>

On 8/12/2012 9:23 AM, Joshua Cranmer wrote:
> On 8/10/2012 6:22 PM, bob smith wrote:
>> Better in the sense that you would never HAVE to override hashCode.
>>
>> Now, there are cases where you HAVE to override it, or your code is
>> very broken.
>
> Returning a constant hash code is correct in the same sense that
> answering "yes" to the question "Can you tell me the correct way to do
> this?" would be--syntactically and semantically correct, but completely
> contrary to the actual intent of the question.
>
> The point of the hash code is to provide a cheap way to quickly
> distinguish inputs (in the sense that Pr(a.hashCode() == b.hashCode()
> and !a.equals(b)) should be as small as possible [1]). A constant-value
> hash completely negates the purpose of the hash code, and this renders
> the hashCode again completely unusable for anything that actually wants
> to use it.
>
> In the default case, a.hashCode() == b.hashCode() only when a == b (this
> definitely holds true with 32-bit machines and I'm pretty sure it still
> holds true with 64-bit machines, but I'd have to reverify the JVM source
> code to be certain). It is thus correct so long as identity equals is
> correct. It is also potentially correct in a limited set of cases where
> a.equals(b) and a != b. In all of these cases, it would not only be
> correct but also extremely useful, having pretty strong guarantees about
> the distribution of hash values.
>
> [1] Actually, for good performance, hash codes should go one step
> further and make slightly stronger guarantees about independence with
> respect to the size of the hash table. But I digress.
>

I think there are two reasonably usable ways of handling this issue. One is the 
current arrangement, in which every class has a hashCode that is expected to be 
usable for selecting a hash table bucket.

Keeping hashCode as an Object method but making it useless for bucket selection 
unless overridden would not be a good alternative.

A more reasonable alternative would be to have hashCode as the only member of a 
HashKey interface that would be implemented by every class whose objects are 
intended to be suitable for use as has keys. Those objects that have a hashCode 
would still have to have a usable one, but some classes would not implement 
HashKey and not have a hashCode at all.

Patricia

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17761

From"Eric Sosman" <eric.sosman@1:261/38.remove-6fd-this>
Date2012-08-12 20:00 +0000
Message-ID<5027FE12.56709.calajapr@time.synchro.net>
In reply to#17759
  To: Patricia Shanahan
From: Eric Sosman <esosman@ieee-dot-org.invalid>

On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
>[...]
> I think there are two reasonably usable ways of handling this issue. One
> is the current arrangement, in which every class has a hashCode that is
> expected to be usable for selecting a hash table bucket.
>
> Keeping hashCode as an Object method but making it useless for bucket
> selection unless overridden would not be a good alternative.
>
> A more reasonable alternative would be to have hashCode as the only
> member of a HashKey interface that would be implemented by every class
> whose objects are intended to be suitable for use as has keys. Those
> objects that have a hashCode would still have to have a usable one, but
> some classes would not implement HashKey and not have a hashCode at all.

     Ugh.  So if J. Random Programmer is too lazy or unimaginative to
write hashCode(), that means I can't use his class as a HashMap key, or even 
put instances in a HashSet?  Ugh, again.

     (And, no: I don't think a HashCalculator interface along the lines
of Comparable would save the day.)

--
Eric Sosman
esosman@ieee-dot-org.invalid

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17762

From"Patricia Shanahan" <patricia.shanahan@1:261/38.remove-6fd-this>
Date2012-08-12 20:00 +0000
Message-ID<5027FE12.56710.calajapr@time.synchro.net>
In reply to#17761
  To: Eric Sosman
From: Patricia Shanahan <pats@acm.org>

On 8/12/2012 10:59 AM, Eric Sosman wrote:
> On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
>> [...]
>> I think there are two reasonably usable ways of handling this issue. One
>> is the current arrangement, in which every class has a hashCode that is
>> expected to be usable for selecting a hash table bucket.
>>
>> Keeping hashCode as an Object method but making it useless for bucket
>> selection unless overridden would not be a good alternative.
>>
>> A more reasonable alternative would be to have hashCode as the only
>> member of a HashKey interface that would be implemented by every class
>> whose objects are intended to be suitable for use as has keys. Those
>> objects that have a hashCode would still have to have a usable one, but
>> some classes would not implement HashKey and not have a hashCode at all.
>
>      Ugh.  So if J. Random Programmer is too lazy or unimaginative to
> write hashCode(), that means I can't use his class as a HashMap key,
> or even put instances in a HashSet?  Ugh, again.
>
>      (And, no: I don't think a HashCalculator interface along the lines
> of Comparable would save the day.)
>

I'm not saying that it would be better than the current situation, just better 
than having hashCode implementations that appear to be there, but in practice 
must not be used for hash bucket selection.

Patricia

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17811

From"Arne Vajhøj" <������ høj@1:261/38.remove-nlb-this>
Date2012-08-13 18:36 +0000
Message-ID<502943B4.56759.calajapr@time.synchro.net>
In reply to#17759
  To: Patricia Shanahan
From: =?UTF-8?B?QXJuZSBWYWpow7hq?= <arne@vajhoej.dk>

On 8/12/2012 12:40 PM, Patricia Shanahan wrote:
> Keeping hashCode as an Object method but making it useless for bucket
> selection unless overridden would not be a good alternative.
>
> A more reasonable alternative would be to have hashCode as the only
> member of a HashKey interface that would be implemented by every class
> whose objects are intended to be suitable for use as has keys. Those
> objects that have a hashCode would still have to have a usable one, but
> some classes would not implement HashKey and not have a hashCode at all.

That would avoid using the hash for classes where it does not make much sense.

I like it.

Given that there is a gazillion lines of code out there that stores Object, 
then it can not be changed.

Arne

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17760

From"Jan Burse" <jan.burse@1:261/38.remove-m2z-this>
Date2012-08-12 18:58 +0000
Message-ID<5027F2CB.56705.calajapr@time.synchro.net>
In reply to#17758
  To: Joshua Cranmer
From: Jan Burse <janburse@fastmail.fm>

Joshua Cranmer schrieb:
>
> In the default case, a.hashCode() == b.hashCode() only when a == b (this
> definitely holds true with 32-bit machines and I'm pretty sure it still
> holds true with 64-bit machines, but I'd have to reverify the JVM source
> code to be certain).

Not at all. Even not for 32-bit machines. Not only because the hashCode() 
usually uses less than 32 bits. But also for other reasons, namely the internal 
algorithm how the hash is produced (although the below bug doesn't reveal much 
internals):

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

But in the bottom of the above bug you can find code that searches for a clash. 
It can take quite a while, but it is not excluded. This is what a sample run 
produced according to the bug:

62028120: java.lang.Object@9cab16 - java.lang.Object@9cab16

So the 62028120-th new Object produced the same hashcode.

So

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17685

From"Roedy Green" <roedy.green@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FCD.56632.calajapr@time.synchro.net>
In reply to#17644
  To: bob smith
From: Roedy Green <see_website@mindprod.com.invalid>

On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
<bob@coolfone.comze.com> wrote, quoted or indirectly quoted someone
who said :

>       @Override
>       public int hashCode() {
>               return 1;
>       }

that's about the worst possible hashCode function.

See http://mindprod.com/jgloss/hashcode.html for tips on how to write good 
ones.
--
Roedy Green Canadian Mind Products http://mindprod.com A new scientific truth 
does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up 
that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17686

From"Lew" <lew@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FCD.56633.calajapr@time.synchro.net>
In reply to#17685
  To: Roedy Green
From: Lew <lewbloch@gmail.com>

Roedy Green wrote:
> bob smith wrote, quoted or indirectly quoted someone
> who said :
>
>>      @Override
> >     public int hashCode() {
> >             return 1;
> >     }
>
> that's about the worst possible hashCode function.

Normally that's correct, but it's conceivable that one might do it for some 
hackish reason. In most situations where one might do such an override as this, 
one would do better not to override hashCode().

> See http://mindprod.com/jgloss/hashcode.html for tips on how to write
> good ones.

The default of assembling it via the mix-in of attribute hash codes using the 
Knuth constants is usually good enough.

h(object) = Sum(i rnn 0.. n-1) of ( attribute[i] * pow(31, n - 1 - i) );

or

public static int calculateHash(Foo arg) {
   int h = 0;

   for ( each attribute of Foo that contributes to 'equals()' )
   {
        h = 31 * h + attribute.hashCode();
   }
   return h;
}

http://en.wikipedia.org/wiki/Hash_function
http://en.wikipedia.org/wiki/Java_hashCode()
http://en.wikipedia.org/wiki/Java_hashCode()#Java

--
Lew

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17703

From"Roedy Green" <roedy.green@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FD1.56651.calajapr@time.synchro.net>
In reply to#17686
  To: Lew
From: Roedy Green <see_website@mindprod.com.invalid>

On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewbloch@gmail.com> wrote, 
quoted or indirectly quoted someone who said :

>        h =3D 31 * h + attribute.hashCode();
>   }
In my essay I recommend XOR which is an inherentely  faster operation than 
multiply.  I wonder which actually works out better. If you had a large number 
of fields, the multiply effect could fall off the left hand end.  It is the 
algorithm used for String which could have  very long strings, so Sun must have 
thought of that.
--
Roedy Green Canadian Mind Products http://mindprod.com A new scientific truth 
does not triumph by convincing its opponents and making them see the light,
but rather because its opponents eventually die, and a new generation grows up 
that is familiar with it.
~ Max Planck 1858-04-23 1947-10-04

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17709

From"Joerg Meier" <joerg.meier@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FD2.56657.calajapr@time.synchro.net>
In reply to#17703
  To: Roedy Green
From: Joerg Meier <joergmmeier@arcor.de>

On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewbloch@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>>        h =3D 31 * h + attribute.hashCode();
>>   }
> In my essay I recommend XOR which is an inherentely  faster operation
> than multiply.

Hasn't that been wrong since about the invention of the 80386 processor family 
? Pretty sure by now MUL and XOR both take one cycle and that's it.


Liebe Gruesse,
                Joerg

--
Ich lese meine Emails nicht, replies to Email bleiben also leider ungelesen.

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17714

From"Mike Winter" <mike.winter@1:261/38.remove-ya-this>
Date2012-08-11 19:19 +0000
Message-ID<5026AEE3.56663.calajapr@time.synchro.net>
In reply to#17709
  To: Joerg Meier
From: Mike Winter <usenet@michael-winter.me.invalid>

On 11/08/2012 17:25, Joerg Meier wrote:
> On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:
>[...]
>> In my essay I recommend XOR which is an inherentely  faster operation
>> than multiply.
>
> Hasn't that been wrong since about the invention of the 80386 processor
> family ?

Not that far back: the Pentium required 9-11 cycles to complete a MUL 
instruction compared to 1-3 for XOR (and the like), depending on operand 
locations and widths.

> Pretty sure by now MUL and XOR both take one cycle and that's it.

More-or-less, but the former is still slower for wider operands. However, your 
point is well-taken: it needn't be as much a concern in most cases.

--
Mike Winter
Replace ".invalid" with ".uk" to reply by e-mail.

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17711

From"Peter Duniho" <peter.duniho@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FD3.56658.calajapr@time.synchro.net>
In reply to#17703
  To: Roedy Green
From: Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com>

On Sat, 11 Aug 2012 04:54:09 -0700, Roedy Green wrote:

> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewbloch@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>>        h =3D 31 * h + attribute.hashCode();
>>   }
> In my essay I recommend XOR which is an inherentely  faster operation
> than multiply.  I wonder which actually works out better. If you had a
> large number of fields, the multiply effect could fall off the left
> hand end.  It is the algorithm used for String which could have  very
> long strings, so Sun must have thought of that.

Lew's example is better than XOR. I think you should fix your essay.

There are of course even better approaches if one knows something specific 
about the input data. But the "multiply previous by prime, add next value" is a 
reasonably standard, "pretty good" composition function.

XOR a well-known problems in the context of hash codes: in many situations, 
large numbers of combinations of values wind up mapping to the same result. 
Consider a pair of integers: any time those values are identical, the result of 
XOR is 0. On the other hand, with the multiply-and-add approach, it's much less 
common for "typical" integer values to combine to a hash value of 0.

(Integers are an important scenario, because a 32-bit integer naturally is its 
own hash value...there's no reason to try to mix the bits any further as is 
done with other data types, so they don't).

Now (unless I've done my math wrong) if your component hash codes are actually 
well-distributed across the entire range of a 32-bit integer (i.e. are 
completely uncorrelated to each other and have entirely random distribution), 
the XOR should work fine.  It's just that it's very susceptible to creating 
hash value collisions when the input values aren't themselves well-distributed 
and so as a general technique it's not nearly as good as multiply-and-add.

As far as performance goes, even if the multiply-and-add costs more (and as 
Joerg points out, this may only be a 1-cycle vs 2-cycle difference anyway on 
that specific operation), your code should not be spending a lot of time 
computing hash values in the first place.  If that's a significant bottleneck, 
there are better ways to improve performance than worrying about XOR vs 
multiply-and-add, especially since a poor hash function can hurt performance 
much worse than using the wrong math operation would.

Pete

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17712

From"Jan Burse" <jan.burse@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FD3.56659.calajapr@time.synchro.net>
In reply to#17703
  To: Roedy Green
From: Jan Burse <janburse@fastmail.fm>

Roedy Green schrieb:
> If you had a
> large number of fields, the multiply effect could fall off the left
> hand end.

Actually this does not happen, since you multiply with 31, which is 1+2+4+8+16. 
So that:

     a*31+b = a*16+a*8+a*4+a*2+a+b

So for a HashMap that uses an index = hash & (2^n - 1) (which is the same as 
hash mod 2^n), the impact of a will be still seen, even when it occurs at the 
very left hand side.

There is some Microsoft C# HashMap implementation which does not use mod 2^n, 
but instead some primes. In case the implementation choses 31 as the designated 
prime, all information but for the first field will be lost. But since mod 2^32 
is also applied, this might not be completely true.

For 2^n I don't know exactly how the impact could be described. I guess in a 
HashMap with index = hash mod 2^1 the hash amounts to a parity bit, since the 
sum in a+b acts like an xor on the first right hand bit. But 2^n with n>1 the 
31 multiplication is a little more crude.

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17750

From"Arne Vajhøj" <arne.vajhøj@1:261/38.remove-m2z-this>
Date2012-08-12 18:58 +0000
Message-ID<5027F2C9.56694.calajapr@time.synchro.net>
In reply to#17703
  To: Roedy Green
From: Arne Vajhoj <arne@vajhoej.dk>

On 8/11/2012 7:54 AM, Roedy Green wrote:
> On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewbloch@gmail.com>
> wrote, quoted or indirectly quoted someone who said :
>
>>         h =3D 31 * h + attribute.hashCode();
>>    }
> In my essay I recommend XOR which is an inherentely  faster operation
> than multiply.  I wonder which actually works out better.

Multiply.

XOR has several problems:
- many small values give small result
- same values in different fields give same result
- two identical values give result zero
+ all those I did not think of.

>                                                     If you had a
> large number of fields, the multiply effect could fall off the left
> hand end.  It is the algorithm used for String which could have  very
> long strings, so Sun must have thought of that.

The multiply effect does not fall off the left with a value like 31 (it would 
with 32).

Arne

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17713

From"rossum" <rossum@1:261/38.remove-ya-this>
Date2012-08-11 19:19 +0000
Message-ID<5026AEE2.56662.calajapr@time.synchro.net>
In reply to#17686
  To: Lew
From: rossum <rossum48@coldmail.com>

On Fri, 10 Aug 2012 12:45:07 -0700 (PDT), Lew <lewbloch@gmail.com> wrote:

>public static int calculateHash(Foo arg) {
>   int h = 0;
>
>   for ( each attribute of Foo that contributes to 'equals()' )
>   {
>        h = 31 * h + attribute.hashCode();
>   }
>   return h;
>}
Bloch starts with:

  int h = 17;

He says that works beter in cases where the first one or more 
attribute.hashCode() values are zero, and hence will not register.

He suggessts any constant non-zero value.

rossum

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#17698

From"Arne Vajhøj" <arne.vajhøj@1:261/38.remove-pjg-this>
Date2012-08-11 18:17 +0000
Message-ID<50269FD0.56645.calajapr@time.synchro.net>
In reply to#17685
  To: Roedy Green
From: Arne Vajhoj <arne@vajhoej.dk>

On 8/10/2012 3:17 PM, Roedy Green wrote:
> On Fri, 10 Aug 2012 08:47:12 -0700 (PDT), bob smith
> <bob@coolfone.comze.com> wrote, quoted or indirectly quoted someone
> who said :
>
>>      @Override
>>      public int hashCode() {
>>              return 1;
>>      }
>
> that's about the worst possible hashCode function.

Yes, but the posted asked "Is it always technically correct to ..." not whether 
is was "best possible".

Arne

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


Page 2 of 3 — ← Prev page 1 [2] 3  Next page →

Back to top | Article view | comp.lang.java.programmer


csiph-web