Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #1808 > unrolled thread

Overflowing HashMap

Started byRoedy Green <see_website@mindprod.com.invalid>
First post2012-06-03 07:39 -0700
Last post2012-06-03 21:13 -0700
Articles 12 — 7 participants

Back to article view | Back to comp.lang.java.help


Contents

  Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 07:39 -0700
    Re: Overflowing HashMap Patricia Shanahan <pats@acm.org> - 2012-06-03 08:08 -0700
      Re: Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 19:59 -0700
        Re: Overflowing HashMap Gene Wirchenko <genew@ocis.net> - 2012-06-04 10:25 -0700
    Re: Overflowing HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-06-03 08:40 -0700
      Re: Overflowing HashMap David Lamb <dalamb@cs.queensu.ca> - 2012-06-03 12:55 -0400
        Re: Overflowing HashMap Lew <noone@lewscanon.com> - 2012-06-03 12:06 -0700
          Re: Overflowing HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-03 13:30 -0700
            Re: Overflowing HashMap Lew <noone@lewscanon.com> - 2012-06-03 13:52 -0700
      Re: Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 20:51 -0700
        Re: Overflowing HashMap Patricia Shanahan <pats@acm.org> - 2012-06-03 20:59 -0700
          Re: Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 21:13 -0700

#1808 — Overflowing HashMap

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-06-03 07:39 -0700
SubjectOverflowing HashMap
Message-ID<0rrms7198ugnfvv3otifhdccfogib2a1vk@4ax.com>
This obvious but it just bit me when I tried to spider and check links
on http://www.richarddawkins.net, a rather large site.

I used 64 bit java.exe and gave it 10MB of RAM to play with but it
died anyway.  

One of the problems is HashMaps, even in 64-bit JVMs, can still only
hold 1 << 30 elements.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [next] | [standalone]


#1809

FromPatricia Shanahan <pats@acm.org>
Date2012-06-03 08:08 -0700
Message-ID<9ZednT4BhcJt4FbSnZ2dnUVZ_qudnZ2d@earthlink.com>
In reply to#1808
On 6/3/2012 7:39 AM, Roedy Green wrote:
> This obvious but it just bit me when I tried to spider and check links
> on http://www.richarddawkins.net, a rather large site.
>
> I used 64 bit java.exe and gave it 10MB of RAM to play with but it
> died anyway.

10MB?????

[toc] | [prev] | [next] | [standalone]


#1815

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-06-03 19:59 -0700
Message-ID<n29os7lvavsufoj6gknas7jo3of7g483i4@4ax.com>
In reply to#1809
On Sun, 03 Jun 2012 08:08:35 -0700, Patricia Shanahan <pats@acm.org>
wrote, quoted or indirectly quoted someone who said :

>10MB?????

oops 10 GB.  I am so old I can remember thinking that only the richest
people would afford to max out their 8080s with 64K.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [next] | [standalone]


#1822

FromGene Wirchenko <genew@ocis.net>
Date2012-06-04 10:25 -0700
Message-ID<nqrps79mq3qgddgu80nak8irtg5fcas8tq@4ax.com>
In reply to#1815
On Sun, 03 Jun 2012 19:59:42 -0700, Roedy Green
<see_website@mindprod.com.invalid> wrote:

>On Sun, 03 Jun 2012 08:08:35 -0700, Patricia Shanahan <pats@acm.org>
>wrote, quoted or indirectly quoted someone who said :
>
>>10MB?????
>
>oops 10 GB.  I am so old I can remember thinking that only the richest
>people would afford to max out their 8080s with 64K.

      Or 48K on one's TRS-80 Model I.  Bought cheaper than Radio
Shack's prices.  Only $150 for 16K.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#1810

FromKnute Johnson <nospam@knutejohnson.com>
Date2012-06-03 08:40 -0700
Message-ID<jqg0h6$5hc$1@dont-email.me>
In reply to#1808
On 6/3/2012 7:39 AM, Roedy Green wrote:
> One of the problems is HashMaps, even in 64-bit JVMs, can still only
> hold 1<<  30 elements.

What does that mean?

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]


#1811

FromDavid Lamb <dalamb@cs.queensu.ca>
Date2012-06-03 12:55 -0400
Message-ID<jqg4tj$hp$1@dont-email.me>
In reply to#1810
On 03/06/2012 11:40 AM, Knute Johnson wrote:
> On 6/3/2012 7:39 AM, Roedy Green wrote:
>> One of the problems is HashMaps, even in 64-bit JVMs, can still only
>> hold 1<< 30 elements.
>
> What does that mean?
>

Maybe hashmaps use only (most of) an int for indices?

[toc] | [prev] | [next] | [standalone]


#1812

FromLew <noone@lewscanon.com>
Date2012-06-03 12:06 -0700
Message-ID<jqgckd$gn2$1@news.albasani.net>
In reply to#1811
David Lamb wrote:
> Knute Johnson wrote:
>> Roedy Green wrote:
>>> One of the problems is HashMaps, even in 64-bit JVMs, can still only
>>> hold 1<< 30 elements.
>>
>> What does that mean?
>>
> Maybe hashmaps use only (most of) an int for indices?

Maps don't have indices.

But whether they're limited to 'int' range capacity is very easy to check.

Sure enough, right there in the source for 'HashMap', is

     transient Entry[] table;

But all of that is unnecessary effort. I can't believe I bothered, lazy as I 
am, given that the Javadocs guarantee that a 'Map' can hold at most 
'Integer.MAX_VALUE' elements usefully. (In principle there's no reason a 
'TreeMap' couldn't hold more than that, but it would likely become unuseful at 
that point. For example, 'putAll(sortedMapOverSized)' might surprise you.)

<http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#size()>
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/HashMap.java>
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/TreeMap.java>

P.S., 'TreeMap' has an interesting expression to handle the infamous overflow 
bug for tree sorting/searching of which Josh Bloch has red-facedly written:

   int mid = (lo + hi) >>> 1;

P.P.S, Yes, I know that source comes from a pretty old Java version (b147), 
but that shouldn't matter here. That site didn't have newer and it's too well 
organized to ignore.

-- 
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

[toc] | [prev] | [next] | [standalone]


#1813

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-03 13:30 -0700
Message-ID<iPPyr.56488$ax3.6389@newsfe05.iad>
In reply to#1812
On 6/3/12 12:06 PM, Lew wrote:
> David Lamb wrote:
>> Knute Johnson wrote:
>>> Roedy Green wrote:
>>>> One of the problems is HashMaps, even in 64-bit JVMs, can still only
>>>> hold 1<< 30 elements.
>>>
>>> What does that mean?
>>>
>> Maybe hashmaps use only (most of) an int for indices?
>
> Maps don't have indices.
>
> But whether they're limited to 'int' range capacity is very easy to check.
>
> Sure enough, right there in the source for 'HashMap', is
>
> transient Entry[] table;
>
> But all of that is unnecessary effort. I can't believe I bothered, lazy
> as I am, given that the Javadocs guarantee that a 'Map' can hold at most
> 'Integer.MAX_VALUE' elements usefully. (In principle there's no reason a
> 'TreeMap' couldn't hold more than that, but it would likely become
> unuseful at that point. For example, 'putAll(sortedMapOverSized)' might
> surprise you.)
>
> <http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#size()>
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/HashMap.java>
>
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/TreeMap.java>
>
>
> P.S., 'TreeMap' has an interesting expression to handle the infamous
> overflow bug for tree sorting/searching of which Josh Bloch has
> red-facedly written:
>
> int mid = (lo + hi) >>> 1;
>
> P.P.S, Yes, I know that source comes from a pretty old Java version
> (b147), but that shouldn't matter here. That site didn't have newer and
> it's too well organized to ignore.
>

Actually, this is fine if 0 <= lo <= hi <= Integer.MAX_VALUE, even if lo 
= hi = Integer.MAX_VALUE. The ">>>" handles it, since that is an 
"unsigned" operation.  While lo+hi might be out of range for signed 
Integer, the 32bit twos-compliment signed value and the 32 bit unsigned 
value result in the same bit-pattern.

[toc] | [prev] | [next] | [standalone]


#1814

FromLew <noone@lewscanon.com>
Date2012-06-03 13:52 -0700
Message-ID<jqgipp$5fa$1@news.albasani.net>
In reply to#1813
Daniel Pitts wrote:
> Lew wrote:
>> P.S., 'TreeMap' has an interesting expression to handle the infamous
>> overflow bug for tree sorting/searching of which Josh Bloch has
>> red-facedly written:
>>
>> int mid = (lo + hi) >>> 1;
>>
>> ...
>
> Actually, this is fine if 0 <= lo <= hi <= Integer.MAX_VALUE, even if lo = hi
> = Integer.MAX_VALUE. The ">>>" handles it, since that is an "unsigned"
> operation. While lo+hi might be out of range for signed Integer, the 32bit
> twos-compliment signed value and the 32 bit unsigned value result in the same
> bit-pattern.

Yes, as I said, I found this an interesting way to handle it. In particular, 
it shows why there is the '>>>' operator and why one might wish to use it. It 
also highlights the importance of precision in reading code as well as writing 
it. It also is elegant.

-- 
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

[toc] | [prev] | [next] | [standalone]


#1816

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-06-03 20:51 -0700
Message-ID<unbos7hos4sj70sm0isvl0pke6te6i014r@4ax.com>
In reply to#1810
On Sun, 03 Jun 2012 08:40:22 -0700, Knute Johnson
<nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone
who said :

>> One of the problems is HashMaps, even in 64-bit JVMs, can still only
>> hold 1<<  30 elements.
>
>What does that mean?

see http://www.mathe-online.at/JavaCalc/jcintro.html
it will evaluate it for you to :
1,073,741,824
Note it is smaller than
Integer.MAX_VALUE
+2,147,483,647 [2^31-1]  aka 2 gig, roughly 2 billion

I "cheated" and looked inside HashMap source.  The docs lie. The max
value is controlled by a constant
MAX_CAPACITY = 1 << 30;

Internally it needs an array sized a power of 2 with some breathing
room to track when the various hash chains start. The power of two
restriction allows it to convert a hashCode to an array index with
just a shift rather than a divide/modulus.


-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [next] | [standalone]


#1817

FromPatricia Shanahan <pats@acm.org>
Date2012-06-03 20:59 -0700
Message-ID<OOWdnaKU06A9r1HSnZ2dnUVZ_oudnZ2d@earthlink.com>
In reply to#1816
On 6/3/2012 8:51 PM, Roedy Green wrote:
> On Sun, 03 Jun 2012 08:40:22 -0700, Knute Johnson
> <nospam@knutejohnson.com>  wrote, quoted or indirectly quoted someone
> who said :
>
>>> One of the problems is HashMaps, even in 64-bit JVMs, can still only
>>> hold 1<<   30 elements.
>>
>> What does that mean?
>
> see http://www.mathe-online.at/JavaCalc/jcintro.html
> it will evaluate it for you to :
> 1,073,741,824
> Note it is smaller than
> Integer.MAX_VALUE
> +2,147,483,647 [2^31-1]  aka 2 gig, roughly 2 billion
>
> I "cheated" and looked inside HashMap source.  The docs lie. The max
> value is controlled by a constant
> MAX_CAPACITY = 1<<  30;
>
> Internally it needs an array sized a power of 2 with some breathing
> room to track when the various hash chains start. The power of two
> restriction allows it to convert a hashCode to an array index with
> just a shift rather than a divide/modulus.
>
>

I thought that was a restriction on the number of buckets, not on the
number of entries.

Patricia

[toc] | [prev] | [next] | [standalone]


#1818

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-06-03 21:13 -0700
Message-ID<04dos7p5ru620sdqokjah6bhd1dbbvcfa0@4ax.com>
In reply to#1817
On Sun, 03 Jun 2012 20:59:48 -0700, Patricia Shanahan <pats@acm.org>
wrote, quoted or indirectly quoted someone who said :

>I thought that was a restriction on the number of buckets, not on the
>number of entries.

I am so used to having the number of buckets larger than the number of
entries I forgot that is an efficiency measure not an absolute
requirement to function.

 Given the way chain lengths explode when space gets tight, for all
practical purposes the max capacity is in the order of 1 << 30.

My first paid programming job back when I was 15 was writing a
Hashtable on punch cards in FORTRAN.  Back then a prime 149 was a
reasonable default number of buckets.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.help


csiph-web