Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.help > #1808 > unrolled thread
| Started by | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| First post | 2012-06-03 07:39 -0700 |
| Last post | 2012-06-03 21:13 -0700 |
| Articles | 12 — 7 participants |
Back to article view | Back to comp.lang.java.help
Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 07:39 -0700
Re: Overflowing HashMap Patricia Shanahan <pats@acm.org> - 2012-06-03 08:08 -0700
Re: Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 19:59 -0700
Re: Overflowing HashMap Gene Wirchenko <genew@ocis.net> - 2012-06-04 10:25 -0700
Re: Overflowing HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-06-03 08:40 -0700
Re: Overflowing HashMap David Lamb <dalamb@cs.queensu.ca> - 2012-06-03 12:55 -0400
Re: Overflowing HashMap Lew <noone@lewscanon.com> - 2012-06-03 12:06 -0700
Re: Overflowing HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-03 13:30 -0700
Re: Overflowing HashMap Lew <noone@lewscanon.com> - 2012-06-03 13:52 -0700
Re: Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 20:51 -0700
Re: Overflowing HashMap Patricia Shanahan <pats@acm.org> - 2012-06-03 20:59 -0700
Re: Overflowing HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-06-03 21:13 -0700
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-06-03 07:39 -0700 |
| Subject | Overflowing HashMap |
| Message-ID | <0rrms7198ugnfvv3otifhdccfogib2a1vk@4ax.com> |
This obvious but it just bit me when I tried to spider and check links on http://www.richarddawkins.net, a rather large site. I used 64 bit java.exe and gave it 10MB of RAM to play with but it died anyway. One of the problems is HashMaps, even in 64-bit JVMs, can still only hold 1 << 30 elements. -- Roedy Green Canadian Mind Products http://mindprod.com Controlling complexity is the essence of computer programming. ~ Brian W. Kernighan 1942-01-01 .
[toc] | [next] | [standalone]
| From | Patricia Shanahan <pats@acm.org> |
|---|---|
| Date | 2012-06-03 08:08 -0700 |
| Message-ID | <9ZednT4BhcJt4FbSnZ2dnUVZ_qudnZ2d@earthlink.com> |
| In reply to | #1808 |
On 6/3/2012 7:39 AM, Roedy Green wrote: > This obvious but it just bit me when I tried to spider and check links > on http://www.richarddawkins.net, a rather large site. > > I used 64 bit java.exe and gave it 10MB of RAM to play with but it > died anyway. 10MB?????
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-06-03 19:59 -0700 |
| Message-ID | <n29os7lvavsufoj6gknas7jo3of7g483i4@4ax.com> |
| In reply to | #1809 |
On Sun, 03 Jun 2012 08:08:35 -0700, Patricia Shanahan <pats@acm.org> wrote, quoted or indirectly quoted someone who said : >10MB????? oops 10 GB. I am so old I can remember thinking that only the richest people would afford to max out their 8080s with 64K. -- Roedy Green Canadian Mind Products http://mindprod.com Controlling complexity is the essence of computer programming. ~ Brian W. Kernighan 1942-01-01 .
[toc] | [prev] | [next] | [standalone]
| From | Gene Wirchenko <genew@ocis.net> |
|---|---|
| Date | 2012-06-04 10:25 -0700 |
| Message-ID | <nqrps79mq3qgddgu80nak8irtg5fcas8tq@4ax.com> |
| In reply to | #1815 |
On Sun, 03 Jun 2012 19:59:42 -0700, Roedy Green
<see_website@mindprod.com.invalid> wrote:
>On Sun, 03 Jun 2012 08:08:35 -0700, Patricia Shanahan <pats@acm.org>
>wrote, quoted or indirectly quoted someone who said :
>
>>10MB?????
>
>oops 10 GB. I am so old I can remember thinking that only the richest
>people would afford to max out their 8080s with 64K.
Or 48K on one's TRS-80 Model I. Bought cheaper than Radio
Shack's prices. Only $150 for 16K.
Sincerely,
Gene Wirchenko
[toc] | [prev] | [next] | [standalone]
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Date | 2012-06-03 08:40 -0700 |
| Message-ID | <jqg0h6$5hc$1@dont-email.me> |
| In reply to | #1808 |
On 6/3/2012 7:39 AM, Roedy Green wrote: > One of the problems is HashMaps, even in 64-bit JVMs, can still only > hold 1<< 30 elements. What does that mean? -- Knute Johnson
[toc] | [prev] | [next] | [standalone]
| From | David Lamb <dalamb@cs.queensu.ca> |
|---|---|
| Date | 2012-06-03 12:55 -0400 |
| Message-ID | <jqg4tj$hp$1@dont-email.me> |
| In reply to | #1810 |
On 03/06/2012 11:40 AM, Knute Johnson wrote: > On 6/3/2012 7:39 AM, Roedy Green wrote: >> One of the problems is HashMaps, even in 64-bit JVMs, can still only >> hold 1<< 30 elements. > > What does that mean? > Maybe hashmaps use only (most of) an int for indices?
[toc] | [prev] | [next] | [standalone]
| From | Lew <noone@lewscanon.com> |
|---|---|
| Date | 2012-06-03 12:06 -0700 |
| Message-ID | <jqgckd$gn2$1@news.albasani.net> |
| In reply to | #1811 |
David Lamb wrote:
> Knute Johnson wrote:
>> Roedy Green wrote:
>>> One of the problems is HashMaps, even in 64-bit JVMs, can still only
>>> hold 1<< 30 elements.
>>
>> What does that mean?
>>
> Maybe hashmaps use only (most of) an int for indices?
Maps don't have indices.
But whether they're limited to 'int' range capacity is very easy to check.
Sure enough, right there in the source for 'HashMap', is
transient Entry[] table;
But all of that is unnecessary effort. I can't believe I bothered, lazy as I
am, given that the Javadocs guarantee that a 'Map' can hold at most
'Integer.MAX_VALUE' elements usefully. (In principle there's no reason a
'TreeMap' couldn't hold more than that, but it would likely become unuseful at
that point. For example, 'putAll(sortedMapOverSized)' might surprise you.)
<http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#size()>
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/HashMap.java>
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/TreeMap.java>
P.S., 'TreeMap' has an interesting expression to handle the infamous overflow
bug for tree sorting/searching of which Josh Bloch has red-facedly written:
int mid = (lo + hi) >>> 1;
P.P.S, Yes, I know that source comes from a pretty old Java version (b147),
but that shouldn't matter here. That site didn't have newer and it's too well
organized to ignore.
--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-03 13:30 -0700 |
| Message-ID | <iPPyr.56488$ax3.6389@newsfe05.iad> |
| In reply to | #1812 |
On 6/3/12 12:06 PM, Lew wrote: > David Lamb wrote: >> Knute Johnson wrote: >>> Roedy Green wrote: >>>> One of the problems is HashMaps, even in 64-bit JVMs, can still only >>>> hold 1<< 30 elements. >>> >>> What does that mean? >>> >> Maybe hashmaps use only (most of) an int for indices? > > Maps don't have indices. > > But whether they're limited to 'int' range capacity is very easy to check. > > Sure enough, right there in the source for 'HashMap', is > > transient Entry[] table; > > But all of that is unnecessary effort. I can't believe I bothered, lazy > as I am, given that the Javadocs guarantee that a 'Map' can hold at most > 'Integer.MAX_VALUE' elements usefully. (In principle there's no reason a > 'TreeMap' couldn't hold more than that, but it would likely become > unuseful at that point. For example, 'putAll(sortedMapOverSized)' might > surprise you.) > > <http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#size()> > <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/HashMap.java> > > <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/TreeMap.java> > > > P.S., 'TreeMap' has an interesting expression to handle the infamous > overflow bug for tree sorting/searching of which Josh Bloch has > red-facedly written: > > int mid = (lo + hi) >>> 1; > > P.P.S, Yes, I know that source comes from a pretty old Java version > (b147), but that shouldn't matter here. That site didn't have newer and > it's too well organized to ignore. > Actually, this is fine if 0 <= lo <= hi <= Integer.MAX_VALUE, even if lo = hi = Integer.MAX_VALUE. The ">>>" handles it, since that is an "unsigned" operation. While lo+hi might be out of range for signed Integer, the 32bit twos-compliment signed value and the 32 bit unsigned value result in the same bit-pattern.
[toc] | [prev] | [next] | [standalone]
| From | Lew <noone@lewscanon.com> |
|---|---|
| Date | 2012-06-03 13:52 -0700 |
| Message-ID | <jqgipp$5fa$1@news.albasani.net> |
| In reply to | #1813 |
Daniel Pitts wrote: > Lew wrote: >> P.S., 'TreeMap' has an interesting expression to handle the infamous >> overflow bug for tree sorting/searching of which Josh Bloch has >> red-facedly written: >> >> int mid = (lo + hi) >>> 1; >> >> ... > > Actually, this is fine if 0 <= lo <= hi <= Integer.MAX_VALUE, even if lo = hi > = Integer.MAX_VALUE. The ">>>" handles it, since that is an "unsigned" > operation. While lo+hi might be out of range for signed Integer, the 32bit > twos-compliment signed value and the 32 bit unsigned value result in the same > bit-pattern. Yes, as I said, I found this an interesting way to handle it. In particular, it shows why there is the '>>>' operator and why one might wish to use it. It also highlights the importance of precision in reading code as well as writing it. It also is elegant. -- Lew Honi soit qui mal y pense. http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-06-03 20:51 -0700 |
| Message-ID | <unbos7hos4sj70sm0isvl0pke6te6i014r@4ax.com> |
| In reply to | #1810 |
On Sun, 03 Jun 2012 08:40:22 -0700, Knute Johnson <nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone who said : >> One of the problems is HashMaps, even in 64-bit JVMs, can still only >> hold 1<< 30 elements. > >What does that mean? see http://www.mathe-online.at/JavaCalc/jcintro.html it will evaluate it for you to : 1,073,741,824 Note it is smaller than Integer.MAX_VALUE +2,147,483,647 [2^31-1] aka 2 gig, roughly 2 billion I "cheated" and looked inside HashMap source. The docs lie. The max value is controlled by a constant MAX_CAPACITY = 1 << 30; Internally it needs an array sized a power of 2 with some breathing room to track when the various hash chains start. The power of two restriction allows it to convert a hashCode to an array index with just a shift rather than a divide/modulus. -- Roedy Green Canadian Mind Products http://mindprod.com Controlling complexity is the essence of computer programming. ~ Brian W. Kernighan 1942-01-01 .
[toc] | [prev] | [next] | [standalone]
| From | Patricia Shanahan <pats@acm.org> |
|---|---|
| Date | 2012-06-03 20:59 -0700 |
| Message-ID | <OOWdnaKU06A9r1HSnZ2dnUVZ_oudnZ2d@earthlink.com> |
| In reply to | #1816 |
On 6/3/2012 8:51 PM, Roedy Green wrote: > On Sun, 03 Jun 2012 08:40:22 -0700, Knute Johnson > <nospam@knutejohnson.com> wrote, quoted or indirectly quoted someone > who said : > >>> One of the problems is HashMaps, even in 64-bit JVMs, can still only >>> hold 1<< 30 elements. >> >> What does that mean? > > see http://www.mathe-online.at/JavaCalc/jcintro.html > it will evaluate it for you to : > 1,073,741,824 > Note it is smaller than > Integer.MAX_VALUE > +2,147,483,647 [2^31-1] aka 2 gig, roughly 2 billion > > I "cheated" and looked inside HashMap source. The docs lie. The max > value is controlled by a constant > MAX_CAPACITY = 1<< 30; > > Internally it needs an array sized a power of 2 with some breathing > room to track when the various hash chains start. The power of two > restriction allows it to convert a hashCode to an array index with > just a shift rather than a divide/modulus. > > I thought that was a restriction on the number of buckets, not on the number of entries. Patricia
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-06-03 21:13 -0700 |
| Message-ID | <04dos7p5ru620sdqokjah6bhd1dbbvcfa0@4ax.com> |
| In reply to | #1817 |
On Sun, 03 Jun 2012 20:59:48 -0700, Patricia Shanahan <pats@acm.org> wrote, quoted or indirectly quoted someone who said : >I thought that was a restriction on the number of buckets, not on the >number of entries. I am so used to having the number of buckets larger than the number of entries I forgot that is an efficiency measure not an absolute requirement to function. Given the way chain lengths explode when space gets tight, for all practical purposes the max capacity is in the order of 1 << 30. My first paid programming job back when I was 15 was writing a Hashtable on punch cards in FORTRAN. Back then a prime 149 was a reasonable default number of buckets. -- Roedy Green Canadian Mind Products http://mindprod.com Controlling complexity is the essence of computer programming. ~ Brian W. Kernighan 1942-01-01 .
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.help
csiph-web