Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #6304
| From | Eric Sosman <esosman@ieee-dot-org.invalid> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: HashSet keeps all nonidentical equal objects in memory |
| Date | 2011-07-20 07:30 -0400 |
| Organization | A noiseless patient Spider |
| Message-ID | <j06eb1$ph5$1@dont-email.me> (permalink) |
| References | <2f8556b7-4d08-4adb-a455-7997fcff0829@m10g2000yqd.googlegroups.com> |
On 7/20/2011 5:43 AM, Frederik wrote:
> Hi,
>
> I've been doing java programming for over 10 years, but now I've
> encoutered a phenomenon that I wasn't aware of at all.
> I had an application in which I have a HashSet<String>. I added a lot
> of different String objects to this HashSet, but many of the String
> objects are equal to each other. Now, after a while my application ran
> out of memory, even with -Xmx1500M. This happened when there were only
> about 7000 different Strings in the set! I didn't understand this,
> until I started adding the "intern()" of every String object to the
> set instead of the original String object. Now the program needs
> virtually no memory anymore.
> There is only one explanation: before I used "intern()", ALL the
> different String objects, even the ones that are equal, were kept in
> memory by the HashSet! No matter how strange it sounds. I was
> wondering, does anybody have an explanation as to why this is the case?
I'm unable to reproduce your problem (see test program below).
Perhaps you've overlooked another possible explanation: Before you
switched to using intern(), maybe you were retaining your own
references to all those Strings accidentally.
Here's my test program: It inserts twenty thousand distinct but
identical Strings into a HashSet, pausing every now and then to
report how much memory is used (with some heavy-handed attempts to
force garbage collection):
package esosman.misc;
import java.util.HashSet;
public class HashSpace {
public static void main(String[] unused) {
HashSet<String> set = new HashSet<String>();
String value = "x";
for (int n = 0; n < 20; ++n) {
report(n * 1000);
for (int i = 0; i < 1000; ++i) {
value = (value + "x").substring(1);
set.add(value);
}
}
report(20 * 1000);
}
private static void report(int insertions) {
long memUsed = runtime.totalMemory() - runtime.freeMemory();
long memPrev = Long.MAX_VALUE;
for (int gc = 0; (memUsed < memPrev) && gc < 5; ++gc) {
runtime.runFinalization();
runtime.gc();
Thread.yield();
memPrev = memUsed;
memUsed = runtime.totalMemory() - runtime.freeMemory();
}
System.out.printf("After %d insertions, memory used = %d\n",
insertions, memUsed);
}
private static final Runtime runtime = Runtime.getRuntime();
}
... and here's what I get for output:
After 0 insertions, memory used = 125656
After 1000 insertions, memory used = 133272
After 2000 insertions, memory used = 133664
After 3000 insertions, memory used = 133272
After 4000 insertions, memory used = 133312
After 5000 insertions, memory used = 133272
After 6000 insertions, memory used = 133312
After 7000 insertions, memory used = 133272
After 8000 insertions, memory used = 133312
After 9000 insertions, memory used = 133272
After 10000 insertions, memory used = 133312
After 11000 insertions, memory used = 133272
After 12000 insertions, memory used = 133312
After 13000 insertions, memory used = 133448
After 14000 insertions, memory used = 133840
After 15000 insertions, memory used = 133448
After 16000 insertions, memory used = 133488
After 17000 insertions, memory used = 133272
After 18000 insertions, memory used = 133312
After 19000 insertions, memory used = 133272
After 20000 insertions, memory used = 133312
I see no evidence that all those String instances are being
retained anywhere: They need ~24 bytes apiece, which would come
to about half a megabyte.
--
Eric Sosman
esosman@ieee-dot-org.invalid
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
HashSet keeps all nonidentical equal objects in memory Frederik <landcglobal@gmail.com> - 2011-07-20 02:43 -0700
Re: HashSet keeps all nonidentical equal objects in memory Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-07-20 07:30 -0400
Re: HashSet keeps all nonidentical equal objects in memory Frederik <landcglobal@gmail.com> - 2011-07-20 04:09 -0700
Re: HashSet keeps all nonidentical equal objects in memory markspace <-@.> - 2011-07-20 08:22 -0700
Re: HashSet keeps all nonidentical equal objects in memory Robert Klemme <shortcutter@googlemail.com> - 2011-07-20 08:38 -0700
Re: HashSet keeps all nonidentical equal objects in memory lewbloch <lewbloch@gmail.com> - 2011-07-20 09:31 -0700
csiph-web