Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: HashSet keeps all nonidentical equal objects in memory Date: Wed, 20 Jul 2011 07:30:35 -0400 Organization: A noiseless patient Spider Lines: 94 Message-ID: References: <2f8556b7-4d08-4adb-a455-7997fcff0829@m10g2000yqd.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Wed, 20 Jul 2011 11:31:45 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="f8igmItKsWs6nM5YanFxAA"; logging-data="26149"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18bMcR2f1P4PV7TDxW+Nihs" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 In-Reply-To: <2f8556b7-4d08-4adb-a455-7997fcff0829@m10g2000yqd.googlegroups.com> Cancel-Lock: sha1:PvLWKMVmbAyJGhouUfWcSFA8/Ag= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:6304 On 7/20/2011 5:43 AM, Frederik wrote: > Hi, > > I've been doing java programming for over 10 years, but now I've > encoutered a phenomenon that I wasn't aware of at all. > I had an application in which I have a HashSet. I added a lot > of different String objects to this HashSet, but many of the String > objects are equal to each other. Now, after a while my application ran > out of memory, even with -Xmx1500M. This happened when there were only > about 7000 different Strings in the set! I didn't understand this, > until I started adding the "intern()" of every String object to the > set instead of the original String object. Now the program needs > virtually no memory anymore. > There is only one explanation: before I used "intern()", ALL the > different String objects, even the ones that are equal, were kept in > memory by the HashSet! No matter how strange it sounds. I was > wondering, does anybody have an explanation as to why this is the case? I'm unable to reproduce your problem (see test program below). Perhaps you've overlooked another possible explanation: Before you switched to using intern(), maybe you were retaining your own references to all those Strings accidentally. Here's my test program: It inserts twenty thousand distinct but identical Strings into a HashSet, pausing every now and then to report how much memory is used (with some heavy-handed attempts to force garbage collection): package esosman.misc; import java.util.HashSet; public class HashSpace { public static void main(String[] unused) { HashSet set = new HashSet(); String value = "x"; for (int n = 0; n < 20; ++n) { report(n * 1000); for (int i = 0; i < 1000; ++i) { value = (value + "x").substring(1); set.add(value); } } report(20 * 1000); } private static void report(int insertions) { long memUsed = runtime.totalMemory() - runtime.freeMemory(); long memPrev = Long.MAX_VALUE; for (int gc = 0; (memUsed < memPrev) && gc < 5; ++gc) { runtime.runFinalization(); runtime.gc(); Thread.yield(); memPrev = memUsed; memUsed = runtime.totalMemory() - runtime.freeMemory(); } System.out.printf("After %d insertions, memory used = %d\n", insertions, memUsed); } private static final Runtime runtime = Runtime.getRuntime(); } ... and here's what I get for output: After 0 insertions, memory used = 125656 After 1000 insertions, memory used = 133272 After 2000 insertions, memory used = 133664 After 3000 insertions, memory used = 133272 After 4000 insertions, memory used = 133312 After 5000 insertions, memory used = 133272 After 6000 insertions, memory used = 133312 After 7000 insertions, memory used = 133272 After 8000 insertions, memory used = 133312 After 9000 insertions, memory used = 133272 After 10000 insertions, memory used = 133312 After 11000 insertions, memory used = 133272 After 12000 insertions, memory used = 133312 After 13000 insertions, memory used = 133448 After 14000 insertions, memory used = 133840 After 15000 insertions, memory used = 133448 After 16000 insertions, memory used = 133488 After 17000 insertions, memory used = 133272 After 18000 insertions, memory used = 133312 After 19000 insertions, memory used = 133272 After 20000 insertions, memory used = 133312 I see no evidence that all those String instances are being retained anywhere: They need ~24 bytes apiece, which would come to about half a megabyte. -- Eric Sosman esosman@ieee-dot-org.invalid