Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #6304

Re: HashSet keeps all nonidentical equal objects in memory

From Eric Sosman <esosman@ieee-dot-org.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: HashSet keeps all nonidentical equal objects in memory
Date 2011-07-20 07:30 -0400
Organization A noiseless patient Spider
Message-ID <j06eb1$ph5$1@dont-email.me> (permalink)
References <2f8556b7-4d08-4adb-a455-7997fcff0829@m10g2000yqd.googlegroups.com>

Show all headers | View raw


On 7/20/2011 5:43 AM, Frederik wrote:
> Hi,
>
> I've been doing java programming for over 10 years, but now I've
> encoutered a phenomenon that I wasn't aware of at all.
> I had an application in which I have a HashSet<String>. I added a lot
> of different String objects to this HashSet, but many of the String
> objects are equal to each other. Now, after a while my application ran
> out of memory, even with -Xmx1500M. This happened when there were only
> about 7000 different Strings in the set! I didn't understand this,
> until I started adding the "intern()" of every String object to the
> set instead of the original String object. Now the program needs
> virtually no memory anymore.
> There is only one explanation: before I used "intern()", ALL the
> different String objects, even the ones that are equal, were kept in
> memory by the HashSet! No matter how strange it sounds. I was
> wondering, does anybody have an explanation as to why this is the case?

     I'm unable to reproduce your problem (see test program below).
Perhaps you've overlooked another possible explanation: Before you
switched to using intern(), maybe you were retaining your own
references to all those Strings accidentally.

     Here's my test program: It inserts twenty thousand distinct but
identical Strings into a HashSet, pausing every now and then to
report how much memory is used (with some heavy-handed attempts to
force garbage collection):

package esosman.misc;
import java.util.HashSet;

public class HashSpace {

     public static void main(String[] unused) {
         HashSet<String> set = new HashSet<String>();
         String value = "x";
         for (int n = 0;  n < 20;  ++n) {
             report(n * 1000);
             for (int i = 0;  i < 1000;  ++i) {
                 value = (value + "x").substring(1);
                 set.add(value);
             }
         }
         report(20 * 1000);
     }

     private static void report(int insertions) {
         long memUsed = runtime.totalMemory() - runtime.freeMemory();
         long memPrev = Long.MAX_VALUE;
         for (int gc = 0;  (memUsed < memPrev) && gc < 5;  ++gc) {
             runtime.runFinalization();
             runtime.gc();
             Thread.yield();
             memPrev = memUsed;
             memUsed = runtime.totalMemory() - runtime.freeMemory();
         }
         System.out.printf("After %d insertions, memory used = %d\n",
                 insertions, memUsed);
     }

     private static final Runtime runtime = Runtime.getRuntime();
}

... and here's what I get for output:

After 0 insertions, memory used = 125656
After 1000 insertions, memory used = 133272
After 2000 insertions, memory used = 133664
After 3000 insertions, memory used = 133272
After 4000 insertions, memory used = 133312
After 5000 insertions, memory used = 133272
After 6000 insertions, memory used = 133312
After 7000 insertions, memory used = 133272
After 8000 insertions, memory used = 133312
After 9000 insertions, memory used = 133272
After 10000 insertions, memory used = 133312
After 11000 insertions, memory used = 133272
After 12000 insertions, memory used = 133312
After 13000 insertions, memory used = 133448
After 14000 insertions, memory used = 133840
After 15000 insertions, memory used = 133448
After 16000 insertions, memory used = 133488
After 17000 insertions, memory used = 133272
After 18000 insertions, memory used = 133312
After 19000 insertions, memory used = 133272
After 20000 insertions, memory used = 133312

     I see no evidence that all those String instances are being
retained anywhere: They need ~24 bytes apiece, which would come
to about half a megabyte.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

HashSet keeps all nonidentical equal objects in memory Frederik <landcglobal@gmail.com> - 2011-07-20 02:43 -0700
  Re: HashSet keeps all nonidentical equal objects in memory Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-07-20 07:30 -0400
  Re: HashSet keeps all nonidentical equal objects in memory Frederik <landcglobal@gmail.com> - 2011-07-20 04:09 -0700
    Re: HashSet keeps all nonidentical equal objects in memory markspace <-@.> - 2011-07-20 08:22 -0700
  Re: HashSet keeps all nonidentical equal objects in memory Robert Klemme <shortcutter@googlemail.com> - 2011-07-20 08:38 -0700
    Re: HashSet keeps all nonidentical equal objects in memory lewbloch <lewbloch@gmail.com> - 2011-07-20 09:31 -0700

csiph-web