Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #6310

Re: HashSet keeps all nonidentical equal objects in memory

From lewbloch <lewbloch@gmail.com>
Newsgroups comp.lang.java.programmer
Subject Re: HashSet keeps all nonidentical equal objects in memory
Date 2011-07-20 09:31 -0700
Organization http://groups.google.com
Message-ID <2537b5c3-5526-436c-94bc-c19428e1cd6b@e20g2000prf.googlegroups.com> (permalink)
References <2f8556b7-4d08-4adb-a455-7997fcff0829@m10g2000yqd.googlegroups.com> <c8b56e6e-b04f-4831-b6ab-712b10402a50@x10g2000vbl.googlegroups.com>

Show all headers | View raw


On Jul 20, 8:38 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> On 20 Jul., 11:43, Frederik <landcglo...@gmail.com> wrote:
>
> > I've been doing java programming for over 10 years, but now I've
> > encoutered a phenomenon that I wasn't aware of at all.
>
> Apparently you didn't - as you found out in the meantime. :-)
>
> > I had an application in which I have a HashSet<String>. I added a lot
> > of different String objects to this HashSet, but many of the String
> > objects are equal to each other. Now, after a while my application ran
> > out of memory, even with -Xmx1500M. This happened when there were only
> > about 7000 different Strings in the set! I didn't understand this,
> > until I started adding the "intern()" of every String object to the
> > set instead of the original String object. Now the program needs
> > virtually no memory anymore.
> > There is only one explanation: before I used "intern()", ALL the
> > different String objects, even the ones that are equal, were kept in
> > memory by the HashSet! No matter how strange it sounds. I was
> > wondering, does anybody have an explanation as to why this is the case?
>
> No, that conclusion is not warranted by the facts.  You only know that
> *something* kept hold of a lot of memory (String instances).  Since we
> do neither know all the code nor do we know the application
> architecture we can only speculate but it seems a realistic assumption
> that those String instances are not only kept by the HashSet but
> somewhere else.
>
> An easy way you can create such a situation is that you are reading
> from some external source (file) repeated content and create an object
> which - among other things - holds the String.  Now you have 1,000,000
> objects holding on to 1,000,000 String instances but there are only
> 7,000 different character sequences.  In such a situation it may be
> better to have a HashMap<String,String> where you store the String
> only once and reuse that first instance.  Basically this is what
> happened when you used String.intern() only that you do not have
> control over this storage any more which - depending on application
> type - can still create a serious memory leak, e.g. long running app
> which over time reads multiple files with different sets of repeated
> strings.
>

To highlight one of Robert's points much more specifically,
undisciplined use of 'intern()' can create memory pressure itself.
It's not really a good idea to intern every single 'String' because
that uses up the intern space and disables GC to clean up dead
strings.

Sweeping dirt under the carpet makes dirt less visible, not the floor
clean.

--
Lew

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Find similar


Thread

HashSet keeps all nonidentical equal objects in memory Frederik <landcglobal@gmail.com> - 2011-07-20 02:43 -0700
  Re: HashSet keeps all nonidentical equal objects in memory Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-07-20 07:30 -0400
  Re: HashSet keeps all nonidentical equal objects in memory Frederik <landcglobal@gmail.com> - 2011-07-20 04:09 -0700
    Re: HashSet keeps all nonidentical equal objects in memory markspace <-@.> - 2011-07-20 08:22 -0700
  Re: HashSet keeps all nonidentical equal objects in memory Robert Klemme <shortcutter@googlemail.com> - 2011-07-20 08:38 -0700
    Re: HashSet keeps all nonidentical equal objects in memory lewbloch <lewbloch@gmail.com> - 2011-07-20 09:31 -0700

csiph-web