Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #15280 > unrolled thread

Infinite loop from HashMap.keySet iteration.

Started byDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
First post2012-06-14 09:54 -0700
Last post2012-06-18 15:30 -0700
Articles 11 — 4 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-14 09:54 -0700
    Re: Infinite loop from HashMap.keySet iteration. markspace <-@.> - 2012-06-14 10:17 -0700
      Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-14 10:34 -0700
    Re: Infinite loop from HashMap.keySet iteration. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-14 11:34 -0700
      Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-14 12:19 -0700
    Re: Infinite loop from HashMap.keySet iteration. Robert Klemme <shortcutter@googlemail.com> - 2012-06-18 04:42 -0700
      Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-18 09:08 -0700
        Re: Infinite loop from HashMap.keySet iteration. Robert Klemme <shortcutter@googlemail.com> - 2012-06-18 18:34 +0200
          Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-18 13:32 -0700
            Re: Infinite loop from HashMap.keySet iteration. Robert Klemme <shortcutter@googlemail.com> - 2012-06-18 23:26 +0200
              Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-18 15:30 -0700

#15280 — Infinite loop from HashMap.keySet iteration.

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-14 09:54 -0700
SubjectInfinite loop from HashMap.keySet iteration.
Message-ID<EHoCr.13285$ji7.6276@newsfe20.iad>
Just wanted to share some experience with the group.

I've come across a very strange situation, and I think I have the 
solution (still needs to be verified in production, but looks promising).

I've inherited some code which basically does the following. Obfuscated 
and simplified for demonstration purposes.

void logResults(Map<String, Result> resultsMap) {
    StringBuilder logLine = new StringBuilder("Some stuff");
    Set<String> keySet = resultsMap.keySet();
    for (String key: keySet) {
       logLine.append(key).append(": ").append(resultsMap.get(key));
    }
    logger.info(logLine.toString());
}

One day, we upgraded our application container, and this (which had been 
working fine), started to cause OOM exceptions.  Digging through an 
HPROF, I found the "Some stuff" string in a StringBuilder which was 
600MB is size, and had the same values of a couple of keys repeated over 
and over again.

I found a newer version of this library function which someone had added 
a simple counter to break out of the loop if the count  > keySet.size();

Fixes the catastrophic symptom, but not the cause.

My belief is that this is an unusual manifestation of a concurrency 
issue.  the resultsMap is actually written to in this manor, by multiple 
threads.


void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    synchronize(factory) {
        Result result;
        result = resultsMap.get(factory.getName());
        if (result == null) {
             result = new Result();
             resultsMap.put(factory.getName(), result);
        }
        result.recordResult(factory.getResult());
    }
}

This may look "safe", but really the resultsMap internal data structure 
can become inconsistent because it is being accessed by two threads at 
the same time.

Since this is legacy code and I don't have time to do a full analysis 
and repair the fundamental design flaws, my solution was to make 
resultsMap a Collections.synchronizedMap(new HashMap<String, Result>()), 
instead of a straight up HashMap.

 From what I could tell, addResult is the only method which writes to 
that map (but since its a Map being sent around all over the place, 
that's hard to tell for sure). addResult's synchronization should be 
enough to handle the "atomic" operation of "add if absent".  Given that 
factory.getName() is unique to that factory instance.

Anyway, just thought I'd share this interesting case-study with the 
group.  I just wish my predecessor had read /Java Concurrency in 
Practice/ before trying to write a threaded framework. This is only 
*one* of the issues that's come out of this framework.

Sincerely,
Daniel.

[toc] | [next] | [standalone]


#15281

Frommarkspace <-@.>
Date2012-06-14 10:17 -0700
Message-ID<jrd6b5$mh1$1@dont-email.me>
In reply to#15280
Well, it doesn't look "safe" to me because you are reading 
unsynchronized from a shared object in the logResults() method.  Always 
always always you must synchronized both the read and the write, or it's 
not thread safe.

You're also not MUTEX safe either, which clearly needs to be dealt with. 
  If you were using some form of Collections iterator, you'd probably be 
getting concurrent modification exceptions.

For the MUTEX/concurrent modification problem, I don't think using 
Collections.synchronizedMap() is enough.  You're going to have to block 
all access during the read.  This could ruin the concurrency of your 
app, so there are even further issues here to consider, but strictly 
from a thread safety issue it's the better solution I think.

void logResults(Map<String, Result> resultsMap) {
  StringBuilder logLine = new StringBuilder("Some stuff");
  synchronized( resultsMap ) {
    Set<String> keySet = resultsMap.keySet();
    for (String key: keySet) {
       logLine.append(key).append(": ").append(resultsMap.get(key));
    }
  }
  logger.info(logLine.toString());
}

Assuming "Some stuff" is thread safe.

However I think a different scheme altogether is needed.  Something like 
a ConcurrentLinkedQueue rather than a Map is probably an even better 
solution overall, from both an application performance & concurrency 
standpoint, and a thread safe & correct standpoint.

I realize you say your app is legacy.  I'm just tossing ideas out for 
future reference.

[toc] | [prev] | [next] | [standalone]


#15283

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-14 10:34 -0700
Message-ID<0hpCr.13948$l_1.9019@newsfe21.iad>
In reply to#15281
On 6/14/12 10:17 AM, markspace wrote:
> Well, it doesn't look "safe" to me because you are reading
> unsynchronized from a shared object in the logResults() method. Always
> always always you must synchronized both the read and the write, or it's
> not thread safe.
Collections.synchronizedMap adds the synchronization necessary to safely 
read/write (as separate operations, not atomic).
>
> You're also not MUTEX safe either, which clearly needs to be dealt with.
> If you were using some form of Collections iterator, you'd probably be
> getting concurrent modification exceptions.
The logResults is only called after all the addResults methods have 
completed.
>
> For the MUTEX/concurrent modification problem, I don't think using
> Collections.synchronizedMap() is enough. You're going to have to block
> all access during the read. This could ruin the concurrency of your app,
> so there are even further issues here to consider, but strictly from a
> thread safety issue it's the better solution I think.
If you look at the addResult method, that handles the MUTEX case.
>
> void logResults(Map<String, Result> resultsMap) {
> StringBuilder logLine = new StringBuilder("Some stuff");
> synchronized( resultsMap ) {
> Set<String> keySet = resultsMap.keySet();
> for (String key: keySet) {
> logLine.append(key).append(": ").append(resultsMap.get(key));
> }
> }
> logger.info(logLine.toString());
> }
>
> Assuming "Some stuff" is thread safe.
>
> However I think a different scheme altogether is needed. Something like
> a ConcurrentLinkedQueue rather than a Map is probably an even better
> solution overall, from both an application performance & concurrency
> standpoint, and a thread safe & correct standpoint.
Actually, the entire scheme that requires all of this is junk, so it 
wouldn't be worth fixing. A complete rewrite the better idioms and real 
abstractions is what is needed.  Unfortunately, this code-base is being 
abandoned so we can move to a PHP based platform. Yes, I know. I don't 
want to hear it, because you're preaching to the choir. That decision 
was made by the CTO, and I've given all the feedback my position allows.
>
> I realize you say your app is legacy. I'm just tossing ideas out for
> future reference.
Understood. :-)

[toc] | [prev] | [next] | [standalone]


#15286

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-06-14 11:34 -0700
Message-ID<ejbkt796o006jku7l0pvgntjc73qlbf4bn@4ax.com>
In reply to#15280
On Thu, 14 Jun 2012 09:54:59 -0700, Daniel Pitts
<newsgroup.nospam@virtualinfinity.net> wrote, quoted or indirectly
quoted someone who said :

>This may look "safe", but really the resultsMap internal data structure 
>can become inconsistent because it is being accessed by two threads at 
>the same time.

I did a cursory glance, but are you not modifying the values of keys?
That is a definite no no, even on the same thread.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Controlling complexity is the essence of computer programming.
~ Brian W. Kernighan 1942-01-01
.

[toc] | [prev] | [next] | [standalone]


#15288

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-14 12:19 -0700
Message-ID<QOqCr.14184$GJ4.1574@newsfe16.iad>
In reply to#15286
On 6/14/12 11:34 AM, Roedy Green wrote:
> On Thu, 14 Jun 2012 09:54:59 -0700, Daniel Pitts
> <newsgroup.nospam@virtualinfinity.net>  wrote, quoted or indirectly
> quoted someone who said :
>
>> This may look "safe", but really the resultsMap internal data structure
>> can become inconsistent because it is being accessed by two threads at
>> the same time.
>
> I did a cursory glance, but are you not modifying the values of keys?
> That is a definite no no, even on the same thread.
Keys are String, String is immutable.

[toc] | [prev] | [next] | [standalone]


#15377

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-06-18 04:42 -0700
Message-ID<313a279f-0e72-4431-9879-c0deb4385f1c@googlegroups.com>
In reply to#15280
On Thursday, June 14, 2012 6:54:59 PM UTC+2, Daniel Pitts wrote:
> Just wanted to share some experience with the group.
> 
> I've come across a very strange situation, and I think I have the 
> solution (still needs to be verified in production, but looks promising).
> 
> I've inherited some code which basically does the following. Obfuscated 
> and simplified for demonstration purposes.
> 
> void logResults(Map<String, Result> resultsMap) {
>     StringBuilder logLine = new StringBuilder("Some stuff");
>     Set<String> keySet = resultsMap.keySet();
>     for (String key: keySet) {
>        logLine.append(key).append(": ").append(resultsMap.get(key));
>     }
>     logger.info(logLine.toString());
> }

That should look roughly like this (more efficient to use the entrySet() and synchronize added):

void logResults(Map<String, Result> resultsMap) {
    final StringBuilder logLine = new StringBuilder("Some stuff");

    synchronized (resultsMap) {
      for (final Entry<String, Result> entry: resultsMap.entrySet()) {
         logLine.append(entry.getKey()).append(": ").append(entry.getValue());
      }
    }

    logger.info(logLine.toString());
}

> void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
>     synchronize(factory) {
>         Result result;
>         result = resultsMap.get(factory.getName());
>         if (result == null) {
>              result = new Result();
>              resultsMap.put(factory.getName(), result);
>         }
>         result.recordResult(factory.getResult());
>     }
> }

It looks like there is a spelling error and it was intended to be

void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
    synchronize(resultsMap) { // error fixed
        Result result = resultsMap.get(factory.getName());
        if (result == null) {
             result = new Result();
             resultsMap.put(factory.getName(), result);
        }
        result.recordResult(factory.getResult());
    }
}

Or the synchronize(resultsMap) was simply forgotten.

Kind regards

robert

[toc] | [prev] | [next] | [standalone]


#15380

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-18 09:08 -0700
Message-ID<RnIDr.19181$l_1.6246@newsfe21.iad>
In reply to#15377
On 6/18/12 4:42 AM, Robert Klemme wrote:
> On Thursday, June 14, 2012 6:54:59 PM UTC+2, Daniel Pitts wrote:
>> Just wanted to share some experience with the group.
>>
>> I've come across a very strange situation, and I think I have the
>> solution (still needs to be verified in production, but looks promising).
>>
>> I've inherited some code which basically does the following. Obfuscated
>> and simplified for demonstration purposes.
>>
>> void logResults(Map<String, Result>  resultsMap) {
>>      StringBuilder logLine = new StringBuilder("Some stuff");
>>      Set<String>  keySet = resultsMap.keySet();
>>      for (String key: keySet) {
>>         logLine.append(key).append(": ").append(resultsMap.get(key));
>>      }
>>      logger.info(logLine.toString());
>> }
>
> That should look roughly like this (more efficient to use the entrySet() and synchronize added):
>
> void logResults(Map<String, Result>  resultsMap) {
>      final StringBuilder logLine = new StringBuilder("Some stuff");
>
>      synchronized (resultsMap) {
>        for (final Entry<String, Result>  entry: resultsMap.entrySet()) {
>           logLine.append(entry.getKey()).append(": ").append(entry.getValue());
>        }
>      }
>
>      logger.info(logLine.toString());
> }
Agreed. However, the code I was handed used keySet().  I didn't feel 
like refactoring this (for many reasons).
>
>> void addResult(ResultFactory factory, Map<String, Result>  resultsMap) {
>>      synchronize(factory) {
>>          Result result;
>>          result = resultsMap.get(factory.getName());
>>          if (result == null) {
>>               result = new Result();
>>               resultsMap.put(factory.getName(), result);
>>          }
>>          result.recordResult(factory.getResult());
>>      }
>> }
>
> It looks like there is a spelling error and it was intended to be
>
> void addResult(ResultFactory factory, Map<String, Result>  resultsMap) {
>      synchronize(resultsMap) { // error fixed
Nope, that is not, in fact, the original author's intent. There was no 
other synchronizing on the resultMap. The original author was only 
worried about protecting the key, not the whole map.  This of course is 
not sufficient, but it is what they implemented.
>          Result result = resultsMap.get(factory.getName());
>          if (result == null) {
>               result = new Result();
>               resultsMap.put(factory.getName(), result);
>          }
>          result.recordResult(factory.getResult());
>      }
> }
>
> Or the synchronize(resultsMap) was simply forgotten.
Exactly, which is why I ensure resultsMap is a synchronized Map instead.
>
> Kind regards
>
> robert

[toc] | [prev] | [next] | [standalone]


#15381

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-06-18 18:34 +0200
Message-ID<a493k2FpcfU1@mid.individual.net>
In reply to#15380
On 06/18/2012 06:08 PM, Daniel Pitts wrote:
> On 6/18/12 4:42 AM, Robert Klemme wrote:

>> [refactoring from keySet() to entrySet()]

> Agreed. However, the code I was handed used keySet(). I didn't feel like
> refactoring this (for many reasons).

For example?

> Nope, that is not, in fact, the original author's intent. There was no
> other synchronizing on the resultMap. The original author was only
> worried about protecting the key, not the whole map. This of course is
> not sufficient, but it is what they implemented.

The key is a String which does not need synchronization because it is 
immutable.  Did you mean to say "factory" instead?  I have no idea what 
ResultFactory actually does but getName() looks like something which 
always returns the same name.  If that was the case then no 
synchronization would be necessary.  And if getResult() involves a 
calculation which needs to be thread safe then that method could be 
synchronized.  Can you share more details about that class?

>> Result result = resultsMap.get(factory.getName());
>> if (result == null) {
>> result = new Result();
>> resultsMap.put(factory.getName(), result);
>> }
>> result.recordResult(factory.getResult());
>> }
>> }
>>
>> Or the synchronize(resultsMap) was simply forgotten.
> Exactly, which is why I ensure resultsMap is a synchronized Map instead.

This won't make the code entirely thread safe - at least if you want to 
ensure that the fist thread which detects the missing entry also adds it 
to the Map and - maybe more important - that getResult() is invoked only 
once per key.  Collections.synchronizedMap() only ensures all accesses 
are synchronized - not more.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]


#15391

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-18 13:32 -0700
Message-ID<xfMDr.19346$l_1.5660@newsfe21.iad>
In reply to#15381
On 6/18/12 9:34 AM, Robert Klemme wrote:
> On 06/18/2012 06:08 PM, Daniel Pitts wrote:
>> On 6/18/12 4:42 AM, Robert Klemme wrote:
>
>>> [refactoring from keySet() to entrySet()]
>
>> Agreed. However, the code I was handed used keySet(). I didn't feel like
>> refactoring this (for many reasons).
>
> For example?
1. This is legacy code.
2. Changing the semantics of this method might introduce bugs. I don't 
feel like debugging this package.
3. This is a centrally used library for many different development 
teams. I didn't want to worry about side-effects on other teams.
4. This code base *will* be replaced in the next year.
5. As an engineer with decades of experience, I have decided that 
changing this code-base was going to be more expensive than leaving it 
alone.
>
>> Nope, that is not, in fact, the original author's intent. There was no
>> other synchronizing on the resultMap. The original author was only
>> worried about protecting the key, not the whole map. This of course is
>> not sufficient, but it is what they implemented.
>
> The key is a String which does not need synchronization because it is
> immutable. Did you mean to say "factory" instead? I have no idea what
> ResultFactory actually does but getName() looks like something which
> always returns the same name. If that was the case then no
> synchronization would be necessary. And if getResult() involves a
> calculation which needs to be thread safe then that method could be
> synchronized. Can you share more details about that class?
getName always returned the same name for the specific factory. 
getResult itself needn't be synchronized.

Again, see the reasons above why I'm not making sweeping changes to this 
package.
>
>>> Result result = resultsMap.get(factory.getName());
>>> if (result == null) {
>>> result = new Result();
>>> resultsMap.put(factory.getName(), result);
>>> }
>>> result.recordResult(factory.getResult());
>>> }
>>> }
>>>
>>> Or the synchronize(resultsMap) was simply forgotten.
>> Exactly, which is why I ensure resultsMap is a synchronized Map instead.
>
> This won't make the code entirely thread safe - at least if you want to
> ensure that the fist thread which detects the missing entry also adds it
> to the Map and - maybe more important - that getResult() is invoked only
> once per key. Collections.synchronizedMap() only ensures all accesses
> are synchronized - not more.
That is taking care of by the synchronization on the factory.  In 
reality, the Factory should have been the key, but that isn't how the 
original coder decided to do things.

In any case, the real problem was that there was no happens-before 
between any two "put" method calls, which lead to inconsistent state.

[toc] | [prev] | [next] | [standalone]


#15395

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-06-18 23:26 +0200
Message-ID<a49kp5FesU1@mid.individual.net>
In reply to#15391
On 18.06.2012 22:32, Daniel Pitts wrote:
> On 6/18/12 9:34 AM, Robert Klemme wrote:
>> On 06/18/2012 06:08 PM, Daniel Pitts wrote:
>>> On 6/18/12 4:42 AM, Robert Klemme wrote:
>>
>>>> [refactoring from keySet() to entrySet()]
>>
>>> Agreed. However, the code I was handed used keySet(). I didn't feel like
>>> refactoring this (for many reasons).
>>
>> For example?
> 1. This is legacy code.
> 2. Changing the semantics of this method might introduce bugs. I don't
> feel like debugging this package.
> 3. This is a centrally used library for many different development
> teams. I didn't want to worry about side-effects on other teams.
> 4. This code base *will* be replaced in the next year.
> 5. As an engineer with decades of experience, I have decided that
> changing this code-base was going to be more expensive than leaving it
> alone.

Yep, that sounds reasonable.  I'd just say that iterating via entrySet 
instead of keySet is only marginally semantically different: you'll even 
get more efficient iteration because you spare all the map lookups. 
Plus you are more likely to get the actually corresponding value to a key.

>>> Nope, that is not, in fact, the original author's intent. There was no
>>> other synchronizing on the resultMap. The original author was only
>>> worried about protecting the key, not the whole map. This of course is
>>> not sufficient, but it is what they implemented.
>>
>> The key is a String which does not need synchronization because it is
>> immutable. Did you mean to say "factory" instead? I have no idea what
>> ResultFactory actually does but getName() looks like something which
>> always returns the same name. If that was the case then no
>> synchronization would be necessary. And if getResult() involves a
>> calculation which needs to be thread safe then that method could be
>> synchronized. Can you share more details about that class?
> getName always returned the same name for the specific factory.
> getResult itself needn't be synchronized.
>
> Again, see the reasons above why I'm not making sweeping changes to this
> package.

Absolutely reasonable.

>>>> Or the synchronize(resultsMap) was simply forgotten.
>>> Exactly, which is why I ensure resultsMap is a synchronized Map instead.
>>
>> This won't make the code entirely thread safe - at least if you want to
>> ensure that the fist thread which detects the missing entry also adds it
>> to the Map and - maybe more important - that getResult() is invoked only
>> once per key. Collections.synchronizedMap() only ensures all accesses
>> are synchronized - not more.
> That is taking care of by the synchronization on the factory.

Not strictly: it will be unsafe if multiple factories can return the 
same (equivalent) name.

> In
> reality, the Factory should have been the key, but that isn't how the
> original coder decided to do things.

Often quoted "historic reasons". :-)  Maybe also the factory should 
really be the value in the map (with the name as key) and cache the key 
internally once it is calculated.

> In any case, the real problem was that there was no happens-before
> between any two "put" method calls, which lead to inconsistent state.

I understood your previous posting that the issue was more with missing 
synchronization between reading (iteration for logging) and writing 
(method addResult()).

Note also that wrapping the Map in a synchronized version does not 
prevent CME.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]


#15397

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-06-18 15:30 -0700
Message-ID<f_NDr.12400$Bp1.5373@newsfe10.iad>
In reply to#15395
On 6/18/12 2:26 PM, Robert Klemme wrote:
> On 18.06.2012 22:32, Daniel Pitts wrote:
>> In any case, the real problem was that there was no happens-before
>> between any two "put" method calls, which lead to inconsistent state.
>
> I understood your previous posting that the issue was more with missing
> synchronization between reading (iteration for logging) and writing
> (method addResult()).
>
> Note also that wrapping the Map in a synchronized version does not
> prevent CME.

Right, the iteration only happens after all the the writing has been 
completed, so no CME is possible from that. The damage is from the 
improperly synchronized "puts", causing the data structure to become 
corrupt.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web