Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #15280 > unrolled thread
| Started by | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| First post | 2012-06-14 09:54 -0700 |
| Last post | 2012-06-18 15:30 -0700 |
| Articles | 11 — 4 participants |
Back to article view | Back to comp.lang.java.programmer
Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-14 09:54 -0700
Re: Infinite loop from HashMap.keySet iteration. markspace <-@.> - 2012-06-14 10:17 -0700
Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-14 10:34 -0700
Re: Infinite loop from HashMap.keySet iteration. Roedy Green <see_website@mindprod.com.invalid> - 2012-06-14 11:34 -0700
Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-14 12:19 -0700
Re: Infinite loop from HashMap.keySet iteration. Robert Klemme <shortcutter@googlemail.com> - 2012-06-18 04:42 -0700
Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-18 09:08 -0700
Re: Infinite loop from HashMap.keySet iteration. Robert Klemme <shortcutter@googlemail.com> - 2012-06-18 18:34 +0200
Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-18 13:32 -0700
Re: Infinite loop from HashMap.keySet iteration. Robert Klemme <shortcutter@googlemail.com> - 2012-06-18 23:26 +0200
Re: Infinite loop from HashMap.keySet iteration. Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-06-18 15:30 -0700
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-14 09:54 -0700 |
| Subject | Infinite loop from HashMap.keySet iteration. |
| Message-ID | <EHoCr.13285$ji7.6276@newsfe20.iad> |
Just wanted to share some experience with the group.
I've come across a very strange situation, and I think I have the
solution (still needs to be verified in production, but looks promising).
I've inherited some code which basically does the following. Obfuscated
and simplified for demonstration purposes.
void logResults(Map<String, Result> resultsMap) {
StringBuilder logLine = new StringBuilder("Some stuff");
Set<String> keySet = resultsMap.keySet();
for (String key: keySet) {
logLine.append(key).append(": ").append(resultsMap.get(key));
}
logger.info(logLine.toString());
}
One day, we upgraded our application container, and this (which had been
working fine), started to cause OOM exceptions. Digging through an
HPROF, I found the "Some stuff" string in a StringBuilder which was
600MB is size, and had the same values of a couple of keys repeated over
and over again.
I found a newer version of this library function which someone had added
a simple counter to break out of the loop if the count > keySet.size();
Fixes the catastrophic symptom, but not the cause.
My belief is that this is an unusual manifestation of a concurrency
issue. the resultsMap is actually written to in this manor, by multiple
threads.
void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
synchronize(factory) {
Result result;
result = resultsMap.get(factory.getName());
if (result == null) {
result = new Result();
resultsMap.put(factory.getName(), result);
}
result.recordResult(factory.getResult());
}
}
This may look "safe", but really the resultsMap internal data structure
can become inconsistent because it is being accessed by two threads at
the same time.
Since this is legacy code and I don't have time to do a full analysis
and repair the fundamental design flaws, my solution was to make
resultsMap a Collections.synchronizedMap(new HashMap<String, Result>()),
instead of a straight up HashMap.
From what I could tell, addResult is the only method which writes to
that map (but since its a Map being sent around all over the place,
that's hard to tell for sure). addResult's synchronization should be
enough to handle the "atomic" operation of "add if absent". Given that
factory.getName() is unique to that factory instance.
Anyway, just thought I'd share this interesting case-study with the
group. I just wish my predecessor had read /Java Concurrency in
Practice/ before trying to write a threaded framework. This is only
*one* of the issues that's come out of this framework.
Sincerely,
Daniel.
[toc] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-06-14 10:17 -0700 |
| Message-ID | <jrd6b5$mh1$1@dont-email.me> |
| In reply to | #15280 |
Well, it doesn't look "safe" to me because you are reading
unsynchronized from a shared object in the logResults() method. Always
always always you must synchronized both the read and the write, or it's
not thread safe.
You're also not MUTEX safe either, which clearly needs to be dealt with.
If you were using some form of Collections iterator, you'd probably be
getting concurrent modification exceptions.
For the MUTEX/concurrent modification problem, I don't think using
Collections.synchronizedMap() is enough. You're going to have to block
all access during the read. This could ruin the concurrency of your
app, so there are even further issues here to consider, but strictly
from a thread safety issue it's the better solution I think.
void logResults(Map<String, Result> resultsMap) {
StringBuilder logLine = new StringBuilder("Some stuff");
synchronized( resultsMap ) {
Set<String> keySet = resultsMap.keySet();
for (String key: keySet) {
logLine.append(key).append(": ").append(resultsMap.get(key));
}
}
logger.info(logLine.toString());
}
Assuming "Some stuff" is thread safe.
However I think a different scheme altogether is needed. Something like
a ConcurrentLinkedQueue rather than a Map is probably an even better
solution overall, from both an application performance & concurrency
standpoint, and a thread safe & correct standpoint.
I realize you say your app is legacy. I'm just tossing ideas out for
future reference.
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-14 10:34 -0700 |
| Message-ID | <0hpCr.13948$l_1.9019@newsfe21.iad> |
| In reply to | #15281 |
On 6/14/12 10:17 AM, markspace wrote:
> Well, it doesn't look "safe" to me because you are reading
> unsynchronized from a shared object in the logResults() method. Always
> always always you must synchronized both the read and the write, or it's
> not thread safe.
Collections.synchronizedMap adds the synchronization necessary to safely
read/write (as separate operations, not atomic).
>
> You're also not MUTEX safe either, which clearly needs to be dealt with.
> If you were using some form of Collections iterator, you'd probably be
> getting concurrent modification exceptions.
The logResults is only called after all the addResults methods have
completed.
>
> For the MUTEX/concurrent modification problem, I don't think using
> Collections.synchronizedMap() is enough. You're going to have to block
> all access during the read. This could ruin the concurrency of your app,
> so there are even further issues here to consider, but strictly from a
> thread safety issue it's the better solution I think.
If you look at the addResult method, that handles the MUTEX case.
>
> void logResults(Map<String, Result> resultsMap) {
> StringBuilder logLine = new StringBuilder("Some stuff");
> synchronized( resultsMap ) {
> Set<String> keySet = resultsMap.keySet();
> for (String key: keySet) {
> logLine.append(key).append(": ").append(resultsMap.get(key));
> }
> }
> logger.info(logLine.toString());
> }
>
> Assuming "Some stuff" is thread safe.
>
> However I think a different scheme altogether is needed. Something like
> a ConcurrentLinkedQueue rather than a Map is probably an even better
> solution overall, from both an application performance & concurrency
> standpoint, and a thread safe & correct standpoint.
Actually, the entire scheme that requires all of this is junk, so it
wouldn't be worth fixing. A complete rewrite the better idioms and real
abstractions is what is needed. Unfortunately, this code-base is being
abandoned so we can move to a PHP based platform. Yes, I know. I don't
want to hear it, because you're preaching to the choir. That decision
was made by the CTO, and I've given all the feedback my position allows.
>
> I realize you say your app is legacy. I'm just tossing ideas out for
> future reference.
Understood. :-)
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2012-06-14 11:34 -0700 |
| Message-ID | <ejbkt796o006jku7l0pvgntjc73qlbf4bn@4ax.com> |
| In reply to | #15280 |
On Thu, 14 Jun 2012 09:54:59 -0700, Daniel Pitts <newsgroup.nospam@virtualinfinity.net> wrote, quoted or indirectly quoted someone who said : >This may look "safe", but really the resultsMap internal data structure >can become inconsistent because it is being accessed by two threads at >the same time. I did a cursory glance, but are you not modifying the values of keys? That is a definite no no, even on the same thread. -- Roedy Green Canadian Mind Products http://mindprod.com Controlling complexity is the essence of computer programming. ~ Brian W. Kernighan 1942-01-01 .
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-14 12:19 -0700 |
| Message-ID | <QOqCr.14184$GJ4.1574@newsfe16.iad> |
| In reply to | #15286 |
On 6/14/12 11:34 AM, Roedy Green wrote: > On Thu, 14 Jun 2012 09:54:59 -0700, Daniel Pitts > <newsgroup.nospam@virtualinfinity.net> wrote, quoted or indirectly > quoted someone who said : > >> This may look "safe", but really the resultsMap internal data structure >> can become inconsistent because it is being accessed by two threads at >> the same time. > > I did a cursory glance, but are you not modifying the values of keys? > That is a definite no no, even on the same thread. Keys are String, String is immutable.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-06-18 04:42 -0700 |
| Message-ID | <313a279f-0e72-4431-9879-c0deb4385f1c@googlegroups.com> |
| In reply to | #15280 |
On Thursday, June 14, 2012 6:54:59 PM UTC+2, Daniel Pitts wrote:
> Just wanted to share some experience with the group.
>
> I've come across a very strange situation, and I think I have the
> solution (still needs to be verified in production, but looks promising).
>
> I've inherited some code which basically does the following. Obfuscated
> and simplified for demonstration purposes.
>
> void logResults(Map<String, Result> resultsMap) {
> StringBuilder logLine = new StringBuilder("Some stuff");
> Set<String> keySet = resultsMap.keySet();
> for (String key: keySet) {
> logLine.append(key).append(": ").append(resultsMap.get(key));
> }
> logger.info(logLine.toString());
> }
That should look roughly like this (more efficient to use the entrySet() and synchronize added):
void logResults(Map<String, Result> resultsMap) {
final StringBuilder logLine = new StringBuilder("Some stuff");
synchronized (resultsMap) {
for (final Entry<String, Result> entry: resultsMap.entrySet()) {
logLine.append(entry.getKey()).append(": ").append(entry.getValue());
}
}
logger.info(logLine.toString());
}
> void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
> synchronize(factory) {
> Result result;
> result = resultsMap.get(factory.getName());
> if (result == null) {
> result = new Result();
> resultsMap.put(factory.getName(), result);
> }
> result.recordResult(factory.getResult());
> }
> }
It looks like there is a spelling error and it was intended to be
void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
synchronize(resultsMap) { // error fixed
Result result = resultsMap.get(factory.getName());
if (result == null) {
result = new Result();
resultsMap.put(factory.getName(), result);
}
result.recordResult(factory.getResult());
}
}
Or the synchronize(resultsMap) was simply forgotten.
Kind regards
robert
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-18 09:08 -0700 |
| Message-ID | <RnIDr.19181$l_1.6246@newsfe21.iad> |
| In reply to | #15377 |
On 6/18/12 4:42 AM, Robert Klemme wrote:
> On Thursday, June 14, 2012 6:54:59 PM UTC+2, Daniel Pitts wrote:
>> Just wanted to share some experience with the group.
>>
>> I've come across a very strange situation, and I think I have the
>> solution (still needs to be verified in production, but looks promising).
>>
>> I've inherited some code which basically does the following. Obfuscated
>> and simplified for demonstration purposes.
>>
>> void logResults(Map<String, Result> resultsMap) {
>> StringBuilder logLine = new StringBuilder("Some stuff");
>> Set<String> keySet = resultsMap.keySet();
>> for (String key: keySet) {
>> logLine.append(key).append(": ").append(resultsMap.get(key));
>> }
>> logger.info(logLine.toString());
>> }
>
> That should look roughly like this (more efficient to use the entrySet() and synchronize added):
>
> void logResults(Map<String, Result> resultsMap) {
> final StringBuilder logLine = new StringBuilder("Some stuff");
>
> synchronized (resultsMap) {
> for (final Entry<String, Result> entry: resultsMap.entrySet()) {
> logLine.append(entry.getKey()).append(": ").append(entry.getValue());
> }
> }
>
> logger.info(logLine.toString());
> }
Agreed. However, the code I was handed used keySet(). I didn't feel
like refactoring this (for many reasons).
>
>> void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
>> synchronize(factory) {
>> Result result;
>> result = resultsMap.get(factory.getName());
>> if (result == null) {
>> result = new Result();
>> resultsMap.put(factory.getName(), result);
>> }
>> result.recordResult(factory.getResult());
>> }
>> }
>
> It looks like there is a spelling error and it was intended to be
>
> void addResult(ResultFactory factory, Map<String, Result> resultsMap) {
> synchronize(resultsMap) { // error fixed
Nope, that is not, in fact, the original author's intent. There was no
other synchronizing on the resultMap. The original author was only
worried about protecting the key, not the whole map. This of course is
not sufficient, but it is what they implemented.
> Result result = resultsMap.get(factory.getName());
> if (result == null) {
> result = new Result();
> resultsMap.put(factory.getName(), result);
> }
> result.recordResult(factory.getResult());
> }
> }
>
> Or the synchronize(resultsMap) was simply forgotten.
Exactly, which is why I ensure resultsMap is a synchronized Map instead.
>
> Kind regards
>
> robert
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-06-18 18:34 +0200 |
| Message-ID | <a493k2FpcfU1@mid.individual.net> |
| In reply to | #15380 |
On 06/18/2012 06:08 PM, Daniel Pitts wrote:
> On 6/18/12 4:42 AM, Robert Klemme wrote:
>> [refactoring from keySet() to entrySet()]
> Agreed. However, the code I was handed used keySet(). I didn't feel like
> refactoring this (for many reasons).
For example?
> Nope, that is not, in fact, the original author's intent. There was no
> other synchronizing on the resultMap. The original author was only
> worried about protecting the key, not the whole map. This of course is
> not sufficient, but it is what they implemented.
The key is a String which does not need synchronization because it is
immutable. Did you mean to say "factory" instead? I have no idea what
ResultFactory actually does but getName() looks like something which
always returns the same name. If that was the case then no
synchronization would be necessary. And if getResult() involves a
calculation which needs to be thread safe then that method could be
synchronized. Can you share more details about that class?
>> Result result = resultsMap.get(factory.getName());
>> if (result == null) {
>> result = new Result();
>> resultsMap.put(factory.getName(), result);
>> }
>> result.recordResult(factory.getResult());
>> }
>> }
>>
>> Or the synchronize(resultsMap) was simply forgotten.
> Exactly, which is why I ensure resultsMap is a synchronized Map instead.
This won't make the code entirely thread safe - at least if you want to
ensure that the fist thread which detects the missing entry also adds it
to the Map and - maybe more important - that getResult() is invoked only
once per key. Collections.synchronizedMap() only ensures all accesses
are synchronized - not more.
Kind regards
robert
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-18 13:32 -0700 |
| Message-ID | <xfMDr.19346$l_1.5660@newsfe21.iad> |
| In reply to | #15381 |
On 6/18/12 9:34 AM, Robert Klemme wrote:
> On 06/18/2012 06:08 PM, Daniel Pitts wrote:
>> On 6/18/12 4:42 AM, Robert Klemme wrote:
>
>>> [refactoring from keySet() to entrySet()]
>
>> Agreed. However, the code I was handed used keySet(). I didn't feel like
>> refactoring this (for many reasons).
>
> For example?
1. This is legacy code.
2. Changing the semantics of this method might introduce bugs. I don't
feel like debugging this package.
3. This is a centrally used library for many different development
teams. I didn't want to worry about side-effects on other teams.
4. This code base *will* be replaced in the next year.
5. As an engineer with decades of experience, I have decided that
changing this code-base was going to be more expensive than leaving it
alone.
>
>> Nope, that is not, in fact, the original author's intent. There was no
>> other synchronizing on the resultMap. The original author was only
>> worried about protecting the key, not the whole map. This of course is
>> not sufficient, but it is what they implemented.
>
> The key is a String which does not need synchronization because it is
> immutable. Did you mean to say "factory" instead? I have no idea what
> ResultFactory actually does but getName() looks like something which
> always returns the same name. If that was the case then no
> synchronization would be necessary. And if getResult() involves a
> calculation which needs to be thread safe then that method could be
> synchronized. Can you share more details about that class?
getName always returned the same name for the specific factory.
getResult itself needn't be synchronized.
Again, see the reasons above why I'm not making sweeping changes to this
package.
>
>>> Result result = resultsMap.get(factory.getName());
>>> if (result == null) {
>>> result = new Result();
>>> resultsMap.put(factory.getName(), result);
>>> }
>>> result.recordResult(factory.getResult());
>>> }
>>> }
>>>
>>> Or the synchronize(resultsMap) was simply forgotten.
>> Exactly, which is why I ensure resultsMap is a synchronized Map instead.
>
> This won't make the code entirely thread safe - at least if you want to
> ensure that the fist thread which detects the missing entry also adds it
> to the Map and - maybe more important - that getResult() is invoked only
> once per key. Collections.synchronizedMap() only ensures all accesses
> are synchronized - not more.
That is taking care of by the synchronization on the factory. In
reality, the Factory should have been the key, but that isn't how the
original coder decided to do things.
In any case, the real problem was that there was no happens-before
between any two "put" method calls, which lead to inconsistent state.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-06-18 23:26 +0200 |
| Message-ID | <a49kp5FesU1@mid.individual.net> |
| In reply to | #15391 |
On 18.06.2012 22:32, Daniel Pitts wrote: > On 6/18/12 9:34 AM, Robert Klemme wrote: >> On 06/18/2012 06:08 PM, Daniel Pitts wrote: >>> On 6/18/12 4:42 AM, Robert Klemme wrote: >> >>>> [refactoring from keySet() to entrySet()] >> >>> Agreed. However, the code I was handed used keySet(). I didn't feel like >>> refactoring this (for many reasons). >> >> For example? > 1. This is legacy code. > 2. Changing the semantics of this method might introduce bugs. I don't > feel like debugging this package. > 3. This is a centrally used library for many different development > teams. I didn't want to worry about side-effects on other teams. > 4. This code base *will* be replaced in the next year. > 5. As an engineer with decades of experience, I have decided that > changing this code-base was going to be more expensive than leaving it > alone. Yep, that sounds reasonable. I'd just say that iterating via entrySet instead of keySet is only marginally semantically different: you'll even get more efficient iteration because you spare all the map lookups. Plus you are more likely to get the actually corresponding value to a key. >>> Nope, that is not, in fact, the original author's intent. There was no >>> other synchronizing on the resultMap. The original author was only >>> worried about protecting the key, not the whole map. This of course is >>> not sufficient, but it is what they implemented. >> >> The key is a String which does not need synchronization because it is >> immutable. Did you mean to say "factory" instead? I have no idea what >> ResultFactory actually does but getName() looks like something which >> always returns the same name. If that was the case then no >> synchronization would be necessary. And if getResult() involves a >> calculation which needs to be thread safe then that method could be >> synchronized. Can you share more details about that class? > getName always returned the same name for the specific factory. > getResult itself needn't be synchronized. > > Again, see the reasons above why I'm not making sweeping changes to this > package. Absolutely reasonable. >>>> Or the synchronize(resultsMap) was simply forgotten. >>> Exactly, which is why I ensure resultsMap is a synchronized Map instead. >> >> This won't make the code entirely thread safe - at least if you want to >> ensure that the fist thread which detects the missing entry also adds it >> to the Map and - maybe more important - that getResult() is invoked only >> once per key. Collections.synchronizedMap() only ensures all accesses >> are synchronized - not more. > That is taking care of by the synchronization on the factory. Not strictly: it will be unsafe if multiple factories can return the same (equivalent) name. > In > reality, the Factory should have been the key, but that isn't how the > original coder decided to do things. Often quoted "historic reasons". :-) Maybe also the factory should really be the value in the map (with the name as key) and cache the key internally once it is calculated. > In any case, the real problem was that there was no happens-before > between any two "put" method calls, which lead to inconsistent state. I understood your previous posting that the issue was more with missing synchronization between reading (iteration for logging) and writing (method addResult()). Note also that wrapping the Map in a synchronized version does not prevent CME. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-06-18 15:30 -0700 |
| Message-ID | <f_NDr.12400$Bp1.5373@newsfe10.iad> |
| In reply to | #15395 |
On 6/18/12 2:26 PM, Robert Klemme wrote: > On 18.06.2012 22:32, Daniel Pitts wrote: >> In any case, the real problem was that there was no happens-before >> between any two "put" method calls, which lead to inconsistent state. > > I understood your previous posting that the issue was more with missing > synchronization between reading (iteration for logging) and writing > (method addResult()). > > Note also that wrapping the Map in a synchronized version does not > prevent CME. Right, the iteration only happens after all the the writing has been completed, so no CME is possible from that. The damage is from the improperly synchronized "puts", causing the data structure to become corrupt.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web