Groups > comp.lang.java.programmer > #14544 > unrolled thread

multithreaded cache?

Started by	bugbear <bugbear@trim_papermule.co.uk_trim>
First post	2012-05-15 10:14 +0100
Last post	2012-06-02 21:54 +0200
Articles	20 on this page of 40 — 12 participants

Back to article view | Back to comp.lang.java.programmer

  multithreaded cache? bugbear <bugbear@trim_papermule.co.uk_trim> - 2012-05-15 10:14 +0100
    Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-15 11:41 +0200
      Re: multithreaded cache? Roedy Green <see_website@mindprod.com.invalid> - 2012-05-15 15:57 -0700
        Re: multithreaded cache? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-15 16:13 -0700
          Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-16 01:22 +0200
        Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-16 01:19 +0200
          Re: multithreaded cache? Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-05-15 20:26 -0400
            Re: multithreaded cache? Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-05-15 21:41 -0300
            Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-16 09:26 +0200
              Re: multithreaded cache? "Mike Schilling" <mscottschilling@hotmail.com> - 2012-05-26 14:54 -0700
          Re: multithreaded cache? Roedy Green <see_website@mindprod.com.invalid> - 2012-05-16 22:03 -0700
            Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-21 15:25 +0200
    Re: multithreaded cache? Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-05-15 08:22 -0400
    Re: multithreaded cache? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-15 09:56 -0700
    Re: multithreaded cache? bugbear <bugbear@trim_papermule.co.uk_trim> - 2012-05-16 14:33 +0100
    Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-17 11:54 +0200
      Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-17 12:06 +0200
        Re: multithreaded cache? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-17 09:51 -0700
          Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-17 20:48 +0200
        Re: multithreaded cache? Sebastian <sebastian@undisclosed.invalid> - 2012-05-20 01:35 +0200
          Re: multithreaded cache? Sebastian <sebastian@undisclosed.invalid> - 2012-05-20 01:48 +0200
            Re: multithreaded cache? Lew <noone@lewscanon.com> - 2012-05-19 18:11 -0700
      Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-18 20:42 +0200
        Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-18 23:45 +0200
          Re: multithreaded cache? Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-05-18 18:31 -0400
            Re: multithreaded cache? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-05-18 22:15 -0700
              Re: multithreaded cache? Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-05-19 07:09 -0400
            Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-19 12:33 +0200
              Re: multithreaded cache? Lew <noone@lewscanon.com> - 2012-05-19 14:24 -0700
                Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-21 21:15 +0200
                  Re: multithreaded cache? Lew <noone@lewscanon.com> - 2012-05-23 23:11 -0700
                    Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-24 23:05 +0200
                      Re: multithreaded cache? Gene Wirchenko <genew@ocis.net> - 2012-05-24 20:11 -0700
                        Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-25 17:42 +0200
                  Re: multithreaded cache? Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-05-25 07:09 -0300
          Re: multithreaded cache? Silvio Bierman <silvio@moc.com> - 2012-05-21 15:21 +0200
    Re: multithreaded cache? Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2012-05-24 23:22 -0700
      Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-05-25 17:36 +0200
        Re: multithreaded cache? Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2012-06-02 12:27 -0700
          Re: multithreaded cache? Robert Klemme <shortcutter@googlemail.com> - 2012-06-02 21:54 +0200

Page 1 of 2 [1] 2 Next page →

#14544 — multithreaded cache?

From	bugbear <bugbear@trim_papermule.co.uk_trim>
Date	2012-05-15 10:14 +0100
Subject	multithreaded cache?
Message-ID	<zsWdnVufD87Egy_SnZ2dnUVZ7oednZ2d@brightview.co.uk>

I'm using various Apache Commons Maps as a
multithread cache, protected using
ReentrantReadWriteLock, so that getting() uses a read lock,
and putting() uses a write lock.

But I've got an issue; in the
case of a cache miss (protected by a read lock),
the required value is acquired using the "underlying function"
that the cache is over; this value is then put() into
the cache (protected by a write lock)

This is all perfectly thread safe, and gives
correct results.

However, if the underlying function is slow
and/or resource hungry (consider cacheing
a ray traced image!) many threads can
end up calling the real function (second
and subsequent threads to the first get a miss
during the first threads call to the underlying function).

"obviously..." what I want is for only
the thread that FIRST has a cache miss
calls the underlying function, whilst other
threads (for the same key) wait.

This seems "obvious" enough that I'm guessing
there's a standard solution.

Googling led me to several "heavy" libraries;

This appears more a locking/cacheing issue
than a Java issue, although my implementation
is in Java.

Can anyone (please) point me at a
canonical solution/implementation?

  BugBear

[toc] | [next] | [standalone]

#14545

From	Silvio Bierman <silvio@moc.com>
Date	2012-05-15 11:41 +0200
Message-ID	<4fb224b2$0$6867$e4fe514c@news2.news.xs4all.nl>
In reply to	#14544

On 05/15/2012 11:14 AM, bugbear wrote:
> I'm using various Apache Commons Maps as a
> multithread cache, protected using
> ReentrantReadWriteLock, so that getting() uses a read lock,
> and putting() uses a write lock.
>
> But I've got an issue; in the
> case of a cache miss (protected by a read lock),
> the required value is acquired using the "underlying function"
> that the cache is over; this value is then put() into
> the cache (protected by a write lock)
>
> This is all perfectly thread safe, and gives
> correct results.
>
> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.
>
> This seems "obvious" enough that I'm guessing
> there's a standard solution.
>
> Googling led me to several "heavy" libraries;
>
> This appears more a locking/cacheing issue
> than a Java issue, although my implementation
> is in Java.
>
> Can anyone (please) point me at a
> canonical solution/implementation?
>
> BugBear


The simplest approach is to wrap your cached values in an object and 
make all cache access go through two stages.

Stage one is to lookup the wrapper in the cache, possibly inserting a 
new (empty) wrapper if not present.

Stage two is to get a lock on the wrapper to get the actual cache value. 
If that is not present (or perhaps stale) then the thread owing the lock 
on the wrapper can recompute the value.

If you cache values that are slow to compute this is usually the way to go.

[toc] | [prev] | [next] | [standalone]

#14553

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2012-05-15 15:57 -0700
Message-ID	<rpn5r7lbt7ckkj0l67n7us7m9srs2plprp@4ax.com>
In reply to	#14545

On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman <silvio@moc.com>
wrote, quoted or indirectly quoted someone who said :

>The simplest approach is to wrap your cached values in an object and 
>make all cache access go through two stages.

Where did you learn this?
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Programmers love to create simplified replacements for HTML. 
They forget that the simplest language is the one you 
already know. They also forget that their simple little 
markup language will bit by bit become even more convoluted 
and complicated than HTML because of the unplanned way it grows.
.

[toc] | [prev] | [next] | [standalone]

#14554

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-05-15 16:13 -0700
Message-ID	<qqBsr.11866$3y3.7423@newsfe20.iad>
In reply to	#14553

On 5/15/12 3:57 PM, Roedy Green wrote:
> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> The simplest approach is to wrap your cached values in an object and
>> make all cache access go through two stages.
>
> Where did you learn this?
I think he meant "an easy efficient approach", rather than "the simplest".

The simplest of course is a global lock for the cache.

[toc] | [prev] | [next] | [standalone]

#14556

From	Silvio Bierman <silvio@moc.com>
Date	2012-05-16 01:22 +0200
Message-ID	<4fb2e52f$0$6854$e4fe514c@news2.news.xs4all.nl>
In reply to	#14554

On 05/16/2012 01:13 AM, Daniel Pitts wrote:
> On 5/15/12 3:57 PM, Roedy Green wrote:
>> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
>> wrote, quoted or indirectly quoted someone who said :
>>
>>> The simplest approach is to wrap your cached values in an object and
>>> make all cache access go through two stages.
>>
>> Where did you learn this?
> I think he meant "an easy efficient approach", rather than "the simplest".
>
> The simplest of course is a global lock for the cache.

Yep, with "simplest" I meant the simplest way to get around the 
mentioned disadvantage of a global lock.

[toc] | [prev] | [next] | [standalone]

#14555

From	Silvio Bierman <silvio@moc.com>
Date	2012-05-16 01:19 +0200
Message-ID	<4fb2e46c$0$6854$e4fe514c@news2.news.xs4all.nl>
In reply to	#14553

On 05/16/2012 12:57 AM, Roedy Green wrote:
> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> The simplest approach is to wrap your cached values in an object and
>> make all cache access go through two stages.
>
> Where did you learn this?

Why do you ask? Do you disagree?

It is something I came up with the first time I encountered the same 
situation, probably many years ago.
It is obvious you need to decouple key lookups/insertions, which are 
inherently cache-global, from value insertions/updates, which could be 
considered key-local.

Eric and Daniel responded with more specific solutions that both match 
my general pattern.

[toc] | [prev] | [next] | [standalone]

#14558

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2012-05-15 20:26 -0400
Message-ID	<jous86$cc7$2@dont-email.me>
In reply to	#14555

On 5/15/2012 7:19 PM, Silvio Bierman wrote:
> On 05/16/2012 12:57 AM, Roedy Green wrote:
>> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
>> wrote, quoted or indirectly quoted someone who said :
>>
>>> The simplest approach is to wrap your cached values in an object and
>>> make all cache access go through two stages.
>>
>> Where did you learn this?
>
> Why do you ask? Do you disagree?
>
> It is something I came up with the first time I encountered the same
> situation, probably many years ago.
> It is obvious you need to decouple key lookups/insertions, which are
> inherently cache-global, from value insertions/updates, which could be
> considered key-local.
>
> Eric and Daniel responded with more specific solutions that both match
> my general pattern.

     Three independent votes for one pattern: What we have here is a

	M  A  N  D  A  T  E  !!!

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#14559

From	Arved Sandstrom <asandstrom3minus1@eastlink.ca>
Date	2012-05-15 21:41 -0300
Message-ID	<DICsr.2195$9Q6.238@newsfe18.iad>
In reply to	#14558

On 12-05-15 09:26 PM, Eric Sosman wrote:
> On 5/15/2012 7:19 PM, Silvio Bierman wrote:
>> On 05/16/2012 12:57 AM, Roedy Green wrote:
>>> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
>>> wrote, quoted or indirectly quoted someone who said :
>>>
>>>> The simplest approach is to wrap your cached values in an object and
>>>> make all cache access go through two stages.
>>>
>>> Where did you learn this?
>>
>> Why do you ask? Do you disagree?
>>
>> It is something I came up with the first time I encountered the same
>> situation, probably many years ago.
>> It is obvious you need to decouple key lookups/insertions, which are
>> inherently cache-global, from value insertions/updates, which could be
>> considered key-local.
>>
>> Eric and Daniel responded with more specific solutions that both match
>> my general pattern.
> 
>     Three independent votes for one pattern: What we have here is a
> 
>     M  A  N  D  A  T  E  !!!
> 
Dunno about that, but I did go out and buy a lottery ticket...

AHS
-- 
Never interrupt your enemy when he is making a mistake.
--Napoleon

[toc] | [prev] | [next] | [standalone]

#14561

From	Silvio Bierman <silvio@moc.com>
Date	2012-05-16 09:26 +0200
Message-ID	<4fb35691$0$6944$e4fe514c@news2.news.xs4all.nl>
In reply to	#14558

On 05/16/2012 02:26 AM, Eric Sosman wrote:
> On 5/15/2012 7:19 PM, Silvio Bierman wrote:
>> On 05/16/2012 12:57 AM, Roedy Green wrote:
>>> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
>>> wrote, quoted or indirectly quoted someone who said :
>>>
>>>> The simplest approach is to wrap your cached values in an object and
>>>> make all cache access go through two stages.
>>>
>>> Where did you learn this?
>>
>> Why do you ask? Do you disagree?
>>
>> It is something I came up with the first time I encountered the same
>> situation, probably many years ago.
>> It is obvious you need to decouple key lookups/insertions, which are
>> inherently cache-global, from value insertions/updates, which could be
>> considered key-local.
>>
>> Eric and Daniel responded with more specific solutions that both match
>> my general pattern.
>
> Three independent votes for one pattern: What we have here is a
>
> M A N D A T E !!!
>

Let's hope this is inspires the Greeks...

[toc] | [prev] | [next] | [standalone]

#14823

From	"Mike Schilling" <mscottschilling@hotmail.com>
Date	2012-05-26 14:54 -0700
Message-ID	<jprjfe$1je$1@dont-email.me>
In reply to	#14561

"Silvio Bierman" <silvio@moc.com> wrote in message 
news:4fb35691$0$6944$e4fe514c@news2.news.xs4all.nl...
> On 05/16/2012 02:26 AM, Eric Sosman wrote:
>> On 5/15/2012 7:19 PM, Silvio Bierman wrote:
>>> On 05/16/2012 12:57 AM, Roedy Green wrote:
>>>> On Tue, 15 May 2012 11:41:06 +0200, Silvio Bierman<silvio@moc.com>
>>>> wrote, quoted or indirectly quoted someone who said :
>>>>
>>>>> The simplest approach is to wrap your cached values in an object and
>>>>> make all cache access go through two stages.
>>>>
>>>> Where did you learn this?
>>>
>>> Why do you ask? Do you disagree?
>>>
>>> It is something I came up with the first time I encountered the same
>>> situation, probably many years ago.
>>> It is obvious you need to decouple key lookups/insertions, which are
>>> inherently cache-global, from value insertions/updates, which could be
>>> considered key-local.
>>>
>>> Eric and Daniel responded with more specific solutions that both match
>>> my general pattern.
>>
>> Three independent votes for one pattern: What we have here is a
>>
>> M A N D A T E !!!
>>
>
> Let's hope this is inspires the Greeks...

They've been dating men for millenia.

[toc] | [prev] | [next] | [standalone]

#14568

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2012-05-16 22:03 -0700
Message-ID	<rh19r7th8q0r7cvvbn9ldmd2forprodisi@4ax.com>
In reply to	#14555

On Wed, 16 May 2012 01:19:07 +0200, Silvio Bierman <silvio@moc.com>
wrote, quoted or indirectly quoted someone who said :

>
>Why do you ask? Do you disagree?

No. I thought there were two possibilities.  There is a some great
book you read or website that I should reference in the Java glossary,
or that you figured this out by exhaustive experiment, in which case I
should consider worship.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Plants" with "leaves" no more efficient than today’s solar cells 
could out-compete real plants, crowding the biosphere with an
inedible foliage. Tough omnivorous "bacteria" could out-compete 
real bacteria: They could spread like blowing pollen, replicate 
swiftly, and reduce the biosphere to dust in a matter of days.
Dangerous replicators could easily be too tough, small, and 
rapidly spreading to stop -- at least if we make no preparation. 
We have trouble enough controlling viruses and fruit flies. 
~ Eric Drexler (born: 1955-04-25 age: 57)  
Engines of Creation: The Coming Era of Nanotechnology.
.

[toc] | [prev] | [next] | [standalone]

#14735

From	Silvio Bierman <silvio@moc.com>
Date	2012-05-21 15:25 +0200
Message-ID	<4fba4237$0$6846$e4fe514c@news2.news.xs4all.nl>
In reply to	#14568

On 05/17/2012 07:03 AM, Roedy Green wrote:
> On Wed, 16 May 2012 01:19:07 +0200, Silvio Bierman<silvio@moc.com>
> wrote, quoted or indirectly quoted someone who said :
>
>>
>> Why do you ask? Do you disagree?
>
> No. I thought there were two possibilities.  There is a some great
> book you read or website that I should reference in the Java glossary,
> or that you figured this out by exhaustive experiment, in which case I
> should consider worship.

Actually, exhaustive experiment would be a huge overstatement. It was 
the second thing that came up after I immediately discarded the first 
impulse of a global lock.

Some people would call that premature optimization...

[toc] | [prev] | [next] | [standalone]

#14546

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2012-05-15 08:22 -0400
Message-ID	<jothpq$pub$1@dont-email.me>
In reply to	#14544

On 5/15/2012 5:14 AM, bugbear wrote:
> I'm using various Apache Commons Maps as a
> multithread cache, protected using
> ReentrantReadWriteLock, so that getting() uses a read lock,
> and putting() uses a write lock.
>
> But I've got an issue; in the
> case of a cache miss (protected by a read lock),
> the required value is acquired using the "underlying function"
> that the cache is over; this value is then put() into
> the cache (protected by a write lock)
>
> This is all perfectly thread safe, and gives
> correct results.
>
> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.
>
> This seems "obvious" enough that I'm guessing
> there's a standard solution.
>
> Googling led me to several "heavy" libraries;
>
> This appears more a locking/cacheing issue
> than a Java issue, although my implementation
> is in Java.
>
> Can anyone (please) point me at a
> canonical solution/implementation?

     Don't know whether it's "canonical," but one fairly obvious
approach is to insert the missed key in the map immediately, but
with a value that means "Coming Soon To A Cache Near You."  The
original thread can then release its read lock and fetch the true
value from its source.

     Other threads that look for the same value will "hit" the
cache, but will recognize the "Coming Soon" as an indication to
wait until the first thread comes up with the value.

     There must be many ways to arrange this, but one that requires
very little awareness on the part of the querying threads is to
make the hashed "value" a value-holder object.  Then a querying
thread could do something like

	mapLock.lockForReading();
	ValueHolder holder = map.get(key);
	mapLock.release();

	if (holder == null) {
	    mapLock.lockForWriting();
	    holder = map.get(key);  // inserted by other thread?
	    if (holder == null) {
	        holder = new ValueHolder(key);
	        map.put(key, holder);
	    }
	    mapLock.release();
	}

	Value = holder.getValue();

The ValueHolder class might look like

	class ValueHolder {
	    private final Key key;
	    private Value cachedValue;
	    private SourceException fetchError;

	    ValueHolder(Key key) {
	        this.key = key;
	    }

	    synchronized
	    Value getValue() throws SourceException {
	        if (cachedValue == null) {
	            if (fetchError != null) {
	                // notify new thread of old error
	                throw fetchError;
	            }
	            try {
	                cachedValue = fetchValueFromSource(key);
	            } catch (SourceException ex) {
	                fetchError = ex;
	                throw fetchError;
	            }
	        }
	        return cachedValue;
	    }
	}

(That's not the only way to deal with errors in fetching, but it
seems perfectly reasonable to me even if it's a little odd to throw
the same exception object multiple times.)

     The cache-querying threads lock the map only briefly, long
enough to determine whether a ValueHolder is present.  ValueHolder
can delay its callers for a long time (if not, why cache at all?),
but will only delay callers that are interested in that particular
key; callers looking for other keys will get different ValueHolder
instances and won't be interfered with.  And only one thread will
actually attempt fetchValueFromSource() on any given key.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#14547

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-05-15 09:56 -0700
Message-ID	<bVvsr.9500$TC4.4320@newsfe14.iad>
In reply to	#14544

On 5/15/12 2:14 AM, bugbear wrote:
> I'm using various Apache Commons Maps as a
> multithread cache, protected using
> ReentrantReadWriteLock, so that getting() uses a read lock,
> and putting() uses a write lock.
>
> But I've got an issue; in the
> case of a cache miss (protected by a read lock),
> the required value is acquired using the "underlying function"
> that the cache is over; this value is then put() into
> the cache (protected by a write lock)
>
> This is all perfectly thread safe, and gives
> correct results.
>
> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.
>
> This seems "obvious" enough that I'm guessing
> there's a standard solution.
>
> Googling led me to several "heavy" libraries;
>
> This appears more a locking/cacheing issue
> than a Java issue, although my implementation
> is in Java.
>
> Can anyone (please) point me at a
> canonical solution/implementation?
>
> BugBear

What I think I would do in this situation is have the "write" operation 
be fast. It would put in a "FutureTask<V>" object, which hasn't been run 
yet.

The FutureTask<V> class would basically be a "lazy loader".  The reason 
to do this is to be able to release the lock on the Map, while still 
blocking the return of the get until the value is ready.

public V get(K key) throw ExecutionException, InterruptedException {
     readLock.lockInterruptibly();
     try {
         final Future<V> future = map.get(key);
         if (future != null) { return future.getValue(); }
     } finally {
       readLock.unlock();
     }
     final FutureTask<V> future;

     writeLock.lockInterruptibly();
     try {
         // We need to double check, to make sure no one else
         // has added the future.
         final FutureTask<V> cachedFuture = map.get(key);
         if (cachedFuture != null) {
           future = cachedFuture;
         } else {
           future = new FutureTask<V>(new 
UnderlyingFunctionCallable<V>(key));
           map.put(key, future);
         }

     } finally {
      writeLock.unlock();
     }
     future.run();
     return future.getValue();
}


Note, since the "get" and "put" operations should be fairly fast, a 
read/write lock may be over-kill, and the whole thing could be simplified:

public V get(K key) throw ExecutionException, InterruptedException {
    synchronize(mySync) {
         final FutureTask<V> cachedFuture = map.get(key);
         if (cachedFuture != null) {
           future = cachedFuture;
         } else {
           future = new FutureTask<V>(new 
UnderlyingFunctionCallable<V>(key));
           map.put(key, future);
         }

     }
     future.run();
     return future.getValue();
}

This basically gives you "per key" synchronization, with the "whole map" 
synch being only for an O(1) operation of "check for key, add place-hold 
if absent".

[toc] | [prev] | [next] | [standalone]

#14563

From	bugbear <bugbear@trim_papermule.co.uk_trim>
Date	2012-05-16 14:33 +0100
Message-ID	<_P2dnU4rLbwxMS7SnZ2dnUVZ8uGdnZ2d@brightview.co.uk>
In reply to	#14544

bugbear wrote:
>
> Can anyone (please) point me at a
> canonical solution/implementation?

The answer is "yes" !!

Thanks to the members for some
great thoughts, analysis
and reasoned discussion.

  BugBear

[toc] | [prev] | [next] | [standalone]

#14571

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-05-17 11:54 +0200
Message-ID	<a1k06fFtdtU1@mid.individual.net>
In reply to	#14544

On 05/15/2012 11:14 AM, bugbear wrote:
> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.

I provide a variant of Silvio's, Eric's and Daniel's solution which 
should yield higher throughput because it works without read write 
locking.  You can find it as gist in case the code is garbled in the 
newsgroup posting:
https://gist.github.com/2717818

Kind regards

	robert


The code (untested):

package clj;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

/**
  * The cache works with as few locking as possible. Lookup is done in 
two steps
  * on cache miss:
  * <ol>
  * <li>On a cache miss a retriever is inserted into the cache which 
will obtain
  * the value synchronized from a {@link Calculator}.</li>
  * <li>Once calculation has finished a simple lock free reference to 
the value
  * replaces the retriever in the cache and the value is returned.</li>
  * </ol>
  *
  * @author robert klemme
  *
  * @param <K>
  *            key type
  * @param <V>
  *            value type
  */
public final class LazyCache<K, V> {
     /**
      * Calculate values from given keys.
      *
      * @param <K>
      *            key type
      * @param <V>
      *            value type
      */
     public interface Calculator<K, V> {
         V get(K key);
     }

     /**
      * Obtain a value.
      *
      * @param <V>
      *            value type.
      */
     private interface Reference<V> {
         V get();
     }

     /**
      * Stupid simple reference which only hands out a fixed value all 
the time
      * without synchronization.
      *
      * @param <V>
      *            value type.
      */
     private static final class Ref<V> implements Reference<V> {
         private final V val;

         public Ref(V val) {
             this.val = val;
         }

         @Override
         public V get() {
             return val;
         }
     }

     /** Mapping from keys to objects which yield values. */
     private final ConcurrentMap<K, Reference<V>> map = new 
ConcurrentHashMap<K, Reference<V>>();

     /** User provided. */
     private final Calculator<K, V> calc;

     /**
      * Create a cache.
      *
      * @param calc
      *            user must provide a reasonable implementation, not
      *            <code>null</code>.
      */
     public LazyCache(final Calculator<K, V> calc) {
         if (calc == null)
             throw new NullPointerException();
         this.calc = calc;
     }

     /**
      * Get a value from the cache. The value might have to be calculated.
      *
      * @param key
      *            lookup key.
      * @return value, might even be <code>null</code> depending on 
algorithm.
      */
     public V get(final K key) {
         Reference<V> ref = map.get(key);

         if (ref == null) {
             // miss
             ref = new Reference<V>() {
                 @Override
                 public synchronized V get() {
                     final V val = calc.get(key);
                     // next time lock free access:
                     Reference<V> x = map.put(key, new Ref<V>(val));
                     assert x == this;
                     return val;
                 }
             };

             final Reference<V> old = map.putIfAbsent(key, ref);

             if (old != null)
                 ref = old; // someone else was faster
         }

         return ref.get();
     }
}

[toc] | [prev] | [next] | [standalone]

#14572

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-05-17 12:06 +0200
Message-ID	<a1k0taF206U1@mid.individual.net>
In reply to	#14571

On 05/17/2012 11:54 AM, Robert Klemme wrote:
> On 05/15/2012 11:14 AM, bugbear wrote:
>> However, if the underlying function is slow
>> and/or resource hungry (consider cacheing
>> a ray traced image!) many threads can
>> end up calling the real function (second
>> and subsequent threads to the first get a miss
>> during the first threads call to the underlying function).
>>
>> "obviously..." what I want is for only
>> the thread that FIRST has a cache miss
>> calls the underlying function, whilst other
>> threads (for the same key) wait.
>
> I provide a variant of Silvio's, Eric's and Daniel's solution which
> should yield higher throughput because it works without read write
> locking. You can find it as gist in case the code is garbled in the
> newsgroup posting:
> https://gist.github.com/2717818

There was one important detail missing.  This is the corrected code 
(gist is updated as well):

package clj;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

/**
  * The cache works with as few locking as possible. Lookup is done in 
two steps
  * on cache miss:
  * <ol>
  * <li>On a cache miss a retriever is inserted into the cache which 
will obtain
  * the value synchronized from a {@link Calculator}.</li>
  * <li>Once calculation has finished a simple lock free reference to 
the value
  * replaces the retriever in the cache and the value is returned.</li>
  * </ol>
  *
  * @author robert klemme
  *
  * @param <K>
  *            key type
  * @param <V>
  *            value type
  */
public final class LazyCache<K, V> {
     /**
      * Calculate values from given keys.
      *
      * @param <K>
      *            key type
      * @param <V>
      *            value type
      */
     public interface Calculator<K, V> {
         V get(K key);
     }

     /**
      * Obtain a value.
      *
      * @param <V>
      *            value type.
      */
     private interface Reference<V> {
         V get();
     }

     /**
      * Stupid simple reference which only hands out a fixed value all 
the time
      * without synchronization.
      *
      * @param <V>
      *            value type.
      */
     private static final class Ref<V> implements Reference<V> {
         private final V val;

         public Ref(V val) {
             this.val = val;
         }

         @Override
         public V get() {
             return val;
         }
     }

     /** Mapping from keys to objects which yield values. */
     private final ConcurrentMap<K, Reference<V>> map = new 
ConcurrentHashMap<K, Reference<V>>();

     /** User provided. */
     private final Calculator<K, V> calc;

     /**
      * Create a cache.
      *
      * @param calc
      *            user must provide a reasonable implementation, not
      *            <code>null</code>.
      */
     public LazyCache(final Calculator<K, V> calc) {
         if (calc == null)
             throw new NullPointerException();
         this.calc = calc;
     }

     /**
      * Get a value from the cache. The value might have to be calculated.
      *
      * @param key
      *            lookup key.
      * @return value, might even be <code>null</code> depending on 
algorithm.
      */
     public V get(final K key) {
         Reference<V> ref = map.get(key);

         if (ref == null) {
             // miss
             ref = new Reference<V>() {
                 private V val;
                 private boolean here = false; // superfluous but explicit

                 @Override
                 public synchronized V get() {
                     if (!here) {
                         val = calc.get(key);
                         here = true;
                         // next time lock free access:
                         Reference<V> x = map.put(key, new Ref<V>(val));
                         assert x == this;
                     }

                     return val;
                 }
             };

             final Reference<V> old = map.putIfAbsent(key, ref);

             if (old != null)
                 ref = old; // someone else was faster
         }

         return ref.get();
     }
}

[toc] | [prev] | [next] | [standalone]

#14583

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-05-17 09:51 -0700
Message-ID	<v0atr.23997$_l.2497@newsfe15.iad>
In reply to	#14572

On 5/17/12 3:06 AM, Robert Klemme wrote:
> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>> On 05/15/2012 11:14 AM, bugbear wrote:
>>> However, if the underlying function is slow
>>> and/or resource hungry (consider cacheing
>>> a ray traced image!) many threads can
>>> end up calling the real function (second
>>> and subsequent threads to the first get a miss
>>> during the first threads call to the underlying function).
>>>
>>> "obviously..." what I want is for only
>>> the thread that FIRST has a cache miss
>>> calls the underlying function, whilst other
>>> threads (for the same key) wait.
>>
>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>> should yield higher throughput because it works without read write
>> locking. You can find it as gist in case the code is garbled in the
>> newsgroup posting:
>> https://gist.github.com/2717818
>
> There was one important detail missing. This is the corrected code (gist
> is updated as well):
[snip]
>  if (!here) {
>   val = calc.get(key);
>   here = true;
>   // next time lock free access:
>   Reference<V> x = map.put(key, new Ref<V>(val));
>   assert x == this;
>  }
This assert (while true) is unnecessary and potentially harmful. 
Nothing becomes inconsistent if the value you replace is not the 
calculating reference. However, your assert may break at runtime if 
something else changes about this code.  Assert should be used to 
validate consistency.

I like this class, though I have a couple of style comments.

I personally would name "Ref" as "ConstantReference", and your anonymous 
Reference implementation I would make a private static class 
CalculatingReference.

In my own personal library of utilities, I actually have two interfaces 
"Factory<T>" with a "T create()" method and "ParameterizedFactory<T,P>" 
with a "T create(P parameter)".  Including a few interesting 
implementations of those.  In my case, I would probably implement this 
cache as a ParameterizedFactory impl that takes another 
ParameterizedFactory as the "calc" field.

[toc] | [prev] | [next] | [standalone]

#14590

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-05-17 20:48 +0200
Message-ID	<a1kvglF557U1@mid.individual.net>
In reply to	#14583

On 17.05.2012 18:51, Daniel Pitts wrote:
> On 5/17/12 3:06 AM, Robert Klemme wrote:
>> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>>> On 05/15/2012 11:14 AM, bugbear wrote:
>>>> However, if the underlying function is slow
>>>> and/or resource hungry (consider cacheing
>>>> a ray traced image!) many threads can
>>>> end up calling the real function (second
>>>> and subsequent threads to the first get a miss
>>>> during the first threads call to the underlying function).
>>>>
>>>> "obviously..." what I want is for only
>>>> the thread that FIRST has a cache miss
>>>> calls the underlying function, whilst other
>>>> threads (for the same key) wait.
>>>
>>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>>> should yield higher throughput because it works without read write
>>> locking. You can find it as gist in case the code is garbled in the
>>> newsgroup posting:
>>> https://gist.github.com/2717818
>>
>> There was one important detail missing. This is the corrected code (gist
>> is updated as well):
> [snip]
>> if (!here) {
>> val = calc.get(key);
>> here = true;
>> // next time lock free access:
>> Reference<V> x = map.put(key, new Ref<V>(val));
>> assert x == this;
>> }
> This assert (while true) is unnecessary and potentially harmful. Nothing
> becomes inconsistent if the value you replace is not the calculating
> reference.

The algorithm works under the assumption that the first element placed 
into the map is the one doing the calculation and only after that the 
constant reference is inserted.  If that is not the case (e.g. because 
another calculator is put into the map) then obviously something must be 
wrong and we might even have more than one lengthy calculation of the 
cached value.  This assumption is checked with the assert which also 
serves as documentation.

> However, your assert may break at runtime if something else
> changes about this code. Assert should be used to validate consistency.

Well, if the logic is changed then of course asserts need to change, 
too.  There is nothing strange or harmful about that.  With your 
argument no asserts would make sense because they would need to be 
changed if a class is changed.  Note that internal class invariants can 
change without changing the observable invariant.  So even though a 
class "looks" the same and behaves the same from the outside asserts 
might have to be changed if internal logic of the class changes.

> I like this class, though I have a couple of style comments.

Thank you!

> I personally would name "Ref" as "ConstantReference",

Well, I was lazy typing and the class isn't public anyway.  This is not 
production code but an illustrating example for cljp.

> and your anonymous
> Reference implementation I would make a private static class
> CalculatingReference.

Why?  You would need to either pass in a reference to LazyCache or pass 
map and calc - plus declaring a constructor and type parameters.  In 
this case I don't really see the benefit.

> In my own personal library of utilities, I actually have two interfaces
> "Factory<T>" with a "T create()" method and "ParameterizedFactory<T,P>"
> with a "T create(P parameter)". Including a few interesting
> implementations of those. In my case, I would probably implement this
> cache as a ParameterizedFactory impl that takes another
> ParameterizedFactory as the "calc" field.

That could be done.  But then method naming could be considered 
misleading because the create() of the cache isn't really a create in 
all cases.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#14658

From	Sebastian <sebastian@undisclosed.invalid>
Date	2012-05-20 01:35 +0200
Message-ID	<jp9anv$a6a$1@news.albasani.net>
In reply to	#14572

Am 17.05.2012 12:06, schrieb Robert Klemme:
> On 05/17/2012 11:54 AM, Robert Klemme wrote:
>> On 05/15/2012 11:14 AM, bugbear wrote:
>>> However, if the underlying function is slow
>>> and/or resource hungry (consider cacheing
>>> a ray traced image!) many threads can
>>> end up calling the real function (second
>>> and subsequent threads to the first get a miss
>>> during the first threads call to the underlying function).
>>>
>>> "obviously..." what I want is for only
>>> the thread that FIRST has a cache miss
>>> calls the underlying function, whilst other
>>> threads (for the same key) wait.
>>
>> I provide a variant of Silvio's, Eric's and Daniel's solution which
>> should yield higher throughput because it works without read write
>> locking. You can find it as gist in case the code is garbled in the
>> newsgroup posting:
>> https://gist.github.com/2717818
>
> There was one important detail missing. This is the corrected code (gist
> is updated as well):
>
[snip]
The important detail obiously is the extra check with the "here" flag.
Would you mind explaining why that is necessary? Everyone seems to have
got it, but I must admit I haven't. Why is it not enough that
Reference#get() is synchronized? As you yourself have missed this at
your first attempt, I hope the reason isn't trivial...
-- Sebastian

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

multithreaded cache?

Contents

#14544 — multithreaded cache?

#14545

#14553

#14554

#14556

#14555

#14558

#14559

#14561

#14823

#14568

#14735

#14546

#14547

#14563

#14571

#14572

#14583

#14590

#14658