Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: multithreaded cache? Date: Tue, 15 May 2012 08:22:16 -0400 Organization: A noiseless patient Spider Lines: 114 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Tue, 15 May 2012 12:22:18 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="03ebLEozl+tXCe7JS60Feg"; logging-data="26571"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Rb0f2YIJQieAf2WJzikPU" User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 In-Reply-To: Cancel-Lock: sha1:de/wsJOmdxW6LotGZgtVOLH8JEw= Xref: csiph.com comp.lang.java.programmer:14546 On 5/15/2012 5:14 AM, bugbear wrote: > I'm using various Apache Commons Maps as a > multithread cache, protected using > ReentrantReadWriteLock, so that getting() uses a read lock, > and putting() uses a write lock. > > But I've got an issue; in the > case of a cache miss (protected by a read lock), > the required value is acquired using the "underlying function" > that the cache is over; this value is then put() into > the cache (protected by a write lock) > > This is all perfectly thread safe, and gives > correct results. > > However, if the underlying function is slow > and/or resource hungry (consider cacheing > a ray traced image!) many threads can > end up calling the real function (second > and subsequent threads to the first get a miss > during the first threads call to the underlying function). > > "obviously..." what I want is for only > the thread that FIRST has a cache miss > calls the underlying function, whilst other > threads (for the same key) wait. > > This seems "obvious" enough that I'm guessing > there's a standard solution. > > Googling led me to several "heavy" libraries; > > This appears more a locking/cacheing issue > than a Java issue, although my implementation > is in Java. > > Can anyone (please) point me at a > canonical solution/implementation? Don't know whether it's "canonical," but one fairly obvious approach is to insert the missed key in the map immediately, but with a value that means "Coming Soon To A Cache Near You." The original thread can then release its read lock and fetch the true value from its source. Other threads that look for the same value will "hit" the cache, but will recognize the "Coming Soon" as an indication to wait until the first thread comes up with the value. There must be many ways to arrange this, but one that requires very little awareness on the part of the querying threads is to make the hashed "value" a value-holder object. Then a querying thread could do something like mapLock.lockForReading(); ValueHolder holder = map.get(key); mapLock.release(); if (holder == null) { mapLock.lockForWriting(); holder = map.get(key); // inserted by other thread? if (holder == null) { holder = new ValueHolder(key); map.put(key, holder); } mapLock.release(); } Value = holder.getValue(); The ValueHolder class might look like class ValueHolder { private final Key key; private Value cachedValue; private SourceException fetchError; ValueHolder(Key key) { this.key = key; } synchronized Value getValue() throws SourceException { if (cachedValue == null) { if (fetchError != null) { // notify new thread of old error throw fetchError; } try { cachedValue = fetchValueFromSource(key); } catch (SourceException ex) { fetchError = ex; throw fetchError; } } return cachedValue; } } (That's not the only way to deal with errors in fetching, but it seems perfectly reasonable to me even if it's a little odd to throw the same exception object multiple times.) The cache-querying threads lock the map only briefly, long enough to determine whether a ValueHolder is present. ValueHolder can delay its callers for a long time (if not, why cache at all?), but will only delay callers that are interested in that particular key; callers looking for other keys will get different ValueHolder instances and won't be interfered with. And only one thread will actually attempt fetchValueFromSource() on any given key. -- Eric Sosman esosman@ieee-dot-org.invalid