Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: "aminer" Newsgroups: comp.programming.threads,comp.programming Subject: Reader-Wrioter lock implementation and Distributed Reader-Writer Mutex Date: Fri, 14 Sep 2012 19:16:06 -0500 Organization: A noiseless patient Spider Lines: 257 Message-ID: Injection-Date: Fri, 14 Sep 2012 23:16:20 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="c43ca82f9e8d62a602307fe9d2e9b807"; logging-data="2136"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+U7SgsIqkHyjOdox+qvSiY" X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512 X-RFC2646: Format=Flowed; Original X-Newsreader: Microsoft Outlook Express 6.00.2900.5512 Cancel-Lock: sha1:26rZ+ao4un1eylxTxmqB/Ofo7OA= X-Priority: 3 X-MSMail-Priority: Normal Xref: csiph.com comp.programming.threads:1072 comp.programming:2216 Hello, I have implemented the Object Pascal Distributed Reader-Writer Mutex based on the C++ Distributed Reader-Writer Mutex by Dmitry Vyukov, in this post i will discuss the TomniMREW reader-writer lock by Primoz Gabrijelcic. Lets take a look at the source code and my explaination will follow: ==== unit MREW; {$IFDEF FPC} {$ASMMODE intel} {$ENDIF FPC} interface uses sysutils, classes; {$I defines.inc} type {$IFDEF CPU64} int = int64; {$ENDIF CPU64} {$IFDEF CPU32} int = integer; {$ENDIF CPU32} typecache1 = array[0..15] of longword; TOmniMREW = class private omrewReference: int; //Reference.Bit0 is 'writing in progress' flag // cache:typecache1; public procedure EnterReadLock; procedure EnterWriteLock; procedure ExitReadLock; procedure ExitWriteLock; end; { TOmniMREW } implementation { TOmniMREW } function LockedExchangeAdd(var Target: Int; Value: Int): Integer; asm {$IFDEF CPU32} // --> EAX Target // EDX Value // <-- EAX Result MOV ECX, EAX MOV EAX, EDX // ECX Target // EAX Value LOCK XADD [ECX], EAX {$ENDIF CPU32} {$IFDEF CPU64} // --> RCX Target // EDX Value // <-- EAX Result MOV RAX, EDX // RCX Target // RAX Value LOCK XADD [RCX], RAX {$ENDIF CPU64} end; function CAS(var Target:int;Comp ,Exch : int): boolean;assembler;stdcall; asm {$IFDEF CPU64} mov rax, comp lock cmpxchg [Target], Exch setz al {$ENDIF CPU64} {$IFDEF CPU32} mov eax, comp mov ecx,Target mov edx,exch lock cmpxchg [ecx], edx setz al {$ENDIF CPU32} end; { CAS } ? procedure TOmniMREW.EnterReadLock; var currentReference: int; begin //Wait on writer to reset write flag so Reference.Bit0 must be 0 than increase Reference repeat currentReference := omrewReference AND NOT 1; until CAS(omrewReference, currentReference,currentReference + 2); end; { TOmniMREW.EnterReadLock } procedure TOmniMREW.EnterWriteLock; var currentReference: int; begin //Wait on writer to reset write flag so omrewReference.Bit0 must be 0 then set omrewReference.Bit0 repeat currentReference := omrewReference AND NOT 1; until CAS(omrewReference, currentReference,currentReference + 1); //Now wait on all readers repeat until omrewReference = 1; end; { TOmniMREW.EnterWriteLock } procedure TOmniMREW.ExitReadLock; begin //Decrease omrewReference LockedExchangeAdd(omrewReference, -2); end; { TOmniMREW.ExitReadLock } procedure TOmniMREW.ExitWriteLock; begin omrewReference := 0; end; { TOmniMREW.ExitWriteLock } end. === This method is using the following technique: currentReference := omrewReference AND NOT 1; So if omrewReference is 0 or multiple of 2 and is an even number the reader can enter the MREW lock and increment omrewReference by -2 , or the writer will enter and increment omrewReference by 1 and this will stop the reader and the writer from crossing the CAS, and after that the writer will wait on the reader to exit by using a repeat loop like this: repeat until omrewReference = 1; and as you have noticed with this method the writers will not starve forever, and thie MREW lock is very fast also, but even if it's very fast it doesn't scale cause there is a single point of access on the CAS and this can cause a lot of contention , also the inter-thread communication can be expensive if the reader's time under the MREW lock is not so signifant.. and about Distributed Reader-Writer Mutex, based on the Dmitry Vyukov C++ Distributed Reader-Writer Mutex , I have included the following Reader-Writer Mutexes inside this Distributed Reader-Writer mutex: TOmniMREW a lightweight MREW that is very fast and TMultiReadExclusiveWrite from JCL and now both of them can scale better with Distributed Reader-Writer Mutex, and i have modified the Dmitry Vyukov Distributed Reader-Writer Mutex, in the first version i have not used GetCurrentProcessor() but i have used GetCurrentThreadID(), and i have also provided you with a second version that scales better, to be ableto use the second version please use the version2 in defines.inc, i have given you a test.pas example for the first version and test1.pas for the second version, but don't forget to use version2 inside defines.inc, to use the second version just uncomment the version2 inside defines.inc and comment version1. I have also done a cache line alignement in TOmniMREW, this has allowed Drwlock to scale better. I have provided you with the source code, please take a look at the source code to understand better.The Object Pascal Distributed Reader-Writer Mutex is based on the following C++ Distributed Reader-Writer Mutex by Dmitry Vyukov, read more here: http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex I have also modified the Dmitry Vyukov's Distributed Reader-Writer Mutex to use a variable number of MREWs, you can pass the number of MREWs to the constructor like this: drw:=TDRWLOCK.create(100); You have four methods: procedure wlock; // same as EnterWriteLock of TOmniMREW procedure wunlock; // same as ExitWriteLock procedure rlock; // same as EnterReadLock procedure runlock; // same as ExitReadLock and you have to pass the number of MREWs(multiple-readers-exclusive-writer) to the constructor like this: drw:=TDRWLOCK.create(200); // here we are creating 200 MEWs Here is some scalability numbers: I have used TOmniMREW of the Omnithread library and used only EnterReadLock() and ExitReadLock() with four threads on a quad cores and TOmniMREW gave a negative scalability of -5.51x And when i have used the second version of Distributed Reader-Writer Mutex using only rlock() and runlock() , it gave me +3.94x scalability with four threads on four cores. So now it's scaling. And about the second version , don't forget to initialize the number that you pass to rlock() and runlock() to 0 before calling rlock() and runlock() . In the previous versions i have aligned the array elements on cache line bounderies like have done Dmitry Vyukov, and it didn't work correctly when i have tested the second version, so i have thought about that and after that i have decided to not align the array elements on cache line bounderied but just add a cache line padding to TOmniMREW for example and this time it has worked perfectly and now the second version is scaling perfectly.. And if you have noticed , there is still a weakness with the Dmitry Vyukov C++ Distributed Reader-Writer Mutex, cause since he is using GetCurrentProcessorNumber() he is limiting the array of rwlocks to the number of avaiblable cores and this is not good i think , cause if you have many more threads than the avaiblable cores and there is high contention this will cause the performance to degrade, so i have decided to change that in my implementation and i have used a variable number of rwlocks/MREWs so that you can lower more the contention and this is better for scalability. .You can download Distributed Reader-Writer Mutex version 1.03 from: http://pages.videotron.com/aminer/ Thank you, Amine Moulay Ramdane.