Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.programming > #1602

Lockfree_mpmc version 1.12 ...

From "aminer" <aminer@videotron.ca>
Newsgroups comp.programming.threads, comp.programming
Subject Lockfree_mpmc version 1.12 ...
Date 2012-05-19 19:55 -0500
Organization A noiseless patient Spider
Message-ID <jp9bu4$plq$1@dont-email.me> (permalink)

Cross-posted to 2 groups.

Show all headers | View raw


Hello,


Look at: http://www.emadar.com/fpc/lockfree.htm


In the pop() side of this lockfree quueue we have:
-------------- 

function gFLQueue.pop(var tm:_R):boolean;
var newhead, lastHead : integer;

begin
repeat
 lastHead:=head;
  if tail<>head
   then
     begin
       pointer(newHead):=interlockedCompareExchange(pointer(head),pointer(lastHead
       +1),pointer(lasthead));
 if newHead=lastHead
  then
    begin
    tm:=getObject(lastHead);
    result:=true; exit;
end; end
else
begin
 result:=false;
 exit;
end;
until false;
end;
end.
---- 

If you have noticed, after he is incrementing head with an
interlockedCompareExchange(), he is doing a tm:=getObject(lastHead);
and this is an error i think, cause if the thread
is prempeted and another thread have put another value in the same
place as lashead, the value willl be invalid and corrupted.
I have corrected this problem in my lockfree_mpmc fifo queue version
1.12

So, becarefull with those lockfree algorithms.

And i have tried to enhance  my lockfree_mpmc and i have
used  an exponential backoff in the pop side of the algorithm and it
gives better performance under contention with  28.317 millions pop 
transations
per second.

The queue with a windows critical section on both the push and the pop
sides of the algorithm scored very bad under contention with
4 threads. Here is the benchmarks using the windows critical section:
Only 370692 push transactions per second
and only 503024 push transactions per second

Now the list of memory ordering guarantees in the Intel x86 memory model:
Loads are not reordered with other loads.
Stores are not reordered with other stores.
Stores are not reordered with older loads.
In a multiprocessor system, memory ordering obeys causality (memory
ordering respects transitive visibility).
In a multiprocessor system, stores to the same location have a total
order.

In a multiprocessor system, locked instructions have a total order.
Loads and stores are not reordered with locked instructions.
But Loads may be reordered with older stores to different locations
But what is important for us is MEMORY VISIBILITY across threads..

So, the only variables in lockfree_mpmc that are used across threads
are tail and head.. And since in a multiprocessor system, locked 
instructions
have a total order. So, the lock cmpxchg (inside the CAS) in the push() side
of lockfree_mpmc does in fact achieve sequential consistency and

LockedIncLong(temp) in the push side of lockfree_mpmc 1.12 also
achieve sequential consistency

And tail and head inside lockfree_mpmc.pas are properly aligned to avoid 
false sharing...

And of course i was speaking about sequential consistency in the x86 
architecture..
Hello,

Now how to verify that lockfree_mpmc is correct ?

You can for example reason like this

First , in the source code i am using a margin of
1000 threads, so you can not have more than 1000 threads
accessing the lockfree queue at the same time...

And here is the ObjectPascal source code of the push() side:

---
function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
var lasttail,newtemp:longword;
begin
if getlength >= fsize
then
begin
result:=false;
exit;
end;
result:=true;
newTemp:=LockedIncLong(temp);
lastTail:=newTemp-1;
setObject(lastTail,tm);
repeat
if CAS(tail,lasttail,newtemp)
then
begin
exit;
end;
asm pause end;
until false;
end;
----

So, let say the size of the bounded lockfree_mpmc queue is 1000 ,
and imagine that the threads are executing the "if getlength >= fsize "
all at the same time, and imagine that the getlength() returns 999, so, the
"if getlength >= fsize" will returns false , and since we have
fSize:=(1 shl aPower) - margin inside the constructor, with a margin of
1000 (margin must be >= to the number of threads) , there will be no 
overflow .
And here is the source code of the pop() side:

----
function TLockfree_MPMC.pop(var obj:tNodeQueue):boolean;
var lastHead : longword;
begin
repeat
lastHead:=head;
if tail<>head
then
begin
obj:=getObject(lastHead);
if CAS(head,lasthead,lasthead+1)
then
begin
result:=true;
exit;
end
else backoff2.delay;
end
else
begin
result:=false;
exit;
end;
until false;
end;
----

As you have noticed, there is a test like this:
if CAS(head,lasthead,lasthead+1) , and this test
avoids something like the ABA problem, cause if head
have changed , the CAS() will fail.


You  can download lockfree_mpmc version 1.12 from:

http://pages.videotron.com/aminer/



Amine Moulay Ramdane.





Back to comp.programming | Previous | Next | Find similar


Thread

Lockfree_mpmc version 1.12 ... "aminer" <aminer@videotron.ca> - 2012-05-19 19:55 -0500

csiph-web