Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.programming > #1659

Re: About Lockfree_mpmc and scalability ...

From "aminer" <aminer@videotron.ca>
Newsgroups comp.programming.threads, comp.programming, comp.arch
Subject Re: About Lockfree_mpmc and scalability ...
Date 2012-05-29 15:06 -0500
Organization A noiseless patient Spider
Message-ID <jq36ot$kqo$1@dont-email.me> (permalink)
References <jq0kn9$oe3$1@dont-email.me> <jq36i1$jea$1@dont-email.me>

Cross-posted to 3 groups.

Show all headers | View raw


Hello,

Of course i was speaking about the x86 architecture...


Amine Moulay Ramdane.

"aminer" <aminer@videotron.ca> wrote in message 
news:jq36i1$jea$1@dont-email.me...
>
> Hello,
>
>
> I have receaived the benchmarks from some persons
> that have an L3 cache, and i have noticed that lockfree_mpmc
> doesn't scale either on with an L3 cache.
> Do you know why this lock free fifo doesn't scale, cause
> look at the following code on the push() side:
>
> --
>
> function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
> var lasttail,newtemp:longword;
> i,j:integer;
> begin
>
> if getlength >= fsize
> then
> begin
> result:=false;
> exit;
> end;
> result:=true;
> newTemp:=LockedIncLong(temp);
>
> lastTail:=newTemp-1;
> setObject(lastTail,tm);
>
> repeat
>
> if CAS(tail,lasttail,newtemp)
> then
> begin
> exit;
> end;
> asm pause end;
>
> until false;
>
>
> end;
>
> ---
>
>
> You have two thinks:
>
> [1] newTemp:=LockedIncLong(temp);
>
> [2] CAS(tail,lasttail,newtemp)
>
> In the 4 threads scenario , as you can see
> in [1] temp has to be loaded from the L3 cache
> of the other cores on computers that have an L3 cache
> but on my also from memory on my Intel Core 2 Quad Q6600
> that doesn't have an L2 cache(just an L2 cache for every two cores) ,
> so that will make the the four thread test with an L3 cache a little bit
> slower than the single thread version and much slower without an
> L3 cache compared to the single thread version that loads the values
> from the L1 cache. That's the same for [2] , tail has to be loaded the 
> same
> way.
>
> It's why i am getting a retrograde throughput on my
> Intel Core 2 Quad Q6600 and alomost the same thoughput
> as the single thread on a computer with an L3 cache.
>
> In the two thread scenario, you have to do a load
> from the local L2 cache in [1] and [2] and this loads makes
> the S part of the Amadahl equation much bigger than
> the P part, it's why the two threads version doens't scale
> either.
>
> So in general i think it's not possible to make lockfree
> fifo queues to scale when the lockfree code is sharing variables
> between the cores, cause sharing variables is so expensive..
>
>
> Thank you.
>
> Amine Moulay Ramdane.
>
>
> "aminer" <aminer@videotron.ca> wrote in message 
> news:jq0kn9$oe3$1@dont-email.me...
>>
>> Hello all,
>>
>>
>> I have finally found why lockfree_mpmc doesn't scale...
>>
>> you can get the the source code of lockfree_mpmc from:
>>
>> http://pages.videotron.com/aminer/
>>
>> So please follow with me..
>>
>> If you take a look at lockfree_mpmc object pascal
>> source code you will read this on the push side:
>>
>>
>> ---
>>
>> function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
>> var lasttail,newtemp:longword;
>> i,j:integer;
>> begin
>>
>> if getlength >= fsize
>> then
>>  begin
>>    result:=false;
>>   exit;
>> end;
>>
>> result:=true;
>>
>> newTemp:=LockedIncLong(temp);
>> lastTail:=newTemp-1;
>>
>> setObject(lastTail,tm);
>>
>> repeat
>> if CAS(tail,lasttail,newtemp)
>> then
>>   begin
>>    exit;
>>   end;
>> asm pause end;
>> until false;
>> end;
>>
>> ---
>>
>> When i have tested the push() side with 4 threads i have noticed that 
>> lockfree_mpmc
>> doesn't scale at all., in fact i have got a retrograde throughput, that 
>> means that
>> i got less throughput than on a single thread  test.. and i have finally 
>> found
>> why lockfree_mpmc doesn't scale.  When you are using a lockfree_mpmc
>> on a single thread test the CAS does read and update the variables on the
>> level 1 cache, and it's fast, but when you are using 4 threads it does 
>> get
>> too slow cause we are reading and updating from the L2  and from the 
>> memory.
>>
>> I have thried to play with the affinity mask and i have found that when i 
>> am
>> using two threads on my tests and reading and updating from the same 
>> level 2 cache
>> it does scale a little bit more and i have got more throughput with two 
>> threads
>> on different cores and on the same level 2 cache than the single 
>> threadtest.
>>
>>
>> I have also modified lockfree_mpmc to not touch the CAS and
>> the cache when tail and lasttail are not equal by using the following 
>> code inside
>> the repeat until loop:
>>
>> if tail <> lasttail
>> then
>> begin
>> continue;
>> end;
>>
>> and it does give  better  performance with this method
>>
>> here is the final code of the push() side of lockfree_mpmc..
>>
>> i think i will modify the pop() side like that...
>>
>>
>> ---
>> function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
>> var lasttail,newtemp:longword;
>> i,j:integer;
>> begin
>>
>> if getlength >= fsize
>> then
>>  begin
>>    result:=false;
>>   exit;
>> end;
>>
>> result:=true;
>>
>> newTemp:=LockedIncLong(temp);
>> lastTail:=newTemp-1;
>>
>> setObject(lastTail,tm);
>>
>> repeat
>>
>> if tail <> lasttail
>> then
>> begin
>>  continue;
>> end;
>>
>> if CAS(tail,lasttail,newtemp)
>> then
>>   begin
>>    exit;
>>   end;
>> asm pause end;
>> until false;
>> end;
>> ---
>>
>> But as i have said before lockfree_mpmc doesn't scale when we are
>> using different cores and WE ARE NOT  sharing the same cache,
>> that means that on my Intel Core 2 Quad Q6600 it does scale only
>> when we are using 2 threads on different cores that shares the same
>> level2 cache.
>>
>>
>>
>> Thank you.
>>
>>
>> Amine Moulay Ramdane.
>>
>
> 

Back to comp.programming | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 15:46 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 16:39 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 18:54 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:03 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:23 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:20 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:45 -0500
      Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 18:07 -0500

csiph-web