Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.programming > #1642

Re: About Lockfree_mpmc and scalability ...

From "aminer" <aminer@videotron.ca>
Newsgroups comp.programming.threads, comp.programming, comp.arch
Subject Re: About Lockfree_mpmc and scalability ...
Date 2012-05-28 16:39 -0500
Organization A noiseless patient Spider
Message-ID <jq0nr3$cfu$1@dont-email.me> (permalink)
References <jq0kn9$oe3$1@dont-email.me>

Cross-posted to 3 groups.

Show all headers | View raw


Hello,


I have only tested lockfree_mpmc on a Intel Core 2 Quad Q6600,
i don't have here an L3 cache, but perhaps  lockfree_mpmc
will scale on an x86 that have an L3 cache.


I didn't tested it with an L3 cache, can you please do it for me
if you have an L3 cache and a quad core or more core on your x86 computer ?



Just download the Lockfree MPMC and SPMC fifo queues version 1.12
from http://pages.videotron.com/aminer/ and look inside the zip
file i have put a push.pas and a pop.pas tests, just open for example the
push.pas test and test it with a single threads and after that with 4 
threads
by giving the variable a the value of 1 and after that 4 and after that just
email me the throughput for 1 and 4 threads.



Thank you.

Amine Moulay Ramdane.



"aminer" <aminer@videotron.ca> wrote in message 
news:jq0kn9$oe3$1@dont-email.me...
>
> Hello all,
>
>
> I have finally found why lockfree_mpmc doesn't scale...
>
> you can get the the source code of lockfree_mpmc from:
>
> http://pages.videotron.com/aminer/
>
> So please follow with me..
>
> If you take a look at lockfree_mpmc object pascal
> source code you will read this on the push side:
>
>
> ---
>
> function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
> var lasttail,newtemp:longword;
> i,j:integer;
> begin
>
> if getlength >= fsize
> then
>  begin
>    result:=false;
>   exit;
> end;
>
> result:=true;
>
> newTemp:=LockedIncLong(temp);
> lastTail:=newTemp-1;
>
> setObject(lastTail,tm);
>
> repeat
> if CAS(tail,lasttail,newtemp)
> then
>   begin
>    exit;
>   end;
> asm pause end;
> until false;
> end;
>
> ---
>
> When i have tested the push() side with 4 threads i have noticed that 
> lockfree_mpmc
> doesn't scale at all., in fact i have got a retrograde throughput, that 
> means that
> i got less throughput than on a single thread  test.. and i have finally 
> found
> why lockfree_mpmc doesn't scale.  When you are using a lockfree_mpmc
> on a single thread test the CAS does read and update the variables on the
> level 1 cache, and it's fast, but when you are using 4 threads it does get
> too slow cause we are reading and updating from the L2  and from the 
> memory.
>
> I have thried to play with the affinity mask and i have found that when i 
> am
> using two threads on my tests and reading and updating from the same level 
> 2 cache
> it does scale a little bit more and i have got more throughput with two 
> threads
> on different cores and on the same level 2 cache than the single 
> threadtest.
>
>
> I have also modified lockfree_mpmc to not touch the CAS and
> the cache when tail and lasttail are not equal by using the following code 
> inside
> the repeat until loop:
>
> if tail <> lasttail
> then
> begin
> continue;
> end;
>
> and it does give  better  performance with this method
>
> here is the final code of the push() side of lockfree_mpmc..
>
> i think i will modify the pop() side like that...
>
>
> ---
> function TLockfree_MPMC.push(tm : tNodeQueue):boolean;
> var lasttail,newtemp:longword;
> i,j:integer;
> begin
>
> if getlength >= fsize
> then
>  begin
>    result:=false;
>   exit;
> end;
>
> result:=true;
>
> newTemp:=LockedIncLong(temp);
> lastTail:=newTemp-1;
>
> setObject(lastTail,tm);
>
> repeat
>
> if tail <> lasttail
> then
> begin
>  continue;
> end;
>
> if CAS(tail,lasttail,newtemp)
> then
>   begin
>    exit;
>   end;
> asm pause end;
> until false;
> end;
> ---
>
> But as i have said before lockfree_mpmc doesn't scale when we are
> using different cores and WE ARE NOT  sharing the same cache,
> that means that on my Intel Core 2 Quad Q6600 it does scale only
> when we are using 2 threads on different cores that shares the same
> level2 cache.
>
>
>
> Thank you.
>
>
> Amine Moulay Ramdane.
> 

Back to comp.programming | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 15:46 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 16:39 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 18:54 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:03 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:23 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
  Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:20 -0500
    Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:45 -0500
      Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 18:07 -0500

csiph-web