Groups > comp.programming.threads > #2108

Re: More information about the benchmarks...

From	aminer <aminer@toto.net>
Newsgroups	comp.programming.threads, comp.programming
Subject	Re: More information about the benchmarks...
Date	2014-03-18 17:08 -0700
Organization	albasani.net
Message-ID	<lgackn$ne5$1@news.albasani.net> (permalink)
References	<lgac4c$ma1$1@news.albasani.net>

Cross-posted to 2 groups.

Show all headers | View raw

On 3/18/2014 4:59 PM, aminer wrote:
>
> Hello,
>
>
>
> Question:
>
>
> Amine , even if you don't have a computer with 32 cores
> you have done some calculations and projected to how much
> all your RWLocks variants will scale to 32 cores, how have you been
> able to done this ?
>
>
> Answer:
>
> Fist i have said that i have made the read and write sections a little
> bit bigger and the execution time inside the read and write sections was
> equal and i have taken an average number of cache-lines tranfers and the
> code inside the reader section was scalable, and if you look
> at the source code of my RWLockX you will notice that i am using the
> following code inside the WLock() method:
>
> ---
> repeat;
>    event2.setEvent;
>    asm pause end;
>    event2.resetEvent;
> until nbr^.nbr=0;
>
> for i:=0 to GetSystemThreadCount-1 do
>   begin
>     while (FCount1^[i].fcount1<>0)
>      do
>       begin
>       end;
>   end;
>
> ---
>
> first if you are using 32 threads on a 32 cores (ib an x86), the
> following code will transfer a number of cache-lines between cores cause
> the following code is signaling the portable event object on the reader
> side so this will generate let in the worst case 32 cache-lines
> transfers and on average we will take for exemple 16 cache-lines
> transfers, here is the code:
>
> repeat;
>    event2.setEvent;
>    asm pause end;
>    event2.resetEvent;
> until nbr^.nbr=0;
>
> that's the same for the following code that will generate cache-lines
> transfers between cores:
>
> for i:=0 to GetSystemThreadCount-1 do
>   begin
>     while (FCount1^[i].fcount1<>0)
>      do
>       begin
>       end;
>   end;
>
>
> So to be able to project the scalability on a 32 cores even if we
> don't have a 32 cores computer , we must take for exemple a scenario
> with 0.1% of writers , let say the writers will enter WLock() 2000 times
> the WLock() method ,  so the readers will enter the RLock() 2000000
> times, now to be able to project with the Amdahl law , we have to
> calculate how much the serial part will take time, first we must
> benchmark the time that it will take to  serial part of the writers
> without readers , and we must benchmark how much it will take for
> cache-lines transfers, this part is easy to do with a small program,
> after that we will add the too previous serial parts and this will give
> us the overall serial part of the Amdahl equation and we will call it
> serial part S,  after that we will calculate how much it will take for
> the readers to enter 20000000 times the reader sections let call it


I mean 2000000 times.


> Parallel part P  and after that we will plug all the results into the
> Amdahl equation and this will give us  the calculation results of the
> scalability of my RWLocks on a x86 computer with 32 cores.
>
>
> Hope you have understood now.
>
>
> Thank you,
> Amine Moulay Ramdane.
>
>
>
>
>
>
>
>
>
>
> Reponse:

Back to comp.programming.threads | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

More information about the benchmarks... aminer <aminer@toto.net> - 2014-03-18 16:59 -0700
  Re: More information about the benchmarks... aminer <aminer@toto.net> - 2014-03-18 17:08 -0700
  Re: More information about the benchmarks... aminer <aminer@toto.net> - 2014-03-18 17:45 -0700

csiph-web