Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming > #1659
| From | "aminer" <aminer@videotron.ca> |
|---|---|
| Newsgroups | comp.programming.threads, comp.programming, comp.arch |
| Subject | Re: About Lockfree_mpmc and scalability ... |
| Date | 2012-05-29 15:06 -0500 |
| Organization | A noiseless patient Spider |
| Message-ID | <jq36ot$kqo$1@dont-email.me> (permalink) |
| References | <jq0kn9$oe3$1@dont-email.me> <jq36i1$jea$1@dont-email.me> |
Cross-posted to 3 groups.
Hello, Of course i was speaking about the x86 architecture... Amine Moulay Ramdane. "aminer" <aminer@videotron.ca> wrote in message news:jq36i1$jea$1@dont-email.me... > > Hello, > > > I have receaived the benchmarks from some persons > that have an L3 cache, and i have noticed that lockfree_mpmc > doesn't scale either on with an L3 cache. > Do you know why this lock free fifo doesn't scale, cause > look at the following code on the push() side: > > -- > > function TLockfree_MPMC.push(tm : tNodeQueue):boolean; > var lasttail,newtemp:longword; > i,j:integer; > begin > > if getlength >= fsize > then > begin > result:=false; > exit; > end; > result:=true; > newTemp:=LockedIncLong(temp); > > lastTail:=newTemp-1; > setObject(lastTail,tm); > > repeat > > if CAS(tail,lasttail,newtemp) > then > begin > exit; > end; > asm pause end; > > until false; > > > end; > > --- > > > You have two thinks: > > [1] newTemp:=LockedIncLong(temp); > > [2] CAS(tail,lasttail,newtemp) > > In the 4 threads scenario , as you can see > in [1] temp has to be loaded from the L3 cache > of the other cores on computers that have an L3 cache > but on my also from memory on my Intel Core 2 Quad Q6600 > that doesn't have an L2 cache(just an L2 cache for every two cores) , > so that will make the the four thread test with an L3 cache a little bit > slower than the single thread version and much slower without an > L3 cache compared to the single thread version that loads the values > from the L1 cache. That's the same for [2] , tail has to be loaded the > same > way. > > It's why i am getting a retrograde throughput on my > Intel Core 2 Quad Q6600 and alomost the same thoughput > as the single thread on a computer with an L3 cache. > > In the two thread scenario, you have to do a load > from the local L2 cache in [1] and [2] and this loads makes > the S part of the Amadahl equation much bigger than > the P part, it's why the two threads version doens't scale > either. > > So in general i think it's not possible to make lockfree > fifo queues to scale when the lockfree code is sharing variables > between the cores, cause sharing variables is so expensive.. > > > Thank you. > > Amine Moulay Ramdane. > > > "aminer" <aminer@videotron.ca> wrote in message > news:jq0kn9$oe3$1@dont-email.me... >> >> Hello all, >> >> >> I have finally found why lockfree_mpmc doesn't scale... >> >> you can get the the source code of lockfree_mpmc from: >> >> http://pages.videotron.com/aminer/ >> >> So please follow with me.. >> >> If you take a look at lockfree_mpmc object pascal >> source code you will read this on the push side: >> >> >> --- >> >> function TLockfree_MPMC.push(tm : tNodeQueue):boolean; >> var lasttail,newtemp:longword; >> i,j:integer; >> begin >> >> if getlength >= fsize >> then >> begin >> result:=false; >> exit; >> end; >> >> result:=true; >> >> newTemp:=LockedIncLong(temp); >> lastTail:=newTemp-1; >> >> setObject(lastTail,tm); >> >> repeat >> if CAS(tail,lasttail,newtemp) >> then >> begin >> exit; >> end; >> asm pause end; >> until false; >> end; >> >> --- >> >> When i have tested the push() side with 4 threads i have noticed that >> lockfree_mpmc >> doesn't scale at all., in fact i have got a retrograde throughput, that >> means that >> i got less throughput than on a single thread test.. and i have finally >> found >> why lockfree_mpmc doesn't scale. When you are using a lockfree_mpmc >> on a single thread test the CAS does read and update the variables on the >> level 1 cache, and it's fast, but when you are using 4 threads it does >> get >> too slow cause we are reading and updating from the L2 and from the >> memory. >> >> I have thried to play with the affinity mask and i have found that when i >> am >> using two threads on my tests and reading and updating from the same >> level 2 cache >> it does scale a little bit more and i have got more throughput with two >> threads >> on different cores and on the same level 2 cache than the single >> threadtest. >> >> >> I have also modified lockfree_mpmc to not touch the CAS and >> the cache when tail and lasttail are not equal by using the following >> code inside >> the repeat until loop: >> >> if tail <> lasttail >> then >> begin >> continue; >> end; >> >> and it does give better performance with this method >> >> here is the final code of the push() side of lockfree_mpmc.. >> >> i think i will modify the pop() side like that... >> >> >> --- >> function TLockfree_MPMC.push(tm : tNodeQueue):boolean; >> var lasttail,newtemp:longword; >> i,j:integer; >> begin >> >> if getlength >= fsize >> then >> begin >> result:=false; >> exit; >> end; >> >> result:=true; >> >> newTemp:=LockedIncLong(temp); >> lastTail:=newTemp-1; >> >> setObject(lastTail,tm); >> >> repeat >> >> if tail <> lasttail >> then >> begin >> continue; >> end; >> >> if CAS(tail,lasttail,newtemp) >> then >> begin >> exit; >> end; >> asm pause end; >> until false; >> end; >> --- >> >> But as i have said before lockfree_mpmc doesn't scale when we are >> using different cores and WE ARE NOT sharing the same cache, >> that means that on my Intel Core 2 Quad Q6600 it does scale only >> when we are using 2 threads on different cores that shares the same >> level2 cache. >> >> >> >> Thank you. >> >> >> Amine Moulay Ramdane. >> > >
Back to comp.programming | Previous | Next — Previous in thread | Next in thread | Find similar
About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 15:46 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 16:39 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 18:54 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:03 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:23 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:20 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:45 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 18:07 -0500
csiph-web