Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming > #1642
| From | "aminer" <aminer@videotron.ca> |
|---|---|
| Newsgroups | comp.programming.threads, comp.programming, comp.arch |
| Subject | Re: About Lockfree_mpmc and scalability ... |
| Date | 2012-05-28 16:39 -0500 |
| Organization | A noiseless patient Spider |
| Message-ID | <jq0nr3$cfu$1@dont-email.me> (permalink) |
| References | <jq0kn9$oe3$1@dont-email.me> |
Cross-posted to 3 groups.
Hello, I have only tested lockfree_mpmc on a Intel Core 2 Quad Q6600, i don't have here an L3 cache, but perhaps lockfree_mpmc will scale on an x86 that have an L3 cache. I didn't tested it with an L3 cache, can you please do it for me if you have an L3 cache and a quad core or more core on your x86 computer ? Just download the Lockfree MPMC and SPMC fifo queues version 1.12 from http://pages.videotron.com/aminer/ and look inside the zip file i have put a push.pas and a pop.pas tests, just open for example the push.pas test and test it with a single threads and after that with 4 threads by giving the variable a the value of 1 and after that 4 and after that just email me the throughput for 1 and 4 threads. Thank you. Amine Moulay Ramdane. "aminer" <aminer@videotron.ca> wrote in message news:jq0kn9$oe3$1@dont-email.me... > > Hello all, > > > I have finally found why lockfree_mpmc doesn't scale... > > you can get the the source code of lockfree_mpmc from: > > http://pages.videotron.com/aminer/ > > So please follow with me.. > > If you take a look at lockfree_mpmc object pascal > source code you will read this on the push side: > > > --- > > function TLockfree_MPMC.push(tm : tNodeQueue):boolean; > var lasttail,newtemp:longword; > i,j:integer; > begin > > if getlength >= fsize > then > begin > result:=false; > exit; > end; > > result:=true; > > newTemp:=LockedIncLong(temp); > lastTail:=newTemp-1; > > setObject(lastTail,tm); > > repeat > if CAS(tail,lasttail,newtemp) > then > begin > exit; > end; > asm pause end; > until false; > end; > > --- > > When i have tested the push() side with 4 threads i have noticed that > lockfree_mpmc > doesn't scale at all., in fact i have got a retrograde throughput, that > means that > i got less throughput than on a single thread test.. and i have finally > found > why lockfree_mpmc doesn't scale. When you are using a lockfree_mpmc > on a single thread test the CAS does read and update the variables on the > level 1 cache, and it's fast, but when you are using 4 threads it does get > too slow cause we are reading and updating from the L2 and from the > memory. > > I have thried to play with the affinity mask and i have found that when i > am > using two threads on my tests and reading and updating from the same level > 2 cache > it does scale a little bit more and i have got more throughput with two > threads > on different cores and on the same level 2 cache than the single > threadtest. > > > I have also modified lockfree_mpmc to not touch the CAS and > the cache when tail and lasttail are not equal by using the following code > inside > the repeat until loop: > > if tail <> lasttail > then > begin > continue; > end; > > and it does give better performance with this method > > here is the final code of the push() side of lockfree_mpmc.. > > i think i will modify the pop() side like that... > > > --- > function TLockfree_MPMC.push(tm : tNodeQueue):boolean; > var lasttail,newtemp:longword; > i,j:integer; > begin > > if getlength >= fsize > then > begin > result:=false; > exit; > end; > > result:=true; > > newTemp:=LockedIncLong(temp); > lastTail:=newTemp-1; > > setObject(lastTail,tm); > > repeat > > if tail <> lasttail > then > begin > continue; > end; > > if CAS(tail,lasttail,newtemp) > then > begin > exit; > end; > asm pause end; > until false; > end; > --- > > But as i have said before lockfree_mpmc doesn't scale when we are > using different cores and WE ARE NOT sharing the same cache, > that means that on my Intel Core 2 Quad Q6600 it does scale only > when we are using 2 threads on different cores that shares the same > level2 cache. > > > > Thank you. > > > Amine Moulay Ramdane. >
Back to comp.programming | Previous | Next — Previous in thread | Next in thread | Find similar
About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 15:46 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 16:39 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-28 18:54 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:03 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:23 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:06 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:20 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 15:45 -0500
Re: About Lockfree_mpmc and scalability ... "aminer" <aminer@videotron.ca> - 2012-05-29 18:07 -0500
csiph-web