Groups > comp.programming.threads > #1871

I have corrected some typos , please read again...

From	aminer <aminer@toto.net>
Newsgroups	comp.programming.threads, comp.programming
Subject	I have corrected some typos , please read again...
Date	2013-10-08 09:25 -0700
Organization	albasani.net
Message-ID	<l3113c$ck0$4@news.albasani.net> (permalink)

Cross-posted to 2 groups.

Show all headers | View raw

I have corrected some typos , please read again...


Hello,

I have took a look at the following FIFO MPMC queue
that Dmitry Vyukov and Chris Thomasson wrote, and i have understood
it very well, this FIFO queue is Waitfree on the push() side
but Lockfree on the pop side... but what i don't like is that
on the pop() side when there is high contention , there will be
too many cache misses and cache lines transfer and this
will be so bad ... in a worst case scenario when the threads will be 
looping and all crossing the "node next = cmp.m_next;" there will be 
progress but i think this FIFO queue will have to tranfer
1+2+3+4+ ..N = ((N^2+N)/2) cache lines or even worse N+N+N+N...N cache lines
transfers if we have have everytime N threads , and this will be
so bad, this is what i don't like, so to lower much those cache lines 
tranfer the FIFO queue must be Waitfree on both the push() and the pop().

Here is the FIFO queue:

http://pastebin.com/f72cc3cc1

Amine, where did you come with the ((N^2+N)/2)?

I mean  with lockfree FIFO queue algorithms and in a worst case scenario 
when N threads are waiting and spinning on a global variable you will 
have 1+2+3+4...+N = (N^2+N)/2 cache lines tranfers , cause the first 
thread will cross the CAS() and it will have a cache miss and one 
cache-line transfer and the others threads waiting will spin and have 
N-1 cache misses and N-1 cache
line transfers , this will continue like that, so the second thread
will cross the CAS() and have one cache line transfer and the others
will have (N-1)-1 cache misses and  (N-1)-1 cache lines transfers
this will continue like that until the last thread , so this
will equal to 1+...N = (N^2 + N) /2 , but when you will change that and 
wrap the push() and pop() with two  Locks that lower the cache-coherence 
traffic such us  my scalable distributed Lock or the scalable Anderson 
Lock or the scalable MCS lock,
this (N^2+N)/2 cache lines tranfers, in worst cache scenarios, will be 
reduced to a constant qual to N(number of threads) cache lines 
transfers, where did i come with the constant N ? since in the scalable 
Anderson Lock or in my scalable distributed algorithm
(this is true also for the scalable MCS algorithm) every threads is 
spining on a separate element inside the array , so this will reduce a 
lotthe cache-coherence traffic to N cache misses and N cache lines 
transfers. But in lockfree FIFO queue algorithms and in a worst case 
scenario if we have everytime N threads crossing the CAS , you will have 
N+N+N+N+...N cache misses and cache lines tranfers and this too much 
cache-coherence traffic and it is not acceptable, this why lockfree FIFO 
queue algorithms are bad, but with the scalable Anderson Lock or with my 
scalable distributed Lock or the scalable MCS Lock this N+N+N+N+...N 
cache misses and cache lines tranfers will be reduced to 1+1+1...1 and 
this much much better, other than that the lockfree FIFO queue 
algorithms are not FIFO fair hence they are not starvation-free , so 
they can have starvation, so this is why i advice you to use the Locks 
that are also FIFO fair around the push() and the pop() such us my 
scalable distributed Lock or the Anderson Lock or the MCS Lock.


Thank you,
Amine Moulay Ramdane.

Back to comp.programming.threads | Previous | Next | Find similar

Thread

I have corrected some typos , please read again... aminer <aminer@toto.net> - 2013-10-08 09:25 -0700

csiph-web