Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming.threads > #1871
| From | aminer <aminer@toto.net> |
|---|---|
| Newsgroups | comp.programming.threads, comp.programming |
| Subject | I have corrected some typos , please read again... |
| Date | 2013-10-08 09:25 -0700 |
| Organization | albasani.net |
| Message-ID | <l3113c$ck0$4@news.albasani.net> (permalink) |
Cross-posted to 2 groups.
I have corrected some typos , please read again... Hello, I have took a look at the following FIFO MPMC queue that Dmitry Vyukov and Chris Thomasson wrote, and i have understood it very well, this FIFO queue is Waitfree on the push() side but Lockfree on the pop side... but what i don't like is that on the pop() side when there is high contention , there will be too many cache misses and cache lines transfer and this will be so bad ... in a worst case scenario when the threads will be looping and all crossing the "node next = cmp.m_next;" there will be progress but i think this FIFO queue will have to tranfer 1+2+3+4+ ..N = ((N^2+N)/2) cache lines or even worse N+N+N+N...N cache lines transfers if we have have everytime N threads , and this will be so bad, this is what i don't like, so to lower much those cache lines tranfer the FIFO queue must be Waitfree on both the push() and the pop(). Here is the FIFO queue: http://pastebin.com/f72cc3cc1 Amine, where did you come with the ((N^2+N)/2)? I mean with lockfree FIFO queue algorithms and in a worst case scenario when N threads are waiting and spinning on a global variable you will have 1+2+3+4...+N = (N^2+N)/2 cache lines tranfers , cause the first thread will cross the CAS() and it will have a cache miss and one cache-line transfer and the others threads waiting will spin and have N-1 cache misses and N-1 cache line transfers , this will continue like that, so the second thread will cross the CAS() and have one cache line transfer and the others will have (N-1)-1 cache misses and (N-1)-1 cache lines transfers this will continue like that until the last thread , so this will equal to 1+...N = (N^2 + N) /2 , but when you will change that and wrap the push() and pop() with two Locks that lower the cache-coherence traffic such us my scalable distributed Lock or the scalable Anderson Lock or the scalable MCS lock, this (N^2+N)/2 cache lines tranfers, in worst cache scenarios, will be reduced to a constant qual to N(number of threads) cache lines transfers, where did i come with the constant N ? since in the scalable Anderson Lock or in my scalable distributed algorithm (this is true also for the scalable MCS algorithm) every threads is spining on a separate element inside the array , so this will reduce a lotthe cache-coherence traffic to N cache misses and N cache lines transfers. But in lockfree FIFO queue algorithms and in a worst case scenario if we have everytime N threads crossing the CAS , you will have N+N+N+N+...N cache misses and cache lines tranfers and this too much cache-coherence traffic and it is not acceptable, this why lockfree FIFO queue algorithms are bad, but with the scalable Anderson Lock or with my scalable distributed Lock or the scalable MCS Lock this N+N+N+N+...N cache misses and cache lines tranfers will be reduced to 1+1+1...1 and this much much better, other than that the lockfree FIFO queue algorithms are not FIFO fair hence they are not starvation-free , so they can have starvation, so this is why i advice you to use the Locks that are also FIFO fair around the push() and the pop() such us my scalable distributed Lock or the Anderson Lock or the MCS Lock. Thank you, Amine Moulay Ramdane.
Back to comp.programming.threads | Previous | Next | Find similar
I have corrected some typos , please read again... aminer <aminer@toto.net> - 2013-10-08 09:25 -0700
csiph-web