More about scalable algorithms

From	aminer <aminer@toto.net>
Newsgroups	comp.lang.pascal.misc
Subject	More about scalable algorithms
Date	2014-04-28 12:06 -0700
Organization	albasani.net
Message-ID	<ljlua7$2i5$3@news.albasani.net> (permalink)

Show all headers | View raw

Hello,

You have to understand that the Chriss Thomasson concurrent FIFO queue
that uses the bakery algorithm, here it is:


https://groups.google.com/d/topic/lock-free/acjQ3-89abE/discussion


Look at the producer() source code:

void producer(double state) {
     uint32_t ver = XADD(&head, 1);
     cell& c = cells[ver & (N - 1)];
     while (LOAD(&c.ver) != ver) backoff();
     c.state = state;
     STORE(&c.ver, ver + 1);
}



Since the algorithm is more parralelized so the "c.state = state;"
will be wrote in parallel by many threads to the cache or write-back 
cache,  so this will higher the throughput of the pop() side, but
even if the push() side is faster , notice that under contention the 
throughtput islimited by the throughput of the pop() side.


But look at the consumer() function:

double consumer() {
     uint32_t ver = XADD(&tail, 1);
     cell& c = cells[ver & (N - 1)];
     while (LOAD(&c.ver) != ver + 1) backoff();
     double state = c.state;
     STORE(&c.ver, ver + N);
     return state;
}



So as you have noticed the "double state = c.state" is serialized
cause the "Bus" serializes the data movements between the caches and the 
mmemory subsystem and the caches... but notice how many variables
causes data movements between caches and between memory and caches on
the Chriss Thomasson algorithm on consumer() side,
There is the "tail" variable right on "XADD(&tail, 1);", there
is the "&c.ver" variable right on  the "while (LOAD(&c.ver) != ver + 1)" 
and there is  "c.state"variable right on  "double state = c.state;"
but since those variables generate data movements between the caches
and between the memory and the caches, they will cause contention
on the Bus as i have explained it to you in my previous posts, cause 
those data movements are serialized on the "Bus" subsystem , so this 
contention will cause more waiting time and more waiting time means less 
throughput... but notice with me that in my concurrent FIFO queue that 
uses the two lock algorithm  since almost all the variable are located 
inside the critial section, that means they will generate less 
contention , hence the two locks algorithm is more efficient than the 
Chriss Thomasson algorithm cause it lowers the contention efficently, 
this is why even if my two locks algorithm uses more variables (4 in 
total) than the Chriss Thomasson algorithm that is using 3 variables,
the two locks algorithm is scoring the same throughput on the
pop() side than the Chriss Thomasson algorithm , that means
that the two locks algorithm is more efficient when it comes
to "contention" than the Chriss Thomasson algorithm.


Hope you have undertstood what i want you to understand in this post.



Thank you,
Amine Moulay Ramdane.

Back to comp.lang.pascal.misc | Previous | Next — Next in thread | Find similar

Thread

More about scalable algorithms aminer <aminer@toto.net> - 2014-04-28 12:06 -0700
  Re: More about scalable algorithms aminer <aminer@toto.net> - 2014-04-28 12:11 -0700
    Re: More about scalable algorithms aminer <aminer@toto.net> - 2014-04-28 12:28 -0700

csiph-web