Groups > comp.lang.java.programmer > #18747 > unrolled thread

CAS and scalability [...]

Started by	"aminer" <aminer@videotron.ca>
First post	2012-09-13 16:48 -0500
Last post	2012-09-13 19:02 -0500
Articles	13 — 5 participants

Back to article view | Back to comp.lang.java.programmer

  CAS and scalability [...] "aminer" <aminer@videotron.ca> - 2012-09-13 16:48 -0500
    Re: CAS and scalability [...] Lew <lewbloch@gmail.com> - 2012-09-13 14:20 -0700
      Re: CAS and scalability [...] "aminer" <aminer@videotron.ca> - 2012-09-13 19:08 -0500
      Re: CAS and scalability [...] "aminer" <aminer@videotron.ca> - 2012-09-13 19:11 -0500
        Re: CAS and scalability [...] Arne Vajhøj <arne@vajhoej.dk> - 2012-09-13 20:22 -0400
        Re: CAS and scalability [...] Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2012-09-13 21:47 -0700
          Re: CAS and scalability [...] Patricia Shanahan <pats@acm.org> - 2012-09-13 22:34 -0700
    Re: CAS and scalability [...] Patricia Shanahan <pats@acm.org> - 2012-09-13 15:04 -0700
      Re: CAS and scalability [...] "aminer" <aminer@videotron.ca> - 2012-09-13 19:17 -0500
        Re: CAS and scalability [...] Lew <lewbloch@gmail.com> - 2012-09-13 16:21 -0700
        Re: CAS and scalability [...] Patricia Shanahan <pats@acm.org> - 2012-09-13 16:44 -0700
        Re: CAS and scalability [...] Arne Vajhøj <arne@vajhoej.dk> - 2012-09-13 20:24 -0400
    Re: CAS and scalability [...] "aminer" <aminer@videotron.ca> - 2012-09-13 19:02 -0500

#18747 — CAS and scalability [...]

From	"aminer" <aminer@videotron.ca>
Date	2012-09-13 16:48 -0500
Subject	CAS and scalability [...]
Message-ID	<k2tgrh$j5f$1@dont-email.me>

Hello...

when the CAS operation "goes on the bus", use of CAS can impair scalability.
but CAS can be accomplished locally -- that is, with no bus transactions --
and then it can scale.

If we then change the CAS operation that goes on the bus to a normal store
you'll also see a similar slow-down in terms of coherency bus traffic, CAS
isn't appreciably different than a normal store. Also the lock: prefix
caused
the LOCK# signal to be asserted, acquiring exclusive access to the bus.
This doesn't scale of course

As you have noticed i  have wrote parallelhashlist (a parallel hashtable),
you can find parallelhashlist here:

http://pages.videotron.com/aminer/

It's a parallel Hashtable with O(1) best case and O(log(n)) worst case
access that uses lock striping and lightweight MREWs(multiple-readers
-exclusive-writer) , this allows multiple threads to write and read
concurently. also parallelhashlist maintains an independant counter , that
counts the number of entries , for each segment of the hashtable and uses
a lock for each counter, this is also for better scalability. and
parallelhashlist
is scaling very well,  but since it is a parallel hashtable so the
possibility of
contention is low so why doi need the distributed reader-writer lock of
Dmitry Vyukov inside my parallel hashlist ?

Other than that I have done some tests with the lightweight MREW that i am
using inside my parallelhashlist and i have done also some tests with my
lockfree mpmc fifo queue and what i think is that the CAS is generating
a lot of contention this is is why the lightweight MREW and my lockfree_mpmc
are not scaling , but parallelhashlist is scaling very well cause i am using
lock-striping that is lowering contention.

What are doing Dmitry Vyukov in his distributed rwlock is lowering
the contention using the same method as lock striping that i am using inside
parallelhashlist it is why it is scaling, but there is still a possibility
of contention in his distributed rwlock that can cause a problem to the
scalability if there is too many threads and not a sufficient number of
rwlocks in the Dmitry distributed rwlock to be able to lower the contention.

I have tested parallelhashlist(a parallel hashtable that i have implemented)
with four threads on a quad core and it's giving a very well scaling on both
reads and writes.

Also i have done some scalability tests on my parallelsort library and i
have
come
to the conclusion that parallel heapsort is better on scalability than
parallel quicksort
cause the P part (of the Amdahl equation) is bigger in parallel heapsort
than in parallel
quicksort, the parallel heapsort is doing more on the parallel part, it's
why it scales better than parallel quicksort, but parallel quicksort is
still
faster than parallel heapsort and parallel merge sort on my tests on a
quad core processor.

And about lockfree_mpmc( a lockfree fifo queue), i have done some tests
and it's not scaling cause when you are using a single thread some variables
are updated locally on the L1 cache but using multiple threads those
variables are
loaded from the L2 cache and it's more expensive to load them from the L2
cache.and this does generate much more contention

But even though lockfree_mpmc is not scalable, you can increase
the P (parallel) part by doing more of the same: Increase the volume of
data processed by the P part (and therefore the percentage p of time spent
in computing). This is Gustafson's Law and you will get more scalability.

For example i have used the IntToStr() function on each of the four threads
(on
a quad core) on my lockfree_mpmc test programs to convert from and integer
to a string, so i have increased the P (parallel) part and i have got more
scalability,
this is Gustafson's Law, and you have to remember Gustafson's Law ,
this is very important.


You can download my parallel libraries from

http://pages.videotron.com/aminer/




Sincerely,
Amine Moulay Ramdane.

[toc] | [next] | [standalone]

#18748

From	Lew <lewbloch@gmail.com>
Date	2012-09-13 14:20 -0700
Message-ID	<cd213bd3-a7f9-4b99-acf6-0e74ab901103@googlegroups.com>
In reply to	#18747

aminer wrote:
> when the CAS operation "goes on the bus", use of CAS can impair scalability.
> but CAS can be accomplished locally -- that is, with no bus transactions --
> and then it can scale.
> 
> If we then change the CAS operation that goes on the bus to a normal store
> you'll also see a similar slow-down in terms of coherency bus traffic, CAS
> isn't appreciably different than a normal store. Also the lock: prefix
> caused
> the LOCK# signal to be asserted, acquiring exclusive access to the bus.
> This doesn't scale of course
> 
> As you have noticed i  have wrote parallelhashlist (a parallel hashtable),

Why do you spew your irrelevant and illiterate spam on the comp.lang.java.programmer 
newsgroup?

You are a spammer, and not a good writer.

Please stop spamming.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#18753

From	"aminer" <aminer@videotron.ca>
Date	2012-09-13 19:08 -0500
Message-ID	<k2tp23$2f7$1@dont-email.me>
In reply to	#18748

Lew wrote;
> and not a good writer


Sorry for my poor english , i am not an english man,
i only speak well french and arabic.



Amine Moulay Ramdane



"Lew" <lewbloch@gmail.com> wrote in message 
news:cd213bd3-a7f9-4b99-acf6-0e74ab901103@googlegroups.com...
> aminer wrote:
>> when the CAS operation "goes on the bus", use of CAS can impair 
>> scalability.
>> but CAS can be accomplished locally -- that is, with no bus 
>> transactions --
>> and then it can scale.
>>
>> If we then change the CAS operation that goes on the bus to a normal 
>> store
>> you'll also see a similar slow-down in terms of coherency bus traffic, 
>> CAS
>> isn't appreciably different than a normal store. Also the lock: prefix
>> caused
>> the LOCK# signal to be asserted, acquiring exclusive access to the bus.
>> This doesn't scale of course
>>
>> As you have noticed i  have wrote parallelhashlist (a parallel 
>> hashtable),
>
> Why do you spew your irrelevant and illiterate spam on the 
> comp.lang.java.programmer
> newsgroup?
>
> You are a spammer, and not a good writer.
>
> Please stop spamming.
>
> -- 
> Lew
>

[toc] | [prev] | [next] | [standalone]

#18754

From	"aminer" <aminer@videotron.ca>
Date	2012-09-13 19:11 -0500
Message-ID	<k2tp7p$99g$1@dont-email.me>
In reply to	#18748


Lew wrote:;
> You are a spammer


Sorry Lew, it's not spam , cause it's an exchange of ideas on programming
and the source code is free and this can also inspire a Java progammer to 
port
part or all of the Object Pascal source code to Java for example.

Do you understand what i mean. And sorry for my poor english.


Thank you,
Amine Moulay Ramdane.







"Lew" <lewbloch@gmail.com> wrote in message 
news:cd213bd3-a7f9-4b99-acf6-0e74ab901103@googlegroups.com...
> aminer wrote:
>> when the CAS operation "goes on the bus", use of CAS can impair 
>> scalability.
>> but CAS can be accomplished locally -- that is, with no bus 
>> transactions --
>> and then it can scale.
>>
>> If we then change the CAS operation that goes on the bus to a normal 
>> store
>> you'll also see a similar slow-down in terms of coherency bus traffic, 
>> CAS
>> isn't appreciably different than a normal store. Also the lock: prefix
>> caused
>> the LOCK# signal to be asserted, acquiring exclusive access to the bus.
>> This doesn't scale of course
>>
>> As you have noticed i  have wrote parallelhashlist (a parallel 
>> hashtable),
>
> Why do you spew your irrelevant and illiterate spam on the 
> comp.lang.java.programmer
> newsgroup?
>
> You are a spammer, and not a good writer.
>
> Please stop spamming.
>
> -- 
> Lew
>

[toc] | [prev] | [next] | [standalone]

#18758

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-09-13 20:22 -0400
Message-ID	<505278b9$0$286$14726298@news.sunsite.dk>
In reply to	#18754

On 9/13/2012 8:11 PM, aminer wrote:
> Lew wrote:;
>> You are a spammer
>
> Sorry Lew, it's not spam , cause it's an exchange of ideas on programming

I believe there are other groups that covers programming in general.

This group is for Java.

> and the source code is free

Not relevant. This group is about Java. That you software is free
does not make it about Java.

>                           and this can also inspire a Java progammer to
> port
> part or all of the Object Pascal source code to Java for example.

That it could be written in Java does not make it relevant here.

Only if it was actually written in Java.

Arne

[toc] | [prev] | [next] | [standalone]

#18762

From	Kevin McMurtrie <mcmurtrie@pixelmemory.us>
Date	2012-09-13 21:47 -0700
Message-ID	<5052b6d8$0$65525$742ec2ed@news.sonic.net>
In reply to	#18754

In article <k2tp7p$99g$1@dont-email.me>, "aminer" <aminer@videotron.ca> 
wrote:

> Lew wrote:;
> > You are a spammer
> 
> 
> Sorry Lew, it's not spam , cause it's an exchange of ideas on programming
> and the source code is free and this can also inspire a Java progammer to 
> port
> part or all of the Object Pascal source code to Java for example.
> 
> Do you understand what i mean. And sorry for my poor english.
> 
> 
> Thank you,
> Amine Moulay Ramdane.
> 
> 

I don't think it's spam either, but a more thorough explanation of the 
algorithm would be more appropriate than C++ source when posting to a 
Java group.

Concurrency is something often researched by students and those working 
on very large systems.  I doubt there's a shortage of Java Maps that are 
lock-free and employing data structures to minimize CPU cache syncing.  
They're usually tuned for a specific use case, as Sun's 
ConcurrentHashMap is fine general use.  What's needed is more people 
understanding what they're about.
-- 
I will not see posts from Google because I must filter them as spam

[toc] | [prev] | [next] | [standalone]

#18765

From	Patricia Shanahan <pats@acm.org>
Date	2012-09-13 22:34 -0700
Message-ID	<BZqdnabzt6tHXM_NnZ2dnUVZ_r2dnZ2d@earthlink.com>
In reply to	#18762

On 9/13/2012 9:47 PM, Kevin McMurtrie wrote:
...
> I don't think it's spam either, but a more thorough explanation of the
> algorithm would be more appropriate than C++ source when posting to a
> Java group.
...

I think a general, language neutral, discussion of concurrency and
algorithms belongs in comp.programming.

If the subject involved Java-specfic issues, then it would be on-topic here.

Patricia

[toc] | [prev] | [next] | [standalone]

#18750

From	Patricia Shanahan <pats@acm.org>
Date	2012-09-13 15:04 -0700
Message-ID	<k4SdnV-aBOjlxc_NnZ2dnUVZ_t-dnZ2d@earthlink.com>
In reply to	#18747

On 9/13/2012 2:48 PM, aminer wrote:
> Hello...
>
> when the CAS operation "goes on the bus", use of CAS can impair scalability.
> but CAS can be accomplished locally -- that is, with no bus transactions --
> and then it can scale.
...

I am interested in such issues, and will happily discuss them somewhere
where they are on-topic, but I don't see the relevance to Java
programming, the subject of this newsgroup.

Patricia

[toc] | [prev] | [next] | [standalone]

#18755

From	"aminer" <aminer@videotron.ca>
Date	2012-09-13 19:17 -0500
Message-ID	<k2tpik$apg$1@dont-email.me>
In reply to	#18750

Hello,

Exchange of ideas on programming is also relevant to this newsgroup, no ?
and my source code is free and this can also inspire a Java progammer
to port part or all of the Object Pascal source code to Java for example,
and this is also relevant to Java i think.

Do you agree or not Patricia Shanahan?

Amine Moulay Ramdane.

"Patricia Shanahan" <pats@acm.org> wrote in message 
news:k4SdnV-aBOjlxc_NnZ2dnUVZ_t-dnZ2d@earthlink.com...
> On 9/13/2012 2:48 PM, aminer wrote:
>> Hello...
>>
>> when the CAS operation "goes on the bus", use of CAS can impair 
>> scalability.
>> but CAS can be accomplished locally -- that is, with no bus 
>> transactions --
>> and then it can scale.
> ...
>
> I am interested in such issues, and will happily discuss them somewhere
> where they are on-topic, but I don't see the relevance to Java
> programming, the subject of this newsgroup.
>
> Patricia
>
>

[toc] | [prev] | [next] | [standalone]

#18756

From	Lew <lewbloch@gmail.com>
Date	2012-09-13 16:21 -0700
Message-ID	<b0131fea-f0bd-4651-9274-a2b35f1058cd@googlegroups.com>
In reply to	#18755

On Thursday, September 13, 2012 4:17:40 PM UTC-7, aminer wrote:
> Exchange of ideas on programming is also relevant to this newsgroup, no ?

If they're relevant to Java and not spam like yours.

> and my source code is free and this can also inspire a Java progammer
> to port part or all of the Object Pascal source code to Java for example,
> and this is also relevant to Java i think.
> 
> Do you agree or not Patricia Shanahan?

Oh, so you thought that first, that your work was so important you had to 
inform the Java folks, or are you just coming up with that now that you've 
been challenged?

If you really had meant to open your work up to the Java community, you 
would have mentioned that explicitly, no? Because you didn't. 

You just plunged into a bunch of terminology and ideas that had no obvious 
relevance to this group.

So let's hear more about how your brilliant work will help the Java community
on any post you include in this newsgroup, OK?

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#18757

From	Patricia Shanahan <pats@acm.org>
Date	2012-09-13 16:44 -0700
Message-ID	<Ue2dnepl_P5d8s_NnZ2dnUVZ_q2dnZ2d@earthlink.com>
In reply to	#18755

On 9/13/2012 5:17 PM, aminer wrote:
> Hello,
>
> Exchange of ideas on programming is also relevant to this newsgroup, no ?

No. Exchange of ideas on *Java* programming is the only subject that is
relevant in this newsgroup.

> and my source code is free and this can also inspire a Java progammer
> to port part or all of the Object Pascal source code to Java for example,
> and this is also relevant to Java i think.
>
>
> Do you agree or not Patricia Shanahan?
...

I very strongly disagree. So much so that I only became aware of the
article to which I'm responding because Lew quoted it. By the time it
arrived I already had a filter set up to ignore anything you post in
comp.lang.java.programmer.

If you want to talk about Pascal programming, go to a comp.lang.pascal
newsgroup. If you want to talk about programming in general, I suggest
comp.programming. If you want to talk about your libraries I suggest
creating a newsgroup or mailing list for the purpose.

Patricia

[toc] | [prev] | [next] | [standalone]

#18759

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-09-13 20:24 -0400
Message-ID	<5052793f$0$286$14726298@news.sunsite.dk>
In reply to	#18755

On 9/13/2012 8:17 PM, aminer wrote:
> Exchange of ideas on programming is also relevant to this newsgroup, no ?

No.

Only of Java is involved.

> and my source code is free and this can also inspire a Java progammer
> to port part or all of the Object Pascal source code to Java for example,
> and this is also relevant to Java i think.

Questions about the port would be on relevant here.

The fact that it could be written in Java does not make it relevant.

Almost all software could be written in Java.

Arne

[toc] | [prev] | [next] | [standalone]

#18752

From	"aminer" <aminer@videotron.ca>
Date	2012-09-13 19:02 -0500
Message-ID	<k2tom3$eo$1@dont-email.me>
In reply to	#18747

I wrote:
> And about lockfree_mpmc( a lockfree fifo queue), i have done some tests
> and it's not scaling cause when you are using a single thread some 
> variables
> are updated locally on the L1 cache but using multiple threads those
> variables are
> loaded from the L2 cache and it's more expensive to load them from the L2
> cache.and this does generate much more contention

I mean that it's more expensive to load them from the L2 cache than the L1 
cache.

And i wrote: "it does generate a lot of contention", i mean the CAS is a 
single point
of access that can generate a lot of contention, in parallelhashlist i am 
using
lock striping and distributing the access to many points(ea: many rwlocks, 
many CASes)
and this is lowering the contention and this is better for scalability.

What are doing Dmitry Vyukov in his distributed rwlock is also distributing
thje access to many rwlocks using the same method as lock-striping and this 
is
lowering the contention and this is better for scalability , it's why my 
parallelhashlist
and the Distributed Reader-Writer Mutex 1.03 are scaling very well.

And if you have noticed , there is still a weakness with the Dmitry Vyukov
C++ Distributed Reader-Writer Mutex, cause since he is using
GetCurrentProcessorNumber() he is limiting the array of rwlocks to the
number of avaiblable cores and this is not good i think , cause if you have
many more threads than the avaiblable cores and there is high contention
this will cause the performance to degrade, so i have decided to change
that in my implementation of the Distributed Reader-Writer Mutex 1.03
and i have used a  variable number of rwlocks/MREWs so that you can lower
more the contention and this is better for scalability.



Thank you,
Amine Moulay Ramdane.









"aminer" <aminer@videotron.ca> wrote in message 
news:k2tgrh$j5f$1@dont-email.me...
> Hello...
>
> when the CAS operation "goes on the bus", use of CAS can impair 
> scalability.
> but CAS can be accomplished locally -- that is, with no bus 
> transactions --
> and then it can scale.
>
> If we then change the CAS operation that goes on the bus to a normal store
> you'll also see a similar slow-down in terms of coherency bus traffic, CAS
> isn't appreciably different than a normal store. Also the lock: prefix
> caused
> the LOCK# signal to be asserted, acquiring exclusive access to the bus.
> This doesn't scale of course
>
> As you have noticed i  have wrote parallelhashlist (a parallel hashtable),
> you can find parallelhashlist here:
>
> http://pages.videotron.com/aminer/
>
> It's a parallel Hashtable with O(1) best case and O(log(n)) worst case
> access that uses lock striping and lightweight MREWs(multiple-readers
> -exclusive-writer) , this allows multiple threads to write and read
> concurently. also parallelhashlist maintains an independant counter , that
> counts the number of entries , for each segment of the hashtable and uses
> a lock for each counter, this is also for better scalability. and
> parallelhashlist
> is scaling very well,  but since it is a parallel hashtable so the
> possibility of
> contention is low so why doi need the distributed reader-writer lock of
> Dmitry Vyukov inside my parallel hashlist ?
>
> Other than that I have done some tests with the lightweight MREW that i am
> using inside my parallelhashlist and i have done also some tests with my
> lockfree mpmc fifo queue and what i think is that the CAS is generating
> a lot of contention this is is why the lightweight MREW and my 
> lockfree_mpmc
> are not scaling , but parallelhashlist is scaling very well cause i am 
> using
> lock-striping that is lowering contention.
>
> What are doing Dmitry Vyukov in his distributed rwlock is lowering
> the contention using the same method as lock striping that i am using 
> inside
> parallelhashlist it is why it is scaling, but there is still a possibility
> of contention in his distributed rwlock that can cause a problem to the
> scalability if there is too many threads and not a sufficient number of
> rwlocks in the Dmitry distributed rwlock to be able to lower the 
> contention.
>
> I have tested parallelhashlist(a parallel hashtable that i have 
> implemented)
> with four threads on a quad core and it's giving a very well scaling on 
> both
> reads and writes.
>
> Also i have done some scalability tests on my parallelsort library and i
> have
> come
> to the conclusion that parallel heapsort is better on scalability than
> parallel quicksort
> cause the P part (of the Amdahl equation) is bigger in parallel heapsort
> than in parallel
> quicksort, the parallel heapsort is doing more on the parallel part, it's
> why it scales better than parallel quicksort, but parallel quicksort is
> still
> faster than parallel heapsort and parallel merge sort on my tests on a
> quad core processor.
>
> And about lockfree_mpmc( a lockfree fifo queue), i have done some tests
> and it's not scaling cause when you are using a single thread some 
> variables
> are updated locally on the L1 cache but using multiple threads those
> variables are
> loaded from the L2 cache and it's more expensive to load them from the L2
> cache.and this does generate much more contention
>
> But even though lockfree_mpmc is not scalable, you can increase
> the P (parallel) part by doing more of the same: Increase the volume of
> data processed by the P part (and therefore the percentage p of time spent
> in computing). This is Gustafson's Law and you will get more scalability.
>
> For example i have used the IntToStr() function on each of the four 
> threads
> (on
> a quad core) on my lockfree_mpmc test programs to convert from and integer
> to a string, so i have increased the P (parallel) part and i have got more
> scalability,
> this is Gustafson's Law, and you have to remember Gustafson's Law ,
> this is very important.
>
>
> You can download my parallel libraries from
>
> http://pages.videotron.com/aminer/
>
>
>
>
> Sincerely,
> Amine Moulay Ramdane.
>
>
>
>
>
>

[toc] | [prev] | [standalone]

csiph-web

CAS and scalability [...]

Contents

#18747 — CAS and scalability [...]

#18748

#18753

#18754

#18758

#18762

#18765

#18750

#18755

#18756

#18757

#18759

#18752