Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.programming.threads > #1503

Re: Thread scaling in Windows

Date 2013-05-31 15:53 +0200
From Marcel Müller <news.5.maazl@spamgourmet.org>
Newsgroups comp.programming.threads
Subject Re: Thread scaling in Windows
References <XnsA1CFD0BECDBADmyfirstnameosapriee@216.196.109.131> <51a7c499$0$6565$9b4e6d93@newsspool4.arcor-online.net> <XnsA1D1132254871myfirstnameosapriee@216.196.109.131>
Message-ID <51a8ab52$0$6567$9b4e6d93@newsspool4.arcor-online.net> (permalink)
Organization Arcor

Show all headers | View raw


On 31.05.13 00.52, Paavo Helde wrote:
>> Most likely you enter the mutexes very often during calculation. This
> is
>> bad practice. If you want to run in parallel, then run in parallel and
>> do not talk to each other over an over.
>
> Mutexes are indeed entered quite often, but most of the cases there
> should be no contention and the mutex is locked for a very short time,

This does not care. Think of that: entering a mutex requires to 
synchronize all cores with memory. Even at lowest level one clock cycle 
of 1GHz (very few nowadays) is only enough for synchronizing over a 
distance of about 10 cm, simply because the speed of the electric fields 
(about 70% of the speed of light). Enough to reach the memory chips, but 
no more. Switching times of the electronics will additionally reduce the 
speed. So even at assembly level, e.g. with lock-free algorithms, 
synchronization (= memory barriers) are expensive nowadays.

> like updating a shared object reference count by a smartpointer.

This is really expensive! Not only because of the mutex. Writing to a 
shared memory location from different cores also invalidates the related 
cache content of all cores except for the one currently writing. So 
cache misses are very likely to occur after that. The problem is called 
cache line hopping, and you simply should not do that. Each thread 
should write to its own memory. Writes to shared memory should be used 
rarely. Reads to shared memory are pretty fine, as long as no one writes 
to it. And note that memory locations that are close to each other are 
also affected, because CPUs manage the cache memory in chunks of about 
256 bytes or something in that order. So an array where each thread has 
its own slot is also bad practice.

This is a place where smart pointers could really slow down things, even 
when the different threads only read the pointers and the referred 
objects. Simply because copying a smart pointer always writes to a 
common memory. Although some smart pointer implementations are lock-free 
(which is not that easy when you require strong thread safety), the 
overhead could be considerably, if you pass them as function arguments 
all over the time.

To come around that I often use reference types as function arguments 
instead of smart pointers. This is safe as long as the caller needs to 
hold a strong reference anyway. However this references may not be put 
into a new smart pointer unless you have consequently chosen intrusive 
implementations, that do no additional allocation for the smart pointers 
internal structure. So boost::shared_ptr is out.
This is the way I always go. I never used shared_ptr really. Instead my 
objects that are subject to be reference counted have a public base 
class with the (private) reference counter that has the intrusive 
counter access as friend. Using this pattern allows me to pass the value 
of a smart pointer as raw pointer (or reference) to a function and 
assign this raw pointer to a smart pointer inside the function again if 
necessary.

> I have
> understood that mutex lock is next to harmless if there is no contention,
> does this change in multicore? NUMA?

Yes, this makes a significant change. With NUMA it is even more significant.

> In the VTune Amplifier I see many "Synchronization context switches"
> (about 1000 per second, 10 times more than "Preemption context
> switches"). Are these caused by mutex lockings?

Yes, likely. But maybe also by atomic increment/decrement instructions 
used by smart pointers.

> Are these numbers too large?

Some people say a context switch create a load of about 5,000 to 10,000 
clock cycles. Doing that 1000 times a second creates a significant 
overhead. This is only a rough estimate. Details depend on whether the 
MMU must be reprogrammed (process switch) and which CPU components are 
used since the last switch (FPU, AVX ...) and, of course, on the 
operating system kernel.

> It also tells me that the CPI rate (cycles per instruction retired) is
> too high (1.691, ideal would be something like 0.25) and lists memory
> stalls as the first possible cause.

That could be your cache line hoppings.


> Minimal logging level, no GUI, no progress indicators. Tasks are very
> large (maybe this causes problems instead?).

No, no problem with large tasks.

But if so, why do you write to a shared smart pointer all the time?


Marcel

Back to comp.programming.threads | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-29 12:31 -0500
  Re: Thread scaling in Windows Robert Miles <robertmilesxyz@gmail.com> - 2013-05-29 17:26 -0700
    Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-30 16:11 -0500
      Re: Thread scaling in Windows Gerald Breuer <Gerald.Breuer@googlemail.com> - 2013-05-31 03:44 +0200
        Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-31 00:29 -0500
  Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-05-30 23:28 +0200
    Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-30 17:52 -0500
      Re: Thread scaling in Windows Melzzzzz <mel@zzzzz.com> - 2013-05-31 04:18 +0200
      Re: Thread scaling in Windows Robert Wessel <robertwessel2@yahoo.com> - 2013-05-31 01:11 -0500
        Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-01 12:15 -0500
          Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-01 16:52 -0500
            Re: Thread scaling in Windows [SOLVED] Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-03 12:00 -0500
              Re: Thread scaling in Windows [SOLVED] Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> - 2013-06-04 08:16 +0000
              Re: Thread scaling in Windows [SOLVED] Noob <root@127.0.0.1> - 2013-06-04 10:25 +0200
      Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-05-31 15:53 +0200
        Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-31 12:57 -0500
          Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-05-31 22:59 +0200
            Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-01 03:06 -0500
              Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-06-01 15:20 +0200
                Re: Thread scaling in Windows Drazen Kacar <dave@fly.srk.fer.hr> - 2013-06-02 06:35 +0000
            Re: Thread scaling in Windows Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> - 2013-06-01 12:34 +0000
              Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-06-01 15:09 +0200
                Re: Thread scaling in Windows Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> - 2013-06-01 17:12 +0000

csiph-web