Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.programming.threads > #1503
| Date | 2013-05-31 15:53 +0200 |
|---|---|
| From | Marcel Müller <news.5.maazl@spamgourmet.org> |
| Newsgroups | comp.programming.threads |
| Subject | Re: Thread scaling in Windows |
| References | <XnsA1CFD0BECDBADmyfirstnameosapriee@216.196.109.131> <51a7c499$0$6565$9b4e6d93@newsspool4.arcor-online.net> <XnsA1D1132254871myfirstnameosapriee@216.196.109.131> |
| Message-ID | <51a8ab52$0$6567$9b4e6d93@newsspool4.arcor-online.net> (permalink) |
| Organization | Arcor |
On 31.05.13 00.52, Paavo Helde wrote: >> Most likely you enter the mutexes very often during calculation. This > is >> bad practice. If you want to run in parallel, then run in parallel and >> do not talk to each other over an over. > > Mutexes are indeed entered quite often, but most of the cases there > should be no contention and the mutex is locked for a very short time, This does not care. Think of that: entering a mutex requires to synchronize all cores with memory. Even at lowest level one clock cycle of 1GHz (very few nowadays) is only enough for synchronizing over a distance of about 10 cm, simply because the speed of the electric fields (about 70% of the speed of light). Enough to reach the memory chips, but no more. Switching times of the electronics will additionally reduce the speed. So even at assembly level, e.g. with lock-free algorithms, synchronization (= memory barriers) are expensive nowadays. > like updating a shared object reference count by a smartpointer. This is really expensive! Not only because of the mutex. Writing to a shared memory location from different cores also invalidates the related cache content of all cores except for the one currently writing. So cache misses are very likely to occur after that. The problem is called cache line hopping, and you simply should not do that. Each thread should write to its own memory. Writes to shared memory should be used rarely. Reads to shared memory are pretty fine, as long as no one writes to it. And note that memory locations that are close to each other are also affected, because CPUs manage the cache memory in chunks of about 256 bytes or something in that order. So an array where each thread has its own slot is also bad practice. This is a place where smart pointers could really slow down things, even when the different threads only read the pointers and the referred objects. Simply because copying a smart pointer always writes to a common memory. Although some smart pointer implementations are lock-free (which is not that easy when you require strong thread safety), the overhead could be considerably, if you pass them as function arguments all over the time. To come around that I often use reference types as function arguments instead of smart pointers. This is safe as long as the caller needs to hold a strong reference anyway. However this references may not be put into a new smart pointer unless you have consequently chosen intrusive implementations, that do no additional allocation for the smart pointers internal structure. So boost::shared_ptr is out. This is the way I always go. I never used shared_ptr really. Instead my objects that are subject to be reference counted have a public base class with the (private) reference counter that has the intrusive counter access as friend. Using this pattern allows me to pass the value of a smart pointer as raw pointer (or reference) to a function and assign this raw pointer to a smart pointer inside the function again if necessary. > I have > understood that mutex lock is next to harmless if there is no contention, > does this change in multicore? NUMA? Yes, this makes a significant change. With NUMA it is even more significant. > In the VTune Amplifier I see many "Synchronization context switches" > (about 1000 per second, 10 times more than "Preemption context > switches"). Are these caused by mutex lockings? Yes, likely. But maybe also by atomic increment/decrement instructions used by smart pointers. > Are these numbers too large? Some people say a context switch create a load of about 5,000 to 10,000 clock cycles. Doing that 1000 times a second creates a significant overhead. This is only a rough estimate. Details depend on whether the MMU must be reprogrammed (process switch) and which CPU components are used since the last switch (FPU, AVX ...) and, of course, on the operating system kernel. > It also tells me that the CPI rate (cycles per instruction retired) is > too high (1.691, ideal would be something like 0.25) and lists memory > stalls as the first possible cause. That could be your cache line hoppings. > Minimal logging level, no GUI, no progress indicators. Tasks are very > large (maybe this causes problems instead?). No, no problem with large tasks. But if so, why do you write to a shared smart pointer all the time? Marcel
Back to comp.programming.threads | Previous | Next — Previous in thread | Next in thread | Find similar
Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-29 12:31 -0500
Re: Thread scaling in Windows Robert Miles <robertmilesxyz@gmail.com> - 2013-05-29 17:26 -0700
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-30 16:11 -0500
Re: Thread scaling in Windows Gerald Breuer <Gerald.Breuer@googlemail.com> - 2013-05-31 03:44 +0200
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-31 00:29 -0500
Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-05-30 23:28 +0200
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-30 17:52 -0500
Re: Thread scaling in Windows Melzzzzz <mel@zzzzz.com> - 2013-05-31 04:18 +0200
Re: Thread scaling in Windows Robert Wessel <robertwessel2@yahoo.com> - 2013-05-31 01:11 -0500
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-01 12:15 -0500
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-01 16:52 -0500
Re: Thread scaling in Windows [SOLVED] Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-03 12:00 -0500
Re: Thread scaling in Windows [SOLVED] Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> - 2013-06-04 08:16 +0000
Re: Thread scaling in Windows [SOLVED] Noob <root@127.0.0.1> - 2013-06-04 10:25 +0200
Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-05-31 15:53 +0200
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-05-31 12:57 -0500
Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-05-31 22:59 +0200
Re: Thread scaling in Windows Paavo Helde <myfirstname@osa.pri.ee> - 2013-06-01 03:06 -0500
Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-06-01 15:20 +0200
Re: Thread scaling in Windows Drazen Kacar <dave@fly.srk.fer.hr> - 2013-06-02 06:35 +0000
Re: Thread scaling in Windows Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> - 2013-06-01 12:34 +0000
Re: Thread scaling in Windows Marcel Müller <news.5.maazl@spamgourmet.org> - 2013-06-01 15:09 +0200
Re: Thread scaling in Windows Casper H.S. Dik <Casper.Dik@OrSPaMcle.COM> - 2013-06-01 17:12 +0000
csiph-web