Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c++ > #4373 > unrolled thread
| Started by | jl_post@hotmail.com |
|---|---|
| First post | 2011-04-26 09:58 -0700 |
| Last post | 2011-05-01 14:47 -0700 |
| Articles | 20 on this page of 33 — 9 participants |
Back to article view | Back to comp.lang.c++
Please disprove this Double-Checked Locking "fix" jl_post@hotmail.com - 2011-04-26 09:58 -0700
Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-04-26 18:17 +0100
Re: Please disprove this Double-Checked Locking "fix" Pete Becker <pete@versatilecoding.com> - 2011-04-26 13:50 -0400
Re: Please disprove this Double-Checked Locking "fix" Scott Meyers <NeverRead@aristeia.com> - 2011-05-01 17:14 -0700
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-02 15:59 -0700
Re: Please disprove this Double-Checked Locking "fix" Pete Becker <pete@versatilecoding.com> - 2011-05-03 08:39 -0400
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-04-26 11:16 -0700
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-04-26 11:19 -0700
Re: Please disprove this Double-Checked Locking "fix" Pete Becker <pete@versatilecoding.com> - 2011-04-26 14:30 -0400
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-04-26 11:50 -0700
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-11 19:55 -0700
Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-13 19:56 -0300
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-13 16:59 -0700
Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-18 18:12 -0300
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-18 14:53 -0700
Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-19 13:46 -0300
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-19 15:30 -0700
Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-21 11:55 -0300
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-22 01:08 -0700
Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-04-30 15:54 -0700
Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-05-01 21:49 +0100
Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-01 17:26 -0400
Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-05-01 22:44 +0100
Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-05-02 01:01 +0100
Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-01 22:04 -0400
Re: Please disprove this Double-Checked Locking "fix" "Chris M. Thomasson" <cristom@charter.net> - 2011-05-04 11:49 -0700
Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-06 00:16 -0400
Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-02 15:43 -0700
Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-05-01 14:53 -0700
Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-01 19:23 -0400
Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-05-02 09:02 -0700
Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-05 23:46 -0400
Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-05-01 14:47 -0700
Page 1 of 2 [1] 2 Next page →
| From | jl_post@hotmail.com |
|---|---|
| Date | 2011-04-26 09:58 -0700 |
| Subject | Please disprove this Double-Checked Locking "fix" |
| Message-ID | <78f3178b-efdc-4af5-8f84-7ff6fa995af7@e25g2000prf.googlegroups.com> |
Hi,
Recently I've been reading up on "Double-Checked Locking" in C++
and how it's often implemented imperfectly. The Article "C++ and the
Perils of Double-Checked Locking" by Scott Meyers and Andrei
Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
) provides a good overview of how it's usually done and why it's often
inadequate.
I won't go into details here, but the article states how this code:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
pInstance = new Singleton;
}
}
return pInstance;
}
and its variants aren't completely thread safe.
Now, I've been thinking about how to make the code thread-safe, and
a few days ago I came up with a couple of solutions. As I implemented
them, I realized that these variants of mine also had problems.
One of the "solutions" I came up with was this:
// Assign the singleton object to a temp pointer:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}
Now, this approach isn't new (and is in fact covered in Meyer's and
Alexandrescu's article). The immediate problem is that the compiler
can optimize away the temporary variable, so there's no guarantee that
pInstance won't get the memory before it is properly initialized.
So I thought about making sure pInstance was unset in the
constructor, like this:
Singleton::Singleton() {
// initializing member data
assert( pInstance == 0 );
}
This way, the assert() statement ensures that pInstance will stay NULL
until the constructor is finished.
BUT! I realized that this solution was flawed, because the
compiler can inline the constructor into the ::instance() method, like
this:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = operator new(sizeof(Singleton)); //
initialize to temp
new (temp) Singleton;
// initializing member data
assert( pInstance == 0 );
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}
and further rearrange the lines like this:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = operator new(sizeof(Singleton)); //
initialize to temp
new (temp) Singleton;
assert( pInstance == 0 );
pInstance = temp; // assign temp to pInstance
// initializing member data
}
}
return pInstance;
}
(The compiler is allowed to do this, I think, because the compiler can
reorder code as long as it's not detectable in a single-threaded
program.)
So I realized pInstance still has the possibility of being assigned
memory before it is properly initialized.
Then I thought: What if I included a second lock to make sure that
the constuctor is finished before pInstance is set? So I tried this:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
}
return pInstance;
}
Singleton::Singleton() {
secondLock.lock();
// initializing member data
secondLock.unlock();
}
But eventually I realized that the order of the locks is not
guaranteed, that pInstance could still get assigned before the
constructor gets entered!
So I shot down my own two solutions. However, I kept thinking of
variants, and I started to wonder: What if I combined the two
variants? In other words, I'd take advantage of both assert() and a
second lock, like this:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
}
return pInstance;
}
Singleton::Singleton() {
secondLock.lock();
// initializing member data
assert( pInstance == 0 );
secondLock.unlock();
}
In this way, the second lock guarantees that pInstance is not assigned
to inside the constructor code, and the assert() statement ensures
that the constructor code executes before the code that assigns to
pInstance.
This looks good as far as I can tell, but I'm thinking it's too
good to be true.
Since I'm a bit skeptical about this last solution, could someone
poke holes in it? (I'm eager to see if I really did find a solid
solution, or if it's just another pipe dream.)
Thanks!
[toc] | [next] | [standalone]
| From | Leigh Johnston <leigh@i42.co.uk> |
|---|---|
| Date | 2011-04-26 18:17 +0100 |
| Message-ID | <0tGdnY61aIy7YyvQnZ2dnUVZ7o-dnZ2d@giganews.com> |
| In reply to | #4373 |
On 26/04/2011 17:58, jl_post@hotmail.com wrote:
> Hi,
>
> Recently I've been reading up on "Double-Checked Locking" in C++
> and how it's often implemented imperfectly. The Article "C++ and the
> Perils of Double-Checked Locking" by Scott Meyers and Andrei
> Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> ) provides a good overview of how it's usually done and why it's often
> inadequate.
>
> I won't go into details here, but the article states how this code:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> pInstance = new Singleton;
> }
> }
> return pInstance;
> }
>
> and its variants aren't completely thread safe.
>
> Now, I've been thinking about how to make the code thread-safe, and
> a few days ago I came up with a couple of solutions. As I implemented
> them, I realized that these variants of mine also had problems.
>
> One of the "solutions" I came up with was this:
>
> // Assign the singleton object to a temp pointer:
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> Singleton* temp = new Singleton; // initialize to temp
> pInstance = temp; // assign temp to pInstance
> }
> }
> return pInstance;
> }
>
> Now, this approach isn't new (and is in fact covered in Meyer's and
> Alexandrescu's article). The immediate problem is that the compiler
> can optimize away the temporary variable, so there's no guarantee that
> pInstance won't get the memory before it is properly initialized.
>
> So I thought about making sure pInstance was unset in the
> constructor, like this:
>
> Singleton::Singleton() {
> // initializing member data
> assert( pInstance == 0 );
> }
>
> This way, the assert() statement ensures that pInstance will stay NULL
> until the constructor is finished.
>
> BUT! I realized that this solution was flawed, because the
> compiler can inline the constructor into the ::instance() method, like
> this:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> Singleton* temp = operator new(sizeof(Singleton)); //
> initialize to temp
> new (temp) Singleton;
> // initializing member data
> assert( pInstance == 0 );
> pInstance = temp; // assign temp to pInstance
> }
> }
> return pInstance;
> }
>
> and further rearrange the lines like this:
>
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> Singleton* temp = operator new(sizeof(Singleton)); //
> initialize to temp
> new (temp) Singleton;
> assert( pInstance == 0 );
> pInstance = temp; // assign temp to pInstance
> // initializing member data
> }
> }
> return pInstance;
> }
>
> (The compiler is allowed to do this, I think, because the compiler can
> reorder code as long as it's not detectable in a single-threaded
> program.)
>
> So I realized pInstance still has the possibility of being assigned
> memory before it is properly initialized.
>
> Then I thought: What if I included a second lock to make sure that
> the constuctor is finished before pInstance is set? So I tried this:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> Singleton* temp = new Singleton; // initialize to temp
> secondLock.lock();
> pInstance = temp; // assign temp to pInstance
> secondLock.unlock();
> }
> }
> }
> return pInstance;
> }
>
> Singleton::Singleton() {
> secondLock.lock();
> // initializing member data
> secondLock.unlock();
> }
>
> But eventually I realized that the order of the locks is not
> guaranteed, that pInstance could still get assigned before the
> constructor gets entered!
>
> So I shot down my own two solutions. However, I kept thinking of
> variants, and I started to wonder: What if I combined the two
> variants? In other words, I'd take advantage of both assert() and a
> second lock, like this:
>
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> Singleton* temp = new Singleton; // initialize to temp
> secondLock.lock();
> pInstance = temp; // assign temp to pInstance
> secondLock.unlock();
> }
> }
> }
> return pInstance;
> }
>
> Singleton::Singleton() {
> secondLock.lock();
> // initializing member data
> assert( pInstance == 0 );
> secondLock.unlock();
> }
>
> In this way, the second lock guarantees that pInstance is not assigned
> to inside the constructor code, and the assert() statement ensures
> that the constructor code executes before the code that assigns to
> pInstance.
>
> This looks good as far as I can tell, but I'm thinking it's too
> good to be true.
>
> Since I'm a bit skeptical about this last solution, could someone
> poke holes in it? (I'm eager to see if I really did find a solid
> solution, or if it's just another pipe dream.)
>
A general solution to this problem requires the use of memory barriers
as that article by Meyers and Alexandrescu intimates; you need to
gaurantee that the write to the pInstance pointer occurs after the
initialization writes of the object it points to and I don't see how
your final "solution" achieves this.
Here is my singleton template FWIW:
template <typename T>
class singleton
{
public:
static T& instance()
{
T* ret = sInstancePtr;
memory_barrier_acquire_dependant();
if (ret == 0)
{
lock theLock(sLock);
static T sInstance;
memory_barrier_release();
sInstancePtr = &sInstance;
ret = sInstancePtr;
}
return *ret;
}
private:
static lockable sLock;
static T* sInstancePtr;
};
template <typename T>
lockable singleton<T>::sLock;
template <typename T>
T* singleton<T>::sInstancePtr;
/Leigh
[toc] | [prev] | [next] | [standalone]
| From | Pete Becker <pete@versatilecoding.com> |
|---|---|
| Date | 2011-04-26 13:50 -0400 |
| Message-ID | <2011042613500411162-pete@versatilecodingcom> |
| In reply to | #4373 |
On 2011-04-26 12:58:34 -0400, jl_post@hotmail.com said:
> Hi,
>
> Recently I've been reading up on "Double-Checked Locking" in C++
> and how it's often implemented imperfectly. The Article "C++ and the
> Perils of Double-Checked Locking" by Scott Meyers and Andrei
> Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> ) provides a good overview of how it's usually done and why it's often
> inadequate.
>
> I won't go into details here, but the article states how this code:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> pInstance = new Singleton;
> }
> }
> return pInstance;
> }
>
> and its variants aren't completely thread safe.
>
As Leigh said, the issue is not so much compiler optimizations (as
detailed in the snipped portion of this message) as memory visibility.
There are systems where writes that occur in one order in the compiled
code can become visible to other threads in a different order. (read
the previous sentence several times; it's counterintiutive, but true)
So even if you can persuade the compiler to generate exactly the code
that you want, a different thread may see a non-zero value of pinstance
before it sees the changes made to the thing that the new value of
pinstance points to. The C++0x solution is:
#include <atomic>
std::atomic<Singleton*> pinstance;
if (pinstance == 0) {
Lock lock;
if (pinstance == 0)
pinstance = new Singleton;
}
return pinstance;
Making pinstance an atomic variable ensures that all writes that
"happen before" the assignment to pinstance in one thread are seen
before the result of that assignment is seen in any other thread. In
essence, it provides a portable wrapper around the memory barrier
operations in Leigh's suggested solution. (Yes, if you care, his
solution uses acquire/release semantics, and the solution I've given
enforces sequential consistency, which is a stronger guarantee; if you
need to, you can make the atomic variable use acquire/release)
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)
[toc] | [prev] | [next] | [standalone]
| From | Scott Meyers <NeverRead@aristeia.com> |
|---|---|
| Date | 2011-05-01 17:14 -0700 |
| Message-ID | <ipkt23$2sj$1@news.albasani.net> |
| In reply to | #4376 |
On 4/26/2011 10:50 AM, Pete Becker wrote:
> pinstance points to. The C++0x solution is:
>
> #include <atomic>
> std::atomic<Singleton*> pinstance;
>
> if (pinstance == 0) {
> Lock lock;
> if (pinstance == 0)
> pinstance = new Singleton;
> }
> return pinstance;
>
> Making pinstance an atomic variable ensures that all writes that "happen
> before" the assignment to pinstance in one thread are seen before the
> result of that assignment is seen in any other thread.
True, but that's not enough to make this code work. You also need to
know that assignment to pinstance can't take place until the Singleton
constructor has run to completion. You have that guarantee in C++0x,
too, so the code is correct, but it's not just the use of atomics that
gives you that correctness. You also need the additional guarantees
that C++0x offers regarding the behavior of new expressions (compared to
the weaker guarantees that C++03 gives you).
Scott
--
* C++ and Beyond: Meyers, Sutter, & Alexandrescu, Aug 7-10 in Banff
(http://cppandbeyond.com/)
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-05-02 15:59 -0700 |
| Message-ID | <428ddf1c-7074-4117-ae13-6d427cbe2ec4@34g2000pru.googlegroups.com> |
| In reply to | #4566 |
On May 1, 5:14 pm, Scott Meyers <NeverR...@aristeia.com> wrote:
> On 4/26/2011 10:50 AM, Pete Becker wrote:
>
> > pinstance points to. The C++0x solution is:
>
> > #include <atomic>
> > std::atomic<Singleton*> pinstance;
>
> > if (pinstance == 0) {
> > Lock lock;
> > if (pinstance == 0)
> > pinstance = new Singleton;
> > }
> > return pinstance;
>
> > Making pinstance an atomic variable ensures that all writes that "happen
> > before" the assignment to pinstance in one thread are seen before the
> > result of that assignment is seen in any other thread.
>
> True, but that's not enough to make this code work. You also need to
> know that assignment to pinstance can't take place until the Singleton
> constructor has run to completion. You have that guarantee in C++0x,
> too, so the code is correct, but it's not just the use of atomics that
> gives you that correctness. You also need the additional guarantees
> that C++0x offers regarding the behavior of new expressions (compared to
> the weaker guarantees that C++03 gives you).
I don't think I'd describe it that way.
Consider:
pinstance = new Singleton;
For a simple pointer in C++03, there are no threading guarantees
given, so of course it can implement it however it wants, which is
apparently frequently enough something like the following pseudo-code
pinstance = malloc(sizeof(Singleton));
new(pinstance) Singleton();
Even under POSIX, to detect the difference between the above and the
following pseudo-code
temp = malloc(sizeof(Singleton));
new(temp) Singleton();
pinstance = temp;
would require code that has a race condition, which means undefined
behavior, which means that the compiler can do whatever it wants.
In C++0x, the situation is unchanged for a simple pointer. There are
no guarantees specific to "new expressions" which change this. The
compiler is allowed to implement it in exactly the same way. If you
have some code that could detect this, then you still have a race
condition, and thus undefined behavior, and thus all bets are off, so
the compiler free to do as it wants.
What does change is when we throw std::atomic into the mix. When
pinstance is of type std::atomic, the following:
pinstance = new Singleton;
is equivalent to the following ala operator overloading pseudo-code
(forgive me for not knowing the specific function name offhand):
pinstance.set(new Singleton);
And C++0x does have very specific guarantees concerning std::atomic.
It doesn't matter that the argument is a new expression. It could be
any sort of expression or function call. There is no guarantee
specific to new expressions. There are specific guarantees relating to
std::atomic::set and std::atomic::get, and there are guarantees that
describe all expressions, but not new expressions in particular.
Perhaps you were talking about the guarantees specific to function
local static variables? C++0x does guarantee that their initialization
is properly synchronized so that the first access will initialize it,
and all other threads will wait until it's done, and all other threads
will properly see the initialized function local static variable.
However, Pete Becker's code does not use a function local static, and
even then the guarantees about function local statics are not specific
to new expressions. You could initialize one with a normal function
call or some other expression which isn't a new expression, and the
same guarantees apply.
[toc] | [prev] | [next] | [standalone]
| From | Pete Becker <pete@versatilecoding.com> |
|---|---|
| Date | 2011-05-03 08:39 -0400 |
| Message-ID | <2011050308393971813-pete@versatilecodingcom> |
| In reply to | #4596 |
On 2011-05-02 18:59:49 -0400, Joshua Maurice said: > > What does change is when we throw std::atomic into the mix. When > pinstance is of type std::atomic, the following: > pinstance = new Singleton; > is equivalent to the following ala operator overloading pseudo-code > (forgive me for not knowing the specific function name offhand): > pinstance.set(new Singleton); Let me say that more strongly. std::atomic<Singleton*> is a template instantiation. It has an assignment operator that takes an argument of type Singleton*. So in this code: std::atomic<Singleton*> pinstance; pinstance = new Singleton; the assignment is implemented as a function call. All of the side effects of evaluating the argument to the function call happen before the call to the function, so the memory barrier that the assignment establishes ensures that all of the side effects will be visible to other threads before the assignment is visible. That guarantee, though, depends on both the definition of the atomic template and the new guarantees that compilers can't reorder instructions across atomic operations. -- Pete Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book)
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-04-26 11:16 -0700 |
| Message-ID | <d6774102-147b-4e3d-a457-a6bfd4c98a13@f15g2000pro.googlegroups.com> |
| In reply to | #4373 |
On Apr 26, 9:58 am, jl_p...@hotmail.com wrote:
> Hi,
>
> Recently I've been reading up on "Double-Checked Locking" in C++
> and how it's often implemented imperfectly. The Article "C++ and the
> Perils of Double-Checked Locking" by Scott Meyers and Andrei
> Alexandrescu (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> ) provides a good overview of how it's usually done and why it's often
> inadequate.
[Snipped discussion of double checked locking]
> Since I'm a bit skeptical about this last solution, could someone
> poke holes in it? (I'm eager to see if I really did find a solid
> solution, or if it's just another pipe dream.)
I'm sorry for being so gruff in the following (not really), but it's
evident that you have not even fully read the paper which you cited,
the only paper, and that is highly irritating.
Look, follow this, and understand just how royally screwed your
approach is. Given the initial conditions:
int x = 0;
int y = 0;
Thread 1 executing:
x = 1;
y = 2;
And threads 2-5 executing the following all concurrently with threads
1-5:
cout << x << ' ' << y << endl;
On non-obscure hardware and compilers, that can print /on a single
execution/, all 4 combinations:
00
02
10
12
Repeat, /on a single execution/. Different threads can see some writes
done by different threads in different orders! Thread 2 could see the
write to x without seeing the write to y, and thread 3 can see the
write to y without seeing the write to x.
Technically, the standard guarantees even worse - undefined behavior,
but the above is an example that /really happens/ on /real/ hardware
and /real/ compilers. This is largely due to hardware "reorderings",
but let me quote the paper for the relevant bit:
[quote]
Nothing you do can alter the fundamental problem: you need to be able
to specify a constraint on instruction ordering, and your language
gives you no way to do it.
[/quote]
To emphasize, it might be the compiler reordering it, it might be the
hardware reordering it, and it could even be some new kind of thing
which hasn't been invented yet! In practice the compiler writers and
hardware makers give guarantees for C++03 which mean that your code is
fundamentally broken. There is no standard or given guarantee of any
kind that any code written without proper synchronization will work.
None. Your code, give it to any compiler writer, and they will say
"won't work", which means while it might incidentally work today, but
tomorrow they might put in a new optimization (software or hardware),
and your code breaks. **This is the fundamental problem which you
cannot dodge!!**
Here's how you can break your code. Note again the general problem
that there are no guarantees that could even make it work, so I want
you to focus on the above basic problem, and do not spend too much
time on this, but I present it completeness. For example:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}
A sufficiently smart compiler is allowed to move things from before a
lock to after the lock, and from after an unlock to before a lock. It
can transform the above to:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
secondLock.lock();
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}
And once we get that, it's trivial to change it to:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
secondLock.lock();
pInstance = new Singleton;
secondLock.unlock();
}
}
return pInstance;
}
Which means we're back to screwed for the reasons known to you. I
didn't even need to resort to the wonderful DEC Alpha and its split
cache, but I could have.
You need to reread the paper which you cited. Here's the link again
for your benefit.
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
And pay attention this time.
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-04-26 11:19 -0700 |
| Message-ID | <26785e91-f1d2-43b2-b128-a521a190cfa6@r35g2000prj.googlegroups.com> |
| In reply to | #4378 |
On Apr 26, 11:16 am, Joshua Maurice <joshuamaur...@gmail.com> wrote: > A sufficiently smart compiler is allowed to move things from before a > lock to after the lock, and from after an unlock to before a lock. Ack, I meant: > A sufficiently smart compiler is allowed to move things from before a > lock to after the lock, and from after an unlock to before > **an unlock**. Hopefully it's clear from context. My mistake. My apologies.
[toc] | [prev] | [next] | [standalone]
| From | Pete Becker <pete@versatilecoding.com> |
|---|---|
| Date | 2011-04-26 14:30 -0400 |
| Message-ID | <2011042614300497308-pete@versatilecodingcom> |
| In reply to | #4378 |
On 2011-04-26 14:16:33 -0400, Joshua Maurice said: > This is largely due to hardware "reorderings", > but let me quote the paper for the relevant bit: > > [quote] > Nothing you do can alter the fundamental problem: you need to be able > to specify a constraint on instruction ordering, and your language > gives you no way to do it. > [/quote] > > To emphasize, it might be the compiler reordering it, it might be the > hardware reordering it, Let me underscore the problem, as the quotation above misses part of the issue. Even when the instructions are executed in exactly the order that you want them to be executed, different threads can see results in a different order from the order in which the thread doing the stores actually did them. When data is shared between threads and at least one thread is writing that data, all accesses to that data must be synchronized. C++0x says that the behavior of a program that does not synchronize such accesses is undefined. -- Pete Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book)
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-04-26 11:50 -0700 |
| Message-ID | <8a01f4c1-d9de-40fa-b8d7-acb20e50847b@z27g2000prz.googlegroups.com> |
| In reply to | #4380 |
On Apr 26, 11:30 am, Pete Becker <p...@versatilecoding.com> wrote: > On 2011-04-26 14:16:33 -0400, Joshua Maurice said: > > > This is largely due to hardware "reorderings", > > but let me quote the paper for the relevant bit: > > > [quote] > > Nothing you do can alter the fundamental problem: you need to be able > > to specify a constraint on instruction ordering, and your language > > gives you no way to do it. > > [/quote] > > > To emphasize, it might be the compiler reordering it, it might be the > > hardware reordering it, > > Let me underscore the problem, as the quotation above misses part of > the issue. Even when the instructions are executed in exactly the order > that you want them to be executed, different threads can see results in > a different order from the order in which the thread doing the stores > actually did them. > > When data is shared between threads and at least one thread is writing > that data, all accesses to that data must be synchronized. > > C++0x says that the behavior of a program that does not synchronize > such accesses is undefined. I fully agree. I also stated that /very explicitly/ just before the snipped quote with my 1+4 thread example. I did intend to clearly state exactly that. At least I think I did. For future reference, was this unclear? Or did you merely mean to emphasize this point? That is a good point to emphasize. I'm just a little confused.
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-05-11 19:55 -0700 |
| Message-ID | <5b81d426-bc25-46b2-b97d-0ea29355dcb1@z13g2000prk.googlegroups.com> |
| In reply to | #4378 |
On Apr 26, 11:16 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> Here's how you can break your code. Note again the general problem
> that there are no guarantees that could even make it work, so I want
> you to focus on the above basic problem, and do not spend too much
> time on this, but I present it completeness. For example:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> Singleton* temp = new Singleton; // initialize to temp
> secondLock.lock();
> pInstance = temp; // assign temp to pInstance
> secondLock.unlock();
> }
> }
> return pInstance;
> }
>
> A sufficiently smart compiler is allowed to move things from before a
> lock to after the lock, and from after an unlock to before a lock. It
> can transform the above to:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> secondLock.lock();
> Singleton* temp = new Singleton; // initialize to temp
> pInstance = temp; // assign temp to pInstance
> secondLock.unlock();
> }
> }
> return pInstance;
> }
>
> And once we get that, it's trivial to change it to:
>
> Singleton* Singleton::instance() {
> if (pInstance == 0) {
> Lock lock;
> if (pInstance == 0) {
> secondLock.lock();
> pInstance = new Singleton;
> secondLock.unlock();
> }
> }
> return pInstance;
> }
>
> Which means we're back to screwed for the reasons known to you. I
> didn't even need to resort to the wonderful DEC Alpha and its split
> cache, but I could have.
>
> You need to reread the paper which you cited. Here's the link again
> for your benefit.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> And pay attention this time.
I was looking over this while searching for interview questions, and I
realized I made a mistake. You cannot make the first reordering so
easily due to the lock nonsense inside the constructor. However, a
sufficiently smart compiler could notice your clever ruse, optimize
away the assert as always true, see a lock and unlock pair guarding
nothing, optimize that away, and then move the assignment to temp past
the mutex acquire, as demonstrated above.
Of course, as I tried to emphasize, and as I will emphasize again now,
that's not the only kind of thing which can break it. Cache coherency
problems can lead to quite bizarre situations, especially with a
processor with a weak memory model. Of course, the most important
detail is that you are not given guarantees that it will work. Thus
implementers, hardware and compiler and linker, and anything new which
hasn't been invented yet, are free to break your code, because it has
undefined behavior, because your program breaks the rules.
[toc] | [prev] | [next] | [standalone]
| From | Gerhard Fiedler <gelists@gmail.com> |
|---|---|
| Date | 2011-05-13 19:56 -0300 |
| Message-ID | <1bzq6ymjbt91c$.dlg@gelists.gmail.com> |
| In reply to | #4901 |
Joshua Maurice wrote: > However, a sufficiently smart compiler could notice your clever ruse, > optimize away the assert as always true, see a lock and unlock pair > guarding nothing, optimize that away, and then move the assignment to > temp past the mutex acquire, as demonstrated above. Regarding the compiler optimizing away a lock/unlock pair guarding "nothing": AIUI both lock and unlock need to provide certain fences. Therefore, again AIUI, they can't be optimized away by the compiler even if there's nothing in between, because that would remove the fences and alter the behavior. Am I missing something here? Gerhard
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-05-13 16:59 -0700 |
| Message-ID | <4676ab04-fd5a-4c87-acea-63a951daf1b4@r35g2000prj.googlegroups.com> |
| In reply to | #4951 |
On May 13, 3:56 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
> Joshua Maurice wrote:
> > However, a sufficiently smart compiler could notice your clever ruse,
> > optimize away the assert as always true, see a lock and unlock pair
> > guarding nothing, optimize that away, and then move the assignment to
> > temp past the mutex acquire, as demonstrated above.
>
> Regarding the compiler optimizing away a lock/unlock pair guarding
> "nothing": AIUI both lock and unlock need to provide certain fences.
> Therefore, again AIUI, they can't be optimized away by the compiler even
> if there's nothing in between, because that would remove the fences and
> alter the behavior.
>
> Am I missing something here?
That may be how they're commonly implemented, but that's not the
guaranteed semantics. Two different mutexes may as a matter of fact on
a given implementation give "happens-before" effects between the two
different mutexes, but there's nothing guaranteed about it.
I was ambiguously describing the situation, possibly to the extent of
being wrong. Allow me to correct myself.
Remember the code:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}
Singleton::Singleton() {
secondLock.lock();
assert( pInstance == 0 );
secondLock.unlock();
}
The compiler can expand inline as follows (pseudo-code):
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = (Singleton*) ::operator
new(sizeof(Singleton));
secondLock.lock();
assert( pInstance == 0 );
secondLock.unlock();
secondLock.lock();
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}
With that, it sees:
someMutex.lock();
<blah1>
someMutex.unlock();
<blah2>
someMutex.lock();
<blah3>
someMutex.unlock();
The compiler sees a lock, unlock, lock, unlock, in straightline code,
without branching (or exceptions, or volatile (to keep signal handlers
correct)). The compiler is totally free to replace that with:
someMutex.lock();
<blah1>
<blah2>
<blah3>
someMutex.unlock();
It may be bad from a QoI to do that. It depends. It depends heavily on
the situation. I see it as quite reasonable that the compiler could
employ some heuristics to deduce when it's a savings to combine
critical sections. In this case, <blah1> and <blah2> are actually
empty, no-ops, so it's always a savings to combine the two adjacent
critical sections as such.
So, it can transform it to:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = (Singleton*) ::operator
new(sizeof(Singleton));
secondLock.lock();
assert( pInstance == 0 );
pInstance = temp; // assign temp to pInstance
secondLock.unlock();
}
}
return pInstance;
}
With some simple data flow analysis, and allowed motions past locks,
it can transform it to:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
secondLock.lock();
pInstance = (Singleton*) ::operator new(sizeof(Singleton));
secondLock.unlock();
}
}
return pInstance;
}
----
Now, back to my original much more controversial statement - can a
compiler simply remove a lock unlock pair? Ex:
mutex.lock();
mutex.unlock();
Maybe. I mentioned "clever ruse" with whole program optimization in
mind. (However, upon thinking about it, I just showed that you don't
even need whole program optimization.) Without whole program
optimization, I think no. Could someone please more educated weigh
in?
Thus far, after 10 minutes of attempts just now to write a conforming
race-free program where you could tell the difference if a compiler
simply removed an "empty" mutex acquire release pair, the only
programs I can find are ones that would deadlock before the change,
and not deadlock after the change. A deadlock is observable behavior,
so a compiler cannot remove it for that reason.
[toc] | [prev] | [next] | [standalone]
| From | Gerhard Fiedler <gelists@gmail.com> |
|---|---|
| Date | 2011-05-18 18:12 -0300 |
| Message-ID | <qqe5nc4lm7rr.dlg@gelists.gmail.com> |
| In reply to | #4952 |
Joshua Maurice wrote: > On May 13, 3:56 pm, Gerhard Fiedler <geli...@gmail.com> wrote: >> Joshua Maurice wrote: >>> However, a sufficiently smart compiler could notice your clever ruse, >>> optimize away the assert as always true, see a lock and unlock pair >>> guarding nothing, optimize that away, and then move the assignment to >>> temp past the mutex acquire, as demonstrated above. >> >> Regarding the compiler optimizing away a lock/unlock pair guarding >> "nothing": AIUI both lock and unlock need to provide certain fences. >> Therefore, again AIUI, they can't be optimized away by the compiler even >> if there's nothing in between, because that would remove the fences and >> alter the behavior. >> >> Am I missing something here? > > That may be how they're commonly implemented, but that's not the > guaranteed semantics. Two different mutexes may as a matter of fact on > a given implementation give "happens-before" effects between the two > different mutexes, but there's nothing guaranteed about it. > > ... > > With that, it sees: > someMutex.lock(); > <blah1> > someMutex.unlock(); > <blah2> > someMutex.lock(); > <blah3> > someMutex.unlock(); > > The compiler sees a lock, unlock, lock, unlock, in straightline code, > without branching (or exceptions, or volatile (to keep signal handlers > correct)). The compiler is totally free to replace that with: > someMutex.lock(); > <blah1> > <blah2> > <blah3> > someMutex.unlock(); This is exactly what I don't understand -- I guess I have to look deeper into the different mutex/lock implementations and their guarantees. I was under the impression that through the lock/unlock pairs, it is guaranteed that <blah1> happens before <blah2>, and that <blah3> happens after it. > Now, back to my original much more controversial statement - can a > compiler simply remove a lock unlock pair? Ex: > mutex.lock(); > mutex.unlock(); > Maybe. I mentioned "clever ruse" with whole program optimization in > mind. (However, upon thinking about it, I just showed that you don't > even need whole program optimization.) Without whole program > optimization, I think no. Could someone please more educated weigh > in? > > Thus far, after 10 minutes of attempts just now to write a conforming > race-free program where you could tell the difference if a compiler > simply removed an "empty" mutex acquire release pair, the only > programs I can find are ones that would deadlock before the change, > and not deadlock after the change. A deadlock is observable behavior, > so a compiler cannot remove it for that reason. Which would mean that if the compiler wants to remove it, it has to be able to prove that it can't deadlock. This would be a useful diagnostic output :) Thanks, Gerhard
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-05-18 14:53 -0700 |
| Message-ID | <97ebe8ae-0731-4947-9084-7b62d360e65a@z15g2000prn.googlegroups.com> |
| In reply to | #5144 |
On May 18, 2:12 pm, Gerhard Fiedler <geli...@gmail.com> wrote: > Joshua Maurice wrote: > > With that, it sees: > > someMutex.lock(); > > <blah1> > > someMutex.unlock(); > > <blah2> > > someMutex.lock(); > > <blah3> > > someMutex.unlock(); > > > The compiler sees a lock, unlock, lock, unlock, in straightline code, > > without branching (or exceptions, or volatile (to keep signal handlers > > correct)). The compiler is totally free to replace that with: > > someMutex.lock(); > > <blah1> > > <blah2> > > <blah3> > > someMutex.unlock(); > > This is exactly what I don't understand -- I guess I have to look deeper > into the different mutex/lock implementations and their guarantees. I > was under the impression that through the lock/unlock pairs, it is > guaranteed that <blah1> happens before <blah2>, and that <blah3> happens > after it. Unless I'm totally off, this is a quite straightforward transformation. It may be bad from a QoI perspective depending on the particulars of the code, and it may be a bad optimization depending on the particulars, but it is definitely allowed by the C++0x memory model (and Java 1.5+ memory model) AFAIK. You said that "I was under the impression that [...] it is guaranteed that <blah1> happens before <blah2>, [...]". Why is that? More particularly, what does that mean? Let me try to explain. Taking a step back, consider: int a = 1; int b = 2; global_c = a + b; global_d = a - b; We have no guarantees that the assignment to global_c happens before global_d. Why? This is equivalent to the phrasing that no conforming program could tell the difference if in fact the compiler "switched" the two. It's the basic "as-if" rule. Going back to: someMutex.lock(); <blah1> someMutex.unlock(); <blah2> someMutex.lock(); <blah3> someMutex.unlock(); Using the "as-if" rule, the compiler is totally free to transform that to the following, under some rather specific rules. someMutex.lock(); <blah1> <blah2> <blah3> someMutex.unlock(); It can only do so under some very specific situations. The compiler has to be able to guarantee that no control flow path goes out of that block except the normal flow past the end. (Also volatile and signal handlers might be annoying, so let's ignore that for now.) Now, think of a program that could possibly tell the difference between those two code fragments? AFAIK, you could not. Because of the vagueries of scheduling, no conforming program would have its behavior changed in a way that would produce results not allowed by the memory model, and thus the compiler is entirely free to do that under the "as-if" rule. To put that another way, the difference between the two programs is that another thread might acquire the mutex when the first thread is in <blah2>, and thus see <blah1> without being guaranteed to see <blah2> nor <blah3>. That is, it is allowed for the compiler and scheduler to do so. It is not required. It is perfectly allowed for a "malicious" scheduler and implementation to guarantee that it will / never/ allow another thread to acquire that mutex, forcing it to wait, while a thread is in <blah2>. Your conforming program could not tell the difference because you're not guaranteed that some other thread will be scheduled at just the right time to acquire the mutex when the first thread is in <blah2>. As I said earlier, there are somewhat severe restrictions on when a compiler could do this. If exceptions can be thrown, or early returns, and such, the compiler has to be very careful to guarantee the same "as-if" behavior. Also, gotos or loops could mess this up, especially badly in light of QoI. IIRC, there's some fluff in the C++0x standard that things must become visible to other threads "eventually", or even "in a reasonable amount of time", and probably (?) something similar along the lines that a thread waiting on a mutex will "eventually" acquire it, or even "in a reasonable amount of time". If you combine those two sections like that, you might have just broken the "eventually" or "in a reasonable amount of time" guarantees. Thus, as I mentioned earlier, as <blah1> and <blah2> were empty in the earlier example, none of that can happen - it's safe to combine the critical sections. And it's a clear win to combine the empty critical section with the non-empty one. After that, the compiler can apply another variation of the "as-if" rule to move the assignment to the local variable from outside the critical section to inside the critical section. Finally, it's free to use the normal single-thread rules to remove the unnecessarily local variable.
[toc] | [prev] | [next] | [standalone]
| From | Gerhard Fiedler <gelists@gmail.com> |
|---|---|
| Date | 2011-05-19 13:46 -0300 |
| Message-ID | <1lge2hm64clye$.dlg@gelists.gmail.com> |
| In reply to | #5148 |
Joshua Maurice wrote: > On May 18, 2:12 pm, Gerhard Fiedler <geli...@gmail.com> wrote: >> Joshua Maurice wrote: >>> With that, it sees: >>> someMutex.lock(); >>> <blah1> >>> someMutex.unlock(); >>> <blah2> >>> someMutex.lock(); >>> <blah3> >>> someMutex.unlock(); >> >>> The compiler sees a lock, unlock, lock, unlock, in straightline code, >>> without branching (or exceptions, or volatile (to keep signal handlers >>> correct)). The compiler is totally free to replace that with: >>> someMutex.lock(); >>> <blah1> >>> <blah2> >>> <blah3> >>> someMutex.unlock(); >> >> This is exactly what I don't understand -- I guess I have to look >> deeper into the different mutex/lock implementations and their >> guarantees. I was under the impression that through the lock/unlock >> pairs, it is guaranteed that <blah1> happens before <blah2>, and >> that <blah3> happens after it. > > Unless I'm totally off, this is a quite straightforward > transformation. It may be bad from a QoI perspective depending on the > particulars of the code, and it may be a bad optimization depending > on the particulars, but it is definitely allowed by the C++0x memory > model (and Java 1.5+ memory model) AFAIK. FWIW, I was thinking of C++03 with pthread (GCC) or Windows (VC++) threading models. > You said that "I was under the impression that [...] it is guaranteed > that <blah1> happens before <blah2>, [...]". Why is that? More > particularly, what does that mean? > > Let me try to explain. Taking a step back, consider: > int a = 1; > int b = 2; > global_c = a + b; > global_d = a - b; > > We have no guarantees that the assignment to global_c happens before > global_d. Why? This is equivalent to the phrasing that no conforming > program could tell the difference if in fact the compiler "switched" > the two. It's the basic "as-if" rule. Understood and agreed -- without locks. I was under the impression that locks change this, by adding "observable" points to the program. > Going back to: > someMutex.lock(); > <blah1> > someMutex.unlock(); > <blah2> > someMutex.lock(); > <blah3> > someMutex.unlock(); > > Using the "as-if" rule, the compiler is totally free to transform that > to the following, under some rather specific rules. > someMutex.lock(); > <blah1> > <blah2> > <blah3> > someMutex.unlock(); > > It can only do so under some very specific situations. The compiler > has to be able to guarantee that no control flow path goes out of that > block except the normal flow past the end. (Also volatile and signal > handlers might be annoying, so let's ignore that for now.) Now, think > of a program that could possibly tell the difference between those two > code fragments? AFAIK, you could not. How about the typical situation: short-running, locked synchronizing code with long-running, unlocked working code? blah1 (retrieving tasks out of a shared queue) and blah3 (placing results into a shared queue) each a few milliseconds, blah2 (calculating the results) a few hours or days, and a few threads doing this in parallel. I'm sure I can tell the difference between the two versions. > Because of the vagueries of scheduling, no conforming program would > have its behavior changed in a way that would produce results not > allowed by the memory model, and thus the compiler is entirely free > to do that under the "as-if" rule. I understand that this may be a QoI issue, but then again, if I can't guarantee that the two days that blah2 runs are in fact unlocked, I really can't use this system for this task. QoI of the compiler here becomes QoI of the application, and if the compiler is free to do this, it doesn't allow me to guarantee my application spec -- which I assume to be a typical threaded application. > To put that another way, the difference between the two programs is > that another thread might acquire the mutex when the first thread is > in <blah2>, and thus see <blah1> without being guaranteed to see > <blah2> nor <blah3>. That is, it is allowed for the compiler and > scheduler to do so. It is not required. In my example, which is nothing strange, it is required, and to implement it, I need to use a compiler with a threading/memory model that guarantees me that blah2 runs unlocked, so that it can run in parallel. This of course if the scheduler allows it to do so -- but since the compiler doesn't control the scheduler, it can't really assume much about it and IMO needs to unlock after blah1, even though it doesn't know what the scheduler does with this. Whether it makes sense to unlock and relock around blah2 /I/ need to decide when writing the program, considering the particularities of the scheduler I'm intending to run it on. > IIRC, there's some fluff in the C++0x standard that things must become > visible to other threads "eventually", or even "in a reasonable > amount of time", and probably (?) something similar along the lines > that a thread waiting on a mutex will "eventually" acquire it, or > even "in a reasonable amount of time". If you combine those two > sections like that, you might have just broken the "eventually" or > "in a reasonable amount of time" guarantees. This doesn't sound good :) This relates to my earlier comment that I think I need to look into the guarantees that the individual implementations provide -- if this here reflects the C++03 guarantees, IMO they alone are not very useful in practice. To go back to the example, I need to have the guarantee that blah2 runs unlocked (within the overall scheduling granularity of the target system, of course). Thanks, Gerhard
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-05-19 15:30 -0700 |
| Message-ID | <d8b0fd3c-59b6-470d-bcce-0f07070c118a@34g2000pru.googlegroups.com> |
| In reply to | #5167 |
On May 19, 9:46 am, Gerhard Fiedler <geli...@gmail.com> wrote: > Joshua Maurice wrote: > > Unless I'm totally off, this is a quite straightforward > > transformation. It may be bad from a QoI perspective depending on the > > particulars of the code, and it may be a bad optimization depending > > on the particulars, but it is definitely allowed by the C++0x memory > > model (and Java 1.5+ memory model) AFAIK. > > FWIW, I was thinking of C++03 with pthread (GCC) or Windows (VC++) > threading models. For most intents and purposes, the memory models of C++0x, POSIX pthreads, win32, and Java 1.5 and higher, are the same. There's some important differences between them (like Java "final" keyword, and the Java "out of thin air" guarantee), but for the basics like visibility guarantees w.r.t. mutexes, it's the same all around. > > You said that "I was under the impression that [...] it is guaranteed > > that <blah1> happens before <blah2>, [...]". Why is that? More > > particularly, what does that mean? > > > Let me try to explain. Taking a step back, consider: > > int a = 1; > > int b = 2; > > global_c = a + b; > > global_d = a - b; > > > We have no guarantees that the assignment to global_c happens before > > global_d. Why? This is equivalent to the phrasing that no conforming > > program could tell the difference if in fact the compiler "switched" > > the two. It's the basic "as-if" rule. > > Understood and agreed -- without locks. I was under the impression that > locks change this, by adding "observable" points to the program. With multi-threading, the program no longer has exactly one possible allowed execution. (In practice, I would guess most single threaded programs have more than one allowed execution given things like implementation defined behavior, and various QoI and system specs like stack size, heap space, and so on.) With multi-threading, this changes. No longer are you guaranteed a particular execution. Instead, there is a set of allowed executions. The set of allowed executions is quite annoying and complex to define formally, but you are right that locks restrict this set. Some executions are not allowed because they violate some of the ordering rules of the memory model, such as the ordering rules required by locks. However, you are still left with a rather large set of allowed executions, and a conforming implementation is free to choose to do whichever it wants. > > Going back to: > > someMutex.lock(); > > <blah1> > > someMutex.unlock(); > > <blah2> > > someMutex.lock(); > > <blah3> > > someMutex.unlock(); > > > Using the "as-if" rule, the compiler is totally free to transform that > > to the following, under some rather specific rules. > > someMutex.lock(); > > <blah1> > > <blah2> > > <blah3> > > someMutex.unlock(); > > > It can only do so under some very specific situations. The compiler > > has to be able to guarantee that no control flow path goes out of that > > block except the normal flow past the end. (Also volatile and signal > > handlers might be annoying, so let's ignore that for now.) Now, think > > of a program that could possibly tell the difference between those two > > code fragments? AFAIK, you could not. > > How about the typical situation: short-running, locked synchronizing > code with long-running, unlocked working code? blah1 (retrieving tasks > out of a shared queue) and blah3 (placing results into a shared queue) > each a few milliseconds, blah2 (calculating the results) a few hours or > days, and a few threads doing this in parallel. I'm sure I can tell the > difference between the two versions. I was using an idiomatic meaning of "tell the difference". Of course one could tell the difference in practice between if a compiler does tail-call recursion optimization and not, but a conforming program would not have its formal observable behavior changed. That's what I meant by "You couldn't tell the difference". I meant that a conforming program with or without tail call recursion would still have its formal observable behavior be one of the allowed behaviors. Similarly, if the compiler does something "bad" like combine critical sections when it shouldn't, you could tell the difference in practice, just like you could tell the difference in practice between tail-call recursion optimization and no tail-call recursion optimization. Whether the compiler does or does not do those things is QoI. The formal behavior with or without combining the critical sections is one of the allowed behaviors. > > Because of the vagueries of scheduling, no conforming program would > > have its behavior changed in a way that would produce results not > > allowed by the memory model, and thus the compiler is entirely free > > to do that under the "as-if" rule. > > I understand that this may be a QoI issue, but then again, if I can't > guarantee that the two days that blah2 runs are in fact unlocked, I > really can't use this system for this task. Then you have a QoI issue. By the same token, you would have QoI issues if the compiler otherwise did something else stupid. However, combining or not combining those sections results in a formally allowed observable behavior, and thus the compiler is allowed to do it. > QoI of the compiler here > becomes QoI of the application, and if the compiler is free to do this, > it doesn't allow me to guarantee my application spec -- which I assume > to be a typical threaded application. Then harass your compiler writer to fix it, or the scheduler writer to fix it, or use a different compiler or scheduler. There are so many ways that the implementation can be malicious. This is just one of them. > > To put that another way, the difference between the two programs is > > that another thread might acquire the mutex when the first thread is > > in <blah2>, and thus see <blah1> without being guaranteed to see > > <blah2> nor <blah3>. That is, it is allowed for the compiler and > > scheduler to do so. It is not required. > > In my example, which is nothing strange, it is required, and to > implement it, I need to use a compiler with a threading/memory model > that guarantees me that blah2 runs unlocked, so that it can run in > parallel. You also need a compiler that properly does data flow analysis optimizations, constant folding, constant propagation, common sub- expression elimination, and so on. You also need a compiler that doesn't simply produce a maliciously convoluted object file. It's basically impossible for any general purpose portable memory model to do as you want. How could you possibly specify that to work with different kinds of scheduler, unknown schedulers that aren't even written yet? It depends too much on the vagaries of the scheduler and other such things. I wouldn't worry about it. At least, I would worry about it as much as I would worry about stack sizes in programs. Most of the time you don't need to. Sometimes you do, and you need to check the outputted assembly. Also, if you did do that, completely disallow combining nearby critical sections, then you just removed some optimization opportunities for IMHO no particularly good reason. I want the compiler to be able to optimize the OP's problem by combining an empty critical section with a nearby non-empty one. > This of course if the scheduler allows it to do so -- but > since the compiler doesn't control the scheduler, it can't really assume > much about it and IMO needs to unlock after blah1, even though it > doesn't know what the scheduler does with this. Whether it makes sense > to unlock and relock around blah2 /I/ need to decide when writing the > program, considering the particularities of the scheduler I'm intending > to run it on. Again, perhaps it's bad QoI, and the compiler ought to fix it, but it would result in an allowed formal observable behavior, an allowed execution, and thus the compiler would be free to do it under the "as if" rule. > > IIRC, there's some fluff in the C++0x standard that things must become > > visible to other threads "eventually", or even "in a reasonable > > amount of time", and probably (?) something similar along the lines > > that a thread waiting on a mutex will "eventually" acquire it, or > > even "in a reasonable amount of time". If you combine those two > > sections like that, you might have just broken the "eventually" or > > "in a reasonable amount of time" guarantees. > > This doesn't sound good :) This relates to my earlier comment that I > think I need to look into the guarantees that the individual > implementations provide -- if this here reflects the C++03 guarantees, > IMO they alone are not very useful in practice. C++03 doesn't have any relevant guarantees. AFAIK, this here reflects all of the usual memory models (C++0x, POSIX pthreads, Java 1.5+, win32). > To go back to the > example, I need to have the guarantee that blah2 runs unlocked (within > the overall scheduling granularity of the target system, of course). As I said earlier, good luck attempting to write a portable specification to that end, and you don't really "need" it. You need it as much as you need more than 10 bytes of stack space, and you need it as much as you need a compiler that won't maliciously pessimize your programs. If you need more than that, you need something more than what a portable standard can give you, and you need to check / write the assembly yourself or maybe use some realtime POSIX extensions.
[toc] | [prev] | [next] | [standalone]
| From | Gerhard Fiedler <gelists@gmail.com> |
|---|---|
| Date | 2011-05-21 11:55 -0300 |
| Message-ID | <q108ekvnu6e0.dlg@gelists.gmail.com> |
| In reply to | #5190 |
Joshua Maurice wrote: > On May 19, 9:46 am, Gerhard Fiedler <geli...@gmail.com> wrote: >> To go back to the example, I need to have the guarantee that blah2 >> runs unlocked (within the overall scheduling granularity of the >> target system, of course). > > As I said earlier, good luck attempting to write a portable > specification to that end, and you don't really "need" it. You need > it as much as you need more than 10 bytes of stack space, and you > need it as much as you need a compiler that won't maliciously > pessimize your programs. If you need more than that, you need > something more than what a portable standard can give you, and you > need to check / write the assembly yourself or maybe use some > realtime POSIX extensions. Check out this document <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.9908> about the pthread guarantees. I haven't yet read (and understood) all of it, but it seems to me that in proves that both lock and unlock require memory fences (albeit different ones), and that it is not allowed to reorder memory operations across locks (and only in a specific direction across unlocks). I interpret this such that a compiler (that claims to implement the pthread spec) is not allowed to remove a lock operation -- even if there's nothing between the lock and the following unlock, and even if it is to combine two lock/unlock pairs --, as this would remove a memory fence and allow operations to be reordered across a code position across which they weren't allowed to be reordered before removing the lock operation. Gerhard
[toc] | [prev] | [next] | [standalone]
| From | Joshua Maurice <joshuamaurice@gmail.com> |
|---|---|
| Date | 2011-05-22 01:08 -0700 |
| Message-ID | <b52e52b8-6388-4e02-8f1e-6afa919c372f@22g2000prx.googlegroups.com> |
| In reply to | #5272 |
On May 21, 7:55 am, Gerhard Fiedler <geli...@gmail.com> wrote: > Joshua Maurice wrote: > > On May 19, 9:46 am, Gerhard Fiedler <geli...@gmail.com> wrote: > >> To go back to the example, I need to have the guarantee that blah2 > >> runs unlocked (within the overall scheduling granularity of the > >> target system, of course). > > > As I said earlier, good luck attempting to write a portable > > specification to that end, and you don't really "need" it. You need > > it as much as you need more than 10 bytes of stack space, and you > > need it as much as you need a compiler that won't maliciously > > pessimize your programs. If you need more than that, you need > > something more than what a portable standard can give you, and you > > need to check / write the assembly yourself or maybe use some > > realtime POSIX extensions. > > Check out this document > <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.9908> about > the pthread guarantees. I haven't yet read (and understood) all of it, > but it seems to me that in proves that both lock and unlock require > memory fences (albeit different ones), and that it is not allowed to > reorder memory operations across locks (and only in a specific direction > across unlocks). > > I interpret this such that a compiler (that claims to implement the > pthread spec) is not allowed to remove a lock operation -- even if > there's nothing between the lock and the following unlock, and even if > it is to combine two lock/unlock pairs --, as this would remove a memory > fence and allow operations to be reordered across a code position across > which they weren't allowed to be reordered before removing the lock > operation. At first I was slightly concerned that I didn't know what was going on, but then I found this in the paper: [quote] Finally, we show that the rules for reordering memory op- erations across locking primitives are in fact different from what we believe most implementors would have expected. In particular, a memory operation immediately following a pthread mutex unlock operation may always be moved to just before the pthread mutex unlock. But the corresponding re- ordering of a pthread mutex lock with a preceding memory op- eration is not generally safe. More general movement into critical sections is possible in the absence of the pthread mutex trylock call. As we point out in section 3, many current implementations ei- ther do not follow these rules, or add extra fences. Nor is it com- pletely apparent that they should, since these semantics appear to have been largely accidental. On the other hand, an understanding of these issues does seem to be essential to any revision of the cur- rent rules. [/quote] Ah yes, I knew about this. Reading further confirms that the problem is the one I'm thinking of. It's basically a bug in the POSIX pthreads standard. The POSIX pthreads standard as written requires that a try_lock succeed if no one else is holding the lock. This has some rather interesting implications which the paper lays out. The implications are that locks require double the "usually required" fences, or other contortions such as the paper describes (specifically disallow code motion past a lock). Also, as the paper notes, almost all of the pthreads implementers overlooked this bug, and if you write a conforming program that abuses try_lock in interesting ways, it'll produce an unallowed execution on nearly all pthreads implementations (colloquially "break"). C++0x fixes this problem by allowing try_lock to spuriously fail, that is fail to acquire the lock even though no one else holds it. (AFAIK), it is not the intent that implementers will make try_lock spuriously fail, but programs must assume that it can spuriously fail. This restricts the set of formally race free programs to ones that allow the intended implementation of locks, and allow code motion past a lock. Thus, C++0x avoids this nasty little problem. Note that their "Theorem 6.6" proves that you can indeed move operations past a lock /if/ there's no (broken) trylocks. The problems only arise with (broken) trylocks, which C++0x fixes. ---- However, that is immaterial to the problem at hand. Even if you couldn't do the code transformations that I was discussing due to the particulars of the situation, you seem to have a more fundamental problem. The fundamental problem is you don't believe in the "as if" rule. C++ spells it out quite clearly with "C++03, 1.9 Program execution / 1". In a C++0x draft n3126, it's still "1.9 Program execution / 1", reproduced here: [quote] 1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below. (5) [/quote] Footnote 5: [quote] 5) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced. [/quote] Also, see C++0x draft n3126, "1.9 Program execution / 5": [quote] 5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation). [/quote] I am unable to find something as explicitly clear as that in the POSIX standards offhand, but I assume it's in there somewhere. At the very least, I can find plenty of occurrences of the phrase "as if" in the specs for the individual threading functions, such as in: http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html [quote] The implementation shall behave as if at all times there is at most one owner of any mutex. [/quote] Also, I'm pretty sure that's how all the C, C++, Java, etc., implementers interpret it. This is the general spirit of all of the C++-relevant memory models, including win32, pthreads, and C++0x. (This is also true for the Java memory model.) It's also true for C and C++ programs in all ways - not just threading. An implementation is free to run your program however it wants as long as it produces one of the allowed executions. It doesn't even need to be the same execution each time. For non-trivial threaded programs, it's highly likely that it will be a different execution each time. Again, I understand your concerns, but compilers, linkers, schedulers, hardware etc., will continue to optimize, and the rule that lets them optimize is the "as if" rule. They optimize to make your program run better. That's their job. Their job is to take your program, transform the assembly of it (or whatever code they work with) to make it better. Threads are no exception. Sometimes the optimization is broken and produces an unallowed execution, but that's different than what we're talking about here. We're talking about an optimization that produces an allowed execution, but the execution is undesirable or unacceptable to the programmer. Such is life writing real C and C++ programs. If your implementation ever does such a broken optimization, thus I suggest you complain to your implementers, switch implementations, or write the relevant parts in assembly (and hope that the problem is in the compiler, as the linker, scheduler, hardware, etc., can still get you).
[toc] | [prev] | [next] | [standalone]
| From | James Kanze <james.kanze@gmail.com> |
|---|---|
| Date | 2011-04-30 15:54 -0700 |
| Message-ID | <6b36bf8e-37b5-4e8c-bf34-a9711c3c4183@b19g2000yqg.googlegroups.com> |
| In reply to | #4373 |
On Apr 26, 5:58 pm, jl_p...@hotmail.com wrote:
> Recently I've been reading up on "Double-Checked Locking" in
> C++ and how it's often implemented imperfectly.
You mean, how it is impossible to implement in standard C++
(03).
> The Article "C++ and the Perils of Double-Checked Locking" by
> Scott Meyers and Andrei Alexandrescu
> (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)
> provides a good overview of how it's usually done and why it's
> often inadequate.
The standard solution for implementing a singleton works just
fine:
Singleton* Singleton::instance()
{
ScopedLock lock(mutex);
if ( pInstance == NULL )
pInstance = new Singleton;
return pInstance;
}
If the lock is a performance bottleneck (a case I've yet to
encounter), then you can simply insure that the singleton is
constructed before threads are started, and skip the lock.
Anything else is simply added complexity, for no reason.
--
James Kanze
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.c++
csiph-web