Groups > comp.lang.c++ > #4373 > unrolled thread

Please disprove this Double-Checked Locking "fix"

Started by	jl_post@hotmail.com
First post	2011-04-26 09:58 -0700
Last post	2011-05-01 14:47 -0700
Articles	20 on this page of 33 — 9 participants

Back to article view | Back to comp.lang.c++

  Please disprove this Double-Checked Locking "fix" jl_post@hotmail.com - 2011-04-26 09:58 -0700
    Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-04-26 18:17 +0100
    Re: Please disprove this Double-Checked Locking "fix" Pete Becker <pete@versatilecoding.com> - 2011-04-26 13:50 -0400
      Re: Please disprove this Double-Checked Locking "fix" Scott Meyers <NeverRead@aristeia.com> - 2011-05-01 17:14 -0700
        Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-02 15:59 -0700
          Re: Please disprove this Double-Checked Locking "fix" Pete Becker <pete@versatilecoding.com> - 2011-05-03 08:39 -0400
    Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-04-26 11:16 -0700
      Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-04-26 11:19 -0700
      Re: Please disprove this Double-Checked Locking "fix" Pete Becker <pete@versatilecoding.com> - 2011-04-26 14:30 -0400
        Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-04-26 11:50 -0700
      Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-11 19:55 -0700
        Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-13 19:56 -0300
          Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-13 16:59 -0700
            Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-18 18:12 -0300
              Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-18 14:53 -0700
                Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-19 13:46 -0300
                  Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-19 15:30 -0700
                    Re: Please disprove this Double-Checked Locking "fix" Gerhard Fiedler <gelists@gmail.com> - 2011-05-21 11:55 -0300
                      Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-22 01:08 -0700
    Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-04-30 15:54 -0700
      Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-05-01 21:49 +0100
        Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-01 17:26 -0400
          Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-05-01 22:44 +0100
            Re: Please disprove this Double-Checked Locking "fix" Leigh Johnston <leigh@i42.co.uk> - 2011-05-02 01:01 +0100
              Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-01 22:04 -0400
              Re: Please disprove this Double-Checked Locking "fix" "Chris M. Thomasson" <cristom@charter.net> - 2011-05-04 11:49 -0700
                Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-06 00:16 -0400
            Re: Please disprove this Double-Checked Locking "fix" Joshua Maurice <joshuamaurice@gmail.com> - 2011-05-02 15:43 -0700
          Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-05-01 14:53 -0700
            Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-01 19:23 -0400
              Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-05-02 09:02 -0700
                Re: Please disprove this Double-Checked Locking "fix" Pavel <pauldontspamtolk@removeyourself.dontspam.yahoo> - 2011-05-05 23:46 -0400
        Re: Please disprove this Double-Checked Locking "fix" James Kanze <james.kanze@gmail.com> - 2011-05-01 14:47 -0700

Page 1 of 2 [1] 2 Next page →

#4373 — Please disprove this Double-Checked Locking "fix"

From	jl_post@hotmail.com
Date	2011-04-26 09:58 -0700
Subject	Please disprove this Double-Checked Locking "fix"
Message-ID	<78f3178b-efdc-4af5-8f84-7ff6fa995af7@e25g2000prf.googlegroups.com>

Hi,

   Recently I've been reading up on "Double-Checked Locking" in C++
and how it's often implemented imperfectly.  The Article "C++ and the
Perils of Double-Checked Locking" by Scott Meyers and Andrei
Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
) provides a good overview of how it's usually done and why it's often
inadequate.

   I won't go into details here, but the article states how this code:

Singleton* Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      pInstance = new Singleton;
    }
  }
  return pInstance;
}

and its variants aren't completely thread safe.

   Now, I've been thinking about how to make the code thread-safe, and
a few days ago I came up with a couple of solutions.  As I implemented
them, I realized that these variants of mine also had problems.

   One of the "solutions" I came up with was this:

// Assign the singleton object to a temp pointer:
Singleton* Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      Singleton* temp = new Singleton;  // initialize to temp
      pInstance = temp;  // assign temp to pInstance
    }
  }
  return pInstance;
}

Now, this approach isn't new (and is in fact covered in Meyer's and
Alexandrescu's article).  The immediate problem is that the compiler
can optimize away the temporary variable, so there's no guarantee that
pInstance won't get the memory before it is properly initialized.

   So I thought about making sure pInstance was unset in the
constructor, like this:

Singleton::Singleton() {
  // initializing member data
  assert( pInstance == 0 );
}

This way, the assert() statement ensures that pInstance will stay NULL
until the constructor is finished.

   BUT!  I realized that this solution was flawed, because the
compiler can inline the constructor into the ::instance() method, like
this:

Singleton* Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      Singleton* temp = operator new(sizeof(Singleton));  //
initialize to temp
      new (temp) Singleton;
      // initializing member data
      assert( pInstance == 0 );
      pInstance = temp;  // assign temp to pInstance
    }
  }
  return pInstance;
}

and further rearrange the lines like this:


Singleton* Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      Singleton* temp = operator new(sizeof(Singleton));  //
initialize to temp
      new (temp) Singleton;
      assert( pInstance == 0 );
      pInstance = temp;  // assign temp to pInstance
      // initializing member data
    }
  }
  return pInstance;
}

(The compiler is allowed to do this, I think, because the compiler can
reorder code as long as it's not detectable in a single-threaded
program.)

So I realized pInstance still has the possibility of being assigned
memory before it is properly initialized.

   Then I thought:  What if I included a second lock to make sure that
the constuctor is finished before pInstance is set?  So I tried this:

Singleton* Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      Singleton* temp = new Singleton;  // initialize to temp
      secondLock.lock();
      pInstance = temp;  // assign temp to pInstance
      secondLock.unlock();
      }
    }
  }
  return pInstance;
}

Singleton::Singleton() {
  secondLock.lock();
  // initializing member data
  secondLock.unlock();
}

But eventually I realized that the order of the locks is not
guaranteed, that pInstance could still get assigned before the
constructor gets entered!

   So I shot down my own two solutions.  However, I kept thinking of
variants, and I started to wonder:  What if I combined the two
variants?  In other words, I'd take advantage of both assert() and a
second lock, like this:


Singleton* Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      Singleton* temp = new Singleton;  // initialize to temp
      secondLock.lock();
      pInstance = temp;  // assign temp to pInstance
      secondLock.unlock();
      }
    }
  }
  return pInstance;
}

Singleton::Singleton() {
  secondLock.lock();
  // initializing member data
  assert( pInstance == 0 );
  secondLock.unlock();
}

In this way, the second lock guarantees that pInstance is not assigned
to inside the constructor code, and the assert() statement ensures
that the constructor code executes before the code that assigns to
pInstance.

   This looks good as far as I can tell, but I'm thinking it's too
good to be true.

   Since I'm a bit skeptical about this last solution, could someone
poke holes in it?  (I'm eager to see if I really did find a solid
solution, or if it's just another pipe dream.)

   Thanks!

[toc] | [next] | [standalone]

#4374

From	Leigh Johnston <leigh@i42.co.uk>
Date	2011-04-26 18:17 +0100
Message-ID	<0tGdnY61aIy7YyvQnZ2dnUVZ7o-dnZ2d@giganews.com>
In reply to	#4373

On 26/04/2011 17:58, jl_post@hotmail.com wrote:
> Hi,
>
>     Recently I've been reading up on "Double-Checked Locking" in C++
> and how it's often implemented imperfectly.  The Article "C++ and the
> Perils of Double-Checked Locking" by Scott Meyers and Andrei
> Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> ) provides a good overview of how it's usually done and why it's often
> inadequate.
>
>     I won't go into details here, but the article states how this code:
>
> Singleton* Singleton::instance() {
>    if (pInstance == 0) {
>      Lock lock;
>      if (pInstance == 0) {
>        pInstance = new Singleton;
>      }
>    }
>    return pInstance;
> }
>
> and its variants aren't completely thread safe.
>
>     Now, I've been thinking about how to make the code thread-safe, and
> a few days ago I came up with a couple of solutions.  As I implemented
> them, I realized that these variants of mine also had problems.
>
>     One of the "solutions" I came up with was this:
>
> // Assign the singleton object to a temp pointer:
> Singleton* Singleton::instance() {
>    if (pInstance == 0) {
>      Lock lock;
>      if (pInstance == 0) {
>        Singleton* temp = new Singleton;  // initialize to temp
>        pInstance = temp;  // assign temp to pInstance
>      }
>    }
>    return pInstance;
> }
>
> Now, this approach isn't new (and is in fact covered in Meyer's and
> Alexandrescu's article).  The immediate problem is that the compiler
> can optimize away the temporary variable, so there's no guarantee that
> pInstance won't get the memory before it is properly initialized.
>
>     So I thought about making sure pInstance was unset in the
> constructor, like this:
>
> Singleton::Singleton() {
>    // initializing member data
>    assert( pInstance == 0 );
> }
>
> This way, the assert() statement ensures that pInstance will stay NULL
> until the constructor is finished.
>
>     BUT!  I realized that this solution was flawed, because the
> compiler can inline the constructor into the ::instance() method, like
> this:
>
> Singleton* Singleton::instance() {
>    if (pInstance == 0) {
>      Lock lock;
>      if (pInstance == 0) {
>        Singleton* temp = operator new(sizeof(Singleton));  //
> initialize to temp
>        new (temp) Singleton;
>        // initializing member data
>        assert( pInstance == 0 );
>        pInstance = temp;  // assign temp to pInstance
>      }
>    }
>    return pInstance;
> }
>
> and further rearrange the lines like this:
>
>
> Singleton* Singleton::instance() {
>    if (pInstance == 0) {
>      Lock lock;
>      if (pInstance == 0) {
>        Singleton* temp = operator new(sizeof(Singleton));  //
> initialize to temp
>        new (temp) Singleton;
>        assert( pInstance == 0 );
>        pInstance = temp;  // assign temp to pInstance
>        // initializing member data
>      }
>    }
>    return pInstance;
> }
>
> (The compiler is allowed to do this, I think, because the compiler can
> reorder code as long as it's not detectable in a single-threaded
> program.)
>
> So I realized pInstance still has the possibility of being assigned
> memory before it is properly initialized.
>
>     Then I thought:  What if I included a second lock to make sure that
> the constuctor is finished before pInstance is set?  So I tried this:
>
> Singleton* Singleton::instance() {
>    if (pInstance == 0) {
>      Lock lock;
>      if (pInstance == 0) {
>        Singleton* temp = new Singleton;  // initialize to temp
>        secondLock.lock();
>        pInstance = temp;  // assign temp to pInstance
>        secondLock.unlock();
>        }
>      }
>    }
>    return pInstance;
> }
>
> Singleton::Singleton() {
>    secondLock.lock();
>    // initializing member data
>    secondLock.unlock();
> }
>
> But eventually I realized that the order of the locks is not
> guaranteed, that pInstance could still get assigned before the
> constructor gets entered!
>
>     So I shot down my own two solutions.  However, I kept thinking of
> variants, and I started to wonder:  What if I combined the two
> variants?  In other words, I'd take advantage of both assert() and a
> second lock, like this:
>
>
> Singleton* Singleton::instance() {
>    if (pInstance == 0) {
>      Lock lock;
>      if (pInstance == 0) {
>        Singleton* temp = new Singleton;  // initialize to temp
>        secondLock.lock();
>        pInstance = temp;  // assign temp to pInstance
>        secondLock.unlock();
>        }
>      }
>    }
>    return pInstance;
> }
>
> Singleton::Singleton() {
>    secondLock.lock();
>    // initializing member data
>    assert( pInstance == 0 );
>    secondLock.unlock();
> }
>
> In this way, the second lock guarantees that pInstance is not assigned
> to inside the constructor code, and the assert() statement ensures
> that the constructor code executes before the code that assigns to
> pInstance.
>
>     This looks good as far as I can tell, but I'm thinking it's too
> good to be true.
>
>     Since I'm a bit skeptical about this last solution, could someone
> poke holes in it?  (I'm eager to see if I really did find a solid
> solution, or if it's just another pipe dream.)
>

A general solution to this problem requires the use of memory barriers 
as that article by Meyers and Alexandrescu intimates; you need to 
gaurantee that the write to the pInstance pointer occurs after the 
initialization writes of the object it points to and I don't see how 
your final "solution" achieves this.

Here is my singleton template FWIW:

	template <typename T>
	class singleton
	{
	public:
		static T& instance()
		{
			T* ret = sInstancePtr;
			memory_barrier_acquire_dependant();
			if (ret == 0)
			{
				lock theLock(sLock);
				static T sInstance;
				memory_barrier_release();
				sInstancePtr = &sInstance;
				ret = sInstancePtr;
			}
			return *ret;
		}
	private:
		static lockable sLock;
		static T* sInstancePtr;
	};

	template <typename T>
	lockable singleton<T>::sLock;
	template <typename T>
	T* singleton<T>::sInstancePtr;


/Leigh

[toc] | [prev] | [next] | [standalone]

#4376

From	Pete Becker <pete@versatilecoding.com>
Date	2011-04-26 13:50 -0400
Message-ID	<2011042613500411162-pete@versatilecodingcom>
In reply to	#4373

On 2011-04-26 12:58:34 -0400, jl_post@hotmail.com said:

> Hi,
> 
>    Recently I've been reading up on "Double-Checked Locking" in C++
> and how it's often implemented imperfectly.  The Article "C++ and the
> Perils of Double-Checked Locking" by Scott Meyers and Andrei
> Alexandrescu ( http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> ) provides a good overview of how it's usually done and why it's often
> inadequate.
> 
>    I won't go into details here, but the article states how this code:
> 
> Singleton* Singleton::instance() {
>   if (pInstance == 0) {
>     Lock lock;
>     if (pInstance == 0) {
>       pInstance = new Singleton;
>     }
>   }
>   return pInstance;
> }
> 
> and its variants aren't completely thread safe.
> 

As Leigh said, the issue is not so much compiler optimizations (as 
detailed in the snipped portion of this message) as memory visibility. 
There are systems where writes that occur in one order in the compiled 
code can become visible to other threads in a different order. (read 
the previous sentence several times; it's counterintiutive, but true) 
So even if you can persuade the compiler to generate exactly the code 
that you want, a different thread may see a non-zero value of pinstance 
before it sees the changes made to the thing that the new value of 
pinstance points to. The C++0x solution is:

#include <atomic>
std::atomic<Singleton*> pinstance;

if (pinstance == 0) {
	Lock lock;
	if (pinstance == 0)
		pinstance = new Singleton;
}
return pinstance;

Making pinstance an atomic variable ensures that all writes that 
"happen before" the assignment to pinstance in one thread are seen 
before the result of that assignment is seen in any other thread. In 
essence, it provides a portable wrapper around the memory barrier 
operations in Leigh's suggested solution. (Yes, if you care, his 
solution uses acquire/release semantics, and the solution I've given 
enforces sequential consistency, which is a stronger guarantee; if you 
need to, you can make the atomic variable use acquire/release)

-- 
  Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The 
Standard C++ Library Extensions: a Tutorial and Reference 
(www.petebecker.com/tr1book)

[toc] | [prev] | [next] | [standalone]

#4566

From	Scott Meyers <NeverRead@aristeia.com>
Date	2011-05-01 17:14 -0700
Message-ID	<ipkt23$2sj$1@news.albasani.net>
In reply to	#4376

On 4/26/2011 10:50 AM, Pete Becker wrote:
> pinstance points to. The C++0x solution is:
>
> #include <atomic>
> std::atomic<Singleton*> pinstance;
>
> if (pinstance == 0) {
> Lock lock;
> if (pinstance == 0)
> pinstance = new Singleton;
> }
> return pinstance;
>
> Making pinstance an atomic variable ensures that all writes that "happen
> before" the assignment to pinstance in one thread are seen before the
> result of that assignment is seen in any other thread.

True, but that's not enough to make this code work.  You also need to 
know that assignment to pinstance can't take place until the Singleton 
constructor has run to completion.  You have that guarantee in C++0x, 
too, so the code is correct, but it's not just the use of atomics that 
gives you that correctness.  You also need the additional guarantees 
that C++0x offers regarding the behavior of new expressions (compared to 
the weaker guarantees that C++03 gives you).

Scott

-- 
* C++ and Beyond: Meyers, Sutter, & Alexandrescu, Aug 7-10 in Banff 
(http://cppandbeyond.com/)

[toc] | [prev] | [next] | [standalone]

#4596

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-05-02 15:59 -0700
Message-ID	<428ddf1c-7074-4117-ae13-6d427cbe2ec4@34g2000pru.googlegroups.com>
In reply to	#4566

On May 1, 5:14 pm, Scott Meyers <NeverR...@aristeia.com> wrote:
> On 4/26/2011 10:50 AM, Pete Becker wrote:
>
> > pinstance points to. The C++0x solution is:
>
> > #include <atomic>
> > std::atomic<Singleton*> pinstance;
>
> > if (pinstance == 0) {
> > Lock lock;
> > if (pinstance == 0)
> > pinstance = new Singleton;
> > }
> > return pinstance;
>
> > Making pinstance an atomic variable ensures that all writes that "happen
> > before" the assignment to pinstance in one thread are seen before the
> > result of that assignment is seen in any other thread.
>
> True, but that's not enough to make this code work.  You also need to
> know that assignment to pinstance can't take place until the Singleton
> constructor has run to completion.  You have that guarantee in C++0x,
> too, so the code is correct, but it's not just the use of atomics that
> gives you that correctness.  You also need the additional guarantees
> that C++0x offers regarding the behavior of new expressions (compared to
> the weaker guarantees that C++03 gives you).

I don't think I'd describe it that way.

Consider:
  pinstance = new Singleton;

For a simple pointer in C++03, there are no threading guarantees
given, so of course it can implement it however it wants, which is
apparently frequently enough something like the following pseudo-code
  pinstance = malloc(sizeof(Singleton));
  new(pinstance) Singleton();
Even under POSIX, to detect the difference between the above and the
following pseudo-code
  temp = malloc(sizeof(Singleton));
  new(temp) Singleton();
  pinstance = temp;
would require code that has a race condition, which means undefined
behavior, which means that the compiler can do whatever it wants.

In C++0x, the situation is unchanged for a simple pointer. There are
no guarantees specific to "new expressions" which change this. The
compiler is allowed to implement it in exactly the same way. If you
have some code that could detect this, then you still have a race
condition, and thus undefined behavior, and thus all bets are off, so
the compiler free to do as it wants.

What does change is when we throw std::atomic into the mix. When
pinstance is of type std::atomic, the following:
  pinstance = new Singleton;
is equivalent to the following ala operator overloading pseudo-code
(forgive me for not knowing the specific function name offhand):
  pinstance.set(new Singleton);
And C++0x does have very specific guarantees concerning std::atomic.
It doesn't matter that the argument is a new expression. It could be
any sort of expression or function call. There is no guarantee
specific to new expressions. There are specific guarantees relating to
std::atomic::set and std::atomic::get, and there are guarantees that
describe all expressions, but not new expressions in particular.

Perhaps you were talking about the guarantees specific to function
local static variables? C++0x does guarantee that their initialization
is properly synchronized so that the first access will initialize it,
and all other threads will wait until it's done, and all other threads
will properly see the initialized function local static variable.
However, Pete Becker's code does not use a function local static, and
even then the guarantees about function local statics are not specific
to new expressions. You could initialize one with a normal function
call or some other expression which isn't a new expression, and the
same guarantees apply.

[toc] | [prev] | [next] | [standalone]

#4614

From	Pete Becker <pete@versatilecoding.com>
Date	2011-05-03 08:39 -0400
Message-ID	<2011050308393971813-pete@versatilecodingcom>
In reply to	#4596

On 2011-05-02 18:59:49 -0400, Joshua Maurice said:

> 
> What does change is when we throw std::atomic into the mix. When
> pinstance is of type std::atomic, the following:
>   pinstance = new Singleton;
> is equivalent to the following ala operator overloading pseudo-code
> (forgive me for not knowing the specific function name offhand):
>   pinstance.set(new Singleton);

Let me say that more strongly. std::atomic<Singleton*> is a template 
instantiation. It has an assignment operator that takes an argument of 
type Singleton*. So in this code:

std::atomic<Singleton*> pinstance;
pinstance = new Singleton;

the assignment is implemented as a function call. All of the side 
effects of evaluating the argument to the function call happen before 
the call to the function, so the memory barrier that the assignment 
establishes ensures that all of the side effects will be visible to 
other threads before the assignment is visible.

That guarantee, though, depends on both the definition of the atomic 
template and the new guarantees that compilers can't reorder 
instructions across atomic operations.

-- 
  Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The 
Standard C++ Library Extensions: a Tutorial and Reference 
(www.petebecker.com/tr1book)

[toc] | [prev] | [next] | [standalone]

#4378

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-04-26 11:16 -0700
Message-ID	<d6774102-147b-4e3d-a457-a6bfd4c98a13@f15g2000pro.googlegroups.com>
In reply to	#4373

On Apr 26, 9:58 am, jl_p...@hotmail.com wrote:
> Hi,
>
>    Recently I've been reading up on "Double-Checked Locking" in C++
> and how it's often implemented imperfectly.  The Article "C++ and the
> Perils of Double-Checked Locking" by Scott Meyers and Andrei
> Alexandrescu (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> ) provides a good overview of how it's usually done and why it's often
> inadequate.
[Snipped discussion of double checked locking]
>    Since I'm a bit skeptical about this last solution, could someone
> poke holes in it?  (I'm eager to see if I really did find a solid
> solution, or if it's just another pipe dream.)

I'm sorry for being so gruff in the following (not really), but it's
evident that you have not even fully read the paper which you cited,
the only paper, and that is highly irritating.

Look, follow this, and understand just how royally screwed your
approach is. Given the initial conditions:
  int x = 0;
  int y = 0;
Thread 1 executing:
  x = 1;
  y = 2;
And threads 2-5 executing the following all concurrently with threads
1-5:
  cout << x << ' ' << y << endl;
On non-obscure hardware and compilers, that can print /on a single
execution/, all 4 combinations:
  00
  02
  10
  12
Repeat, /on a single execution/. Different threads can see some writes
done by different threads in different orders! Thread 2 could see the
write to x without seeing the write to y, and thread 3 can see the
write to y without seeing the write to x.

Technically, the standard guarantees even worse - undefined behavior,
but the above is an example that /really happens/ on /real/ hardware
and /real/ compilers. This is largely due to hardware "reorderings",
but let me quote the paper for the relevant bit:

[quote]
Nothing you do can alter the fundamental problem: you need to be able
to specify a constraint on instruction ordering, and your language
gives you no way to do it.
[/quote]

To emphasize, it might be the compiler reordering it, it might be the
hardware reordering it, and it could even be some new kind of thing
which hasn't been invented yet! In practice the compiler writers and
hardware makers give guarantees for C++03 which mean that your code is
fundamentally broken. There is no standard or given guarantee of any
kind that any code written without proper synchronization will work.
None. Your code, give it to any compiler writer, and they will say
"won't work", which means while it might incidentally work today, but
tomorrow they might put in a new optimization (software or hardware),
and your code breaks. **This is the fundamental problem which you
cannot dodge!!**

Here's how you can break your code. Note again the general problem
that there are no guarantees that could even make it work, so I want
you to focus on the above basic problem, and do not spend too much
time on this, but I present it completeness. For example:

    Singleton* Singleton::instance() {
      if (pInstance == 0) {
        Lock lock;
        if (pInstance == 0) {
          Singleton* temp = new Singleton;  // initialize to temp
          secondLock.lock();
          pInstance = temp;  // assign temp to pInstance
          secondLock.unlock();
        }
      }
      return pInstance;
    }

A sufficiently smart compiler is allowed to move things from before a
lock to after the lock, and from after an unlock to before a lock. It
can transform the above to:

    Singleton* Singleton::instance() {
      if (pInstance == 0) {
        Lock lock;
        if (pInstance == 0) {
          secondLock.lock();
          Singleton* temp = new Singleton;  // initialize to temp
          pInstance = temp;  // assign temp to pInstance
          secondLock.unlock();
        }
      }
      return pInstance;
    }

And once we get that, it's trivial to change it to:

    Singleton* Singleton::instance() {
      if (pInstance == 0) {
        Lock lock;
        if (pInstance == 0) {
          secondLock.lock();
          pInstance = new Singleton;
          secondLock.unlock();
        }
      }
      return pInstance;
    }

Which means we're back to screwed for the reasons known to you. I
didn't even need to resort to the wonderful DEC Alpha and its split
cache, but I could have.

You need to reread the paper which you cited. Here's the link again
for your benefit.
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
And pay attention this time.

[toc] | [prev] | [next] | [standalone]

#4379

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-04-26 11:19 -0700
Message-ID	<26785e91-f1d2-43b2-b128-a521a190cfa6@r35g2000prj.googlegroups.com>
In reply to	#4378

On Apr 26, 11:16 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> A sufficiently smart compiler is allowed to move things from before a
> lock to after the lock, and from after an unlock to before a lock.

Ack, I meant:

> A sufficiently smart compiler is allowed to move things from before a
> lock to after the lock, and from after an unlock to before
> **an unlock**.

Hopefully it's clear from context. My mistake. My apologies.

[toc] | [prev] | [next] | [standalone]

#4380

From	Pete Becker <pete@versatilecoding.com>
Date	2011-04-26 14:30 -0400
Message-ID	<2011042614300497308-pete@versatilecodingcom>
In reply to	#4378

On 2011-04-26 14:16:33 -0400, Joshua Maurice said:

> This is largely due to hardware "reorderings",
> but let me quote the paper for the relevant bit:
> 
> [quote]
> Nothing you do can alter the fundamental problem: you need to be able
> to specify a constraint on instruction ordering, and your language
> gives you no way to do it.
> [/quote]
> 
> To emphasize, it might be the compiler reordering it, it might be the
> hardware reordering it,

Let me underscore the problem, as the quotation above misses part of 
the issue. Even when the instructions are executed in exactly the order 
that you want them to be executed, different threads can see results in 
a different order from the order in which the thread doing the stores 
actually did them.

When data is shared between threads and at least one thread is writing 
that data, all accesses to that data must be synchronized.

C++0x says that the behavior of a program that does not synchronize 
such accesses is undefined.

-- 
  Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The 
Standard C++ Library Extensions: a Tutorial and Reference 
(www.petebecker.com/tr1book)

[toc] | [prev] | [next] | [standalone]

#4383

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-04-26 11:50 -0700
Message-ID	<8a01f4c1-d9de-40fa-b8d7-acb20e50847b@z27g2000prz.googlegroups.com>
In reply to	#4380

On Apr 26, 11:30 am, Pete Becker <p...@versatilecoding.com> wrote:
> On 2011-04-26 14:16:33 -0400, Joshua Maurice said:
>
> > This is largely due to hardware "reorderings",
> > but let me quote the paper for the relevant bit:
>
> > [quote]
> > Nothing you do can alter the fundamental problem: you need to be able
> > to specify a constraint on instruction ordering, and your language
> > gives you no way to do it.
> > [/quote]
>
> > To emphasize, it might be the compiler reordering it, it might be the
> > hardware reordering it,
>
> Let me underscore the problem, as the quotation above misses part of
> the issue. Even when the instructions are executed in exactly the order
> that you want them to be executed, different threads can see results in
> a different order from the order in which the thread doing the stores
> actually did them.
>
> When data is shared between threads and at least one thread is writing
> that data, all accesses to that data must be synchronized.
>
> C++0x says that the behavior of a program that does not synchronize
> such accesses is undefined.

I fully agree. I also stated that /very explicitly/ just before the
snipped quote with my 1+4 thread example. I did intend to clearly
state exactly that. At least I think I did. For future reference, was
this unclear? Or did you merely mean to emphasize this point? That is
a good point to emphasize.

I'm just a little confused.

[toc] | [prev] | [next] | [standalone]

#4901

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-05-11 19:55 -0700
Message-ID	<5b81d426-bc25-46b2-b97d-0ea29355dcb1@z13g2000prk.googlegroups.com>
In reply to	#4378

On Apr 26, 11:16 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> Here's how you can break your code. Note again the general problem
> that there are no guarantees that could even make it work, so I want
> you to focus on the above basic problem, and do not spend too much
> time on this, but I present it completeness. For example:
>
>     Singleton* Singleton::instance() {
>       if (pInstance == 0) {
>         Lock lock;
>         if (pInstance == 0) {
>           Singleton* temp = new Singleton;  // initialize to temp
>           secondLock.lock();
>           pInstance = temp;  // assign temp to pInstance
>           secondLock.unlock();
>         }
>       }
>       return pInstance;
>     }
>
> A sufficiently smart compiler is allowed to move things from before a
> lock to after the lock, and from after an unlock to before a lock. It
> can transform the above to:
>
>     Singleton* Singleton::instance() {
>       if (pInstance == 0) {
>         Lock lock;
>         if (pInstance == 0) {
>           secondLock.lock();
>           Singleton* temp = new Singleton;  // initialize to temp
>           pInstance = temp;  // assign temp to pInstance
>           secondLock.unlock();
>         }
>       }
>       return pInstance;
>     }
>
> And once we get that, it's trivial to change it to:
>
>     Singleton* Singleton::instance() {
>       if (pInstance == 0) {
>         Lock lock;
>         if (pInstance == 0) {
>           secondLock.lock();
>           pInstance = new Singleton;
>           secondLock.unlock();
>         }
>       }
>       return pInstance;
>     }
>
> Which means we're back to screwed for the reasons known to you. I
> didn't even need to resort to the wonderful DEC Alpha and its split
> cache, but I could have.
>
> You need to reread the paper which you cited. Here's the link again
> for your benefit.http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
> And pay attention this time.

I was looking over this while searching for interview questions, and I
realized I made a mistake. You cannot make the first reordering so
easily due to the lock nonsense inside the constructor. However, a
sufficiently smart compiler could notice your clever ruse, optimize
away the assert as always true, see a lock and unlock pair guarding
nothing, optimize that away, and then move the assignment to temp past
the mutex acquire, as demonstrated above.

Of course, as I tried to emphasize, and as I will emphasize again now,
that's not the only kind of thing which can break it. Cache coherency
problems can lead to quite bizarre situations, especially with a
processor with a weak memory model. Of course, the most important
detail is that you are not given guarantees that it will work. Thus
implementers, hardware and compiler and linker, and anything new which
hasn't been invented yet, are free to break your code, because it has
undefined behavior, because your program breaks the rules.

[toc] | [prev] | [next] | [standalone]

#4951

From	Gerhard Fiedler <gelists@gmail.com>
Date	2011-05-13 19:56 -0300
Message-ID	<1bzq6ymjbt91c$.dlg@gelists.gmail.com>
In reply to	#4901

Joshua Maurice wrote:

> However, a sufficiently smart compiler could notice your clever ruse,
> optimize away the assert as always true, see a lock and unlock pair
> guarding nothing, optimize that away, and then move the assignment to
> temp past the mutex acquire, as demonstrated above.

Regarding the compiler optimizing away a lock/unlock pair guarding
"nothing": AIUI both lock and unlock need to provide certain fences.
Therefore, again AIUI, they can't be optimized away by the compiler even
if there's nothing in between, because that would remove the fences and
alter the behavior. 

Am I missing something here?

Gerhard

[toc] | [prev] | [next] | [standalone]

#4952

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-05-13 16:59 -0700
Message-ID	<4676ab04-fd5a-4c87-acea-63a951daf1b4@r35g2000prj.googlegroups.com>
In reply to	#4951

On May 13, 3:56 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
> Joshua Maurice wrote:
> > However, a sufficiently smart compiler could notice your clever ruse,
> > optimize away the assert as always true, see a lock and unlock pair
> > guarding nothing, optimize that away, and then move the assignment to
> > temp past the mutex acquire, as demonstrated above.
>
> Regarding the compiler optimizing away a lock/unlock pair guarding
> "nothing": AIUI both lock and unlock need to provide certain fences.
> Therefore, again AIUI, they can't be optimized away by the compiler even
> if there's nothing in between, because that would remove the fences and
> alter the behavior.
>
> Am I missing something here?

That may be how they're commonly implemented, but that's not the
guaranteed semantics. Two different mutexes may as a matter of fact on
a given implementation give "happens-before" effects between the two
different mutexes, but there's nothing guaranteed about it.

I was ambiguously describing the situation, possibly to the extent of
being wrong. Allow me to correct myself.

Remember the code:

  Singleton* Singleton::instance() {
    if (pInstance == 0) {
      Lock lock;
      if (pInstance == 0) {
        Singleton* temp = new Singleton;  // initialize to temp
        secondLock.lock();
        pInstance = temp;  // assign temp to pInstance
        secondLock.unlock();
      }
    }
    return pInstance;
  }

  Singleton::Singleton() {
    secondLock.lock();
    assert( pInstance == 0 );
    secondLock.unlock();
  }

The compiler can expand inline as follows (pseudo-code):

  Singleton* Singleton::instance() {
    if (pInstance == 0) {
      Lock lock;
      if (pInstance == 0) {
        Singleton* temp = (Singleton*) ::operator
new(sizeof(Singleton));

        secondLock.lock();
        assert( pInstance == 0 );
        secondLock.unlock();

        secondLock.lock();
        pInstance = temp;  // assign temp to pInstance
        secondLock.unlock();
      }
    }
    return pInstance;
  }

With that, it sees:
  someMutex.lock();
  <blah1>
  someMutex.unlock();
  <blah2>
  someMutex.lock();
  <blah3>
  someMutex.unlock();

The compiler sees a lock, unlock, lock, unlock, in straightline code,
without branching (or exceptions, or volatile (to keep signal handlers
correct)). The compiler is totally free to replace that with:
  someMutex.lock();
  <blah1>
  <blah2>
  <blah3>
  someMutex.unlock();

It may be bad from a QoI to do that. It depends. It depends heavily on
the situation. I see it as quite reasonable that the compiler could
employ some heuristics to deduce when it's a savings to combine
critical sections. In this case, <blah1> and <blah2> are actually
empty, no-ops, so it's always a savings to combine the two adjacent
critical sections as such.

So, it can transform it to:

  Singleton* Singleton::instance() {
    if (pInstance == 0) {
      Lock lock;
      if (pInstance == 0) {
        Singleton* temp = (Singleton*) ::operator
new(sizeof(Singleton));

        secondLock.lock();
        assert( pInstance == 0 );
        pInstance = temp;  // assign temp to pInstance
        secondLock.unlock();
      }
    }
    return pInstance;
  }

With some simple data flow analysis, and allowed motions past locks,
it can transform it to:
  Singleton* Singleton::instance() {
    if (pInstance == 0) {
      Lock lock;
      if (pInstance == 0) {
        secondLock.lock();
        pInstance = (Singleton*) ::operator new(sizeof(Singleton));
        secondLock.unlock();
      }
    }
    return pInstance;
  }

----

Now, back to my original much more controversial statement - can a
compiler simply remove a lock unlock pair? Ex:
  mutex.lock();
  mutex.unlock();
Maybe. I mentioned "clever ruse" with whole program optimization in
mind. (However, upon thinking about it, I just showed that you don't
even need whole program optimization.) Without whole program
optimization, I think no. Could someone please more educated weigh
in?

Thus far, after 10 minutes of attempts just now to write a conforming
race-free program where you could tell the difference if a compiler
simply removed an "empty" mutex acquire release pair, the only
programs I can find are ones that would deadlock before the change,
and not deadlock after the change. A deadlock is observable behavior,
so a compiler cannot remove it for that reason.

[toc] | [prev] | [next] | [standalone]

#5144

From	Gerhard Fiedler <gelists@gmail.com>
Date	2011-05-18 18:12 -0300
Message-ID	<qqe5nc4lm7rr.dlg@gelists.gmail.com>
In reply to	#4952

Joshua Maurice wrote:

> On May 13, 3:56 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
>> Joshua Maurice wrote:
>>> However, a sufficiently smart compiler could notice your clever ruse,
>>> optimize away the assert as always true, see a lock and unlock pair
>>> guarding nothing, optimize that away, and then move the assignment to
>>> temp past the mutex acquire, as demonstrated above.
>>
>> Regarding the compiler optimizing away a lock/unlock pair guarding
>> "nothing": AIUI both lock and unlock need to provide certain fences.
>> Therefore, again AIUI, they can't be optimized away by the compiler even
>> if there's nothing in between, because that would remove the fences and
>> alter the behavior.
>>
>> Am I missing something here?
> 
> That may be how they're commonly implemented, but that's not the
> guaranteed semantics. Two different mutexes may as a matter of fact on
> a given implementation give "happens-before" effects between the two
> different mutexes, but there's nothing guaranteed about it.
> 
> ...
> 
> With that, it sees:
>   someMutex.lock();
>   <blah1>
>   someMutex.unlock();
>   <blah2>
>   someMutex.lock();
>   <blah3>
>   someMutex.unlock();
> 
> The compiler sees a lock, unlock, lock, unlock, in straightline code,
> without branching (or exceptions, or volatile (to keep signal handlers
> correct)). The compiler is totally free to replace that with:
>   someMutex.lock();
>   <blah1>
>   <blah2>
>   <blah3>
>   someMutex.unlock();

This is exactly what I don't understand -- I guess I have to look deeper
into the different mutex/lock implementations and their guarantees. I
was under the impression that through the lock/unlock pairs, it is
guaranteed that <blah1> happens before <blah2>, and that <blah3> happens
after it. 

> Now, back to my original much more controversial statement - can a
> compiler simply remove a lock unlock pair? Ex:
>   mutex.lock();
>   mutex.unlock();
> Maybe. I mentioned "clever ruse" with whole program optimization in
> mind. (However, upon thinking about it, I just showed that you don't
> even need whole program optimization.) Without whole program
> optimization, I think no. Could someone please more educated weigh
> in?
> 
> Thus far, after 10 minutes of attempts just now to write a conforming
> race-free program where you could tell the difference if a compiler
> simply removed an "empty" mutex acquire release pair, the only
> programs I can find are ones that would deadlock before the change,
> and not deadlock after the change. A deadlock is observable behavior,
> so a compiler cannot remove it for that reason.

Which would mean that if the compiler wants to remove it, it has to be
able to prove that it can't deadlock. This would be a useful diagnostic
output :)

Thanks,
Gerhard

[toc] | [prev] | [next] | [standalone]

#5148

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-05-18 14:53 -0700
Message-ID	<97ebe8ae-0731-4947-9084-7b62d360e65a@z15g2000prn.googlegroups.com>
In reply to	#5144

On May 18, 2:12 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
> Joshua Maurice wrote:
> > With that, it sees:
> >   someMutex.lock();
> >   <blah1>
> >   someMutex.unlock();
> >   <blah2>
> >   someMutex.lock();
> >   <blah3>
> >   someMutex.unlock();
>
> > The compiler sees a lock, unlock, lock, unlock, in straightline code,
> > without branching (or exceptions, or volatile (to keep signal handlers
> > correct)). The compiler is totally free to replace that with:
> >   someMutex.lock();
> >   <blah1>
> >   <blah2>
> >   <blah3>
> >   someMutex.unlock();
>
> This is exactly what I don't understand -- I guess I have to look deeper
> into the different mutex/lock implementations and their guarantees. I
> was under the impression that through the lock/unlock pairs, it is
> guaranteed that <blah1> happens before <blah2>, and that <blah3> happens
> after it.

Unless I'm totally off, this is a quite straightforward
transformation. It may be bad from a QoI perspective depending on the
particulars of the code, and it may be a bad optimization depending on
the particulars, but it is definitely allowed by the C++0x memory
model (and Java 1.5+ memory model) AFAIK.

You said that "I was under the impression that [...] it is guaranteed
that <blah1> happens before <blah2>, [...]". Why is that? More
particularly, what does that mean?

Let me try to explain. Taking a step back, consider:
  int a = 1;
  int b = 2;
  global_c = a + b;
  global_d = a - b;

We have no guarantees that the assignment to global_c happens before
global_d. Why? This is equivalent to the phrasing that no conforming
program could tell the difference if in fact the compiler "switched"
the two. It's the basic "as-if" rule.

Going back to:
  someMutex.lock();
  <blah1>
  someMutex.unlock();
  <blah2>
  someMutex.lock();
  <blah3>
  someMutex.unlock();

Using the "as-if" rule, the compiler is totally free to transform that
to the following, under some rather specific rules.
  someMutex.lock();
  <blah1>
  <blah2>
  <blah3>
  someMutex.unlock();

It can only do so under some very specific situations. The compiler
has to be able to guarantee that no control flow path goes out of that
block except the normal flow past the end. (Also volatile and signal
handlers might be annoying, so let's ignore that for now.) Now, think
of a program that could possibly tell the difference between those two
code fragments? AFAIK, you could not. Because of the vagueries of
scheduling, no conforming program would have its behavior changed in a
way that would produce results not allowed by the memory model, and
thus the compiler is entirely free to do that under the "as-if" rule.

To put that another way, the difference between the two programs is
that another thread might acquire the mutex when the first thread is
in <blah2>, and thus see <blah1> without being guaranteed to see
<blah2> nor <blah3>. That is, it is allowed for the compiler and
scheduler to do so. It is not required. It is perfectly allowed for a
"malicious" scheduler and implementation to guarantee that it will /
never/ allow another thread to acquire that mutex, forcing it to wait,
while a thread is in <blah2>. Your conforming program could not tell
the difference because you're not guaranteed that some other thread
will be scheduled at just the right time to acquire the mutex when the
first thread is in <blah2>.

As I said earlier, there are somewhat severe restrictions on when a
compiler could do this. If exceptions can be thrown, or early returns,
and such, the compiler has to be very careful to guarantee the same
"as-if" behavior. Also, gotos or loops could mess this up, especially
badly in light of QoI. IIRC, there's some fluff in the C++0x standard
that things must become visible to other threads "eventually", or even
"in a reasonable amount of time", and probably (?) something similar
along the lines that a thread waiting on a mutex will "eventually"
acquire it, or even "in a reasonable amount of time". If you combine
those two sections like that, you might have just broken the
"eventually" or "in a reasonable amount of time" guarantees.

Thus, as I mentioned earlier, as <blah1> and <blah2> were empty in the
earlier example, none of that can happen - it's safe to combine the
critical sections. And it's a clear win to combine the empty critical
section with the non-empty one. After that, the compiler can apply
another variation of the "as-if" rule to move the assignment to the
local variable from outside the critical section to inside the
critical section. Finally, it's free to use the normal single-thread
rules to remove the unnecessarily local variable.

[toc] | [prev] | [next] | [standalone]

#5167

From	Gerhard Fiedler <gelists@gmail.com>
Date	2011-05-19 13:46 -0300
Message-ID	<1lge2hm64clye$.dlg@gelists.gmail.com>
In reply to	#5148

Joshua Maurice wrote:

> On May 18, 2:12 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
>> Joshua Maurice wrote:
>>> With that, it sees:
>>>   someMutex.lock();
>>>   <blah1>
>>>   someMutex.unlock();
>>>   <blah2>
>>>   someMutex.lock();
>>>   <blah3>
>>>   someMutex.unlock();
>>
>>> The compiler sees a lock, unlock, lock, unlock, in straightline code,
>>> without branching (or exceptions, or volatile (to keep signal handlers
>>> correct)). The compiler is totally free to replace that with:
>>>   someMutex.lock();
>>>   <blah1>
>>>   <blah2>
>>>   <blah3>
>>>   someMutex.unlock();
>> 
>> This is exactly what I don't understand -- I guess I have to look
>> deeper into the different mutex/lock implementations and their
>> guarantees. I was under the impression that through the lock/unlock
>> pairs, it is guaranteed that <blah1> happens before <blah2>, and
>> that <blah3> happens after it.
> 
> Unless I'm totally off, this is a quite straightforward
> transformation. It may be bad from a QoI perspective depending on the
> particulars of the code, and it may be a bad optimization depending
> on the particulars, but it is definitely allowed by the C++0x memory
> model (and Java 1.5+ memory model) AFAIK.

FWIW, I was thinking of C++03 with pthread (GCC) or Windows (VC++)
threading models.

> You said that "I was under the impression that [...] it is guaranteed
> that <blah1> happens before <blah2>, [...]". Why is that? More
> particularly, what does that mean?
> 
> Let me try to explain. Taking a step back, consider:
>   int a = 1;
>   int b = 2;
>   global_c = a + b;
>   global_d = a - b;
> 
> We have no guarantees that the assignment to global_c happens before
> global_d. Why? This is equivalent to the phrasing that no conforming
> program could tell the difference if in fact the compiler "switched"
> the two. It's the basic "as-if" rule.

Understood and agreed -- without locks. I was under the impression that
locks change this, by adding "observable" points to the program.

> Going back to:
>   someMutex.lock();
>   <blah1>
>   someMutex.unlock();
>   <blah2>
>   someMutex.lock();
>   <blah3>
>   someMutex.unlock();
> 
> Using the "as-if" rule, the compiler is totally free to transform that
> to the following, under some rather specific rules.
>   someMutex.lock();
>   <blah1>
>   <blah2>
>   <blah3>
>   someMutex.unlock();
> 
> It can only do so under some very specific situations. The compiler
> has to be able to guarantee that no control flow path goes out of that
> block except the normal flow past the end. (Also volatile and signal
> handlers might be annoying, so let's ignore that for now.) Now, think
> of a program that could possibly tell the difference between those two
> code fragments? AFAIK, you could not. 

How about the typical situation: short-running, locked synchronizing
code with long-running, unlocked working code? blah1 (retrieving tasks
out of a shared queue) and blah3 (placing results into a shared queue)
each a few milliseconds, blah2 (calculating the results) a few hours or
days, and a few threads doing this in parallel. I'm sure I can tell the
difference between the two versions. 

> Because of the vagueries of scheduling, no conforming program would
> have its behavior changed in a way that would produce results not
> allowed by the memory model, and thus the compiler is entirely free
> to do that under the "as-if" rule.

I understand that this may be a QoI issue, but then again, if I can't
guarantee that the two days that blah2 runs are in fact unlocked, I
really can't use this system for this task. QoI of the compiler here
becomes QoI of the application, and if the compiler is free to do this,
it doesn't allow me to guarantee my application spec -- which I assume
to be a typical threaded application.

> To put that another way, the difference between the two programs is
> that another thread might acquire the mutex when the first thread is
> in <blah2>, and thus see <blah1> without being guaranteed to see
> <blah2> nor <blah3>. That is, it is allowed for the compiler and
> scheduler to do so. It is not required. 

In my example, which is nothing strange, it is required, and to
implement it, I need to use a compiler with a threading/memory model
that guarantees me that blah2 runs unlocked, so that it can run in
parallel. This of course if the scheduler allows it to do so -- but
since the compiler doesn't control the scheduler, it can't really assume
much about it and IMO needs to unlock after blah1, even though it
doesn't know what the scheduler does with this. Whether it makes sense
to unlock and relock around blah2 /I/ need to decide when writing the
program, considering the particularities of the scheduler I'm intending
to run it on.

> IIRC, there's some fluff in the C++0x standard that things must become
> visible to other threads "eventually", or even "in a reasonable
> amount of time", and probably (?) something similar along the lines
> that a thread waiting on a mutex will "eventually" acquire it, or
> even "in a reasonable amount of time". If you combine those two
> sections like that, you might have just broken the "eventually" or
> "in a reasonable amount of time" guarantees.

This doesn't sound good :)  This relates to my earlier comment that I
think I need to look into the guarantees that the individual
implementations provide -- if this here reflects the C++03 guarantees,
IMO they alone are not very useful in practice. To go back to the
example, I need to have the guarantee that blah2 runs unlocked (within
the overall scheduling granularity of the target system, of course).

Thanks,
Gerhard

[toc] | [prev] | [next] | [standalone]

#5190

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-05-19 15:30 -0700
Message-ID	<d8b0fd3c-59b6-470d-bcce-0f07070c118a@34g2000pru.googlegroups.com>
In reply to	#5167

On May 19, 9:46 am, Gerhard Fiedler <geli...@gmail.com> wrote:
> Joshua Maurice wrote:
> > Unless I'm totally off, this is a quite straightforward
> > transformation. It may be bad from a QoI perspective depending on the
> > particulars of the code, and it may be a bad optimization depending
> > on the particulars, but it is definitely allowed by the C++0x memory
> > model (and Java 1.5+ memory model) AFAIK.
>
> FWIW, I was thinking of C++03 with pthread (GCC) or Windows (VC++)
> threading models.

For most intents and purposes, the memory models of C++0x, POSIX
pthreads, win32, and Java 1.5 and higher, are the same. There's some
important differences between them (like Java "final" keyword, and the
Java "out of thin air" guarantee), but for the basics like visibility
guarantees w.r.t. mutexes, it's the same all around.

> > You said that "I was under the impression that [...] it is guaranteed
> > that <blah1> happens before <blah2>, [...]". Why is that? More
> > particularly, what does that mean?
>
> > Let me try to explain. Taking a step back, consider:
> >   int a = 1;
> >   int b = 2;
> >   global_c = a + b;
> >   global_d = a - b;
>
> > We have no guarantees that the assignment to global_c happens before
> > global_d. Why? This is equivalent to the phrasing that no conforming
> > program could tell the difference if in fact the compiler "switched"
> > the two. It's the basic "as-if" rule.
>
> Understood and agreed -- without locks. I was under the impression that
> locks change this, by adding "observable" points to the program.

With multi-threading, the program no longer has exactly one possible
allowed execution. (In practice, I would guess most single threaded
programs have more than one allowed execution given things like
implementation defined behavior, and various QoI and system specs like
stack size, heap space, and so on.)

With multi-threading, this changes. No longer are you guaranteed a
particular execution. Instead, there is a set of allowed executions.
The set of allowed executions is quite annoying and complex to define
formally, but you are right that locks restrict this set. Some
executions are not allowed because they violate some of the ordering
rules of the memory model, such as the ordering rules required by
locks. However, you are still left with a rather large set of allowed
executions, and a conforming implementation is free to choose to do
whichever it wants.

> > Going back to:
> >   someMutex.lock();
> >   <blah1>
> >   someMutex.unlock();
> >   <blah2>
> >   someMutex.lock();
> >   <blah3>
> >   someMutex.unlock();
>
> > Using the "as-if" rule, the compiler is totally free to transform that
> > to the following, under some rather specific rules.
> >   someMutex.lock();
> >   <blah1>
> >   <blah2>
> >   <blah3>
> >   someMutex.unlock();
>
> > It can only do so under some very specific situations. The compiler
> > has to be able to guarantee that no control flow path goes out of that
> > block except the normal flow past the end. (Also volatile and signal
> > handlers might be annoying, so let's ignore that for now.) Now, think
> > of a program that could possibly tell the difference between those two
> > code fragments? AFAIK, you could not.
>
> How about the typical situation: short-running, locked synchronizing
> code with long-running, unlocked working code? blah1 (retrieving tasks
> out of a shared queue) and blah3 (placing results into a shared queue)
> each a few milliseconds, blah2 (calculating the results) a few hours or
> days, and a few threads doing this in parallel. I'm sure I can tell the
> difference between the two versions.

I was using an idiomatic meaning of "tell the difference". Of course
one could tell the difference in practice between if a compiler does
tail-call recursion optimization and not, but a conforming program
would not have its formal observable behavior changed. That's what I
meant by "You couldn't tell the difference". I meant that a conforming
program with or without tail call recursion would still have its
formal observable behavior be one of the allowed behaviors.

Similarly, if the compiler does something "bad" like combine critical
sections when it shouldn't, you could tell the difference in practice,
just like you could tell the difference in practice between tail-call
recursion optimization and no tail-call recursion optimization.
Whether the compiler does or does not do those things is QoI. The
formal behavior with or without combining the critical sections is one
of the allowed behaviors.

> > Because of the vagueries of scheduling, no conforming program would
> > have its behavior changed in a way that would produce results not
> > allowed by the memory model, and thus the compiler is entirely free
> > to do that under the "as-if" rule.
>
> I understand that this may be a QoI issue, but then again, if I can't
> guarantee that the two days that blah2 runs are in fact unlocked, I
> really can't use this system for this task.

Then you have a QoI issue. By the same token, you would have QoI
issues if the compiler otherwise did something else stupid. However,
combining or not combining those sections results in a formally
allowed observable behavior, and thus the compiler is allowed to do
it.

> QoI of the compiler here
> becomes QoI of the application, and if the compiler is free to do this,
> it doesn't allow me to guarantee my application spec -- which I assume
> to be a typical threaded application.

Then harass your compiler writer to fix it, or the scheduler writer to
fix it, or use a different compiler or scheduler. There are so many
ways that the implementation can be malicious. This is just one of
them.

> > To put that another way, the difference between the two programs is
> > that another thread might acquire the mutex when the first thread is
> > in <blah2>, and thus see <blah1> without being guaranteed to see
> > <blah2> nor <blah3>. That is, it is allowed for the compiler and
> > scheduler to do so. It is not required.
>
> In my example, which is nothing strange, it is required, and to
> implement it, I need to use a compiler with a threading/memory model
> that guarantees me that blah2 runs unlocked, so that it can run in
> parallel.

You also need a compiler that properly does data flow analysis
optimizations, constant folding, constant propagation, common sub-
expression elimination, and so on. You also need a compiler that
doesn't simply produce a maliciously convoluted object file.

It's basically impossible for any general purpose portable memory
model to do as you want. How could you possibly specify that to work
with different kinds of scheduler, unknown schedulers that aren't even
written yet? It depends too much on the vagaries of the scheduler and
other such things. I wouldn't worry about it. At least, I would worry
about it as much as I would worry about stack sizes in programs. Most
of the time you don't need to. Sometimes you do, and you need to check
the outputted assembly.

Also, if you did do that, completely disallow combining nearby
critical sections, then you just removed some optimization
opportunities for IMHO no particularly good reason. I want the
compiler to be able to optimize the OP's problem by combining an empty
critical section with a nearby non-empty one.

> This of course if the scheduler allows it to do so -- but
> since the compiler doesn't control the scheduler, it can't really assume
> much about it and IMO needs to unlock after blah1, even though it
> doesn't know what the scheduler does with this. Whether it makes sense
> to unlock and relock around blah2 /I/ need to decide when writing the
> program, considering the particularities of the scheduler I'm intending
> to run it on.

Again, perhaps it's bad QoI, and the compiler ought to fix it, but it
would result in an allowed formal observable behavior, an allowed
execution, and thus the compiler would be free to do it under the "as
if" rule.

> > IIRC, there's some fluff in the C++0x standard that things must become
> > visible to other threads "eventually", or even "in a reasonable
> > amount of time", and probably (?) something similar along the lines
> > that a thread waiting on a mutex will "eventually" acquire it, or
> > even "in a reasonable amount of time". If you combine those two
> > sections like that, you might have just broken the "eventually" or
> > "in a reasonable amount of time" guarantees.
>
> This doesn't sound good :)  This relates to my earlier comment that I
> think I need to look into the guarantees that the individual
> implementations provide -- if this here reflects the C++03 guarantees,
> IMO they alone are not very useful in practice.

C++03 doesn't have any relevant guarantees. AFAIK, this here reflects
all of the usual memory models (C++0x, POSIX pthreads, Java 1.5+,
win32).

> To go back to the
> example, I need to have the guarantee that blah2 runs unlocked (within
> the overall scheduling granularity of the target system, of course).

As I said earlier, good luck attempting to write a portable
specification to that end, and you don't really "need" it. You need it
as much as you need more than 10 bytes of stack space, and you need it
as much as you need a compiler that won't maliciously pessimize your
programs. If you need more than that, you need something more than
what a portable standard can give you, and you need to check / write
the assembly yourself or maybe use some realtime POSIX extensions.

[toc] | [prev] | [next] | [standalone]

#5272

From	Gerhard Fiedler <gelists@gmail.com>
Date	2011-05-21 11:55 -0300
Message-ID	<q108ekvnu6e0.dlg@gelists.gmail.com>
In reply to	#5190

Joshua Maurice wrote:

> On May 19, 9:46 am, Gerhard Fiedler <geli...@gmail.com> wrote:
>> To go back to the example, I need to have the guarantee that blah2
>> runs unlocked (within the overall scheduling granularity of the
>> target system, of course).
> 
> As I said earlier, good luck attempting to write a portable
> specification to that end, and you don't really "need" it. You need
> it as much as you need more than 10 bytes of stack space, and you
> need it as much as you need a compiler that won't maliciously
> pessimize your programs. If you need more than that, you need
> something more than what a portable standard can give you, and you
> need to check / write the assembly yourself or maybe use some
> realtime POSIX extensions.

Check out this document
<http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.9908> about
the pthread guarantees. I haven't yet read (and understood) all of it,
but it seems to me that in proves that both lock and unlock require
memory fences (albeit different ones), and that it is not allowed to
reorder memory operations across locks (and only in a specific direction
across unlocks). 

I interpret this such that a compiler (that claims to implement the
pthread spec) is not allowed to remove a lock operation -- even if
there's nothing between the lock and the following unlock, and even if
it is to combine two lock/unlock pairs --, as this would remove a memory
fence and allow operations to be reordered across a code position across
which they weren't allowed to be reordered before removing the lock
operation.

Gerhard

[toc] | [prev] | [next] | [standalone]

#5322

From	Joshua Maurice <joshuamaurice@gmail.com>
Date	2011-05-22 01:08 -0700
Message-ID	<b52e52b8-6388-4e02-8f1e-6afa919c372f@22g2000prx.googlegroups.com>
In reply to	#5272

On May 21, 7:55 am, Gerhard Fiedler <geli...@gmail.com> wrote:
> Joshua Maurice wrote:
> > On May 19, 9:46 am, Gerhard Fiedler <geli...@gmail.com> wrote:
> >> To go back to the example, I need to have the guarantee that blah2
> >> runs unlocked (within the overall scheduling granularity of the
> >> target system, of course).
>
> > As I said earlier, good luck attempting to write a portable
> > specification to that end, and you don't really "need" it. You need
> > it as much as you need more than 10 bytes of stack space, and you
> > need it as much as you need a compiler that won't maliciously
> > pessimize your programs. If you need more than that, you need
> > something more than what a portable standard can give you, and you
> > need to check / write the assembly yourself or maybe use some
> > realtime POSIX extensions.
>
> Check out this document
> <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.9908> about
> the pthread guarantees. I haven't yet read (and understood) all of it,
> but it seems to me that in proves that both lock and unlock require
> memory fences (albeit different ones), and that it is not allowed to
> reorder memory operations across locks (and only in a specific direction
> across unlocks).
>
> I interpret this such that a compiler (that claims to implement the
> pthread spec) is not allowed to remove a lock operation -- even if
> there's nothing between the lock and the following unlock, and even if
> it is to combine two lock/unlock pairs --, as this would remove a memory
> fence and allow operations to be reordered across a code position across
> which they weren't allowed to be reordered before removing the lock
> operation.

At first I was slightly concerned that I didn't know what was going
on, but then I found this in the paper:

[quote]
Finally, we show that the rules for reordering memory op-
erations across locking primitives are in fact different from
what we believe most implementors would have expected.
In particular, a memory operation immediately following a
pthread mutex unlock operation may always be moved to just
before the pthread mutex unlock. But the corresponding re-
ordering of a pthread mutex lock with a preceding memory op-
eration is not generally safe. More general movement into critical
sections is possible in the absence of the pthread mutex trylock
call.
As we point out in section 3, many current implementations ei-
ther do not follow these rules, or add extra fences. Nor is it com-
pletely apparent that they should, since these semantics appear to
have been largely accidental. On the other hand, an understanding
of these issues does seem to be essential to any revision of the cur-
rent rules.
[/quote]

Ah yes, I knew about this. Reading further confirms that the problem
is the one I'm thinking of. It's basically a bug in the POSIX pthreads
standard. The POSIX pthreads standard as written requires that a
try_lock succeed if no one else is holding the lock. This has some
rather interesting implications which the paper lays out. The
implications are that locks require double the "usually required"
fences, or other contortions such as the paper describes (specifically
disallow code motion past a lock).

Also, as the paper notes, almost all of the pthreads implementers
overlooked this bug, and if you write a conforming program that abuses
try_lock in interesting ways, it'll produce an unallowed execution on
nearly all pthreads implementations (colloquially "break").

C++0x fixes this problem by allowing try_lock to spuriously fail, that
is fail to acquire the lock even though no one else holds it. (AFAIK),
it is not the intent that implementers will make try_lock spuriously
fail, but programs must assume that it can spuriously fail. This
restricts the set of formally race free programs to ones that allow
the intended implementation of locks, and allow code motion past a
lock. Thus, C++0x avoids this nasty little problem.

Note that their "Theorem 6.6" proves that you can indeed move
operations past a lock /if/ there's no (broken) trylocks. The problems
only arise with (broken) trylocks, which C++0x fixes.

----

However, that is immaterial to the problem at hand. Even if you
couldn't do the code transformations that I was discussing due to the
particulars of the situation, you seem to have a more fundamental
problem. The fundamental problem is you don't believe in the "as if"
rule. C++ spells it out quite clearly with "C++03, 1.9 Program
execution / 1". In a C++0x draft n3126, it's still "1.9 Program
execution / 1", reproduced here:

[quote]
1 The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract
machine. This International Standard places no requirement on the
structure of conforming implementations.
In particular, they need not copy or emulate the structure of the
abstract machine. Rather, conforming
implementations are required to emulate (only) the observable behavior
of the abstract machine as explained
below. (5)
[/quote]

Footnote 5:
[quote]
5) This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the
observable behavior of the program. For instance, an actual
implementation need not evaluate part of an expression if it can
deduce that its value is not used and that no side eﬀects aﬀecting the
observable behavior of the program are produced.
[/quote]

Also, see C++0x draft n3126, "1.9 Program execution / 5":
[quote]
5 A conforming implementation executing a well-formed program shall
produce the same observable behavior
as one of the possible executions of the corresponding instance of the
abstract machine with the same program
and the same input. However, if any such execution contains an
undefined operation, this International
Standard places no requirement on the implementation executing that
program with that input (not even
with regard to operations preceding the ﬁrst undefined operation).
[/quote]

I am unable to find something as explicitly clear as that in the POSIX
standards offhand, but I assume it's in there somewhere. At the very
least, I can find plenty of occurrences of the phrase "as if" in the
specs for the individual threading functions, such as in:
http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
[quote]
The implementation shall behave as if at all times there is at most
one owner of any mutex.
[/quote]
Also, I'm pretty sure that's how all the C, C++, Java, etc.,
implementers interpret it.

This is the general spirit of all of the C++-relevant memory models,
including win32, pthreads, and C++0x. (This is also true for the Java
memory model.) It's also true for C and C++ programs in all ways - not
just threading. An implementation is free to run your program however
it wants as long as it produces one of the allowed executions. It
doesn't even need to be the same execution each time. For non-trivial
threaded programs, it's highly likely that it will be a different
execution each time.

Again, I understand your concerns, but compilers, linkers, schedulers,
hardware etc., will continue to optimize, and the rule that lets them
optimize is the "as if" rule. They optimize to make your program run
better. That's their job. Their job is to take your program, transform
the assembly of it (or whatever code they work with) to make it
better. Threads are no exception.

Sometimes the optimization is broken and produces an unallowed
execution, but that's different than what we're talking about here.
We're talking about an optimization that produces an allowed
execution, but the execution is undesirable or unacceptable to the
programmer. Such is life writing real C and C++ programs. If your
implementation ever does such a broken optimization, thus I suggest
you complain to your implementers, switch implementations, or write
the relevant parts in assembly (and hope that the problem is in the
compiler, as the linker, scheduler, hardware, etc., can still get
you).

[toc] | [prev] | [next] | [standalone]

#4534

From	James Kanze <james.kanze@gmail.com>
Date	2011-04-30 15:54 -0700
Message-ID	<6b36bf8e-37b5-4e8c-bf34-a9711c3c4183@b19g2000yqg.googlegroups.com>
In reply to	#4373

On Apr 26, 5:58 pm, jl_p...@hotmail.com wrote:

> Recently I've been reading up on "Double-Checked Locking" in
> C++ and how it's often implemented imperfectly.

You mean, how it is impossible to implement in standard C++
(03).

> The Article "C++ and the Perils of Double-Checked Locking" by
> Scott Meyers and Andrei Alexandrescu
> (http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)
> provides a good overview of how it's usually done and why it's
> often inadequate.

The standard solution for implementing a singleton works just
fine:

    Singleton* Singleton::instance()
    {
	ScopedLock lock(mutex);
	if ( pInstance == NULL )
	    pInstance = new Singleton;
	return pInstance;
    }

If the lock is a performance bottleneck (a case I've yet to
encounter), then you can simply insure that the singleton is
constructed before threads are started, and skip the lock.

Anything else is simply added complexity, for no reason.

--
James Kanze

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Please disprove this Double-Checked Locking "fix"

Contents

#4373 — Please disprove this Double-Checked Locking "fix"

#4374

#4376

#4566

#4596

#4614

#4378

#4379

#4380

#4383

#4901

#4951

#4952

#5144

#5148

#5167

#5190

#5272

#5322

#4534