Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.programming.threads > #1327

condvars of ACE framwork - possible bug

Newsgroups comp.programming.threads
Date 2013-02-04 22:38 -0800
Message-ID <9c4fabd6-a4e1-4ef1-9301-5fe7ef5df016@googlegroups.com> (permalink)
Subject condvars of ACE framwork - possible bug
From Michael Podolsky <michael.podolsky.69@gmail.com>

Show all headers | View raw


Hello All,

There is a condvar implementation in ACE framework
made for systems which lack them.
This implementation, I suppose, may have a bug. 

The ACE implementation, first of all, has a known and 
documented limitation: 'signal'
and 'broadcast' functions must be called only when the mutex associated 
with the condvar is locked. Still even with this limitation
there may be a bug which I am going to demonstrate:

Consider the signal function (file OS_NS_Thread.cpp)
looking like (I removed error handling):

int ACE_OS::cond_signal (ACE_cond_t *cv)
{
  ACE_OS::thread_mutex_lock (&cv->waiters_lock_);
  bool const have_waiters = cv->waiters_ > 0;
  ACE_OS::thread_mutex_unlock (&cv->waiters_lock_);
 
  if (have_waiters)
       return ACE_OS::sema_post (&cv->sema_);
  else
      return 0; // No-op
} 

The supposed problem with it is:
the signaled semaphore cv->sema_  may be "consumed"
by any thread, not especially by a thread which is 
supposed to currently wait on it. It may very well be consumed
by the thread which called  'cond_signal' function and 
then decided to 'wait' for condvar by itself, that
is, the thread may be waken up by its own signal,
and the other thread which called the wait
function before the condvar was signaled, would
not be woken up at all!

Now the code of 'wait' function is a bit complicated
(error handling and some unimportant logic is removed):

int ACE_OS::cond_wait (ACE_cond_t *cv,  ACE_mutex_t *external_mutex)
{
  ACE_OS::thread_mutex_lock (&cv->waiters_lock_) ;
  ++cv->waiters_;
  ACE_OS::thread_mutex_unlock (&cv->waiters_lock_);
  int result = 0;
  if (external_mutex->type_ == USYNC_PROCESS)
    {
      // This call will automatically release the mutex and wait on the semaphore.
/* LINE A */      ::SignalObjectAndWait (external_mutex->proc_mutex_, cv->sema_, INFINITE, FALSE),
    }
  else
    {
      // We keep the lock held just long enough to increment the count of
       ACE_OS::mutex_unlock (external_mutex);
      /* LINE B */ result = ACE_OS::sema_wait (&cv->sema_);
    }

  ACE_OS::thread_mutex_lock (&cv->waiters_lock_);
  --cv->waiters_;
  bool const last_waiter = cv->was_broadcast_ && cv->waiters_ == 0;
  ACE_OS::thread_mutex_unlock (&cv->waiters_lock_);

    if(external_mutex->type_ == USYNC_PROCESS)
    {
      if (last_waiter)
           ::SignalObjectAndWait (cv->waiters_done_,
                                                                external_mutex->proc_mutex_,
                                                                INFINITE, FALSE);
      else
            ACE_OS::mutex_lock (external_mutex);
      return result;
    }
  else if (last_waiter)
        ACE_OS::event_signal (&cv->waiters_done_);

  ACE_OS::mutex_lock (external_mutex);

  return result;
}

So what I wanted to say is that on 'LINE B' (in the prev code) we have the external mutex unlocked and only then approach the wait on
the semaphore, but that means that any other thread may gain the external mutex, call cond_wait function on the same condvar
and inside it succeed with wait on the same semaphore before the first thread even called ACE_OS::sema_wait (&cv->sema_).
It may very well be a scenario in which: Thread A decides to wait on a codvar, releases the mutex, gets preempted
by Thread B which gains mutex, signals condvar, then waits on it and consumes its own signal!

As for 'LINE A' which uses some elaborated SignalObjectAndWait of Win API, this function is not considered to be atomic
anymore as for current MSDN documentation. So all arguments for LINE B stay true for LINE A.

The proposed fix (if the bug actually exists) is simple: 'broadcast' function (which I do not show here) has a special
functionality to wait until last waiter completes on its wait on semaphore (this particularly
works as broadcast is made while holding the external mutex). The same should be
done for the case of 'signal': the cond_signal function should wait until one thread completes
its wait.

Thanks for your attention and comments )

Back to comp.programming.threads | Previous | NextNext in thread | Find similar


Thread

condvars of ACE framwork - possible bug Michael Podolsky <michael.podolsky.69@gmail.com> - 2013-02-04 22:38 -0800
  Re: condvars of ACE framwork - possible bug Codeplug <graham.greene@charter.net> - 2013-02-06 13:42 -0800
    Re: condvars of ACE framwork - possible bug Michael Podolsky <michael.podolsky.69@gmail.com> - 2013-02-06 18:36 -0800
      Re: condvars of ACE framwork - possible bug Codeplug <graham.greene@charter.net> - 2013-02-06 19:39 -0800
        Re: condvars of ACE framwork - possible bug Michael Podolsky <michael.podolsky.69@gmail.com> - 2013-02-06 20:01 -0800
          Re: condvars of ACE framwork - possible bug Codeplug <graham.greene@charter.net> - 2013-02-06 20:16 -0800
            Re: condvars of ACE framwork - possible bug Michael Podolsky <michael.podolsky.69@gmail.com> - 2013-02-06 20:30 -0800
              Re: condvars of ACE framwork - possible bug Codeplug <graham.greene@charter.net> - 2013-02-07 09:05 -0800
                Re: condvars of ACE framwork - possible bug Michael Podolsky <michael.podolsky.69@gmail.com> - 2013-02-07 19:27 -0800
  Re: condvars of ACE framwork - possible bug Noob <root@127.0.0.1> - 2013-02-07 11:54 +0100
    Re: condvars of ACE framwork - possible bug Michael Podolsky <michael.podolsky.69@gmail.com> - 2013-02-07 19:24 -0800

csiph-web