X-Received: by 10.224.189.78 with SMTP id dd14mr12826305qab.0.1360046292332; Mon, 04 Feb 2013 22:38:12 -0800 (PST) X-Received: by 10.49.71.135 with SMTP id v7mr2066765qeu.28.1360046292135; Mon, 04 Feb 2013 22:38:12 -0800 (PST) Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder2.hal-mli.net!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!p13no13713632qai.0!news-out.google.com!k2ni8440qap.0!nntp.google.com!p13no12031803qai.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.programming.threads Date: Mon, 4 Feb 2013 22:38:11 -0800 (PST) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=98.142.248.153; posting-account=8nUMzAoAAACehj29GCRJJmNfA9OMAqRd NNTP-Posting-Host: 98.142.248.153 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <9c4fabd6-a4e1-4ef1-9301-5fe7ef5df016@googlegroups.com> Subject: condvars of ACE framwork - possible bug From: Michael Podolsky Injection-Date: Tue, 05 Feb 2013 06:38:12 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Received-Bytes: 5109 Xref: csiph.com comp.programming.threads:1327 Hello All, There is a condvar implementation in ACE framework made for systems which lack them. This implementation, I suppose, may have a bug. The ACE implementation, first of all, has a known and documented limitation: 'signal' and 'broadcast' functions must be called only when the mutex associated with the condvar is locked. Still even with this limitation there may be a bug which I am going to demonstrate: Consider the signal function (file OS_NS_Thread.cpp) looking like (I removed error handling): int ACE_OS::cond_signal (ACE_cond_t *cv) { ACE_OS::thread_mutex_lock (&cv->waiters_lock_); bool const have_waiters = cv->waiters_ > 0; ACE_OS::thread_mutex_unlock (&cv->waiters_lock_); if (have_waiters) return ACE_OS::sema_post (&cv->sema_); else return 0; // No-op } The supposed problem with it is: the signaled semaphore cv->sema_ may be "consumed" by any thread, not especially by a thread which is supposed to currently wait on it. It may very well be consumed by the thread which called 'cond_signal' function and then decided to 'wait' for condvar by itself, that is, the thread may be waken up by its own signal, and the other thread which called the wait function before the condvar was signaled, would not be woken up at all! Now the code of 'wait' function is a bit complicated (error handling and some unimportant logic is removed): int ACE_OS::cond_wait (ACE_cond_t *cv, ACE_mutex_t *external_mutex) { ACE_OS::thread_mutex_lock (&cv->waiters_lock_) ; ++cv->waiters_; ACE_OS::thread_mutex_unlock (&cv->waiters_lock_); int result = 0; if (external_mutex->type_ == USYNC_PROCESS) { // This call will automatically release the mutex and wait on the semaphore. /* LINE A */ ::SignalObjectAndWait (external_mutex->proc_mutex_, cv->sema_, INFINITE, FALSE), } else { // We keep the lock held just long enough to increment the count of ACE_OS::mutex_unlock (external_mutex); /* LINE B */ result = ACE_OS::sema_wait (&cv->sema_); } ACE_OS::thread_mutex_lock (&cv->waiters_lock_); --cv->waiters_; bool const last_waiter = cv->was_broadcast_ && cv->waiters_ == 0; ACE_OS::thread_mutex_unlock (&cv->waiters_lock_); if(external_mutex->type_ == USYNC_PROCESS) { if (last_waiter) ::SignalObjectAndWait (cv->waiters_done_, external_mutex->proc_mutex_, INFINITE, FALSE); else ACE_OS::mutex_lock (external_mutex); return result; } else if (last_waiter) ACE_OS::event_signal (&cv->waiters_done_); ACE_OS::mutex_lock (external_mutex); return result; } So what I wanted to say is that on 'LINE B' (in the prev code) we have the external mutex unlocked and only then approach the wait on the semaphore, but that means that any other thread may gain the external mutex, call cond_wait function on the same condvar and inside it succeed with wait on the same semaphore before the first thread even called ACE_OS::sema_wait (&cv->sema_). It may very well be a scenario in which: Thread A decides to wait on a codvar, releases the mutex, gets preempted by Thread B which gains mutex, signals condvar, then waits on it and consumes its own signal! As for 'LINE A' which uses some elaborated SignalObjectAndWait of Win API, this function is not considered to be atomic anymore as for current MSDN documentation. So all arguments for LINE B stay true for LINE A. The proposed fix (if the bug actually exists) is simple: 'broadcast' function (which I do not show here) has a special functionality to wait until last waiter completes on its wait on semaphore (this particularly works as broadcast is made while holding the external mutex). The same should be done for the case of 'signal': the cond_signal function should wait until one thread completes its wait. Thanks for your attention and comments )