Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Rainer Weikusat Newsgroups: comp.os.linux.development.apps Subject: Re: Blocking queue race condition? Supersedes: <87y5tlnawi.fsf@sapphire.mobileactivedefense.com> Date: Thu, 05 Jan 2012 20:39:00 +0000 Lines: 38 Message-ID: <87vcopnau3.fsf@sapphire.mobileactivedefense.com> References: <74ec83e6-eafc-440e-bb60-8a1ed6412811@h13g2000vbn.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: individual.net TQcGaPc39ZKwIc6gMPxAYAHWOo+BoPJFeXyvfT+2h9kmdObeQ= Cancel-Lock: sha1:X3MIscPRKHdiYz4yKGIf70+xiT4= sha1:cJ/5ysJhsA290uR4vgMnEo+2AHA= Cancel-Key: sha1:Ff2ZMCkqpRcYSA5pH5nZNnZANiE= sha1:5ZCtsL7ddv9hab/x5S1cmgOs4y8= User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) Xref: x330-a1.tempe.blueboxinc.net comp.os.linux.development.apps:345 > I'm trying to implement a high performance blocking queue backed by a > circular buffer on top of pthreads, semaphore.h and gcc atomic > builtins. The queue needs to handle multiple simulataneous readers > and writers from different threads. > > I've isolated some sort of race condition, and I'm not sure if it's a > faulty assumption about the behavior of some of the atomic operations > and semaphores, or whether my design is fundamentally flawed. I've had a look at this and did some experiments with it (specially, I ran it with one thread get and one set thread which seemed to work and two set and one get thread, corruption occuring in get, and two get and one set thread, corruption in set) and - based on the observation that there is no ordering imposed in which two threads of the same kind perform their operations, I think what's happening here is as follows: Let's assume that there are two get threads and one set thread. Initially, head == tail and there are at least two occupants in the queue. Then, the following scenario is possible: 1. get0 returns from sem_wait, does sync_fetch_and for position 0. 2. get1 returns from sem_post, does sync_fetch... for position 1. 3. get1 executes CAS, then posts vacancies. 4. set returns from sem_wait, does sync_and_fetch for position 0. 5. set executes CAS before get0, detecting 'corruption'. 6. some time after 5) get0 does CAS. NB: This is a hypothesis/ theory which seems plausible to me. Corrections etc appreciated.