Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15025

Re: FIFO race condition on SunOS kernels

Path csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Vladimir Marek <Vladimir.Marek@oracle.com>
Newsgroups gnu.bash.bug
Subject Re: FIFO race condition on SunOS kernels
Date Tue, 1 Jan 2019 23:47:50 +0100
Lines 71
Approved bug-bash@gnu.org
Message-ID <mailman.6666.1546382888.1284.bug-bash@gnu.org> (permalink)
References <eda3efb7-98d4-6a2d-a9d1-cb82f43a5f02@inlv.org>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset=utf-8
X-Trace usenet.stanford.edu 1546382888 32311 208.118.235.17 (1 Jan 2019 22:48:08 GMT)
X-Complaints-To action@cs.stanford.edu
Cc Bug reports for the GNU Bourne Again SHell <bug-bash@gnu.org>
To Martijn Dekker <martijn@inlv.org>
Envelope-to bug-bash@gnu.org
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2018-07-02; bh=0pveRn+XW9DOyvjxx1forcaFkr3O1BOKnKgVONJTLBw=; b=prXI7CgVMV/euzWL7POG/fstjdlubac/Z2lnyzqBmsNHspT4WPL9vmQZ/sBd3FqA3FOl zDkyjBoBUzc+R7PMy3VLwzbYheqS0//6eRt4erkA8wOITgwQDpNqBC/q7eQR5sgxNeiG uC2KY1frOVJiDr7MJ+4RB7wO+KqnmyiBiYf+BHmDH7oGRwuq1ul61+urIWOOj3Dwq0WF uO+hPxH4lnDfTrWCZmLe7KYubbk/V/dzMckh8wNh3oHp0THqfrhZoKktcvi4sK29ceDe ypNFsShwsdc9FoQfKbu9slqWSRAQm481fvJFEYJmns6NNGqcDb25DL81FYEeNn0Tl6ei ag==
Content-Disposition inline
In-Reply-To <eda3efb7-98d4-6a2d-a9d1-cb82f43a5f02@inlv.org>
User-Agent Mutt/1.5.22.1-rc1 (2013-10-16)
X-Proofpoint-Virus-Version vendor=nai engine=5900 definitions=9123 signatures=668680
X-Proofpoint-Spam-Details rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901010204
X-detected-operating-system by eggs.gnu.org: GNU/Linux 3.x [generic] [fuzzy]
X-Received-From 141.146.126.79
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.21
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <http://lists.gnu.org/archive/html/bug-bash/>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref csiph.com gnu.bash.bug:15025

Show key headers only | View raw


Hi,

> You'd think that establishing a pipe between two processes is a very basic
> UNIX feature that should work reliably on all UNIX variants.

One would think that _opening_ a file is a very basic UNIX feature ...
:)

> But the following script seems to break consistently on Solaris and variants
> (SunOS kernels) when executed by bash, ksh93, or dash. All it does is make
> 100 FIFOs and read a line from each -- it should be trivial.
> 
> And it does work fine on (recent versions of) Linux, macOS, and all the
> BSDs, on all shells.

In a horror I have quickly tested on S11.3 SRU34 x86, S11.3 SRU18 sparc,
S11.4 FCS x86, S11.4 FCS sparc, S10 mostly FCS and even development
Solaris version. I have tested it on S11.3 SRU 35 on LDOM. In all cases
it prints lines

this is FIFO 1
...
this is FIFO 100

I haven't seen single issue. Then I tried VirtualBox. And it printed
only two lines. Trussing it it gets stuck at

1546/1:         open("/tmp/FIFOs1546/FIFO4", O_RDONLY|O_XPG4OPEN) (sleeping...)

That said, I do use VirtualBox 5.1.22r115126 which is pretty old.

Putting 0.5s delay anywhere in the loop makes the problem disappear.

Looking more closely at the truss output, I can see

5030/1:          4.142629    open("/var/tmp/FIFOs5030/FIFO7", O_RDONLY|O_XPG4OPEN) Err#91 ERESTART

and shortly after it gets stuck. Process 5030 is bash.

I don't have debug symbols right now, but I guess from disassembly that
we are in redir_open function. I have tried to protect the loop taking
into consideration both EINTR and ERESTART, but that didn't help.

Let's try to look into kernel.

$ mdb -k
> 0t1775::pid2proc|::whatthread|::findstack
stack pointer for thread ffffa1002910c840: ffffe33001d95990
[ ffffe33001d95990 _resume_from_idle+0x192() ]
  ffffe33001d959c0 swtch+0x19d()
  ffffe33001d95a30 cv_wait_sig_swap_core+0x19c()
  ffffe33001d95a50 cv_wait_sig_swap+0x18()
  ffffe33001d95ae0 fifo_open+0x43e()
  ffffe33001d95b60 fop_open+0x18f()
  ffffe33001d95d50 vn_openat+0x974()
  ffffe33001d95ec0 copen+0x4fd()
  ffffe33001d95ef0 openat+0x31()
  ffffe33001d95f00 sys_syscall+0x247()

Hmm, so we are waiting on a condition. That needs to be investigated
deeper. It would be great if you could open a case for this as we have
to prioritize our work ...

At the moment it looks like a bug in Solaris to me, but it shows only on
VirtualBox. I'll try to look at it more. Or rather to find out someone
who knows the filesystems code.

Cheers
-- 
	Vlad

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: FIFO race condition on SunOS kernels Vladimir Marek <Vladimir.Marek@oracle.com> - 2019-01-01 23:47 +0100

csiph-web