Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: "Jason A. Donenfeld" Newsgroups: gnu.bash.bug Subject: process substitution fd lifetime race condition Date: Sun, 19 Apr 2020 23:15:08 -0600 Lines: 71 Approved: bug-bash@gnu.org Message-ID: References: <20200420051508.GA2359844@zx2c4.com> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: usenet.stanford.edu 1587359735 15364 209.51.188.17 (20 Apr 2020 05:15:35 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=date:from:to :subject:message-id:mime-version:content-type; s=mail; bh=18tu4L A4mfMu2XOFyU7ImVmiZKc=; b=iV86AVdUxMPg/oe1F4fciLpMQVe6NVneccvuuv P1VQGNwyVBIL2GLAJLMqN7IqPF+i+FPDQtOqxfnIIhNSTSkb3fpxPxsTE7HahguI wuemwWGeSAE/1dPxvSC61UQzWk9eiadVVl1JqOx+qPLKLUgV8JneKMciRYQidZGq v+fmtYYnfXTwF672wVNXKYkIJCPpp7CPUm1Xh9B//jIGHeKeiFL/Tipszc+nWgGK p8rxUX8N7F7RHFwBL4/UqjG1f4tEwSr8meDSwcXgYFzAxJO63DrlzRne9l1hsVGb qI84XyUy3XkDSvip3oaIpBjCt7n+z8VkkcMZUYyty3LjrX2Q== Content-Disposition: inline Received-SPF: pass client-ip=192.95.5.64; envelope-from=Jason@zx2c4.com; helo=mail.zx2c4.com X-detected-operating-system: by eggs1p.gnu.org: First seen = 2020/04/20 01:15:09 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Received-From: 192.95.5.64 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <20200420051508.GA2359844@zx2c4.com> Xref: csiph.com gnu.bash.bug:16182 Hi, I've uncovered a very unusual race condition when using process substitution and developed as minimal a reproducer as I could create: set -e private="$(wg genkey)" public="$(wg genkey | wg pubkey)" preshared="$(wg genpsk)" ip link del wg0 type wireguard || true ip link add wg0 type wireguard pretty() { echo -e "\x1b[0m"; } doit() { pretty; "$@"; } while true; do doit wg set wg0 private-key <(echo "$private") peer $public preshared-key <(echo "$preshared") done (That reproducer needs wg(8) from the wireguard-tools package and a linux kernel with wireguard in it, usually >= 5.6.) Eventually, a loop iteration gets called and the fd for one of those process substitutions is closed before the wg(8) process is forked and execed, resulting in "fopen: No such file or directory", and a non-zero exit code. I've managed to strace this behavior: 2359208 1587358773.021705 pipe([3, 4]) = 0 2359208 1587358773.021824 dup2(3, 63) = 63 2359208 1587358773.021876 close(3) = 0 2359208 1587358773.022078 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7ab3071e50) = 2359221 2359221 1587358773.022437 close(255) = 0 2359208 1587358773.022559 close(4) = 0 2359208 1587358773.023216 pipe([3, 4]) = 0 2359208 1587358773.023811 dup2(3, 62) = 62 2359208 1587358773.023868 close(3) = 0 2359221 1587358773.023963 dup2(4, 1) = 1 2359221 1587358773.024003 close(4) = 0 2359208 1587358773.024033 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7ab3071e50) = 2359222 2359221 1587358773.024053 close(63) = 0 2359221 1587358773.024427 write(1, "iAQ/NSN8+UD8LVQvk+8fRx7nWj+pjPR8"..., 45) = 45 2359222 1587358773.024480 close(255) = 0 2359208 1587358773.024746 close(4) = 0 2359221 1587358773.024911 +++ exited with 0 +++ 2359208 1587358773.024943 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2359221, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 2359208 1587358773.025253 write(1, "\33[0m\n", 5) = 5 2359208 1587358773.025337 close(63) = 0 <<<<<<<< WHY IS BASH DOING THIS >>>>>>>> 2359222 1587358773.025358 dup2(4, 1) = 1 2359222 1587358773.025406 close(4) = 0 2359222 1587358773.025459 close(62) = 0 2359222 1587358773.025849 write(1, "0E7my4hwPYQf/rrjBw8CX0mrj3nfkEQQ"..., 45) = 45 2359222 1587358773.026285 +++ exited with 0 +++ 2359208 1587358773.026871 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2359222, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- 2359208 1587358773.027573 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7ab3071e50) = 2359224 2359224 1587358773.027917 close(255) = 0 2359224 1587358773.028376 execve("/usr/bin/wg", ["wg", "set", "wg0", "private-key", "/dev/fd/63", "peer", "behGV5Vt43Xt2XWqBLDiJrKFVhQGRb31"..., "preshared-key", "/dev/fd/62"], 0x5614d508c190 /* 61 vars */) = 0 2359224 1587358773.030316 openat(AT_FDCWD, "/dev/fd/63", O_RDONLY) = -1 ENOENT (No such file or directory) 2359224 1587358773.030558 write(2, "fopen: No such file or directory"..., 33) = 33 2359224 1587358773.030849 +++ exited with 1 +++ 2359208 1587358773.031027 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2359224, si_uid=0, si_status=1, si_utime=0, si_stime=0} --- 2359208 1587358773.031146 close(62) = 0 2359208 1587358773.031899 +++ exited with 1 +++ Pay close attention to the line marked as "WHY IS BASH DOING THIS". Indeed, why is bash doing that? Is there some weird process substitution lifetime bug, in which returning from a function nested at depth 2 (pretty) causes the process substitution parameter to be returned, instead of doing that from the function nested at depth 1? But in that case, why only one and not both? Or is something else happening here? Thanks, Jason