Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Robert Elz Newsgroups: gnu.bash.bug Subject: Re: "wait" loses signals Date: Mon, 24 Feb 2020 15:59:43 +0700 Lines: 159 Approved: bug-bash@gnu.org Message-ID: References: <750d460d-b8a4-4157-1488-9f4d9f973715@redhat.com> <35034c85-fd5a-5034-a2d5-e3903888069d@case.edu> <00620c20-19ea-e71e-dc1b-926847901f82@redhat.com> <25750.1582534783@jinx.noi.kre.to> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: usenet.stanford.edu 1582535370 1766 209.51.188.17 (24 Feb 2020 09:09:30 GMT) X-Complaints-To: action@cs.stanford.edu Cc: Denys Vlasenko , bug-bash@gnu.org, Harald van Dijk To: chet.ramey@case.edu Envelope-to: bug-bash@gnu.org In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 2001:3c8:9009:181::2 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <25750.1582534783@jinx.noi.kre.to> X-Mailman-Original-References: <750d460d-b8a4-4157-1488-9f4d9f973715@redhat.com> <35034c85-fd5a-5034-a2d5-e3903888069d@case.edu> <00620c20-19ea-e71e-dc1b-926847901f82@redhat.com> Xref: csiph.com gnu.bash.bug:15955 Date: Fri, 21 Feb 2020 10:07:25 -0500 From: Chet Ramey Message-ID: | That's just not reasonable. You're saying signals that are received before | the wait builtin begins executing (say, while the command is being parsed, | or the shell is doing some other bookkeeping task) should be considered | to have arrived while the wait builtin is executing. I'm pretty sure that's | not consistent with the letter or the spirit of the standard. It quite clearly isn't consistent, what the standard says is: When the shell is waiting, by means of the wait utility, for asynchronous commands to complete, the reception of a signal for which a trap has been set shall cause the wait utility to return immediately with an exit status >128, immediately after which the trap associated with that signal shall be taken. Note: "when the shell us waiting for an asynchronous command to complete" (when that happens as a result of the user/script executing the wait utility) then ... What Denys is failing to realise, is that the standard describes what shells do (or more accurately perhaps, did, in the late 1980's or early 1990's) not what someone might want them to do. And that is, when the wait/waitpid/wait3/wait4/waitid/wait6 (whatever the shell uses) system call returns EINTR, the wait utility exited with a status indicating it was interrupted by that signal (status > 128 means 128+SIGno) and runs the trap. Because that is what shells actually did - the alternative being to simply restart the wait on EINTR like many other system calls that are interrupted by signals are conventionally restarted. Like it or not, that's what shells did, what most still do, and what the standard says must be done. Apart from that, and not interrupting a wait for a foreground process, the standard says very little about when traps should be run, and sorry Harald, but your "as soon as" from ... harald@gigawatt.nl said: | In the same way, I think that except when overridden by 2.11, the "when" | in "Otherwise, the argument action shall be read and executed by the | shell when one of the corresponding conditions arises." should be | interpreted as "as soon as". The only way to do that literally would be to run the trap from the signal handler, as that is "as soon as" the condition arises. But I think we all know that is simply not possible. So let's read that as "as soon as possible after" instead. That's getting more reasonable, but someone needs to decide just what is possible - will running the trap handler mess up the shell's internal state while a new command is parsed and executed? Eg: what if we had VAR=$(grep -c some_string file*.c) and a (trapped) signal arrives while grep is running (more correctly, while the process running the command substitution, which runs grep, is running). We know we cannot interrupt the wait for that foreground process to run the trap handler, so we don't - but do we execute the trap handler before we assign the answer to VAR ? This kind of thing is why shells in general only normally even look to see if there is a trap handler waiting to run after completing executing commands, not in the middle of one. The relevance of this is that if a signal arrives while the wait command is executing (or as Chet suggested, while doing whatever housekeeping is needed to prepare to run it, like looking to see what command comes next) but before the relevant wait*() system call is running, the trap won't be run until after the wait command completes. That's the way shells have always worked, and the way the standard (for that very reason) says can be relied upon by scripts - which is much of its purpose, to tell script writers what they can expect will work, and what will not necessarily work. Now the standard doesn't preclude a shell from looking for pending traps as frequently as it wants to, every second line of C code in the shell could be if (traps_pending) run_trap_handler(); But most shell authors (I believe) wouldn't consider that reasonable. The standard also doesn't preclude a shell from taking extra measures to push the arrival of a signal in the wait utility down to occur in the wait system call (or whatever replaces it). Old shells didn't do that, as there simply was no mechanism for that, and using SIGCHLD was always problematic because of its quite different implementation of different (now ancient) systems, hence we have what we have. The standard is not a legislature, and does not change the rules just because what is there doesn't look reasonable, or you don't like it. If you want things changed, convince the major shell maintainers that this race condition is something they should make their shell go slower to fix (because that's really all it takes on modern systems) and wait for them to comply. When most major shells (perhaps all major shells, and some of the others) have implemented what you want, then you can suggest to the standards body that this is something that ought to be made available as a reliable feature that scripts can rely on. After that expect to wait 10-15 years for enough time to pass for a new version of the standard to be due (it won't happen n a correction update) before anything happens. I'm not a "major" shell maintainer by any means - but you would have trouble convincing me of that - I simply don't believe that the trap/kill combination is or ever was intended to be an IPC mechanism for the shell - rather traps allow 2 main features ... they allow cleanup after various errors (deleting temp files, etc) and they allow the script to report what it is doing when requested (kind of like SIGINFO, using any available signal, but giving script provided information). And while for the latter, consider something like step=0 trap 'printf "At step %s\\n" "${step}" USR1 : $((step = step + 1)) # do something : $((step = step + 1)) # do something : $((step = step + 1)) make world If a USER1 signal arrives while "make world" is running, we know that we are not allowed to run the trap handler (that's because no shells ever did - I know the FreeBSD shell has an option to alter that, but portable scripts cannot assume any such thing). Hence the "At step 3" message would not appear until ater the maks has finished (perhaps hours after the signal arrived). That might be rewritten as make world & while : do wait $! if [ $? -ne 158 ] # 158 means SIGUSR1 for me # a portable script would need to # determine it dynamically then break fi done which most of the time will improve things, but because of the race reported there's a (very) small chance that every now and then that it will end up waiting even though a signal was sent just before it started its system call, but in that situation your average user will just send another signal, which this time will interrupt the sys call. What we have now works well enough for both these scenarios, the trap is eventually run, and if it seems to be talking too long, another singal can simply be sent. The trap command will be run eventually. That's of no use for IPC purpooses, but I don't care. kre