Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Rainer Weikusat Newsgroups: comp.os.linux.development.apps Subject: Re: Linux O_NONBLOCK bug/ quirk Date: Sun, 30 Mar 2014 19:42:01 +0100 Lines: 126 Message-ID: <87ha6fr0jq.fsf@sable.mobileactivedefense.com> References: <878urvu0gx.fsf@sable.mobileactivedefense.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: individual.net lmUqxYBkrj+KkiwN3adabQOGjYmPnrUMEKZBBdXKhmoIgNKi4= Cancel-Lock: sha1:4ANrx068cvi0nAxenvUOTxphLm4= sha1:cyGwFuctYZbbAg39PuGZ0ZHUg/0= User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) Xref: csiph.com comp.os.linux.development.apps:674 Lusotec writes: > Rainer Weikusat wrote: >> As part of one of the usual 'pleasant exchanges' with the people whose >> ability to make a living depends on controlling access to the Linux code >> base, > > Thats nonsense! > >> it came to light that a receive operation on a socket in non-blocking mode >> can actually be blocked forever on Linux, example code: >> >> --------- >> #include >> #include >> #include >> #include >> >> int main(void) >> { >> struct sockaddr_un sun; >> int fd; >> >> fd = socket(AF_UNIX, SOCK_DGRAM, 0); >> sun.sun_family = AF_UNIX; >> strncpy(sun.sun_path, "/tmp/bla", sizeof(sun.sun_path)); >> bind(fd, (struct sockaddr *)&sun, sizeof(sun)); >> >> if (fork() == 0) read(fd, &fd, sizeof(fd)); >> >> sleep(1); >> >> fcntl(fd, F_SETFL, O_NONBLOCK); >> read(fd, &fd, sizeof(fd)); >> >> return 0; >> } >> -------- >> >> Killing the forked process results in the other aborting the read call >> with EAGAIN, as can be determined with strace. >> >> I don't think this is of much practical relevance but it is something >> worth knowing about. > > In the above code, both child and parent processes are reading from the same > file descriptor. > > Reads from a file descriptiors are queued and served in a fifo fashion. This > is true for blocking and non-blocking reads. Even non-blocking reads still > have to wait for any previous reads to complete, even if they are going to > just return EAGAIN. This is not true: In the given case, there's a single mutex in the recv-function and all readers except the first will block on this mutex and will afterwards be served in whatever order they actually acquire the mutex. This may turn out to be FIFO but may well be different, eg, based on priorities. > The issue with your code is that the file descriptor is set to non-blocking > while the first read, a blocking read, is active. When a second read, this > will be non-blocking, is made the first read is still blocking and thus the > second non-blocking read has to wait for the first to finish. Yes. Because the first read has acquired the mutex and is a blocking read, all subsequent reads are effectively blocked until a message is received on this socket, regardless if they were supposed to be non-blocking or not. But the definition of "non-blocking read" is "it won't wait indefinetely until a message is received". > Currently, in Linux fcntl affects future operations but not previous or > current operations. As such, if a read is blocking a file descriptor, future > reads, even if non-blocking will have to wait for the current blocking read > to complete. > > Now, for the code to work as you expect it (or at least as I understood your > expectation), a fcntl must affect a already running operations. I think this > is very problematic. Aborting the blocking read with EAGAIN would indeed be wrong since it is supposed to block until data is received (or it is interrupted by a signal). But the second read isn't: It is supposed to return immediately with either a message or an EAGAIN error. As I wrote in the other posting: While I'd rate the code above as 'contrived example showing a theoretic problem' the issue is different for the 'recv with MSG_DONTWAIT' case: Since 'blocking' or 'non-blocking' semantics can be demanded with a 'granularity' of individual recvs calls, it is perfectly reasonable to expect that the blocking ones will potentially block and the non-blocking ones won't. As it stands, the actual behaviour of an individual call with MSG_DONTWAIT set is effectively unpredictable except if it is certain that only one thread of execution tries receive operations on the socket or if only non-blocking receives are attempted. This could be documented as the usual 'in case ..., the behaviour is undefined', the usual fig leaf for "the implementation doesn't handle this case sensibly", but it isn't. This is also specifically a 'feature' of the AF_UNIX socket implementation (and reportedly, AF_INET, too). Other 'things' capable of supporting non-blocking I/O, eg, pipes (tested) behave as expected[*]: The non-blocking call blocks, the other doesn't. The reason for this is that the pipe-mutex is released prior to blocking a blocking reader, something which cannot 'easily' be done for AF_UNIX sockets because the lock exists in the 'AF_UNIX layer' and the blocking wait is done with a general 'datagram socket function' blissfully unaware of that. [*] The pipe_read implementation in pipe.c (3.2.54) actually contains the following comment: if (!pipe->waiting_writers) { /* syscall merging: Usually we must not sleep * if O_NONBLOCK is set, or if we got some data. * But if a writer sleeps in kernel space, then * we can wait for that data without violating POSIX. The kernel seems to disagree with itself on that (or, more likely, the guy who wrote the pipe-code was a little more far-thinking [or experienced] than the guy who wrote the AF_UNIX code and thus, didn't have to have the issue pointed out to him in a 'politically unwelcome way', namely, by me). > What are you trying to do by reading from the same socket in two processes, > especially when you change the file descriptor status in the middle of the > operations? Both are very unusual. In this case, nothing. The code was supposed to demonstrate a property of the implementation I'd consider to be not in line with the documented behaviour of said implementation.