Path: csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: Buffer contents well-defined after fgets() reaches EOF ? Date: Sat, 15 Feb 2025 08:37:20 -0800 Organization: A noiseless patient Spider Lines: 77 Message-ID: <864j0vyxj3.fsf@linuxsc.com> References: <20250210124911.00006b31@yahoo.com> <86ldu9zxkb.fsf@linuxsc.com> <20250214165108.00002984@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Sat, 15 Feb 2025 17:37:20 +0100 (CET) Injection-Info: dont-email.me; posting-host="c7695d44d5813647a552660a568166b3"; logging-data="109065"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19yr+a6xAY9WkVTst/DYZod2pdSsds1p14=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:sra8n9xMfvX3nMmuOzKBO75wJTA= sha1:H2u4TPwR10WtpEVV9S843PDGJjU= Xref: csiph.com comp.lang.c:390329 Michael S writes: > On Thu, 13 Feb 2025 07:14:28 -0800 > Tim Rentsch wrote: > >> Michael S writes: >> >>> On Sun, 9 Feb 2025 17:22:43 -0800 >>> Andrey Tarasevich wrote: >>> >>>> On Sun 2/9/2025 5:06 PM, Andrey Tarasevich wrote: >>>> >>>>> On Sun 2/9/2025 3:52 PM, Lawrence D'Oliveiro wrote: >>>>> >>>>>> On Sat, 8 Feb 2025 23:12:44 -0800, Andrey Tarasevich wrote: >>>>>> >>>>>>> If `fgets` reads nothing (instant end-of-file), the entire >>>>>>> buffer remains untouched. >>>>>> >>>>>> You mean, only a single null byte gets written. >>>>> >>>>> No. The buffer is not changed at all in such case. >>>> >>>> ... which actually raises an interesting quiz/puzzle/question: >>>> >>>> Under what circumstances `fgets` is expected to return an empty >>>> string? (I.e. set the [0] entry of the buffer to '\0' and return >>>> non-null)? >>>> >>>> The only answer I can see right away is: >>>> >>>> When one calls it as `fgets(buffer, 1, file)`, i.e. asks it to >>>> read 0 characters. >>>> >>>> This is under assumption that asking `fgets` to read 0 characters >>>> is supposed to prevent it from detecting end-of-file condition or >>>> I/O error condition. One can probably do some nitpicking at the >>>> current wording... but I believe the above is the intent. >>> >>> fgets() is one of many poorly defined standard library functions >>> inherited from early UNIX days. [...] >> >> What about the fgets() function do you think is poorly defined? >> >> Second question: by "poorly defined" do you mean "defined >> wrongly" or "defined ambiguously" (or both)? > > For starter, it looks like designers of fgets() did not believe in > their own motto about files being just streams of bytes. > I don't know the history, so, may be, the function was defined this way > for portability with systems where text files have special record-based > structure? > > Then, everything about it feels inelegant. > A return value carries just 1 bit of information, success or failure. > So why did they encode this information in baroque way instead of > something obvious, 0 and 1? > Appending zero at the end also feels like a hack, but it is necessary > because of the main problem. And the main problem is: how the user is > supposed to figure out how many bytes were read? > In well-designed API this question should be answered in O(1) time. > With fgets(), it can be answered in O(N) time when input is trusted to > contain no zeros. When input is arbitrary, finding out the answer is > even harder and requires quirks. If I understand you correctly your complaint is that the existing semantics are not as useful as you would like them to be, even though the current definition does make the behavior well defined. Is that right? Clearly using fgets() is problematic when the input stream might contain null characters. To me it seems obvious that the original implementors expected that fgets() would not be used in such cases, perhaps with the less severe restriction that the presence of embedded nulls could be detected and simply rejected as bad input, much the same as overly long lines or a final line without a terminating newline character.