Groups > comp.lang.c > #155503 > unrolled thread

Inconsistent line counts from 3 methods

Started by	DFS <nospam@dfs.com>
First post	2020-10-10 22:37 -0400
Last post	2020-10-20 15:48 +0100
Articles	20 on this page of 47 — 14 participants

Back to article view | Back to comp.lang.c

  Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-10 22:37 -0400
    Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-10 22:06 -0700
      Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 10:38 -0400
        Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 15:36 +0000
          Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 13:51 -0400
            Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 18:33 +0000
              Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:20 -0400
                Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 19:40 +0000
                  Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:47 -0400
                    Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:35 -0400
                    Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 21:13 +0000
                      Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) DFS <nospam@dfs.com> - 2020-10-11 18:45 -0400
                        Re: NNTP message requirements Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:11 -0700
                Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:27 -0400
                  Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:30 +0100
                    Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 23:56 -0400
            Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 14:53 -0400
              Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:15 -0400
              Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:08 +0000
                Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-14 16:58 -0400
                  Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-14 23:37 +0000
                    Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-14 17:25 -0700
                      Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-15 01:55 +0000
                    Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:19 +0000
                  Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:10 +0000
                    Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-17 19:36 +0000
            Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:16 +0000
          Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:36 -0700
            Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 15:12 -0400
      Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 12:16 -0400
    Re: Inconsistent line counts from 3 methods Johann Klammer <klammerj@NOSPAM.a1.net> - 2020-10-11 15:18 +0200
      Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 14:31 +0000
      Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:31 -0700
      Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:15 +0100
    Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 14:00 -0700
      Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 17:47 -0400
        Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:26 -0700
          Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-12 13:11 -0400
            Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-12 10:56 -0700
              Re: Inconsistent line counts from 3 methods Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-11-29 00:21 -0800
            Re: Inconsistent line counts from 3 methods scott@slp53.sl.home (Scott Lurndal) - 2020-10-12 19:19 +0000
              Re: Inconsistent line counts from 3 methods dfs <nospam@dfs.com> - 2020-10-12 18:53 -0400
                Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 23:09 +0000
                  Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-18 00:24 +0100
                    Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-18 16:56 +0000
                    Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-20 09:17 -0400
                      Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-20 15:48 +0100

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

#155671

From	Eli the Bearded <*@eli.users.panix.com>
Date	2020-10-14 23:37 +0000
Message-ID	<eli$2010141937@qaz.wtf>
In reply to	#155668

In comp.lang.c, James Kuyper  <jameskuyper@alumni.caltech.edu> wrote:
> On 10/14/20 4:08 PM, Jorgen Grahn wrote:
>> On Sun, 2020-10-11, James Kuyper wrote:
>>> As a result, if you do put one in, you'll often end
>>> up with two newlines at the end of the file.
>> /That/ is something I have never seen.  Which tools do that? Sounds
>> like a bug to me -- and one that's easily fixed.

As I read the exchange there I imagined James describing an editor
that adds and extra blank line to all files without one. But the demo
here shows something slightly different:

> I opened a new file with vi, and hit the following keys:
> 
> 	i 1 Enter Esc : x
> 
> Here's what I see in the resulting file:
> 
> ~(48) od -a linetest
> 0000000   1  nl  nl
> 0000003

This seems like a misdescription of what the old Unix standard editors
do. For better or worse, ed, vi, and (I think) emacs all treat a new
empty file as being just a new line. What you did is add a second line
there (that "Enter") and so you now have a two line file. The editor
(vi) doesn't let you explicitly modify the final newline. It just _is_.

With vim, you can disable the implicit final newline with
"set nofixendofline", but there are Unix tools that might not like such
files. As one example, I seem to recall no final newline causing a final
"line" to be silently discarded in /etc/sudoers (and files included by
that).

Elijah
------
though sudoers syntax is a right mess from beginning to end

[toc] | [prev] | [next] | [standalone]

#155672

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-10-14 17:25 -0700
Message-ID	<87mu0o49o7.fsf@nosuchdomain.example.com>
In reply to	#155671

Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.c, James Kuyper  <jameskuyper@alumni.caltech.edu> wrote:
>> On 10/14/20 4:08 PM, Jorgen Grahn wrote:
>>> On Sun, 2020-10-11, James Kuyper wrote:
>>>> As a result, if you do put one in, you'll often end
>>>> up with two newlines at the end of the file.
>>> /That/ is something I have never seen.  Which tools do that? Sounds
>>> like a bug to me -- and one that's easily fixed.
>
> As I read the exchange there I imagined James describing an editor
> that adds and extra blank line to all files without one. But the demo
> here shows something slightly different:
>
>> I opened a new file with vi, and hit the following keys:
>> 
>> 	i 1 Enter Esc : x
>> 
>> Here's what I see in the resulting file:
>> 
>> ~(48) od -a linetest
>> 0000000   1  nl  nl
>> 0000003
>
> This seems like a misdescription of what the old Unix standard editors
> do. For better or worse, ed, vi, and (I think) emacs all treat a new
> empty file as being just a new line. What you did is add a second line
> there (that "Enter") and so you now have a two line file. The editor
> (vi) doesn't let you explicitly modify the final newline. It just _is_.

If I open a new file with vim and save it without entering anything, I
get an empty file, which is a valid text file with 0 lines.

If I do the same thing with busybox vi, I get a file with a single
newline, which seems wrong.

As for the behavior James observed, if I type

    i 1 Esc

I get a '1' character followed by a single newline, because vim assumes
that I meant to have a newline-terminated line.  If I type

    i 1 Enter Esc

I get a '1' character followed by two newlines -- i.e., a 2-line text
file whose second line is empty.

vim has options to handle non-empty files without a trailing newline,
but it doesn't make it easy to create them.  (It may have an option to
do so, but I haven't bothered to find it.)

> With vim, you can disable the implicit final newline with
> "set nofixendofline", but there are Unix tools that might not like such
> files. As one example, I seem to recall no final newline causing a final
> "line" to be silently discarded in /etc/sudoers (and files included by
> that).
>
> Elijah
> ------
> though sudoers syntax is a right mess from beginning to end

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#155674

From	Eli the Bearded <*@eli.users.panix.com>
Date	2020-10-15 01:55 +0000
Message-ID	<eli$2010142128@qaz.wtf>
In reply to	#155672

In comp.lang.c, Keith Thompson  <Keith.S.Thompson+u@gmail.com> wrote:
> If I open a new file with vim and save it without entering anything, I
> get an empty file, which is a valid text file with 0 lines.

Same happens in nvi and ed (tried with /bin/ed on NetBSD).

> If I do the same thing with busybox vi, I get a file with a single
> newline, which seems wrong.

That does seem wrong.

> As for the behavior James observed, if I type
> 
>     i 1 Esc
> 
> I get a '1' character followed by a single newline, because vim assumes
> that I meant to have a newline-terminated line.  If I type

So I compiled and tested ex-1.1 from https://github.com/n-t-roff/ex-1.1
As ex, the save an empty file creates a zero length file. As vi, *you
cannot edit a zero length file*.

$ HOME=/var/tmp/ ./a.out /var/tmp/exds
"/var/tmp/exds" No such file or directory
:vi

No lines in the buffer
:a
.
:vi

No lines in the buffer
:a
1
.
:vi
	(succeeds)
q:w
:q
"/var/tmp/exds" 1 line, 2 characters
$

(ex-1.1 has what is recognizable as vi, but it has significant short
comings like "no colon commands in visual mode" and "can only access
visual mode after starting the editor".)

>     i 1 Enter Esc
> 
> I get a '1' character followed by two newlines -- i.e., a 2-line text
> file whose second line is empty.

Yes. This is the implicit newline I spoke of. vi starts with a single
empty line, the save of a file with nothing on that single empty line
is a special case for vi. For ed and ex you can't enter a partial line.
/bin/ed gives an error if you trying to <ctrl-d> EOF a partial line.
ex (nex Version 1.81.6-2013-11-20nb4) and ex-1.1 ignored <ctrl-d> EOF
until I got tired of testing (more than ten, but less than 26, which I
recall being the special count of them in csh when ignoreeof is set).

> vim has options to handle non-empty files without a trailing newline,
> but it doesn't make it easy to create them.  (It may have an option to
> do so, but I haven't bothered to find it.)

Yeah, it's antithetical to vi, and vim follows vi. But it can be done:

	i 1 Esc :set noendofline nofixendofline | x Enter

Elijah
------
to be honest, had cloned and compiled ex-1.1 several weeks ago

[toc] | [prev] | [next] | [standalone]

#155733

From	Jorgen Grahn <grahn+nntp@snipabacken.se>
Date	2020-10-17 19:19 +0000
Message-ID	<slrnromgt5.1hpq.grahn+nntp@frailea.sa.invalid>
In reply to	#155671

On Wed, 2020-10-14, Eli the Bearded wrote:
...
> With vim, you can disable the implicit final newline with
> "set nofixendofline", but there are Unix tools that might not like such
> files. As one example, I seem to recall no final newline causing a final
> "line" to be silently discarded in /etc/sudoers (and files included by
> that).

Sounds like a bug to be fixed.

There is also the example I gave upthread, which cannot be fixed:
cat foo bar.  cat(1) itself has no business trying to interpret its
input files as lines of text, but as a user you don't expect it to jam
the last line of foo together with the first line of bar -- if you
think of these as text files.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [next] | [standalone]

#155732

From	Jorgen Grahn <grahn+nntp@snipabacken.se>
Date	2020-10-17 19:10 +0000
Message-ID	<slrnromgd7.1hpq.grahn+nntp@frailea.sa.invalid>
In reply to	#155668

On Wed, 2020-10-14, James Kuyper wrote:
> On 10/14/20 4:08 PM, Jorgen Grahn wrote:
>> On Sun, 2020-10-11, James Kuyper wrote:
>>> On 10/11/20 1:51 PM, DFS wrote:
>>> ...
>>>> I'm not talking about IDEs.  I'm talking about writing in general.  I 
>>>> would guess many people write their last line and end it with a period, 
>>>> not a period and return.
>>>
>>> Text file editors (including some IDEs) often put in a final return,
>>> even if you don't.
>> 
>> E.g. Emacs and vi in default configurations.
>> 
>>> As a result, if you do put one in, you'll often end
>>> up with two newlines at the end of the file.
>> 
>> /That/ is something I have never seen.  Which tools do that? Sounds
>> like a bug to me -- and one that's easily fixed.
>
> I opened a new file with vi, and hit the following keys:
>
> 	i 1 Enter Esc : x
>
> Here's what I see in the resulting file:
>
> ~(48) od -a linetest
> 0000000   1  nl  nl
> 0000003

Wow, you're right! I get that behavior with both nvi and the one in
OpenBSD; perhaps it's mandated by POSIX?

I use vi frequently for smaller tasks, but never noticed. Partly
because I tend to edit existing files, and maybe partly because I
unconsciously see that final newline and don't add one.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [next] | [standalone]

#155735

From	Kaz Kylheku <793-849-0957@kylheku.com>
Date	2020-10-17 19:36 +0000
Message-ID	<20201017122145.940@kylheku.com>
In reply to	#155732

On 2020-10-17, Jorgen Grahn <grahn+nntp@snipabacken.se> wrote:
> On Wed, 2020-10-14, James Kuyper wrote:
>> On 10/14/20 4:08 PM, Jorgen Grahn wrote:
>>> On Sun, 2020-10-11, James Kuyper wrote:
>>>> On 10/11/20 1:51 PM, DFS wrote:
>>>> ...
>>>>> I'm not talking about IDEs.  I'm talking about writing in general.  I 
>>>>> would guess many people write their last line and end it with a period, 
>>>>> not a period and return.
>>>>
>>>> Text file editors (including some IDEs) often put in a final return,
>>>> even if you don't.
>>> 
>>> E.g. Emacs and vi in default configurations.
>>> 
>>>> As a result, if you do put one in, you'll often end
>>>> up with two newlines at the end of the file.
>>> 
>>> /That/ is something I have never seen.  Which tools do that? Sounds
>>> like a bug to me -- and one that's easily fixed.
>>
>> I opened a new file with vi, and hit the following keys:
>>
>> 	i 1 Enter Esc : x
>>
>> Here's what I see in the resulting file:
>>
>> ~(48) od -a linetest
>> 0000000   1  nl  nl
>> 0000003
>
> Wow, you're right! I get that behavior with both nvi and the one in
> OpenBSD; perhaps it's mandated by POSIX?

No no no.

It's mandated by the fact that you created two lines.

If you want to create a file which contains just one line, you do this:

  i1Esc:x<Enter>

By entering material into the previously empty file, you have already
created a line.  So that is to say, the moment you hit i to  begin
inserting into the empty file, line 1 is allocated for you to type into.
If you then Enter while still in insert mode, you're creating an
additional line.

This is perfectly clear from the traditional ~ (tilde) marks that run
down the left column, indicating the nonexistent lines past the end of
the buffer.

If you type in the line as I show above, you will see that there is a
tilde just below the 1.  If you hit Enter before hitting Esc, then
you see a blank line between the 1 and the first tilde.

This is all reasonably intuitive.

The slightly counter-intuitive part is that a line is created when you
hit i. That's obviously a special case for the empty buffer situation;
that command will not create a line if the cursor is already on a line,
which is always the case if the buffer is not empty.

POSIX has these words, interestingly:

    Initialization in ex and vi

    Historically, vi always had a line in the edit buffer, even if the edit
    buffer was "empty". For example:

    1. The ex command = executed from visual mode wrote "1" when the buffer
       was empty.

    2. Writes from visual mode of an empty edit buffer wrote files of a
       single character (a <newline>), while writes from ex mode of an
       empty edit buffer wrote empty files.

    3. Put and read commands into an empty edit buffer left an empty line
       at the top of the edit buffer.

    For consistency, POSIX.1-2017 does not permit any of these behaviors.

Those are just interesting historic remarks I found; I tried looking for
a concrete specification of behavior for i (insert before cursor) when
the buffer is empty. I didn't find anything and don't have time at the
moment to go through it in more detail. The observed behavior in various
vi implementations makes a lot of sense, though.

-- 
TXR Programming Language: http://nongnu.org/txr
Music DIY Mailing List:  http://www.kylheku.com/diy
ADA MP-1 Mailing List:   http://www.kylheku.com/mp1

[toc] | [prev] | [next] | [standalone]

#155666

From	Jorgen Grahn <grahn+nntp@snipabacken.se>
Date	2020-10-14 20:16 +0000
Message-ID	<slrnroen59.1hpq.grahn+nntp@frailea.sa.invalid>
In reply to	#155521

On Sun, 2020-10-11, DFS wrote:
> On 10/11/2020 11:36 AM, Jorgen Grahn wrote:
>> On Sun, 2020-10-11, DFS wrote:
>> ...
>>> I see fgets() is more "reliable" than f/getc in case your final line is
>>> missing a newline (which I would bet happens frequently):
>> 
>> In the past, on Unix, it used to happen very infrequently, but it
>> seems recent IDEs generate "endless" text files by default.
>> 
>> I have yet to figure out why they do this.  The only effect is a
>> saving of one byte, and that lot of traditional tools break in subtle
>> ways[1] ... but a conspiracy against Unix seems unlikely.
>
>
> I'm not talking about IDEs.  I'm talking about writing in general.

Which may involve an IDE.  But I didn't mean to talk about IDEs
either. My main message was, on Unix I would expect few text files to
have a missing final newline.

This may of course not be useful information to anyone, and I don't
claim to know what traditions, if any, exist in other communities
which use text files.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [next] | [standalone]

#155524

From	Barry Schwarz <schwarzb@delq.com>
Date	2020-10-11 11:36 -0700
Message-ID	<ktj6oft4i3mmrn4beslolt3vi6ncqfqpi1@4ax.com>
In reply to	#155516

On 11 Oct 2020 15:36:18 GMT, Jorgen Grahn <grahn+nntp@snipabacken.se>
wrote:

>On Sun, 2020-10-11, DFS wrote:
>...
>> I see fgets() is more "reliable" than f/getc in case your final line is 
>> missing a newline (which I would bet happens frequently):
>
>In the past, on Unix, it used to happen very infrequently, but it
>seems recent IDEs generate "endless" text files by default.
>
>I have yet to figure out why they do this.  The only effect is a
>saving of one byte, and that lot of traditional tools break in subtle
>ways[1] ... but a conspiracy against Unix seems unlikely.
>
>> "fgets() stops when either (n-1) characters are read, the newline 
>> character is read, or the end-of-file is reached, whichever comes first."
>
>What do you mean?  All the functions you list give you all information
>available; they all seem reliable.
>
>(fgets() can't handle '\0' characters, but that's a separate thing.)

There is nothing in the description of fgets that mentions any issue
with the \0 character.  fgets will stop reading after the appropriate
number of characters or reading the \n, whichever comes first.  \0
characters will be stored in the array just like any other character. 
If someone were to process the array as a string, the \0 would mess
things up but that is not an fgets issue and not relevant to the
problem under discussion.

-- 
Remove del for email

[toc] | [prev] | [next] | [standalone]

#155527

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2020-10-11 15:12 -0400
Message-ID	<rlvlff$trg$1@dont-email.me>
In reply to	#155524

On 10/11/20 2:36 PM, Barry Schwarz wrote:
> On 11 Oct 2020 15:36:18 GMT, Jorgen Grahn <grahn+nntp@snipabacken.se>
> wrote:
> 
>> On Sun, 2020-10-11, DFS wrote:
>> ...
>>> I see fgets() is more "reliable" than f/getc in case your final line is 
>>> missing a newline (which I would bet happens frequently):
>>
>> In the past, on Unix, it used to happen very infrequently, but it
>> seems recent IDEs generate "endless" text files by default.
>>
>> I have yet to figure out why they do this.  The only effect is a
>> saving of one byte, and that lot of traditional tools break in subtle
>> ways[1] ... but a conspiracy against Unix seems unlikely.
>>
>>> "fgets() stops when either (n-1) characters are read, the newline 
>>> character is read, or the end-of-file is reached, whichever comes first."
>>
>> What do you mean?  All the functions you list give you all information
>> available; they all seem reliable.
>>
>> (fgets() can't handle '\0' characters, but that's a separate thing.)
> 
> There is nothing in the description of fgets that mentions any issue
> with the \0 character.  fgets will stop reading after the appropriate
> number of characters or reading the \n, whichever comes first.  \0
> characters will be stored in the array just like any other character. 
> If someone were to process the array as a string, the \0 would mess
> things up but that is not an fgets issue and not relevant to the
> problem under discussion.

It's not specific to fgets(), it's a property of text streams.

"... Data read in from a text stream will necessarily compare equal to
the data that were earlier written out to that stream only if: the data
consist only of printing characters and the control characters
horizontal tab and new-line; no new-line character is immediately
preceded by space characters; and the last character is a new-line
character. ..." (7.21.2p2).

Data that contains a null character doesn't qualify for the above
guarantee, which gives implementations permission to (among many other
possibilities) drop such characters while either writing to or reading
from a text stream.

"Data read in from a binary stream shall compare equal to the data that
were earlier written out to that stream, under the same implementation."
(7.21.2p3).
Therefore, null characters should not be a problem for fgets() when
reading from a binary stream. However, on systems where lines are
indicated by methods other than a newline at the end of each line,
fgets() on a binary stream won't work the way you probably want it to.

[toc] | [prev] | [next] | [standalone]

#155518

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2020-10-11 12:16 -0400
Message-ID	<rlvb43$emn$1@dont-email.me>
In reply to	#155504

On 10/11/20 1:06 AM, Barry Schwarz wrote:
> On Sat, 10 Oct 2020 22:37:35 -0400, DFS <nospam@dfs.com> wrote:
...
>>  FILE *fin = fopen(argv[1],"r");
>>  fseek(fin, 0, SEEK_END);
>>  int buffer = ftell(fin);	
>>  char *myStr = malloc(sizeof(char) * (buffer + 1));
>>  rewind(fin);
>>  fread(myStr, sizeof(char), buffer, fin);
...
> Alternately, you could open the file in binary mode instead of text
> mode.  ftell should work then.

In text mode, whatever implementation-specific method is used to
identify lines is converted into a single '\n' at the end of each line.

In binary mode, no such conversion will be done. On platforms where line
endings are separated by a sequence such as '\n\r' or '\r\n', counting
'\n' characters without also checking for an '\r' in the correct
location may overcount the number of lines. On platforms where lines are
marked by methods that make no use of '\n' characters, it will
undercount them.

[toc] | [prev] | [next] | [standalone]

#155513

From	Johann Klammer <klammerj@NOSPAM.a1.net>
Date	2020-10-11 15:18 +0200
Message-ID	<rlv0mo$296$1@gioia.aioe.org>
In reply to	#155503

On 10/11/2020 04:37 AM, DFS wrote:
> $ countlines war_peace.txt
> fread-var: 66875  off by 1183
> fgetc    : 65692  correct
> fgets    : 65692  correct
> 
> 
> $ countlines bible.txt
> fread-var: 31255  off by 153
> fgetc    : 31101  off by 1
> fgets    : 31102  correct
> 
> 
> ======================================================
> #include <stdio.h>
> #include <stdlib.h>
> 
> int main(int argc, char *argv[])
> {
>                 
>  char line[600] = "";
>  char c;
> 
>  // use fread to populate a variable
>  // open file, go to end, get size, allocate memory, back to
>  // beginning, read contents into variable
>  FILE *fin = fopen(argv[1],"r");
>  fseek(fin, 0, SEEK_END);
>  int buffer = ftell(fin);   
>  char *myStr = malloc(sizeof(char) * (buffer + 1));
>  rewind(fin);
>  fread(myStr, sizeof(char), buffer, fin);

Here, I don't see where you terminate the string.
malloc returns unitialized memory, I think. 
And fread just reads a chunk of data.
Use calloc or do a 
myStr[buffer]='\0';
before fread.

[toc] | [prev] | [next] | [standalone]

#155514

From	Jorgen Grahn <grahn+nntp@snipabacken.se>
Date	2020-10-11 14:31 +0000
Message-ID	<slrnro65qa.1hpq.grahn+nntp@frailea.sa.invalid>
In reply to	#155513

On Sun, 2020-10-11, Johann Klammer wrote:
> malloc returns unitialized memory, I think. 

True, it does.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .

[toc] | [prev] | [next] | [standalone]

#155522

From	Barry Schwarz <schwarzb@delq.com>
Date	2020-10-11 11:31 -0700
Message-ID	<ebj6ofhp65rko2a372a58qirpbmbk0utgo@4ax.com>
In reply to	#155513

On Sun, 11 Oct 2020 15:18:02 +0200, Johann Klammer
<klammerj@NOSPAM.a1.net> wrote:

>On 10/11/2020 04:37 AM, DFS wrote:
>> $ countlines war_peace.txt
>> fread-var: 66875  off by 1183
>> fgetc    : 65692  correct
>> fgets    : 65692  correct
>> 
>> 
>> $ countlines bible.txt
>> fread-var: 31255  off by 153
>> fgetc    : 31101  off by 1
>> fgets    : 31102  correct
>> 
>> 
>> ======================================================
>> #include <stdio.h>
>> #include <stdlib.h>
>> 
>> int main(int argc, char *argv[])
>> {
>>                 
>>  char line[600] = "";
>>  char c;
>> 
>>  // use fread to populate a variable
>>  // open file, go to end, get size, allocate memory, back to
>>  // beginning, read contents into variable
>>  FILE *fin = fopen(argv[1],"r");
>>  fseek(fin, 0, SEEK_END);
>>  int buffer = ftell(fin);   
>>  char *myStr = malloc(sizeof(char) * (buffer + 1));
>>  rewind(fin);
>>  fread(myStr, sizeof(char), buffer, fin);
>
>Here, I don't see where you terminate the string.
>malloc returns unitialized memory, I think. 
>And fread just reads a chunk of data.
>Use calloc or do a 
>myStr[buffer]='\0';
>before fread.

calloc would prevent him from seeing any residual \n characters in the
buffer.  This would eliminate the extra lines his program counted but
will do nothing to solve the undercount if the last line does not end
with a \n.

Setting myStr[buffer] does nothing to solve any of the issues
reported.  The fact that he does not terminate the array with a '\0'
is irrelevant since he is not processing strings or using any
functions that do.

-- 
Remove del for email

[toc] | [prev] | [next] | [standalone]

#155544

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2020-10-11 23:15 +0100
Message-ID	<87v9fg8l4u.fsf@bsb.me.uk>
In reply to	#155513

Johann Klammer <klammerj@NOSPAM.a1.net> writes:

> On 10/11/2020 04:37 AM, DFS wrote:
>> $ countlines war_peace.txt
>> fread-var: 66875  off by 1183
>> fgetc    : 65692  correct
>> fgets    : 65692  correct
>> 
>> 
>> $ countlines bible.txt
>> fread-var: 31255  off by 153
>> fgetc    : 31101  off by 1
>> fgets    : 31102  correct
>> 
>> 
>> ======================================================
>> #include <stdio.h>
>> #include <stdlib.h>
>> 
>> int main(int argc, char *argv[])
>> {
>>                 
>>  char line[600] = "";
>>  char c;
>> 
>>  // use fread to populate a variable
>>  // open file, go to end, get size, allocate memory, back to
>>  // beginning, read contents into variable
>>  FILE *fin = fopen(argv[1],"r");
>>  fseek(fin, 0, SEEK_END);
>>  int buffer = ftell(fin);   
>>  char *myStr = malloc(sizeof(char) * (buffer + 1));
>>  rewind(fin);
>>  fread(myStr, sizeof(char), buffer, fin);
>
> Here, I don't see where you terminate the string.
> malloc returns unitialized memory, I think. 
> And fread just reads a chunk of data.
> Use calloc or do a 
> myStr[buffer]='\0';
> before fread.

Not needed.  The + 1 makes the reader /think/ that a string is going to
be read in, but the data are accessed using an index bounded by the
count of bytes read.  So, if there is an error here, it is the
misleading + 1.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#155539

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-10-11 14:00 -0700
Message-ID	<87eem47a0q.fsf@nosuchdomain.example.com>
In reply to	#155503

DFS <nospam@dfs.com> writes:
[...]
>  //count lines from file with fgets
>  lines = 0;
>  rewind(fin);
>  while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>  printf("fgets    : %d\n",lines);
[...]

The fgets() method overcounts if the input has lines longer than 600
characters.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#155543

From	DFS <nospam@dfs.com>
Date	2020-10-11 17:47 -0400
Message-ID	<t8LgH.593522$eN2.578034@fx47.iad>
In reply to	#155539

On 10/11/2020 5:00 PM, Keith Thompson wrote:
> DFS <nospam@dfs.com> writes:
> [...]
>>   //count lines from file with fgets
>>   lines = 0;
>>   rewind(fin);
>>   while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>>   printf("fgets    : %d\n",lines);
> [...]
> 
> The fgets() method overcounts if the input has lines longer than 600
> characters.


Maybe.

wc -L bible.txt = maxlinelength = 516
(wc.exe from GnuWin32 - year 2005)

http://www.truth.info/bigfiles/bible.txt.zip
(note: not trying to push religion - I just happened to use that file 
for testing, because it's big and organized)

---------------------------------------------------------
#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *fin = fopen(argv[1],"r");

  int lines = 0;
  char line[400] = "";
  while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
  printf("Line buffer = %d, fgets line count = %d\n",sizeof line, lines);

  lines = 0;
  char line2[513] = "";
  rewind(fin);
  while (fgets(line2,sizeof line2, fin)!= NULL) {lines++;}
  printf("Line buffer = %d, fgets line count = %d\n",sizeof line2, lines);

  lines = 0;
  char line3[600] = "";
  rewind(fin);
  while (fgets(line3,sizeof line3, fin)!= NULL) {lines++;}
  printf("Line buffer = %d, fgets line count = %d\n",sizeof line3, lines);

  fclose(fin);
  return(0);
}

---------------------------------------------------------

$ tcc split_test.c -o split_test.exe
$ split_test bible.txt
Line buffer = 400, fgets line count = 31118
Line buffer = 513, fgets line count = 31102 (correct)
Line buffer = 600, fgets line count = 31102 (correct)


note: wc -l bible.txt = 31101 because the very last line is missing a \n

[toc] | [prev] | [next] | [standalone]

#155548

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-10-11 17:26 -0700
Message-ID	<871ri470i1.fsf@nosuchdomain.example.com>
In reply to	#155543

DFS <nospam@dfs.com> writes:
> On 10/11/2020 5:00 PM, Keith Thompson wrote:
>> DFS <nospam@dfs.com> writes:
>> [...]
>>>   //count lines from file with fgets
>>>   lines = 0;
>>>   rewind(fin);
>>>   while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>>>   printf("fgets    : %d\n",lines);
>> [...]
>>
>> The fgets() method overcounts if the input has lines longer than 600
>> characters.
>
> Maybe.

> wc -L bible.txt = maxlinelength = 516
> (wc.exe from GnuWin32 - year 2005)
[...]

I think your point was that the line length that causes your fgets
code to overcount might not be exactly 600.  If that's your point,
you're right -- but I didn't take the time to be exact, or define
just what the length of a line is (e.g., whether it includes the
terminator, if any).

My point is that by calling fgets() with second argument of 600
(or any fixed value), your code fails to correctly handle text
with very long lines.  You could fix that by scanning the input
line for '\n' characters, but since fgets() doesn't tell you how
many characters it read that could be expensive (O(N) where N is
the length of the line).  And if you use fgets() to read from a
binary stream, you can't tell whether an embedded null character
is the end of the data read by fgets() or data read from the file.

If you only care about *how many* lines are in your input, there's
no point in using fgets().  Just read a character or a block at
a time and scan for '\n' characters (and *maybe* apply special
handling if the last character read isn't '\n').

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#155583

From	DFS <nospam@dfs.com>
Date	2020-10-12 13:11 -0400
Message-ID	<F80hH.340754$%p.222980@fx33.iad>
In reply to	#155548

On 10/11/2020 8:26 PM, Keith Thompson wrote:
> DFS <nospam@dfs.com> writes:
>> On 10/11/2020 5:00 PM, Keith Thompson wrote:
>>> DFS <nospam@dfs.com> writes:
>>> [...]
>>>>    //count lines from file with fgets
>>>>    lines = 0;
>>>>    rewind(fin);
>>>>    while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>>>>    printf("fgets    : %d\n",lines);
>>> [...]
>>>
>>> The fgets() method overcounts if the input has lines longer than 600
>>> characters.
>>
>> Maybe.
> 
>> wc -L bible.txt = maxlinelength = 516
>> (wc.exe from GnuWin32 - year 2005)
> [...]
> 
> I think your point was that the line length that causes your fgets
> code to overcount might not be exactly 600.  If that's your point,
> you're right -- but I didn't take the time to be exact, or define
> just what the length of a line is (e.g., whether it includes the
> terminator, if any).

My point was it appeared that setting the buffer smaller than the 
maxlinelength didn't necessarily cause a line overcount.

from wc.exe in GnuWin32:
$ wc -L bible.txt
516 bible.txt

Line buffer = 400, fgets line count = 31118 (overcount)
Line buffer = 513, fgets line count = 31102 (correct)
Line buffer = 600, fgets line count = 31102

That didn't make sense if the max line length was truly 516, so I wrote 
code to check the max line length, and then change the buffer size and 
do line counts at each buffer size.

$ split_test bible.txt
New max line length = 67
New max line length = 155
New max line length = 157
New max line length = 190
New max line length = 211
New max line length = 265
New max line length = 273
New max line length = 278
New max line length = 291
New max line length = 334
New max line length = 362
New max line length = 375
New max line length = 391
New max line length = 439
New max line length = 458
New max line length = 512
Line buffer = 1024, fgets line count = 31102

Line buffer = 507, fgets line count = 31103
Line buffer = 508, fgets line count = 31103
Line buffer = 509, fgets line count = 31103
Line buffer = 510, fgets line count = 31103
Line buffer = 511, fgets line count = 31103
Line buffer = 512, fgets line count = 31103
Line buffer = 513, fgets line count = 31102 (correct)
Line buffer = 514, fgets line count = 31102
Line buffer = 515, fgets line count = 31102
Line buffer = 516, fgets line count = 31102
Line buffer = 517, fgets line count = 31102

I was mislead by the incorrect maxlinelength value of 516, and by my 
inexperience.


> My point is that by calling fgets() with second argument of 600
> (or any fixed value), your code fails to correctly handle text
> with very long lines.  You could fix that by scanning the input
> line for '\n' characters, but since fgets() doesn't tell you how
> many characters it read that could be expensive (O(N) where N is
> the length of the line).  And if you use fgets() to read from a
> binary stream, you can't tell whether an embedded null character
> is the end of the data read by fgets() or data read from the file.

Gotcha.


> If you only care about *how many* lines are in your input, there's
> no point in using fgets().  Just read a character or a block at
> a time and scan for '\n' characters (and *maybe* apply special
> handling if the last character read isn't '\n').

Why maybe?  Shouldn't you test every time, and add one to your linecount 
if the last character before EOF isn't \n?

----------------------------------------------------
#include <stdio.h>
int main(int argc, char *argv[])
{
  //count newline with getc
  FILE *fin = fopen(argv[1],"r");
  char c;
  int lines = 0;
  for (c=getc(fin);c!=EOF;c=getc(fin)) {if(c=='\n') {lines++;}}
  fseek(fin, ftell(fin)-1, SEEK_SET);
  c=getc(fin);
  if(c!='\n') {lines++;printf("Last character = '%c'\n",c);}
  printf("getc line count: %d\n",lines);
  fclose(fin);
  return(0);
}
----------------------------------------------------

I tested that code a few times and it worked.  Even though the pointer 
is at EOF after the for..loop, do you think it's potentially troublesome 
not to use an explicit fseek(fin, 0, SEEK_END); after the for..loop?

[toc] | [prev] | [next] | [standalone]

#155585

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-10-12 10:56 -0700
Message-ID	<877drv5nvv.fsf@nosuchdomain.example.com>
In reply to	#155583

DFS <nospam@dfs.com> writes:
> On 10/11/2020 8:26 PM, Keith Thompson wrote:
[...]
>> If you only care about *how many* lines are in your input, there's
>> no point in using fgets().  Just read a character or a block at
>> a time and scan for '\n' characters (and *maybe* apply special
>> handling if the last character read isn't '\n').
>
> Why maybe?  Shouldn't you test every time, and add one to your
> linecount if the last character before EOF isn't \n?

Because it depends on how you choose to define a "line".

Consider a 7-byte file containing "foo", a newline, and "bar" without a
newline.  Does it have 1 line or 2?

For a C implementation in which the last line does not require a
terminating new-line character (N1570 7.21.2p2), it has two lines.
For a C implementation that does require a new-line on the last line,
it's not a well formed text file.  The wc command says it has one
line (POSIX says "wc -l" counts <newline> characters).

I'm not going to argue which definition is correct, just pointing out
that there is no one clear definition.  If you want to define it so that
the file I described has 2 lines, that's fine -- but I suggest
documenting it.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#156801

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2020-11-29 00:21 -0800
Message-ID	<864kl8r2sl.fsf@linuxsc.com>
In reply to	#155585

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> DFS <nospam@dfs.com> writes:
>
>> On 10/11/2020 8:26 PM, Keith Thompson wrote:
>
> [...]
>
>>> If you only care about *how many* lines are in your input, there's
>>> no point in using fgets().  Just read a character or a block at
>>> a time and scan for '\n' characters (and *maybe* apply special
>>> handling if the last character read isn't '\n').
>>
>> Why maybe?  Shouldn't you test every time, and add one to your
>> linecount if the last character before EOF isn't \n?
>
> Because it depends on how you choose to define a "line".
>
> Consider a 7-byte file containing "foo", a newline, and "bar" without a
> newline.  Does it have 1 line or 2?
>
> For a C implementation in which the last line does not require a
> terminating new-line character (N1570 7.21.2p2), it has two lines.
> For a C implementation that does require a new-line on the last line,
> it's not a well formed text file.  [...]

It seems a reasonable guess that C implementations allow a last
line without a terminating newline characters if and only if the
underlying operating system supports that.  If that is so then
there is no ambiguity as far as C implementations are concerned.

[toc] | [prev] | [next] | [standalone]

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

csiph-web

Inconsistent line counts from 3 methods

Contents

#155671

#155672

#155674

#155733

#155732

#155735

#155666

#155524

#155527

#155518

#155513

#155514

#155522

#155544

#155539

#155543

#155548

#155583

#155585

#156801