Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #155503 > unrolled thread
| Started by | DFS <nospam@dfs.com> |
|---|---|
| First post | 2020-10-10 22:37 -0400 |
| Last post | 2020-10-20 15:48 +0100 |
| Articles | 20 on this page of 47 — 14 participants |
Back to article view | Back to comp.lang.c
Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-10 22:37 -0400
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-10 22:06 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 10:38 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 15:36 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 13:51 -0400
Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 18:33 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:20 -0400
Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 19:40 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:47 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:35 -0400
Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 21:13 +0000
Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) DFS <nospam@dfs.com> - 2020-10-11 18:45 -0400
Re: NNTP message requirements Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:11 -0700
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:27 -0400
Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:30 +0100
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 23:56 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 14:53 -0400
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:15 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:08 +0000
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-14 16:58 -0400
Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-14 23:37 +0000
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-14 17:25 -0700
Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-15 01:55 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:19 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:10 +0000
Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-17 19:36 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:16 +0000
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:36 -0700
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 15:12 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 12:16 -0400
Re: Inconsistent line counts from 3 methods Johann Klammer <klammerj@NOSPAM.a1.net> - 2020-10-11 15:18 +0200
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 14:31 +0000
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:31 -0700
Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:15 +0100
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 14:00 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 17:47 -0400
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:26 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-12 13:11 -0400
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-12 10:56 -0700
Re: Inconsistent line counts from 3 methods Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-11-29 00:21 -0800
Re: Inconsistent line counts from 3 methods scott@slp53.sl.home (Scott Lurndal) - 2020-10-12 19:19 +0000
Re: Inconsistent line counts from 3 methods dfs <nospam@dfs.com> - 2020-10-12 18:53 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 23:09 +0000
Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-18 00:24 +0100
Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-18 16:56 +0000
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-20 09:17 -0400
Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-20 15:48 +0100
Page 1 of 3 [1] 2 3 Next page →
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-10 22:37 -0400 |
| Subject | Inconsistent line counts from 3 methods |
| Message-ID | <WhugH.334023$Av7.244451@fx34.iad> |
$ countlines war_peace.txt
fread-var: 66875 off by 1183
fgetc : 65692 correct
fgets : 65692 correct
$ countlines bible.txt
fread-var: 31255 off by 153
fgetc : 31101 off by 1
fgets : 31102 correct
======================================================
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char line[600] = "";
char c;
// use fread to populate a variable
// open file, go to end, get size, allocate memory, back to
// beginning, read contents into variable
FILE *fin = fopen(argv[1],"r");
fseek(fin, 0, SEEK_END);
int buffer = ftell(fin);
char *myStr = malloc(sizeof(char) * (buffer + 1));
rewind(fin);
fread(myStr, sizeof(char), buffer, fin);
//count newlines in variable
int lines = 0;
for (int i = 0; i < buffer; i++) {if(myStr[i]=='\n') {lines++;}}
printf("fread-var: %d\n",lines);
free(myStr);
//count newline from file with getc
lines = 0;
rewind(fin);
for (c=getc(fin);c!=EOF;c=getc(fin)) {if(c=='\n') {lines++;}}
printf("fgetc : %d\n",lines);
//count lines from file with fgets
lines = 0;
rewind(fin);
while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
printf("fgets : %d\n",lines);
fclose(fin);
return(0);
}
======================================================
See anything wrong with the fread-var section? It consistently
overcounts lines, especially on bigger files.
[toc] | [next] | [standalone]
| From | Barry Schwarz <schwarzb@delq.com> |
|---|---|
| Date | 2020-10-10 22:06 -0700 |
| Message-ID | <qt35ofl3ced5r1kvtoaui0h2omse848j43@4ax.com> |
| In reply to | #155503 |
On Sat, 10 Oct 2020 22:37:35 -0400, DFS <nospam@dfs.com> wrote:
>$ countlines war_peace.txt
>fread-var: 66875 off by 1183
>fgetc : 65692 correct
>fgets : 65692 correct
>
>
>$ countlines bible.txt
>fread-var: 31255 off by 153
>fgetc : 31101 off by 1
>fgets : 31102 correct
>
>
>======================================================
>#include <stdio.h>
>#include <stdlib.h>
>
>int main(int argc, char *argv[])
>{
>
> char line[600] = "";
> char c;
>
> // use fread to populate a variable
> // open file, go to end, get size, allocate memory, back to
> // beginning, read contents into variable
> FILE *fin = fopen(argv[1],"r");
> fseek(fin, 0, SEEK_END);
> int buffer = ftell(fin);
> char *myStr = malloc(sizeof(char) * (buffer + 1));
> rewind(fin);
> fread(myStr, sizeof(char), buffer, fin);
>
> //count newlines in variable
> int lines = 0;
> for (int i = 0; i < buffer; i++) {if(myStr[i]=='\n') {lines++;}}
> printf("fread-var: %d\n",lines);
> free(myStr);
>
> //count newline from file with getc
> lines = 0;
> rewind(fin);
> for (c=getc(fin);c!=EOF;c=getc(fin)) {if(c=='\n') {lines++;}}
> printf("fgetc : %d\n",lines);
>
>
> //count lines from file with fgets
> lines = 0;
> rewind(fin);
> while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
> printf("fgets : %d\n",lines);
>
> fclose(fin);
> return(0);
>}
>======================================================
>
>See anything wrong with the fread-var section? It consistently
>overcounts lines, especially on bigger files.
Look at the description of ftell in the standard, particularly as it
relates to text files.
"The ftell function obtains the current value of the file position
indicator for the stream pointed to by stream. For a binary stream,
the value is the number of characters from the beginning of the file.
For a text stream, its file position indicator contains unspecified
information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time of the
ftell call; the difference between two such return values is not
necessarily a meaningful measure of the number of characters written
or read."
Using ftell to determine buffer size probably results in an overly
large buffer. You should use the return value from fread to determine
how much data was actually read. You are probably examining residual
data in the buffer that is not part of the file.
Alternately, you could open the file in binary mode instead of text
mode. ftell should work then.
You never call fgetc so I do not understand why you name the output
with that function.
Are you absolutely certain that bible.txt has a \n at the end of the
very last line? Use a hex editor to make sure.
--
Remove del for email
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 10:38 -0400 |
| Message-ID | <eREgH.343567$I15.298775@fx36.iad> |
| In reply to | #155504 |
On 10/11/2020 1:06 AM, Barry Schwarz wrote: > On Sat, 10 Oct 2020 22:37:35 -0400, DFS <nospam@dfs.com> wrote: >> See anything wrong with the fread-var section? It consistently >> overcounts lines, especially on bigger files. > > Look at the description of ftell in the standard, particularly as it > relates to text files. > > "The ftell function obtains the current value of the file position > indicator for the stream pointed to by stream. For a binary stream, > the value is the number of characters from the beginning of the file. > For a text stream, its file position indicator contains unspecified > information, usable by the fseek function for returning the file > position indicator for the stream to its position at the time of the > ftell call; the difference between two such return values is not > necessarily a meaningful measure of the number of characters written > or read." > > Using ftell to determine buffer size probably results in an overly > large buffer. You should use the return value from fread to determine > how much data was actually read. You are probably examining residual > data in the buffer that is not part of the file. > > Alternately, you could open the file in binary mode instead of text > mode. ftell should work then. Thanks. Both your suggestions worked. I'll use the 'open in binary mode' option. > You never call fgetc so I do not understand why you name the output > with that function. typo > Are you absolutely certain that bible.txt has a \n at the end of the > very last line? Use a hex editor to make sure. It didn't, as shown in Notepad++ | View | Symbol | Show End of Line I see fgets() is more "reliable" than f/getc in case your final line is missing a newline (which I would bet happens frequently): "fgets() stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first."
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-11 15:36 +0000 |
| Message-ID | <slrnro69ji.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155515 |
On Sun, 2020-10-11, DFS wrote: ... > I see fgets() is more "reliable" than f/getc in case your final line is > missing a newline (which I would bet happens frequently): In the past, on Unix, it used to happen very infrequently, but it seems recent IDEs generate "endless" text files by default. I have yet to figure out why they do this. The only effect is a saving of one byte, and that lot of traditional tools break in subtle ways[1] ... but a conspiracy against Unix seems unlikely. > "fgets() stops when either (n-1) characters are read, the newline > character is read, or the end-of-file is reached, whichever comes first." What do you mean? All the functions you list give you all information available; they all seem reliable. (fgets() can't handle '\0' characters, but that's a separate thing.) /Jorgen [1] cat foo bar -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 13:51 -0400 |
| Message-ID | <2FHgH.225012$d95.16295@fx06.iad> |
| In reply to | #155516 |
On 10/11/2020 11:36 AM, Jorgen Grahn wrote: > On Sun, 2020-10-11, DFS wrote: > ... >> I see fgets() is more "reliable" than f/getc in case your final line is >> missing a newline (which I would bet happens frequently): > > In the past, on Unix, it used to happen very infrequently, but it > seems recent IDEs generate "endless" text files by default. > > I have yet to figure out why they do this. The only effect is a > saving of one byte, and that lot of traditional tools break in subtle > ways[1] ... but a conspiracy against Unix seems unlikely. I'm not talking about IDEs. I'm talking about writing in general. I would guess many people write their last line and end it with a period, not a period and return. ie, the bible.txt file I used doesn't have a terminating \n http://www.truth.info/bigfiles/bible.txt.zip >> "fgets() stops when either (n-1) characters are read, the newline >> character is read, or the end-of-file is reached, whichever comes first." > > What do you mean? All the functions you list give you all information > available; they all seem reliable. I mean "reliable" in the sense that if you forget a terminating \n fgets will still count the last line, whereas using f/getc you would usually undercount the number of lines if there's no terminating \n. > (fgets() can't handle '\0' characters, but that's a separate thing.) > > /Jorgen > > [1] cat foo bar >
[toc] | [prev] | [next] | [standalone]
| From | Lew Pitcher <lew.pitcher@digitalfreehold.ca> |
|---|---|
| Date | 2020-10-11 18:33 +0000 |
| Message-ID | <rlvj4v$bdi$1@dont-email.me> |
| In reply to | #155521 |
On Sun, 11 Oct 2020 13:51:10 -0400, DFS wrote: > On 10/11/2020 11:36 AM, Jorgen Grahn wrote: >> On Sun, 2020-10-11, DFS wrote: >> ... >>> I see fgets() is more "reliable" than f/getc in case your final line >>> is missing a newline (which I would bet happens frequently): >> >> In the past, on Unix, it used to happen very infrequently, but it seems >> recent IDEs generate "endless" text files by default. >> >> I have yet to figure out why they do this. The only effect is a saving >> of one byte, and that lot of traditional tools break in subtle ways[1] >> ... but a conspiracy against Unix seems unlikely. > > > I'm not talking about IDEs. I'm talking about writing in general. I > would guess many people write their last line and end it with a period, > not a period and return. Maybe so. But, by definition (both the "C" definition (C11 7.21.2 pgph 2), and the Unix common definition) that last block of characters would not make up a "line". > ie, the bible.txt file I used doesn't have a terminating \n > http://www.truth.info/bigfiles/bible.txt.zip > > > >>> "fgets() stops when either (n-1) characters are read, the newline >>> character is read, or the end-of-file is reached, whichever comes >>> first." >> >> What do you mean? All the functions you list give you all information >> available; they all seem reliable. > > I mean "reliable" in the sense that if you forget a terminating \n fgets > will still count the last line, whereas using f/getc you would usually > undercount the number of lines if there's no terminating \n. No, you wouldn't. A line is /specifically/ a sequence of characters followed by a newline. > > >> (fgets() can't handle '\0' characters, but that's a separate thing.) >> >> /Jorgen >> >> [1] cat foo bar >> -- Lew Pitcher "In Skills, We Trust"
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 15:20 -0400 |
| Message-ID | <fZIgH.17219$je1.6227@fx22.iad> |
| In reply to | #155523 |
On 10/11/2020 2:33 PM, Lew Pitcher wrote: > On Sun, 11 Oct 2020 13:51:10 -0400, DFS wrote: > >> On 10/11/2020 11:36 AM, Jorgen Grahn wrote: >>> On Sun, 2020-10-11, DFS wrote: >>> ... >>>> I see fgets() is more "reliable" than f/getc in case your final line >>>> is missing a newline (which I would bet happens frequently): >>> >>> In the past, on Unix, it used to happen very infrequently, but it seems >>> recent IDEs generate "endless" text files by default. >>> >>> I have yet to figure out why they do this. The only effect is a saving >>> of one byte, and that lot of traditional tools break in subtle ways[1] >>> ... but a conspiracy against Unix seems unlikely. >> >> >> I'm not talking about IDEs. I'm talking about writing in general. I >> would guess many people write their last line and end it with a period, >> not a period and return. > > Maybe so. > > But, by definition (both the "C" definition (C11 7.21.2 pgph 2), and the > Unix common definition) that last block of characters would not make up a > "line". "Whether the last line requires a terminating new-line character is implementation-defined." >> ie, the bible.txt file I used doesn't have a terminating \n >> http://www.truth.info/bigfiles/bible.txt.zip >> >> >> >>>> "fgets() stops when either (n-1) characters are read, the newline >>>> character is read, or the end-of-file is reached, whichever comes >>>> first." >>> >>> What do you mean? All the functions you list give you all information >>> available; they all seem reliable. >> >> I mean "reliable" in the sense that if you forget a terminating \n fgets >> will still count the last line, whereas using f/getc you would usually >> undercount the number of lines if there's no terminating \n. > > No, you wouldn't. A line is /specifically/ a sequence of characters > followed by a newline. sez who? By that bogus definition: This is only 1 line of text.
[toc] | [prev] | [next] | [standalone]
| From | Lew Pitcher <lew.pitcher@digitalfreehold.ca> |
|---|---|
| Date | 2020-10-11 19:40 +0000 |
| Message-ID | <rlvn2i$bdi$2@dont-email.me> |
| In reply to | #155529 |
On Sun, 11 Oct 2020 15:20:11 -0400, DFS wrote: > On 10/11/2020 2:33 PM, Lew Pitcher wrote: >> On Sun, 11 Oct 2020 13:51:10 -0400, DFS wrote: >> >>> On 10/11/2020 11:36 AM, Jorgen Grahn wrote: >>>> On Sun, 2020-10-11, DFS wrote: >>>> ... >>>>> I see fgets() is more "reliable" than f/getc in case your final line >>>>> is missing a newline (which I would bet happens frequently): >>>> >>>> In the past, on Unix, it used to happen very infrequently, but it >>>> seems recent IDEs generate "endless" text files by default. >>>> >>>> I have yet to figure out why they do this. The only effect is a >>>> saving of one byte, and that lot of traditional tools break in subtle >>>> ways[1] ... but a conspiracy against Unix seems unlikely. >>> >>> >>> I'm not talking about IDEs. I'm talking about writing in general. I >>> would guess many people write their last line and end it with a >>> period, not a period and return. >> >> Maybe so. >> >> But, by definition (both the "C" definition (C11 7.21.2 pgph 2), and >> the Unix common definition) that last block of characters would not >> make up a "line". > > > "Whether the last line requires a terminating new-line character is > implementation-defined." So, either your implementation requires a terminating new-line character to count the last bit of a file as a line or it doesn't. If it /does/, then any trailing data is not a line. If it doesn't, then there is no trailing data. ISTM that, when writing code for an implementation for which you have not established whether or not "lines" require a terminating new-line, you should assume that the implementation /does/ require a new-line. The resulting code will handle either state. >>> ie, the bible.txt file I used doesn't have a terminating \n >>> http://www.truth.info/bigfiles/bible.txt.zip >>> >>> >>> >>>>> "fgets() stops when either (n-1) characters are read, the newline >>>>> character is read, or the end-of-file is reached, whichever comes >>>>> first." >>>> >>>> What do you mean? All the functions you list give you all >>>> information available; they all seem reliable. >>> >>> I mean "reliable" in the sense that if you forget a terminating \n >>> fgets will still count the last line, whereas using f/getc you would >>> usually undercount the number of lines if there's no terminating \n. >> >> No, you wouldn't. A line is /specifically/ a sequence of characters >> followed by a newline. > > sez who? > > By that bogus definition: > > This is only 1 > line of text. Assuming that the post went to EOF immediatly after the ".", then yes; that was only 1 line of text, followed by a block of text that cannot be called a "line". OTOH, in reality, You /posted/ two lines there, both terminated with newlines. 15:38 $ hexdump -C fZIgH.17219\$je1.6227\@fx22.iad.msg | tail -10 00000bb0 69 6e 65 20 69 73 20 2f 73 70 65 63 69 66 69 63 |ine is /specific| 00000bc0 61 6c 6c 79 2f 20 61 20 73 65 71 75 65 6e 63 65 |ally/ a sequence| 00000bd0 20 6f 66 20 63 68 61 72 61 63 74 65 72 73 0a 3e | of characters.>| 00000be0 20 66 6f 6c 6c 6f 77 65 64 20 62 79 20 61 20 6e | followed by a n| 00000bf0 65 77 6c 69 6e 65 2e 0a 0a 73 65 7a 20 77 68 6f |ewline...sez who| 00000c00 3f 0a 0a 42 79 20 74 68 61 74 20 62 6f 67 75 73 |?..By that bogus| 00000c10 20 64 65 66 69 6e 69 74 69 6f 6e 3a 0a 0a 54 68 | definition:..Th| 00000c20 69 73 20 69 73 20 6f 6e 6c 79 0a 31 20 6c 69 6e |is is only.1 lin| 00000c30 65 20 6f 66 20 74 65 78 74 2e 0a |e of text..| 00000c3b Note the newline (0x0a) at 0c3a) -- Lew Pitcher "In Skills, We Trust"
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 15:47 -0400 |
| Message-ID | <XnJgH.109602$nI.52951@fx21.iad> |
| In reply to | #155531 |
On 10/11/2020 3:40 PM, Lew Pitcher wrote: > On Sun, 11 Oct 2020 15:20:11 -0400, DFS wrote: > >> On 10/11/2020 2:33 PM, Lew Pitcher wrote: >>> On Sun, 11 Oct 2020 13:51:10 -0400, DFS wrote: >>> >>>> On 10/11/2020 11:36 AM, Jorgen Grahn wrote: >>>>> On Sun, 2020-10-11, DFS wrote: >>>>> ... >>>>>> I see fgets() is more "reliable" than f/getc in case your final line >>>>>> is missing a newline (which I would bet happens frequently): >>>>> >>>>> In the past, on Unix, it used to happen very infrequently, but it >>>>> seems recent IDEs generate "endless" text files by default. >>>>> >>>>> I have yet to figure out why they do this. The only effect is a >>>>> saving of one byte, and that lot of traditional tools break in subtle >>>>> ways[1] ... but a conspiracy against Unix seems unlikely. >>>> >>>> >>>> I'm not talking about IDEs. I'm talking about writing in general. I >>>> would guess many people write their last line and end it with a >>>> period, not a period and return. >>> >>> Maybe so. >>> >>> But, by definition (both the "C" definition (C11 7.21.2 pgph 2), and >>> the Unix common definition) that last block of characters would not >>> make up a "line". >> >> >> "Whether the last line requires a terminating new-line character is >> implementation-defined." > > So, either your implementation requires a terminating new-line character to count the last bit of a file as a line or it doesn't. > > If it /does/, then any trailing data is not a line. > If it doesn't, then there is no trailing data. > > ISTM that, when writing code for an implementation for which you have not established whether or not "lines" require a terminating new-line, you should assume that the implementation /does/ require a new-line. The resulting code will handle either state. > > >>>> ie, the bible.txt file I used doesn't have a terminating \n >>>> http://www.truth.info/bigfiles/bible.txt.zip >>>> >>>> >>>> >>>>>> "fgets() stops when either (n-1) characters are read, the newline >>>>>> character is read, or the end-of-file is reached, whichever comes >>>>>> first." >>>>> >>>>> What do you mean? All the functions you list give you all >>>>> information available; they all seem reliable. >>>> >>>> I mean "reliable" in the sense that if you forget a terminating \n >>>> fgets will still count the last line, whereas using f/getc you would >>>> usually undercount the number of lines if there's no terminating \n. >>> >>> No, you wouldn't. A line is /specifically/ a sequence of characters >>> followed by a newline. >> >> sez who? >> >> By that bogus definition: >> >> This is only 1 >> line of text. > Assuming that the post went to EOF immediatly after the ".", then yes; that was only 1 line of text, followed by a block of text that cannot be called a "line". It can and must be called a line, whether it terminated with \n or not. > OTOH, in reality, You /posted/ two lines there, both terminated with newlines. In reality I posted two lines, only one terminated with a newline. https://imgur.com/a/GjG2E9W as cut and pasted from Thunderbird, and viewed with Notepad++ on Windows. > 15:38 $ hexdump -C fZIgH.17219\$je1.6227\@fx22.iad.msg | tail -10 > 00000bb0 69 6e 65 20 69 73 20 2f 73 70 65 63 69 66 69 63 |ine is /specific| > 00000bc0 61 6c 6c 79 2f 20 61 20 73 65 71 75 65 6e 63 65 |ally/ a sequence| > 00000bd0 20 6f 66 20 63 68 61 72 61 63 74 65 72 73 0a 3e | of characters.>| > 00000be0 20 66 6f 6c 6c 6f 77 65 64 20 62 79 20 61 20 6e | followed by a n| > 00000bf0 65 77 6c 69 6e 65 2e 0a 0a 73 65 7a 20 77 68 6f |ewline...sez who| > 00000c00 3f 0a 0a 42 79 20 74 68 61 74 20 62 6f 67 75 73 |?..By that bogus| > 00000c10 20 64 65 66 69 6e 69 74 69 6f 6e 3a 0a 0a 54 68 | definition:..Th| > 00000c20 69 73 20 69 73 20 6f 6e 6c 79 0a 31 20 6c 69 6e |is is only.1 lin| > 00000c30 65 20 6f 66 20 74 65 78 74 2e 0a |e of text..| > 00000c3b > > Note the newline (0x0a) at 0c3a) That last newline was added by another program. I didn't send it. Stop your copy and paste at 00000c30 and see what you get.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-11 16:35 -0400 |
| Message-ID | <rlvqaa$ill$1@dont-email.me> |
| In reply to | #155532 |
On 10/11/20 3:47 PM, DFS wrote: > On 10/11/2020 3:40 PM, Lew Pitcher wrote: >> On Sun, 11 Oct 2020 15:20:11 -0400, DFS wrote: >> >>> On 10/11/2020 2:33 PM, Lew Pitcher wrote: ... >>> "Whether the last line requires a terminating new-line character is >>> implementation-defined." ... >>>> No, you wouldn't. A line is /specifically/ a sequence of characters >>>> followed by a newline. >>> >>> sez who? >>> >>> By that bogus definition: >>> >>> This is only 1 >>> line of text. >> Assuming that the post went to EOF immediatly after the ".", then yes; that was only 1 line of text, followed by a block of text that cannot be called a "line". > > > It can and must be called a line, whether it terminated with \n or not. That might be the way you feel about it, but the C standard expresses a conflicting and more authoritative point of view on the issue. You yourself cited the text from the standard (quoted at the top of this message) that explicitly authorizes each implementation decide for itself whether such a sequence of characters qualifies as a line.
[toc] | [prev] | [next] | [standalone]
| From | Lew Pitcher <lew.pitcher@digitalfreehold.ca> |
|---|---|
| Date | 2020-10-11 21:13 +0000 |
| Subject | Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) |
| Message-ID | <rlvsgs$bdi$3@dont-email.me> |
| In reply to | #155532 |
On Sun, 11 Oct 2020 15:47:28 -0400, DFS wrote: > On 10/11/2020 3:40 PM, Lew Pitcher wrote: >> On Sun, 11 Oct 2020 15:20:11 -0400, DFS wrote: [snip] >>> This is only 1 >>> line of text. >> Assuming that the post went to EOF immediatly after the ".", then yes; that was only 1 line of text, followed by a block of text that cannot be called a "line". > > > It can and must be called a line, whether it terminated with \n or not. >> OTOH, in reality, You /posted/ two lines there, both terminated with newlines. > > In reality I posted two lines, only one terminated with a newline. > > https://imgur.com/a/GjG2E9W > > as cut and pasted from Thunderbird, and viewed with Notepad++ on Windows. > >> 15:38 $ hexdump -C fZIgH.17219\$je1.6227\@fx22.iad.msg | tail -10 [snip] >> 00000c30 65 20 6f 66 20 74 65 78 74 2e 0a |e of text..| >> 00000c3b >> >> Note the newline (0x0a) at 0c3a) > > > That last newline was added by another program. Yes, /your/ nntp posting application, as part of the requirements of NNTP (the protocol that governs the population and management of Usenet articles). See RFC 3977, section 3.6, where it defines the format of the body of a posting. Ironically, it specifies that the body will contain one or more /lines/, each terminated with a CARRIAGE-RETURN, LINEFEED combination. There are /no/ unterminated lines in a posting body. > I didn't send it. You may not have intended to, but you did, indeed, "send it". As for your complaint regarding linecounting, making such a complaint here is more than useless. You are arguing against an interpretation that has /operationally/ been in place for at least 50 years, and has been codified in standards for at least 30 years. Moreover, you are making this argument to an audience of amateurs and professionals in a public forum, and /NOT/ to a standards body, or anyone who can take action to ensure that /your/ interpretation overrides the current, standard interpretation. (At least, not within the confines of this forum; some participants may be in such a position, should you address your concerns / formally/ to them in the appropriate forum.) In other words, you are practicing an exercise in futility. You would have better luck arguing with weathermen that the wind should not blow your expectoration back on you when you spit into the wind. -- Lew Pitcher "In Skills, We Trust"
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 18:45 -0400 |
| Subject | Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) |
| Message-ID | <F_LgH.126376$MQ.9832@fx14.iad> |
| In reply to | #155542 |
On 10/11/2020 5:13 PM, Lew Pitcher wrote: > On Sun, 11 Oct 2020 15:47:28 -0400, DFS wrote: > >> On 10/11/2020 3:40 PM, Lew Pitcher wrote: >>> On Sun, 11 Oct 2020 15:20:11 -0400, DFS wrote: > [snip] >>>> This is only 1 >>>> line of text. >>> Assuming that the post went to EOF immediatly after the ".", then yes; > that was only 1 line of text, followed by a block of text that cannot be > called a "line". >> >> >> It can and must be called a line, whether it terminated with \n or not. > >>> OTOH, in reality, You /posted/ two lines there, both terminated with > newlines. >> >> In reality I posted two lines, only one terminated with a newline. >> >> https://imgur.com/a/GjG2E9W >> >> as cut and pasted from Thunderbird, and viewed with Notepad++ on > Windows. >> >>> 15:38 $ hexdump -C fZIgH.17219\$je1.6227\@fx22.iad.msg | tail -10 > [snip] >>> 00000c30 65 20 6f 66 20 74 65 78 74 2e 0a |e of > text..| >>> 00000c3b >>> >>> Note the newline (0x0a) at 0c3a) >> >> >> That last newline was added by another program. > Yes, /your/ nntp posting application, as part of the requirements of NNTP > (the protocol that governs the population and management of Usenet > articles). See RFC 3977, section 3.6, where it defines the format of the > body of a posting. Ironically, it specifies that the body will contain > one or more /lines/, each terminated with a CARRIAGE-RETURN, LINEFEED > combination. There are /no/ unterminated lines in a posting body. And what exactly forces Thunderbird or any Usenet app or server to adhere to the NNTP protocol? >> I didn't send it. > > You may not have intended to, but you did, indeed, "send it". No #1 I submitted the text to blocknews - via Thunderbird, written with whatever editor is the default - without a penultimate \n, and when the post showed up in clc no CRLF was present at the end. I already showed you what it looks like in Notepad++: https://imgur.com/a/GjG2E9W No #2 When I retrieve the post from blocknews (via the python nntplib.body(articleID) call, it has no terminating \n. <fZIgH.17219$je1.6227@fx22.iad> But when I find the post at Howard Knight Usenet lookup it does contain a terminating \n, indicating it was probably altered by other Usenet server(s). http://al.howardknight.net/?STYPE=msgid&MSGI=<fZIgH.17219%24je1.6227%40fx22.iad> How about you send a post to clc with no terminating \n and we'll see what happens. > As for your complaint regarding linecounting, making such a complaint > here is more than useless. You are arguing against an interpretation that > has /operationally/ been in place for at least 50 years, and has been > codified in standards for at least 30 years. Moreover, you are making > this argument to an audience of amateurs and professionals in a public > forum, and /NOT/ to a standards body, or anyone who can take action to > ensure that /your/ interpretation overrides the current, standard > interpretation. (At least, not within the confines of this forum; some > participants may be in such a position, should you address your concerns / > formally/ to them in the appropriate forum.) > > In other words, you are practicing an exercise in futility. You would > have better luck arguing with weathermen that the wind should not blow > your expectoration back on you when you spit into the wind. If anyone doesn't count the last line of text because it hasn't a terminating \n they should be flogged.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-10-11 17:11 -0700 |
| Subject | Re: NNTP message requirements |
| Message-ID | <875z7g7171.fsf@nosuchdomain.example.com> |
| In reply to | #155546 |
DFS <nospam@dfs.com> writes:
[...]
> If anyone doesn't count the last line of text because it hasn't a
> terminating \n they should be flogged.
Thank you for establishing that your opinions are not to be taken
seriously.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-11 16:27 -0400 |
| Message-ID | <rlvpqq$fgs$1@dont-email.me> |
| In reply to | #155529 |
On 10/11/20 3:20 PM, DFS wrote: > On 10/11/2020 2:33 PM, Lew Pitcher wrote: ... >> No, you wouldn't. A line is /specifically/ a sequence of characters >> followed by a newline. > > sez who? The most authoritative source possible in this context: the ISO C standard. "A text stream is an ordered sequence of characters composed into _lines_, each line consisting of zero or more characters plus a terminating new-line character. ..." (7.21.2p2) The word "lines" is italicized, an ISO convention indicating that the sentence containing that word is considered to officially define the meaning of that word in the context of this document. The immediately following line, which makes the end of the file a special case, has already been quoted by Lew Pitcher.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2020-10-11 23:30 +0100 |
| Message-ID | <87pn5o8kfw.fsf@bsb.me.uk> |
| In reply to | #155536 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > On 10/11/20 3:20 PM, DFS wrote: >> On 10/11/2020 2:33 PM, Lew Pitcher wrote: > ... >>> No, you wouldn't. A line is /specifically/ a sequence of characters >>> followed by a newline. >> >> sez who? > > The most authoritative source possible in this context: the ISO C > standard. The answer to who says a line is defined as above is indeed "the ISO C standard", but saying that this is "the most authoritative source possible in this context" goes too far. If I write a program to count lines, I get to say what a line is, not the C standard. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-11 23:56 -0400 |
| Message-ID | <rm0k4h$9qe$1@dont-email.me> |
| In reply to | #155545 |
On 10/11/20 6:30 PM, Ben Bacarisse wrote: > James Kuyper <jameskuyper@alumni.caltech.edu> writes: > >> On 10/11/20 3:20 PM, DFS wrote: >>> On 10/11/2020 2:33 PM, Lew Pitcher wrote: >> ... >>>> No, you wouldn't. A line is /specifically/ a sequence of characters >>>> followed by a newline. >>> >>> sez who? >> >> The most authoritative source possible in this context: the ISO C >> standard. > > The answer to who says a line is defined as above is indeed "the ISO C > standard", but saying that this is "the most authoritative source > possible in this context" goes too far. If I write a program to count > lines, I get to say what a line is, not the C standard. I'm not disagreeing with that comment, but it's not relevant to what I was saying. The fact that you consider it relevant implies that I didn't say what I meant clearly enough. The standard uses the term line in (at least?) two different contexts - a line of source code, or a line as processed using the standard library's I/O routines. My comment was only about the latter. You are, of course, free to choose a definition of what a "line" is, and to write code that counts how many of them there are in an input file. However, it's the standard's definition of "lines" in 7.21.2p2 that governs the interpretation of any sentence in the standard that uses "line" or "lines" in connection with C standard library I/O routines; the definition you choose has no bearing on the matter. For instance, 7.21.2p9 says "An implementation shall support text files with lines containing at least 254 characters,". The standard's definition of "lines" is what determines whether or not a given implementation meets that requirement, not your definition. In general, a program that counts something called "lines" with a different definition of that term than the one in 7.21.2p2 might have to be carefully written to work around that difference. When I said "in this context", I was referring specifically to a context where it is the standard's definition that matters. It's implementation defined whether or not a newline is needed at the end of the last line of a text file, and it's the standard's definition of "lines", not yours, that determines what constitutes the last line of a text file for purposes of determining whether that requirement has been met.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-11 14:53 -0400 |
| Message-ID | <rlvkb2$dbd$1@dont-email.me> |
| In reply to | #155521 |
On 10/11/20 1:51 PM, DFS wrote: ... > I'm not talking about IDEs. I'm talking about writing in general. I > would guess many people write their last line and end it with a period, > not a period and return. Text file editors (including some IDEs) often put in a final return, even if you don't. As a result, if you do put one in, you'll often end up with two newlines at the end of the file.
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 15:15 -0400 |
| Message-ID | <0VIgH.17218$je1.9703@fx22.iad> |
| In reply to | #155525 |
On 10/11/2020 2:53 PM, James Kuyper wrote: > On 10/11/20 1:51 PM, DFS wrote: > ... >> I'm not talking about IDEs. I'm talking about writing in general. I >> would guess many people write their last line and end it with a period, >> not a period and return. > > Text file editors (including some IDEs) often put in a final return, > even if you don't. As a result, if you do put one in, you'll often end > up with two newlines at the end of the file. I think the default Thunderbird editor does, since when I save a draft it adds a terminating CRLF (Windows). I'm going to send this post without saving it, and see if it adds a CRLF to this last line - I'm not going to.
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-14 20:08 +0000 |
| Message-ID | <slrnroemmf.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155525 |
On Sun, 2020-10-11, James Kuyper wrote: > On 10/11/20 1:51 PM, DFS wrote: > ... >> I'm not talking about IDEs. I'm talking about writing in general. I >> would guess many people write their last line and end it with a period, >> not a period and return. > > Text file editors (including some IDEs) often put in a final return, > even if you don't. E.g. Emacs and vi in default configurations. > As a result, if you do put one in, you'll often end > up with two newlines at the end of the file. /That/ is something I have never seen. Which tools do that? Sounds like a bug to me -- and one that's easily fixed. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-14 16:58 -0400 |
| Message-ID | <rm7oq5$grv$1@dont-email.me> |
| In reply to | #155665 |
On 10/14/20 4:08 PM, Jorgen Grahn wrote: > On Sun, 2020-10-11, James Kuyper wrote: >> On 10/11/20 1:51 PM, DFS wrote: >> ... >>> I'm not talking about IDEs. I'm talking about writing in general. I >>> would guess many people write their last line and end it with a period, >>> not a period and return. >> >> Text file editors (including some IDEs) often put in a final return, >> even if you don't. > > E.g. Emacs and vi in default configurations. > >> As a result, if you do put one in, you'll often end >> up with two newlines at the end of the file. > > /That/ is something I have never seen. Which tools do that? Sounds > like a bug to me -- and one that's easily fixed. I opened a new file with vi, and hit the following keys: i 1 Enter Esc : x Here's what I see in the resulting file: ~(48) od -a linetest 0000000 1 nl nl 0000003
[toc] | [prev] | [next] | [standalone]
Page 1 of 3 [1] 2 3 Next page →
Back to top | Article view | comp.lang.c
csiph-web