Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #155503 > unrolled thread
| Started by | DFS <nospam@dfs.com> |
|---|---|
| First post | 2020-10-10 22:37 -0400 |
| Last post | 2020-10-20 15:48 +0100 |
| Articles | 20 on this page of 47 — 14 participants |
Back to article view | Back to comp.lang.c
Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-10 22:37 -0400
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-10 22:06 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 10:38 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 15:36 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 13:51 -0400
Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 18:33 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:20 -0400
Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 19:40 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:47 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:35 -0400
Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 21:13 +0000
Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) DFS <nospam@dfs.com> - 2020-10-11 18:45 -0400
Re: NNTP message requirements Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:11 -0700
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:27 -0400
Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:30 +0100
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 23:56 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 14:53 -0400
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:15 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:08 +0000
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-14 16:58 -0400
Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-14 23:37 +0000
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-14 17:25 -0700
Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-15 01:55 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:19 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:10 +0000
Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-17 19:36 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:16 +0000
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:36 -0700
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 15:12 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 12:16 -0400
Re: Inconsistent line counts from 3 methods Johann Klammer <klammerj@NOSPAM.a1.net> - 2020-10-11 15:18 +0200
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 14:31 +0000
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:31 -0700
Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:15 +0100
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 14:00 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 17:47 -0400
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:26 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-12 13:11 -0400
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-12 10:56 -0700
Re: Inconsistent line counts from 3 methods Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-11-29 00:21 -0800
Re: Inconsistent line counts from 3 methods scott@slp53.sl.home (Scott Lurndal) - 2020-10-12 19:19 +0000
Re: Inconsistent line counts from 3 methods dfs <nospam@dfs.com> - 2020-10-12 18:53 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 23:09 +0000
Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-18 00:24 +0100
Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-18 16:56 +0000
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-20 09:17 -0400
Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-20 15:48 +0100
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
| From | Eli the Bearded <*@eli.users.panix.com> |
|---|---|
| Date | 2020-10-14 23:37 +0000 |
| Message-ID | <eli$2010141937@qaz.wtf> |
| In reply to | #155668 |
In comp.lang.c, James Kuyper <jameskuyper@alumni.caltech.edu> wrote: > On 10/14/20 4:08 PM, Jorgen Grahn wrote: >> On Sun, 2020-10-11, James Kuyper wrote: >>> As a result, if you do put one in, you'll often end >>> up with two newlines at the end of the file. >> /That/ is something I have never seen. Which tools do that? Sounds >> like a bug to me -- and one that's easily fixed. As I read the exchange there I imagined James describing an editor that adds and extra blank line to all files without one. But the demo here shows something slightly different: > I opened a new file with vi, and hit the following keys: > > i 1 Enter Esc : x > > Here's what I see in the resulting file: > > ~(48) od -a linetest > 0000000 1 nl nl > 0000003 This seems like a misdescription of what the old Unix standard editors do. For better or worse, ed, vi, and (I think) emacs all treat a new empty file as being just a new line. What you did is add a second line there (that "Enter") and so you now have a two line file. The editor (vi) doesn't let you explicitly modify the final newline. It just _is_. With vim, you can disable the implicit final newline with "set nofixendofline", but there are Unix tools that might not like such files. As one example, I seem to recall no final newline causing a final "line" to be silently discarded in /etc/sudoers (and files included by that). Elijah ------ though sudoers syntax is a right mess from beginning to end
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-10-14 17:25 -0700 |
| Message-ID | <87mu0o49o7.fsf@nosuchdomain.example.com> |
| In reply to | #155671 |
Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.c, James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
>> On 10/14/20 4:08 PM, Jorgen Grahn wrote:
>>> On Sun, 2020-10-11, James Kuyper wrote:
>>>> As a result, if you do put one in, you'll often end
>>>> up with two newlines at the end of the file.
>>> /That/ is something I have never seen. Which tools do that? Sounds
>>> like a bug to me -- and one that's easily fixed.
>
> As I read the exchange there I imagined James describing an editor
> that adds and extra blank line to all files without one. But the demo
> here shows something slightly different:
>
>> I opened a new file with vi, and hit the following keys:
>>
>> i 1 Enter Esc : x
>>
>> Here's what I see in the resulting file:
>>
>> ~(48) od -a linetest
>> 0000000 1 nl nl
>> 0000003
>
> This seems like a misdescription of what the old Unix standard editors
> do. For better or worse, ed, vi, and (I think) emacs all treat a new
> empty file as being just a new line. What you did is add a second line
> there (that "Enter") and so you now have a two line file. The editor
> (vi) doesn't let you explicitly modify the final newline. It just _is_.
If I open a new file with vim and save it without entering anything, I
get an empty file, which is a valid text file with 0 lines.
If I do the same thing with busybox vi, I get a file with a single
newline, which seems wrong.
As for the behavior James observed, if I type
i 1 Esc
I get a '1' character followed by a single newline, because vim assumes
that I meant to have a newline-terminated line. If I type
i 1 Enter Esc
I get a '1' character followed by two newlines -- i.e., a 2-line text
file whose second line is empty.
vim has options to handle non-empty files without a trailing newline,
but it doesn't make it easy to create them. (It may have an option to
do so, but I haven't bothered to find it.)
> With vim, you can disable the implicit final newline with
> "set nofixendofline", but there are Unix tools that might not like such
> files. As one example, I seem to recall no final newline causing a final
> "line" to be silently discarded in /etc/sudoers (and files included by
> that).
>
> Elijah
> ------
> though sudoers syntax is a right mess from beginning to end
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Eli the Bearded <*@eli.users.panix.com> |
|---|---|
| Date | 2020-10-15 01:55 +0000 |
| Message-ID | <eli$2010142128@qaz.wtf> |
| In reply to | #155672 |
In comp.lang.c, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: > If I open a new file with vim and save it without entering anything, I > get an empty file, which is a valid text file with 0 lines. Same happens in nvi and ed (tried with /bin/ed on NetBSD). > If I do the same thing with busybox vi, I get a file with a single > newline, which seems wrong. That does seem wrong. > As for the behavior James observed, if I type > > i 1 Esc > > I get a '1' character followed by a single newline, because vim assumes > that I meant to have a newline-terminated line. If I type So I compiled and tested ex-1.1 from https://github.com/n-t-roff/ex-1.1 As ex, the save an empty file creates a zero length file. As vi, *you cannot edit a zero length file*. $ HOME=/var/tmp/ ./a.out /var/tmp/exds "/var/tmp/exds" No such file or directory :vi No lines in the buffer :a . :vi No lines in the buffer :a 1 . :vi (succeeds) q:w :q "/var/tmp/exds" 1 line, 2 characters $ (ex-1.1 has what is recognizable as vi, but it has significant short comings like "no colon commands in visual mode" and "can only access visual mode after starting the editor".) > i 1 Enter Esc > > I get a '1' character followed by two newlines -- i.e., a 2-line text > file whose second line is empty. Yes. This is the implicit newline I spoke of. vi starts with a single empty line, the save of a file with nothing on that single empty line is a special case for vi. For ed and ex you can't enter a partial line. /bin/ed gives an error if you trying to <ctrl-d> EOF a partial line. ex (nex Version 1.81.6-2013-11-20nb4) and ex-1.1 ignored <ctrl-d> EOF until I got tired of testing (more than ten, but less than 26, which I recall being the special count of them in csh when ignoreeof is set). > vim has options to handle non-empty files without a trailing newline, > but it doesn't make it easy to create them. (It may have an option to > do so, but I haven't bothered to find it.) Yeah, it's antithetical to vi, and vim follows vi. But it can be done: i 1 Esc :set noendofline nofixendofline | x Enter Elijah ------ to be honest, had cloned and compiled ex-1.1 several weeks ago
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-17 19:19 +0000 |
| Message-ID | <slrnromgt5.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155671 |
On Wed, 2020-10-14, Eli the Bearded wrote: ... > With vim, you can disable the implicit final newline with > "set nofixendofline", but there are Unix tools that might not like such > files. As one example, I seem to recall no final newline causing a final > "line" to be silently discarded in /etc/sudoers (and files included by > that). Sounds like a bug to be fixed. There is also the example I gave upthread, which cannot be fixed: cat foo bar. cat(1) itself has no business trying to interpret its input files as lines of text, but as a user you don't expect it to jam the last line of foo together with the first line of bar -- if you think of these as text files. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-17 19:10 +0000 |
| Message-ID | <slrnromgd7.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155668 |
On Wed, 2020-10-14, James Kuyper wrote: > On 10/14/20 4:08 PM, Jorgen Grahn wrote: >> On Sun, 2020-10-11, James Kuyper wrote: >>> On 10/11/20 1:51 PM, DFS wrote: >>> ... >>>> I'm not talking about IDEs. I'm talking about writing in general. I >>>> would guess many people write their last line and end it with a period, >>>> not a period and return. >>> >>> Text file editors (including some IDEs) often put in a final return, >>> even if you don't. >> >> E.g. Emacs and vi in default configurations. >> >>> As a result, if you do put one in, you'll often end >>> up with two newlines at the end of the file. >> >> /That/ is something I have never seen. Which tools do that? Sounds >> like a bug to me -- and one that's easily fixed. > > I opened a new file with vi, and hit the following keys: > > i 1 Enter Esc : x > > Here's what I see in the resulting file: > > ~(48) od -a linetest > 0000000 1 nl nl > 0000003 Wow, you're right! I get that behavior with both nvi and the one in OpenBSD; perhaps it's mandated by POSIX? I use vi frequently for smaller tasks, but never noticed. Partly because I tend to edit existing files, and maybe partly because I unconsciously see that final newline and don't add one. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <793-849-0957@kylheku.com> |
|---|---|
| Date | 2020-10-17 19:36 +0000 |
| Message-ID | <20201017122145.940@kylheku.com> |
| In reply to | #155732 |
On 2020-10-17, Jorgen Grahn <grahn+nntp@snipabacken.se> wrote:
> On Wed, 2020-10-14, James Kuyper wrote:
>> On 10/14/20 4:08 PM, Jorgen Grahn wrote:
>>> On Sun, 2020-10-11, James Kuyper wrote:
>>>> On 10/11/20 1:51 PM, DFS wrote:
>>>> ...
>>>>> I'm not talking about IDEs. I'm talking about writing in general. I
>>>>> would guess many people write their last line and end it with a period,
>>>>> not a period and return.
>>>>
>>>> Text file editors (including some IDEs) often put in a final return,
>>>> even if you don't.
>>>
>>> E.g. Emacs and vi in default configurations.
>>>
>>>> As a result, if you do put one in, you'll often end
>>>> up with two newlines at the end of the file.
>>>
>>> /That/ is something I have never seen. Which tools do that? Sounds
>>> like a bug to me -- and one that's easily fixed.
>>
>> I opened a new file with vi, and hit the following keys:
>>
>> i 1 Enter Esc : x
>>
>> Here's what I see in the resulting file:
>>
>> ~(48) od -a linetest
>> 0000000 1 nl nl
>> 0000003
>
> Wow, you're right! I get that behavior with both nvi and the one in
> OpenBSD; perhaps it's mandated by POSIX?
No no no.
It's mandated by the fact that you created two lines.
If you want to create a file which contains just one line, you do this:
i1Esc:x<Enter>
By entering material into the previously empty file, you have already
created a line. So that is to say, the moment you hit i to begin
inserting into the empty file, line 1 is allocated for you to type into.
If you then Enter while still in insert mode, you're creating an
additional line.
This is perfectly clear from the traditional ~ (tilde) marks that run
down the left column, indicating the nonexistent lines past the end of
the buffer.
If you type in the line as I show above, you will see that there is a
tilde just below the 1. If you hit Enter before hitting Esc, then
you see a blank line between the 1 and the first tilde.
This is all reasonably intuitive.
The slightly counter-intuitive part is that a line is created when you
hit i. That's obviously a special case for the empty buffer situation;
that command will not create a line if the cursor is already on a line,
which is always the case if the buffer is not empty.
POSIX has these words, interestingly:
Initialization in ex and vi
Historically, vi always had a line in the edit buffer, even if the edit
buffer was "empty". For example:
1. The ex command = executed from visual mode wrote "1" when the buffer
was empty.
2. Writes from visual mode of an empty edit buffer wrote files of a
single character (a <newline>), while writes from ex mode of an
empty edit buffer wrote empty files.
3. Put and read commands into an empty edit buffer left an empty line
at the top of the edit buffer.
For consistency, POSIX.1-2017 does not permit any of these behaviors.
Those are just interesting historic remarks I found; I tried looking for
a concrete specification of behavior for i (insert before cursor) when
the buffer is empty. I didn't find anything and don't have time at the
moment to go through it in more detail. The observed behavior in various
vi implementations makes a lot of sense, though.
--
TXR Programming Language: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-14 20:16 +0000 |
| Message-ID | <slrnroen59.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155521 |
On Sun, 2020-10-11, DFS wrote: > On 10/11/2020 11:36 AM, Jorgen Grahn wrote: >> On Sun, 2020-10-11, DFS wrote: >> ... >>> I see fgets() is more "reliable" than f/getc in case your final line is >>> missing a newline (which I would bet happens frequently): >> >> In the past, on Unix, it used to happen very infrequently, but it >> seems recent IDEs generate "endless" text files by default. >> >> I have yet to figure out why they do this. The only effect is a >> saving of one byte, and that lot of traditional tools break in subtle >> ways[1] ... but a conspiracy against Unix seems unlikely. > > > I'm not talking about IDEs. I'm talking about writing in general. Which may involve an IDE. But I didn't mean to talk about IDEs either. My main message was, on Unix I would expect few text files to have a missing final newline. This may of course not be useful information to anyone, and I don't claim to know what traditions, if any, exist in other communities which use text files. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | Barry Schwarz <schwarzb@delq.com> |
|---|---|
| Date | 2020-10-11 11:36 -0700 |
| Message-ID | <ktj6oft4i3mmrn4beslolt3vi6ncqfqpi1@4ax.com> |
| In reply to | #155516 |
On 11 Oct 2020 15:36:18 GMT, Jorgen Grahn <grahn+nntp@snipabacken.se> wrote: >On Sun, 2020-10-11, DFS wrote: >... >> I see fgets() is more "reliable" than f/getc in case your final line is >> missing a newline (which I would bet happens frequently): > >In the past, on Unix, it used to happen very infrequently, but it >seems recent IDEs generate "endless" text files by default. > >I have yet to figure out why they do this. The only effect is a >saving of one byte, and that lot of traditional tools break in subtle >ways[1] ... but a conspiracy against Unix seems unlikely. > >> "fgets() stops when either (n-1) characters are read, the newline >> character is read, or the end-of-file is reached, whichever comes first." > >What do you mean? All the functions you list give you all information >available; they all seem reliable. > >(fgets() can't handle '\0' characters, but that's a separate thing.) There is nothing in the description of fgets that mentions any issue with the \0 character. fgets will stop reading after the appropriate number of characters or reading the \n, whichever comes first. \0 characters will be stored in the array just like any other character. If someone were to process the array as a string, the \0 would mess things up but that is not an fgets issue and not relevant to the problem under discussion. -- Remove del for email
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-11 15:12 -0400 |
| Message-ID | <rlvlff$trg$1@dont-email.me> |
| In reply to | #155524 |
On 10/11/20 2:36 PM, Barry Schwarz wrote: > On 11 Oct 2020 15:36:18 GMT, Jorgen Grahn <grahn+nntp@snipabacken.se> > wrote: > >> On Sun, 2020-10-11, DFS wrote: >> ... >>> I see fgets() is more "reliable" than f/getc in case your final line is >>> missing a newline (which I would bet happens frequently): >> >> In the past, on Unix, it used to happen very infrequently, but it >> seems recent IDEs generate "endless" text files by default. >> >> I have yet to figure out why they do this. The only effect is a >> saving of one byte, and that lot of traditional tools break in subtle >> ways[1] ... but a conspiracy against Unix seems unlikely. >> >>> "fgets() stops when either (n-1) characters are read, the newline >>> character is read, or the end-of-file is reached, whichever comes first." >> >> What do you mean? All the functions you list give you all information >> available; they all seem reliable. >> >> (fgets() can't handle '\0' characters, but that's a separate thing.) > > There is nothing in the description of fgets that mentions any issue > with the \0 character. fgets will stop reading after the appropriate > number of characters or reading the \n, whichever comes first. \0 > characters will be stored in the array just like any other character. > If someone were to process the array as a string, the \0 would mess > things up but that is not an fgets issue and not relevant to the > problem under discussion. It's not specific to fgets(), it's a property of text streams. "... Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. ..." (7.21.2p2). Data that contains a null character doesn't qualify for the above guarantee, which gives implementations permission to (among many other possibilities) drop such characters while either writing to or reading from a text stream. "Data read in from a binary stream shall compare equal to the data that were earlier written out to that stream, under the same implementation." (7.21.2p3). Therefore, null characters should not be a problem for fgets() when reading from a binary stream. However, on systems where lines are indicated by methods other than a newline at the end of each line, fgets() on a binary stream won't work the way you probably want it to.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-11 12:16 -0400 |
| Message-ID | <rlvb43$emn$1@dont-email.me> |
| In reply to | #155504 |
On 10/11/20 1:06 AM, Barry Schwarz wrote: > On Sat, 10 Oct 2020 22:37:35 -0400, DFS <nospam@dfs.com> wrote: ... >> FILE *fin = fopen(argv[1],"r"); >> fseek(fin, 0, SEEK_END); >> int buffer = ftell(fin); >> char *myStr = malloc(sizeof(char) * (buffer + 1)); >> rewind(fin); >> fread(myStr, sizeof(char), buffer, fin); ... > Alternately, you could open the file in binary mode instead of text > mode. ftell should work then. In text mode, whatever implementation-specific method is used to identify lines is converted into a single '\n' at the end of each line. In binary mode, no such conversion will be done. On platforms where line endings are separated by a sequence such as '\n\r' or '\r\n', counting '\n' characters without also checking for an '\r' in the correct location may overcount the number of lines. On platforms where lines are marked by methods that make no use of '\n' characters, it will undercount them.
[toc] | [prev] | [next] | [standalone]
| From | Johann Klammer <klammerj@NOSPAM.a1.net> |
|---|---|
| Date | 2020-10-11 15:18 +0200 |
| Message-ID | <rlv0mo$296$1@gioia.aioe.org> |
| In reply to | #155503 |
On 10/11/2020 04:37 AM, DFS wrote:
> $ countlines war_peace.txt
> fread-var: 66875 off by 1183
> fgetc : 65692 correct
> fgets : 65692 correct
>
>
> $ countlines bible.txt
> fread-var: 31255 off by 153
> fgetc : 31101 off by 1
> fgets : 31102 correct
>
>
> ======================================================
> #include <stdio.h>
> #include <stdlib.h>
>
> int main(int argc, char *argv[])
> {
>
> char line[600] = "";
> char c;
>
> // use fread to populate a variable
> // open file, go to end, get size, allocate memory, back to
> // beginning, read contents into variable
> FILE *fin = fopen(argv[1],"r");
> fseek(fin, 0, SEEK_END);
> int buffer = ftell(fin);
> char *myStr = malloc(sizeof(char) * (buffer + 1));
> rewind(fin);
> fread(myStr, sizeof(char), buffer, fin);
Here, I don't see where you terminate the string.
malloc returns unitialized memory, I think.
And fread just reads a chunk of data.
Use calloc or do a
myStr[buffer]='\0';
before fread.
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-11 14:31 +0000 |
| Message-ID | <slrnro65qa.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155513 |
On Sun, 2020-10-11, Johann Klammer wrote: > malloc returns unitialized memory, I think. True, it does. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | Barry Schwarz <schwarzb@delq.com> |
|---|---|
| Date | 2020-10-11 11:31 -0700 |
| Message-ID | <ebj6ofhp65rko2a372a58qirpbmbk0utgo@4ax.com> |
| In reply to | #155513 |
On Sun, 11 Oct 2020 15:18:02 +0200, Johann Klammer
<klammerj@NOSPAM.a1.net> wrote:
>On 10/11/2020 04:37 AM, DFS wrote:
>> $ countlines war_peace.txt
>> fread-var: 66875 off by 1183
>> fgetc : 65692 correct
>> fgets : 65692 correct
>>
>>
>> $ countlines bible.txt
>> fread-var: 31255 off by 153
>> fgetc : 31101 off by 1
>> fgets : 31102 correct
>>
>>
>> ======================================================
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int main(int argc, char *argv[])
>> {
>>
>> char line[600] = "";
>> char c;
>>
>> // use fread to populate a variable
>> // open file, go to end, get size, allocate memory, back to
>> // beginning, read contents into variable
>> FILE *fin = fopen(argv[1],"r");
>> fseek(fin, 0, SEEK_END);
>> int buffer = ftell(fin);
>> char *myStr = malloc(sizeof(char) * (buffer + 1));
>> rewind(fin);
>> fread(myStr, sizeof(char), buffer, fin);
>
>Here, I don't see where you terminate the string.
>malloc returns unitialized memory, I think.
>And fread just reads a chunk of data.
>Use calloc or do a
>myStr[buffer]='\0';
>before fread.
calloc would prevent him from seeing any residual \n characters in the
buffer. This would eliminate the extra lines his program counted but
will do nothing to solve the undercount if the last line does not end
with a \n.
Setting myStr[buffer] does nothing to solve any of the issues
reported. The fact that he does not terminate the array with a '\0'
is irrelevant since he is not processing strings or using any
functions that do.
--
Remove del for email
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2020-10-11 23:15 +0100 |
| Message-ID | <87v9fg8l4u.fsf@bsb.me.uk> |
| In reply to | #155513 |
Johann Klammer <klammerj@NOSPAM.a1.net> writes:
> On 10/11/2020 04:37 AM, DFS wrote:
>> $ countlines war_peace.txt
>> fread-var: 66875 off by 1183
>> fgetc : 65692 correct
>> fgets : 65692 correct
>>
>>
>> $ countlines bible.txt
>> fread-var: 31255 off by 153
>> fgetc : 31101 off by 1
>> fgets : 31102 correct
>>
>>
>> ======================================================
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int main(int argc, char *argv[])
>> {
>>
>> char line[600] = "";
>> char c;
>>
>> // use fread to populate a variable
>> // open file, go to end, get size, allocate memory, back to
>> // beginning, read contents into variable
>> FILE *fin = fopen(argv[1],"r");
>> fseek(fin, 0, SEEK_END);
>> int buffer = ftell(fin);
>> char *myStr = malloc(sizeof(char) * (buffer + 1));
>> rewind(fin);
>> fread(myStr, sizeof(char), buffer, fin);
>
> Here, I don't see where you terminate the string.
> malloc returns unitialized memory, I think.
> And fread just reads a chunk of data.
> Use calloc or do a
> myStr[buffer]='\0';
> before fread.
Not needed. The + 1 makes the reader /think/ that a string is going to
be read in, but the data are accessed using an index bounded by the
count of bytes read. So, if there is an error here, it is the
misleading + 1.
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-10-11 14:00 -0700 |
| Message-ID | <87eem47a0q.fsf@nosuchdomain.example.com> |
| In reply to | #155503 |
DFS <nospam@dfs.com> writes:
[...]
> //count lines from file with fgets
> lines = 0;
> rewind(fin);
> while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
> printf("fgets : %d\n",lines);
[...]
The fgets() method overcounts if the input has lines longer than 600
characters.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-11 17:47 -0400 |
| Message-ID | <t8LgH.593522$eN2.578034@fx47.iad> |
| In reply to | #155539 |
On 10/11/2020 5:00 PM, Keith Thompson wrote:
> DFS <nospam@dfs.com> writes:
> [...]
>> //count lines from file with fgets
>> lines = 0;
>> rewind(fin);
>> while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>> printf("fgets : %d\n",lines);
> [...]
>
> The fgets() method overcounts if the input has lines longer than 600
> characters.
Maybe.
wc -L bible.txt = maxlinelength = 516
(wc.exe from GnuWin32 - year 2005)
http://www.truth.info/bigfiles/bible.txt.zip
(note: not trying to push religion - I just happened to use that file
for testing, because it's big and organized)
---------------------------------------------------------
#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *fin = fopen(argv[1],"r");
int lines = 0;
char line[400] = "";
while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
printf("Line buffer = %d, fgets line count = %d\n",sizeof line, lines);
lines = 0;
char line2[513] = "";
rewind(fin);
while (fgets(line2,sizeof line2, fin)!= NULL) {lines++;}
printf("Line buffer = %d, fgets line count = %d\n",sizeof line2, lines);
lines = 0;
char line3[600] = "";
rewind(fin);
while (fgets(line3,sizeof line3, fin)!= NULL) {lines++;}
printf("Line buffer = %d, fgets line count = %d\n",sizeof line3, lines);
fclose(fin);
return(0);
}
---------------------------------------------------------
$ tcc split_test.c -o split_test.exe
$ split_test bible.txt
Line buffer = 400, fgets line count = 31118
Line buffer = 513, fgets line count = 31102 (correct)
Line buffer = 600, fgets line count = 31102 (correct)
note: wc -l bible.txt = 31101 because the very last line is missing a \n
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-10-11 17:26 -0700 |
| Message-ID | <871ri470i1.fsf@nosuchdomain.example.com> |
| In reply to | #155543 |
DFS <nospam@dfs.com> writes:
> On 10/11/2020 5:00 PM, Keith Thompson wrote:
>> DFS <nospam@dfs.com> writes:
>> [...]
>>> //count lines from file with fgets
>>> lines = 0;
>>> rewind(fin);
>>> while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>>> printf("fgets : %d\n",lines);
>> [...]
>>
>> The fgets() method overcounts if the input has lines longer than 600
>> characters.
>
> Maybe.
> wc -L bible.txt = maxlinelength = 516
> (wc.exe from GnuWin32 - year 2005)
[...]
I think your point was that the line length that causes your fgets
code to overcount might not be exactly 600. If that's your point,
you're right -- but I didn't take the time to be exact, or define
just what the length of a line is (e.g., whether it includes the
terminator, if any).
My point is that by calling fgets() with second argument of 600
(or any fixed value), your code fails to correctly handle text
with very long lines. You could fix that by scanning the input
line for '\n' characters, but since fgets() doesn't tell you how
many characters it read that could be expensive (O(N) where N is
the length of the line). And if you use fgets() to read from a
binary stream, you can't tell whether an embedded null character
is the end of the data read by fgets() or data read from the file.
If you only care about *how many* lines are in your input, there's
no point in using fgets(). Just read a character or a block at
a time and scan for '\n' characters (and *maybe* apply special
handling if the last character read isn't '\n').
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2020-10-12 13:11 -0400 |
| Message-ID | <F80hH.340754$%p.222980@fx33.iad> |
| In reply to | #155548 |
On 10/11/2020 8:26 PM, Keith Thompson wrote:
> DFS <nospam@dfs.com> writes:
>> On 10/11/2020 5:00 PM, Keith Thompson wrote:
>>> DFS <nospam@dfs.com> writes:
>>> [...]
>>>> //count lines from file with fgets
>>>> lines = 0;
>>>> rewind(fin);
>>>> while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
>>>> printf("fgets : %d\n",lines);
>>> [...]
>>>
>>> The fgets() method overcounts if the input has lines longer than 600
>>> characters.
>>
>> Maybe.
>
>> wc -L bible.txt = maxlinelength = 516
>> (wc.exe from GnuWin32 - year 2005)
> [...]
>
> I think your point was that the line length that causes your fgets
> code to overcount might not be exactly 600. If that's your point,
> you're right -- but I didn't take the time to be exact, or define
> just what the length of a line is (e.g., whether it includes the
> terminator, if any).
My point was it appeared that setting the buffer smaller than the
maxlinelength didn't necessarily cause a line overcount.
from wc.exe in GnuWin32:
$ wc -L bible.txt
516 bible.txt
Line buffer = 400, fgets line count = 31118 (overcount)
Line buffer = 513, fgets line count = 31102 (correct)
Line buffer = 600, fgets line count = 31102
That didn't make sense if the max line length was truly 516, so I wrote
code to check the max line length, and then change the buffer size and
do line counts at each buffer size.
$ split_test bible.txt
New max line length = 67
New max line length = 155
New max line length = 157
New max line length = 190
New max line length = 211
New max line length = 265
New max line length = 273
New max line length = 278
New max line length = 291
New max line length = 334
New max line length = 362
New max line length = 375
New max line length = 391
New max line length = 439
New max line length = 458
New max line length = 512
Line buffer = 1024, fgets line count = 31102
Line buffer = 507, fgets line count = 31103
Line buffer = 508, fgets line count = 31103
Line buffer = 509, fgets line count = 31103
Line buffer = 510, fgets line count = 31103
Line buffer = 511, fgets line count = 31103
Line buffer = 512, fgets line count = 31103
Line buffer = 513, fgets line count = 31102 (correct)
Line buffer = 514, fgets line count = 31102
Line buffer = 515, fgets line count = 31102
Line buffer = 516, fgets line count = 31102
Line buffer = 517, fgets line count = 31102
I was mislead by the incorrect maxlinelength value of 516, and by my
inexperience.
> My point is that by calling fgets() with second argument of 600
> (or any fixed value), your code fails to correctly handle text
> with very long lines. You could fix that by scanning the input
> line for '\n' characters, but since fgets() doesn't tell you how
> many characters it read that could be expensive (O(N) where N is
> the length of the line). And if you use fgets() to read from a
> binary stream, you can't tell whether an embedded null character
> is the end of the data read by fgets() or data read from the file.
Gotcha.
> If you only care about *how many* lines are in your input, there's
> no point in using fgets(). Just read a character or a block at
> a time and scan for '\n' characters (and *maybe* apply special
> handling if the last character read isn't '\n').
Why maybe? Shouldn't you test every time, and add one to your linecount
if the last character before EOF isn't \n?
----------------------------------------------------
#include <stdio.h>
int main(int argc, char *argv[])
{
//count newline with getc
FILE *fin = fopen(argv[1],"r");
char c;
int lines = 0;
for (c=getc(fin);c!=EOF;c=getc(fin)) {if(c=='\n') {lines++;}}
fseek(fin, ftell(fin)-1, SEEK_SET);
c=getc(fin);
if(c!='\n') {lines++;printf("Last character = '%c'\n",c);}
printf("getc line count: %d\n",lines);
fclose(fin);
return(0);
}
----------------------------------------------------
I tested that code a few times and it worked. Even though the pointer
is at EOF after the for..loop, do you think it's potentially troublesome
not to use an explicit fseek(fin, 0, SEEK_END); after the for..loop?
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-10-12 10:56 -0700 |
| Message-ID | <877drv5nvv.fsf@nosuchdomain.example.com> |
| In reply to | #155583 |
DFS <nospam@dfs.com> writes:
> On 10/11/2020 8:26 PM, Keith Thompson wrote:
[...]
>> If you only care about *how many* lines are in your input, there's
>> no point in using fgets(). Just read a character or a block at
>> a time and scan for '\n' characters (and *maybe* apply special
>> handling if the last character read isn't '\n').
>
> Why maybe? Shouldn't you test every time, and add one to your
> linecount if the last character before EOF isn't \n?
Because it depends on how you choose to define a "line".
Consider a 7-byte file containing "foo", a newline, and "bar" without a
newline. Does it have 1 line or 2?
For a C implementation in which the last line does not require a
terminating new-line character (N1570 7.21.2p2), it has two lines.
For a C implementation that does require a new-line on the last line,
it's not a well formed text file. The wc command says it has one
line (POSIX says "wc -l" counts <newline> characters).
I'm not going to argue which definition is correct, just pointing out
that there is no one clear definition. If you want to define it so that
the file I described has 2 lines, that's fine -- but I suggest
documenting it.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2020-11-29 00:21 -0800 |
| Message-ID | <864kl8r2sl.fsf@linuxsc.com> |
| In reply to | #155585 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > DFS <nospam@dfs.com> writes: > >> On 10/11/2020 8:26 PM, Keith Thompson wrote: > > [...] > >>> If you only care about *how many* lines are in your input, there's >>> no point in using fgets(). Just read a character or a block at >>> a time and scan for '\n' characters (and *maybe* apply special >>> handling if the last character read isn't '\n'). >> >> Why maybe? Shouldn't you test every time, and add one to your >> linecount if the last character before EOF isn't \n? > > Because it depends on how you choose to define a "line". > > Consider a 7-byte file containing "foo", a newline, and "bar" without a > newline. Does it have 1 line or 2? > > For a C implementation in which the last line does not require a > terminating new-line character (N1570 7.21.2p2), it has two lines. > For a C implementation that does require a new-line on the last line, > it's not a well formed text file. [...] It seems a reasonable guess that C implementations allow a last line without a terminating newline characters if and only if the underlying operating system supports that. If that is so then there is no ambiguity as far as C implementations are concerned.
[toc] | [prev] | [next] | [standalone]
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
Back to top | Article view | comp.lang.c
csiph-web