Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #155503 > unrolled thread
| Started by | DFS <nospam@dfs.com> |
|---|---|
| First post | 2020-10-10 22:37 -0400 |
| Last post | 2020-10-20 15:48 +0100 |
| Articles | 7 on this page of 47 — 14 participants |
Back to article view | Back to comp.lang.c
Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-10 22:37 -0400
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-10 22:06 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 10:38 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 15:36 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 13:51 -0400
Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 18:33 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:20 -0400
Re: Inconsistent line counts from 3 methods Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 19:40 +0000
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:47 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:35 -0400
Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) Lew Pitcher <lew.pitcher@digitalfreehold.ca> - 2020-10-11 21:13 +0000
Re: NNTP message requirements (Was: Inconsistent line counts from 3 methods) DFS <nospam@dfs.com> - 2020-10-11 18:45 -0400
Re: NNTP message requirements Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:11 -0700
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 16:27 -0400
Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:30 +0100
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 23:56 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 14:53 -0400
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 15:15 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:08 +0000
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-14 16:58 -0400
Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-14 23:37 +0000
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-14 17:25 -0700
Re: Inconsistent line counts from 3 methods Eli the Bearded <*@eli.users.panix.com> - 2020-10-15 01:55 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:19 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 19:10 +0000
Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-17 19:36 +0000
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-14 20:16 +0000
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:36 -0700
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 15:12 -0400
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-11 12:16 -0400
Re: Inconsistent line counts from 3 methods Johann Klammer <klammerj@NOSPAM.a1.net> - 2020-10-11 15:18 +0200
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-11 14:31 +0000
Re: Inconsistent line counts from 3 methods Barry Schwarz <schwarzb@delq.com> - 2020-10-11 11:31 -0700
Re: Inconsistent line counts from 3 methods Ben Bacarisse <ben.usenet@bsb.me.uk> - 2020-10-11 23:15 +0100
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 14:00 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-11 17:47 -0400
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-11 17:26 -0700
Re: Inconsistent line counts from 3 methods DFS <nospam@dfs.com> - 2020-10-12 13:11 -0400
Re: Inconsistent line counts from 3 methods Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-10-12 10:56 -0700
Re: Inconsistent line counts from 3 methods Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-11-29 00:21 -0800
Re: Inconsistent line counts from 3 methods scott@slp53.sl.home (Scott Lurndal) - 2020-10-12 19:19 +0000
Re: Inconsistent line counts from 3 methods dfs <nospam@dfs.com> - 2020-10-12 18:53 -0400
Re: Inconsistent line counts from 3 methods Jorgen Grahn <grahn+nntp@snipabacken.se> - 2020-10-17 23:09 +0000
Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-18 00:24 +0100
Re: Inconsistent line counts from 3 methods Kaz Kylheku <793-849-0957@kylheku.com> - 2020-10-18 16:56 +0000
Re: Inconsistent line counts from 3 methods James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-10-20 09:17 -0400
Re: Inconsistent line counts from 3 methods Bart <bc@freeuk.com> - 2020-10-20 15:48 +0100
Page 3 of 3 — ← Prev page 1 2 [3]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2020-10-12 19:19 +0000 |
| Message-ID | <F_1hH.334040$Av7.7306@fx34.iad> |
| In reply to | #155583 |
DFS <nospam@dfs.com> writes:
>On 10/11/2020 8:26 PM, Keith Thompson wrote:
>> If you only care about *how many* lines are in your input, there's
>> no point in using fgets(). Just read a character or a block at
>> a time and scan for '\n' characters (and *maybe* apply special
>> handling if the last character read isn't '\n').
>
>Why maybe? Shouldn't you test every time, and add one to your linecount
>if the last character before EOF isn't \n?
>
>----------------------------------------------------
>#include <stdio.h>
>int main(int argc, char *argv[])
>{
> //count newline with getc
> FILE *fin = fopen(argv[1],"r");
> char c;
> int lines = 0;
> for (c=getc(fin);c!=EOF;c=getc(fin)) {if(c=='\n') {lines++;}}
> fseek(fin, ftell(fin)-1, SEEK_SET);
> c=getc(fin);
> if(c!='\n') {lines++;printf("Last character = '%c'\n",c);}
> printf("getc line count: %d\n",lines);
> fclose(fin);
> return(0);
>}
>----------------------------------------------------
>
>I tested that code a few times and it worked. Even though the pointer
>is at EOF after the for..loop, do you think it's potentially troublesome
>not to use an explicit fseek(fin, 0, SEEK_END); after the for..loop?
>
>
The fastest way to count lines:
#include <errno.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
int
main(int argc, const char **argv, const char **envp)
{
int fd;
uint8_t *cp;
struct stat st;
size_t linecount = 0ul;
if (argc < 2) {
fprintf(stderr, "%s: The file to scan must be supplied as an argument\n", argv[0]);
return 1;
}
fd = open(argv[1], O_RDONLY, 0);
if (fd == -1) {
fprintf(stderr, "%s: Unable to open '%s': %s\n",
argv[0], argv[1], strerror(errno));
return 2;
}
fstat(fd, &st);
cp = (uint8_t *)mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0ul);
if ((void *)cp == MAP_FAILED) {
fprintf(stderr, "%s: Unable to map '%s': %s\n",
argv[0], argv[1], strerror(errno));
return 3;
}
for(size_t s = st.st_size; s > 0; --s) {
if (*cp++ == '\n') linecount++;
}
fprintf(stdout, "Line count is %zu\n", linecount);
if (*(cp - 1) != '\n') fprintf(stdout, "Last byte of file was not a newline\n");
return 0;
}
Yes, on x86 (32-bit), this may choke on files over 1GB (depending
on the virtual and physical address space resource limits). In which
case, mapping smaller portions works just fine.
With this approach, the data from the input file is loaded directly
into the application address space during the page fault process. There
are no intermediate kernel or library buffers involved unlike stdio.
[toc] | [prev] | [next] | [standalone]
| From | dfs <nospam@dfs.com> |
|---|---|
| Date | 2020-10-12 18:53 -0400 |
| Message-ID | <s75hH.312992$575.308561@fx38.iad> |
| In reply to | #155588 |
On 10/12/20 3:19 PM, Scott Lurndal wrote:
> DFS <nospam@dfs.com> writes:
>> On 10/11/2020 8:26 PM, Keith Thompson wrote:
>
>>> If you only care about *how many* lines are in your input, there's
>>> no point in using fgets(). Just read a character or a block at
>>> a time and scan for '\n' characters (and *maybe* apply special
>>> handling if the last character read isn't '\n').
>>
>> Why maybe? Shouldn't you test every time, and add one to your linecount
>> if the last character before EOF isn't \n?
>>
>> ----------------------------------------------------
>> #include <stdio.h>
>> int main(int argc, char *argv[])
>> {
>> //count newline with getc
>> FILE *fin = fopen(argv[1],"r");
>> char c;
>> int lines = 0;
>> for (c=getc(fin);c!=EOF;c=getc(fin)) {if(c=='\n') {lines++;}}
>> fseek(fin, ftell(fin)-1, SEEK_SET);
>> c=getc(fin);
>> if(c!='\n') {lines++;printf("Last character = '%c'\n",c);}
>> printf("getc line count: %d\n",lines);
>> fclose(fin);
>> return(0);
>> }
>> ----------------------------------------------------
>>
>> I tested that code a few times and it worked. Even though the pointer
>> is at EOF after the for..loop, do you think it's potentially troublesome
>> not to use an explicit fseek(fin, 0, SEEK_END); after the for..loop?
>>
>>
>
>
> The fastest way to count lines:
>
> #include <errno.h>
> #include <fcntl.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
>
> #include <sys/mman.h>
> #include <sys/stat.h>
>
> int
> main(int argc, const char **argv, const char **envp)
> {
> int fd;
> uint8_t *cp;
> struct stat st;
> size_t linecount = 0ul;
>
> if (argc < 2) {
> fprintf(stderr, "%s: The file to scan must be supplied as an argument\n", argv[0]);
> return 1;
> }
> fd = open(argv[1], O_RDONLY, 0);
> if (fd == -1) {
> fprintf(stderr, "%s: Unable to open '%s': %s\n",
> argv[0], argv[1], strerror(errno));
> return 2;
> }
> fstat(fd, &st);
> cp = (uint8_t *)mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0ul);
> if ((void *)cp == MAP_FAILED) {
> fprintf(stderr, "%s: Unable to map '%s': %s\n",
> argv[0], argv[1], strerror(errno));
> return 3;
> }
> for(size_t s = st.st_size; s > 0; --s) {
> if (*cp++ == '\n') linecount++;
> }
>
> fprintf(stdout, "Line count is %zu\n", linecount);
> if (*(cp - 1) != '\n') fprintf(stdout, "Last byte of file was not a newline\n");
> return 0;
> }
>
> Yes, on x86 (32-bit), this may choke on files over 1GB (depending
> on the virtual and physical address space resource limits). In which
> case, mapping smaller portions works just fine.
>
> With this approach, the data from the input file is loaded directly
> into the application address space during the page fault process. There
> are no intermediate kernel or library buffers involved unlike stdio.
$ gcc -Wall linecounter_lurndal.c -o linecounter_lurndal
$ time ./linecounter_lurndal bible.txt
Line count is 31101
Last byte of file was not a newline
real 0m0.023s
user 0m0.023s
sys 0m0.000s
$ time ./linecounter_lurndal bible4x.txt
Line count is 124408
real 0m0.086s
user 0m0.082s
sys 0m0.004s
$ time ./linecounter_DFS bible.txt
31102 lines
real 0m0.008s
user 0m0.008s
sys 0m0.000s
$ time ./linecounter_DFS bible4x.txt
124408 lines
real 0m0.029s
user 0m0.021s
sys 0m0.008s
--------------------------------------
mine is a 'standard' fgets routine
--------------------------------------
#include <stdio.h>
int main(int argc, char *argv[])
{
// usage: linecounter_DFS filename
int lines = 0;
char line[1024] = "";
FILE *fin = fopen(argv[1],"r");
while (fgets(line,sizeof line, fin)!= NULL) {lines++;}
fclose(fin);
printf("%d lines\n",lines);
return 0;
}
--------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | Jorgen Grahn <grahn+nntp@snipabacken.se> |
|---|---|
| Date | 2020-10-17 23:09 +0000 |
| Message-ID | <slrnromuco.1hpq.grahn+nntp@frailea.sa.invalid> |
| In reply to | #155597 |
On Mon, 2020-10-12, dfs wrote: ... > $ gcc -Wall linecounter_lurndal.c -o linecounter_lurndal > $ time ./linecounter_lurndal bible.txt Why do you time code that you built with optimization disabled? You can argue (maybe) that it doesn't matter in this case, but you would have saved yourself the trouble by typing four more characters. /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o .
[toc] | [prev] | [next] | [standalone]
| From | Bart <bc@freeuk.com> |
|---|---|
| Date | 2020-10-18 00:24 +0100 |
| Message-ID | <P2LiH.2749107$1Eh.2063211@fx46.ams4> |
| In reply to | #155737 |
On 18/10/2020 00:09, Jorgen Grahn wrote: > On Mon, 2020-10-12, dfs wrote: > ... >> $ gcc -Wall linecounter_lurndal.c -o linecounter_lurndal >> $ time ./linecounter_lurndal bible.txt > > Why do you time code that you built with optimization disabled? You > can argue (maybe) that it doesn't matter in this case, but you would > have saved yourself the trouble by typing four more characters. > So, if one program was faster than another, was it because of the approach and algorithm (eg. memory mapped files vs. calls to fread etc), or because one was more amenable to be optimised? Programs this tiny can be unfairly optimised (sometimes to nothing), in a way that might not be practical in a real, sprawling application. Comparisons can therefore be less meaningful optimised than optimised.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <793-849-0957@kylheku.com> |
|---|---|
| Date | 2020-10-18 16:56 +0000 |
| Message-ID | <20201018095536.418@kylheku.com> |
| In reply to | #155738 |
On 2020-10-17, Bart <bc@freeuk.com> wrote: > On 18/10/2020 00:09, Jorgen Grahn wrote: >> On Mon, 2020-10-12, dfs wrote: >> ... >>> $ gcc -Wall linecounter_lurndal.c -o linecounter_lurndal >>> $ time ./linecounter_lurndal bible.txt >> >> Why do you time code that you built with optimization disabled? You >> can argue (maybe) that it doesn't matter in this case, but you would >> have saved yourself the trouble by typing four more characters. >> > > So, if one program was faster than another, was it because of the > approach and algorithm (eg. memory mapped files vs. calls to fread etc), > or because one was more amenable to be optimised? > Programs this tiny can be unfairly optimised (sometimes to nothing), in > a way that might not be practical in a real, sprawling application. Probably not if you just use -O for basic optimizations. Without any optimizations at all, programs can be confounded by silly coding that moves data from one register to another, only to move it back again, and which jumps to unconditional jump instructions and such.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2020-10-20 09:17 -0400 |
| Message-ID | <rmmo24$aju$1@dont-email.me> |
| In reply to | #155738 |
On 2020-10-17, Bart <bc@freeuk.com> wrote: > On 18/10/2020 00:09, Jorgen Grahn wrote: ... >> Why do you time code that you built with optimization disabled? You >> can argue (maybe) that it doesn't matter in this case, but you would >> have saved yourself the trouble by typing four more characters. >> > > So, if one program was faster than another, was it because of the > approach and algorithm (eg. memory mapped files vs. calls to fread etc), > or because one was more amenable to be optimised? > Programs this tiny can be unfairly optimised (sometimes to nothing), in > a way that might not be practical in a real, sprawling application. If performance is an issue, turning on safe optimizations should be the norm in real-world applications. In that case, testing without optimization is essentially meaningless, unfairly failing to favor code that optimizes easily over code that does not.
[toc] | [prev] | [next] | [standalone]
| From | Bart <bc@freeuk.com> |
|---|---|
| Date | 2020-10-20 15:48 +0100 |
| Message-ID | <UMCjH.242004$ZL3.70823@fx33.am4> |
| In reply to | #155806 |
On 20/10/2020 14:17, James Kuyper wrote: > On 2020-10-17, Bart <bc@freeuk.com> wrote: >> On 18/10/2020 00:09, Jorgen Grahn wrote: > ... >>> Why do you time code that you built with optimization disabled? You >>> can argue (maybe) that it doesn't matter in this case, but you would >>> have saved yourself the trouble by typing four more characters. >>> >> >> So, if one program was faster than another, was it because of the >> approach and algorithm (eg. memory mapped files vs. calls to fread etc), >> or because one was more amenable to be optimised? > > >> Programs this tiny can be unfairly optimised (sometimes to nothing), in >> a way that might not be practical in a real, sprawling application. > > If performance is an issue, turning on safe optimizations should be the > norm in real-world applications. In that case, testing without > optimization is essentially meaningless, unfairly failing to favor code > that optimizes easily over code that does not. > Failing to favour code that unfairly optimises easily. For example, you are testing a function in the same module. It's called from one place, and with arguments that might be all or partly constant values. In this situation, a compiler might inline the function, replace instances of the parameters in the body with the constants, and everything reduces down to some compact expression. Naturally, the results will be very good, and you might deduce that the algorithm used in the function is highly performant. Until you try it in a real program where the function is in another module, where it cannot be inlined, and those constant reductions cannot be performed. Now you might find that algorithm wasn't that great after all. If I want to find out whether car A is faster than B over a circuit, you can't have B taking short-cuts.
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.c
csiph-web