Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #84816 > unrolled thread

position in file when saving

Started byfir <profesor.fir@gmail.com>
First post2016-03-24 08:29 -0700
Last post2016-03-24 22:15 -0500
Articles 20 on this page of 61 — 17 participants

Back to article view | Back to comp.lang.c


Contents

  position in file when saving fir <profesor.fir@gmail.com> - 2016-03-24 08:29 -0700
    Re: position in file when saving Barry Schwarz <schwarzb@dqel.com> - 2016-03-24 10:11 -0700
      Re: position in file when saving fir <profesor.fir@gmail.com> - 2016-03-24 10:51 -0700
        Re: position in file when saving Robert Wessel <robertwessel2@yahoo.com> - 2016-03-24 13:07 -0500
          Re: position in file when saving Stephen Sprunk <stephen@sprunk.org> - 2016-03-24 14:22 -0500
            Re: position in file when saving supercat@casperkitty.com - 2016-03-24 12:40 -0700
              Re: position in file when saving Keith Thompson <kst-u@mib.org> - 2016-03-24 12:57 -0700
                Re: position in file when saving supercat@casperkitty.com - 2016-03-24 13:15 -0700
                  Re: position in file when saving Keith Thompson <kst-u@mib.org> - 2016-03-24 15:06 -0700
                    Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-24 20:29 -0400
                      Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 01:42 +0000
                        Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-24 23:07 -0400
                          Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-25 16:21 +1300
                            Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 09:00 -0400
                          Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 11:07 +0000
                            Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 09:06 -0400
                              Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 14:22 +0000
                                Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 10:57 -0400
                                  Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 16:31 +0000
                                    Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 12:36 -0400
                                      Re: position in file when saving luser droog <luser.droog@gmail.com> - 2016-03-25 09:49 -0700
                                        Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 13:00 -0400
                                      Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 20:35 +0000
                                        Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 16:59 -0400
                                    Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-26 10:46 +1300
                                      Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 20:22 -0400
                                        Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-26 13:42 +1300
                                          Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-26 09:17 -0400
                                        Re: position in file when saving supercat@casperkitty.com - 2016-03-26 07:50 -0700
                                          Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-26 21:37 -0400
                                            Re: position in file when saving supercat@casperkitty.com - 2016-03-27 08:14 -0700
                                              Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-27 12:45 -0400
                                                Re: position in file when saving supercat@casperkitty.com - 2016-03-27 12:20 -0700
                                                  Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-27 17:07 -0400
                                                    Re: position in file when saving supercat@casperkitty.com - 2016-03-27 14:15 -0700
                                                      Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-27 17:26 -0400
                                                      Re: position in file when saving Ken Brody <kenbrody@spamcop.net> - 2016-03-28 10:36 -0400
                                                        Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-28 11:01 -0400
                                                          Re: position in file when saving David Brown <david.brown@hesbynett.no> - 2016-03-29 10:18 +0200
                                                            Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 09:02 -0400
                                                              Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-30 08:24 +1300
                                                                Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 15:57 -0400
                                                                  Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-30 09:04 +1300
                                                                    Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 16:22 -0400
                                                                      Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-31 17:31 +1300
                                                                        Re: position in file when saving gazelle@shell.xmission.com (Kenny McCormack) - 2016-03-31 09:55 +0000
                                                                        Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-31 10:01 -0400
                                                                    Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 16:31 -0400
                                                                    Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-30 17:27 -0400
                                                                      Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-31 16:19 +1300
                                                                        Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-31 10:06 -0400
                                                Re: position in file when saving Malcolm McLean <malcolm.mclean5@btinternet.com> - 2016-03-29 05:21 -0700
                                                  Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 09:05 -0400
                  Re: position in file when saving Ken Brody <kenbrody@spamcop.net> - 2016-03-25 14:33 -0400
                    Re: position in file when saving Keith Thompson <kst-u@mib.org> - 2016-03-25 11:38 -0700
                    Re: position in file when saving supercat@casperkitty.com - 2016-03-25 14:05 -0700
                    Re: position in file when saving Nick Bowler <nbowler@draconx.ca> - 2016-03-28 16:14 +0000
                      Re: position in file when saving Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-30 09:31 -0700
            Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-25 15:06 +1300
          Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-25 15:00 +1300
    Re: position in file when saving Les Cargill <lcargill99@comcast.com> - 2016-03-24 22:15 -0500

Page 3 of 4 — ← Prev page 1 2 [3] 4  Next page →


#85261

FromIan Collins <ian-news@hotmail.com>
Date2016-03-30 08:24 +1300
Message-ID<dm033rFr730U1@mid.individual.net>
In reply to#85231
On 03/30/16 02:02, Jerry Stuckle wrote:
> On 3/29/2016 4:18 AM, David Brown wrote:
>>
>> That depends on your definition of "better".  There are certainly build
>> utilities that work by checksums rather than modification times - they
>> will not be fooled by time sync problems.  But they have to read all the
>> files involved for every build - if you have a lot of files, and only a
>> few change, that can mean a lot longer build time.
>>
>
> Reading a file and computing the checksum is not that hard to do, nor is
> it very time consuming - much less than compiling a file, for instance.
>   And the utility has to check the file time also takes time.  And the
> more files you have, the more critical it is to ensure the source files
> are in sync.

In a Linux kernel tree:

Checking dates -
time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
    43499

real	0m0.340s

Checking checksums -
time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
    43499

real	1m56.131s

Longer then rebuilding most of the tree...  Considering most development 
builds are for one or two files, longer than the entire 
rebuild/reboot/reload cycle.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#85266

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-29 15:57 -0400
Message-ID<ndemh3$ujq$1@jstuckle.eternal-september.org>
In reply to#85261
On 3/29/2016 3:24 PM, Ian Collins wrote:
> On 03/30/16 02:02, Jerry Stuckle wrote:
>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>
>>> That depends on your definition of "better".  There are certainly build
>>> utilities that work by checksums rather than modification times - they
>>> will not be fooled by time sync problems.  But they have to read all the
>>> files involved for every build - if you have a lot of files, and only a
>>> few change, that can mean a lot longer build time.
>>>
>>
>> Reading a file and computing the checksum is not that hard to do, nor is
>> it very time consuming - much less than compiling a file, for instance.
>>   And the utility has to check the file time also takes time.  And the
>> more files you have, the more critical it is to ensure the source files
>> are in sync.
> 
> In a Linux kernel tree:
> 
> Checking dates -
> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>    43499
> 
> real    0m0.340s
> 
> Checking checksums -
> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>    43499
> 
> real    1m56.131s
> 
> Longer then rebuilding most of the tree...  Considering most development
> builds are for one or two files, longer than the entire
> rebuild/reboot/reload cycle.
> 

That all depends on the algorithm being used, doesn't it?  And MD5 is a
relatively slow algorithm - there are much faster ones (hint: you don't
need a 16 byte hash value!).

In addition, you're loading and executing md5sum for every file, which
does not need to be done with an integrated checksum algorithm.

IOW, your "test" is crap.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#85267

FromIan Collins <ian-news@hotmail.com>
Date2016-03-30 09:04 +1300
Message-ID<dm05ebFr730U2@mid.individual.net>
In reply to#85266
On 03/30/16 08:57, Jerry Stuckle wrote:
> On 3/29/2016 3:24 PM, Ian Collins wrote:
>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>
>>>> That depends on your definition of "better".  There are certainly build
>>>> utilities that work by checksums rather than modification times - they
>>>> will not be fooled by time sync problems.  But they have to read all the
>>>> files involved for every build - if you have a lot of files, and only a
>>>> few change, that can mean a lot longer build time.
>>>>
>>>
>>> Reading a file and computing the checksum is not that hard to do, nor is
>>> it very time consuming - much less than compiling a file, for instance.
>>>    And the utility has to check the file time also takes time.  And the
>>> more files you have, the more critical it is to ensure the source files
>>> are in sync.
>>
>> In a Linux kernel tree:
>>
>> Checking dates -
>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>     43499
>>
>> real    0m0.340s
>>
>> Checking checksums -
>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>     43499
>>
>> real    1m56.131s
>>
>> Longer then rebuilding most of the tree...  Considering most development
>> builds are for one or two files, longer than the entire
>> rebuild/reboot/reload cycle.
>>
>
> That all depends on the algorithm being used, doesn't it?  And MD5 is a
> relatively slow algorithm - there are much faster ones (hint: you don't
> need a 16 byte hash value!).
>
> In addition, you're loading and executing md5sum for every file, which
> does not need to be done with an integrated checksum algorithm.

It'll still take longer to check than make takes to run a partial rebuild.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#85268

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-29 16:22 -0400
Message-ID<ndeo0b$49b$1@jstuckle.eternal-september.org>
In reply to#85267
On 3/29/2016 4:04 PM, Ian Collins wrote:
> On 03/30/16 08:57, Jerry Stuckle wrote:
>> On 3/29/2016 3:24 PM, Ian Collins wrote:
>>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>>
>>>>> That depends on your definition of "better".  There are certainly
>>>>> build
>>>>> utilities that work by checksums rather than modification times - they
>>>>> will not be fooled by time sync problems.  But they have to read
>>>>> all the
>>>>> files involved for every build - if you have a lot of files, and
>>>>> only a
>>>>> few change, that can mean a lot longer build time.
>>>>>
>>>>
>>>> Reading a file and computing the checksum is not that hard to do,
>>>> nor is
>>>> it very time consuming - much less than compiling a file, for instance.
>>>>    And the utility has to check the file time also takes time.  And the
>>>> more files you have, the more critical it is to ensure the source files
>>>> are in sync.
>>>
>>> In a Linux kernel tree:
>>>
>>> Checking dates -
>>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>>     43499
>>>
>>> real    0m0.340s
>>>
>>> Checking checksums -
>>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>>     43499
>>>
>>> real    1m56.131s
>>>
>>> Longer then rebuilding most of the tree...  Considering most development
>>> builds are for one or two files, longer than the entire
>>> rebuild/reboot/reload cycle.
>>>
>>
>> That all depends on the algorithm being used, doesn't it?  And MD5 is a
>> relatively slow algorithm - there are much faster ones (hint: you don't
>> need a 16 byte hash value!).
>>
>> In addition, you're loading and executing md5sum for every file, which
>> does not need to be done with an integrated checksum algorithm.
> 
> It'll still take longer to check than make takes to run a partial rebuild.
> 

Based on what criteria?  You have proven nothing, Ian.  Your "test" is crap.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#85451

FromIan Collins <ian-news@hotmail.com>
Date2016-03-31 17:31 +1300
Message-ID<dm3ngdF92p2U2@mid.individual.net>
In reply to#85268
On 03/30/16 09:22, Jerry Stuckle wrote:
> On 3/29/2016 4:04 PM, Ian Collins wrote:
>> On 03/30/16 08:57, Jerry Stuckle wrote:

>>> In addition, you're loading and executing md5sum for every file, which
>>> does not need to be done with an integrated checksum algorithm.
>>
>> It'll still take longer to check than make takes to run a partial rebuild.
>
> Based on what criteria?  You have proven nothing, Ian.  Your "test" is crap.

That on the container I use to build ARM kernels, a clean build takes 
just over 2 minutes and a small incremental (which stats all the files) 
takes around 20 seconds.

Claiming that having to checksum every file rather that check the 
modification time isn't an unnecessary overhead is ludicrous.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#85464

Fromgazelle@shell.xmission.com (Kenny McCormack)
Date2016-03-31 09:55 +0000
Message-ID<ndis65$308$2@news.xmission.com>
In reply to#85451
In article <dm3ngdF92p2U2@mid.individual.net>,
Ian Collins  <ian-news@hotmail.com> wrote:
>On 03/30/16 09:22, Jerry Stuckle babbled, but said nothing of any importance (as usual):
>> Based on what criteria?  You have proven nothing, Ian.  Your "test" is crap.
...

Ian replied:
>That on the container I use to build ARM kernels, a clean build takes 
>just over 2 minutes and a small incremental (which stats all the files) 
>takes around 20 seconds.
>
>Claiming that having to checksum every file rather that check the 
>modification time isn't an unnecessary overhead is ludicrous.

"ludicrous" is Jerry's stock-in-trade.

-- 
If Jeb is  Charlie Brown kicking a football-pulled-away, Mitt  is a '50s
housewife with a  black eye who insists to her  friends the roast wasn't
dry.

[toc] | [prev] | [next] | [standalone]


#85471

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-31 10:01 -0400
Message-ID<ndjadr$n36$1@jstuckle.eternal-september.org>
In reply to#85451
On 3/31/2016 12:31 AM, Ian Collins wrote:
> On 03/30/16 09:22, Jerry Stuckle wrote:
>> On 3/29/2016 4:04 PM, Ian Collins wrote:
>>> On 03/30/16 08:57, Jerry Stuckle wrote:
> 
>>>> In addition, you're loading and executing md5sum for every file, which
>>>> does not need to be done with an integrated checksum algorithm.
>>>
>>> It'll still take longer to check than make takes to run a partial
>>> rebuild.
>>
>> Based on what criteria?  You have proven nothing, Ian.  Your "test" is
>> crap.
> 
> That on the container I use to build ARM kernels, a clean build takes
> just over 2 minutes and a small incremental (which stats all the files)
> takes around 20 seconds.
> 
> Claiming that having to checksum every file rather that check the
> modification time isn't an unnecessary overhead is ludicrous.
> 

3.5 seconds to checksum *every* source file in *all* architectures?  And
when limit the test to the applicable files, the result will be under a
second.  I don't consider that "unnecessary overhead".

Checking the date is "quick and dirty".  It is not accurate, as I have
already shown.  And the Linux kernel may be the biggest project you've
ever tried to compile - but it's not even a drop in the ocean.  I've
seen compiles which have tens of thousands of source and take overnight
on a mainframe, for instance.  These projects may have hundreds of
programmers working on them, and something like an older file in the
build can easily sneak in and cause problems.  Sure, they use version
management - but that's not perfect, either.  It's why commercial
utilities don't just use dates.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#85269

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-29 16:31 -0400
Message-ID<ndeog1$6ah$1@jstuckle.eternal-september.org>
In reply to#85267
On 3/29/2016 4:04 PM, Ian Collins wrote:
> On 03/30/16 08:57, Jerry Stuckle wrote:
>> On 3/29/2016 3:24 PM, Ian Collins wrote:
>>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>>
>>>>> That depends on your definition of "better".  There are certainly
>>>>> build
>>>>> utilities that work by checksums rather than modification times - they
>>>>> will not be fooled by time sync problems.  But they have to read
>>>>> all the
>>>>> files involved for every build - if you have a lot of files, and
>>>>> only a
>>>>> few change, that can mean a lot longer build time.
>>>>>
>>>>
>>>> Reading a file and computing the checksum is not that hard to do,
>>>> nor is
>>>> it very time consuming - much less than compiling a file, for instance.
>>>>    And the utility has to check the file time also takes time.  And the
>>>> more files you have, the more critical it is to ensure the source files
>>>> are in sync.
>>>
>>> In a Linux kernel tree:
>>>
>>> Checking dates -
>>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>>     43499
>>>
>>> real    0m0.340s
>>>
>>> Checking checksums -
>>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>>     43499
>>>
>>> real    1m56.131s
>>>
>>> Longer then rebuilding most of the tree...  Considering most development
>>> builds are for one or two files, longer than the entire
>>> rebuild/reboot/reload cycle.
>>>
>>
>> That all depends on the algorithm being used, doesn't it?  And MD5 is a
>> relatively slow algorithm - there are much faster ones (hint: you don't
>> need a 16 byte hash value!).
>>
>> In addition, you're loading and executing md5sum for every file, which
>> does not need to be done with an integrated checksum algorithm.
> 
> It'll still take longer to check than make takes to run a partial rebuild.
> 

I should also add - your tree contains files for multiple architectures,
only one of which would be used during a build.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#85408

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-30 17:27 -0400
Message-ID<ndhg5t$fse$1@jstuckle.eternal-september.org>
In reply to#85267
On 3/29/2016 4:04 PM, Ian Collins wrote:
> On 03/30/16 08:57, Jerry Stuckle wrote:
>> On 3/29/2016 3:24 PM, Ian Collins wrote:
>>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>>
>>>>> That depends on your definition of "better".  There are certainly
>>>>> build
>>>>> utilities that work by checksums rather than modification times - they
>>>>> will not be fooled by time sync problems.  But they have to read
>>>>> all the
>>>>> files involved for every build - if you have a lot of files, and
>>>>> only a
>>>>> few change, that can mean a lot longer build time.
>>>>>
>>>>
>>>> Reading a file and computing the checksum is not that hard to do,
>>>> nor is
>>>> it very time consuming - much less than compiling a file, for instance.
>>>>    And the utility has to check the file time also takes time.  And the
>>>> more files you have, the more critical it is to ensure the source files
>>>> are in sync.
>>>
>>> In a Linux kernel tree:
>>>
>>> Checking dates -
>>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>>     43499
>>>
>>> real    0m0.340s
>>>
>>> Checking checksums -
>>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>>     43499
>>>
>>> real    1m56.131s
>>>
>>> Longer then rebuilding most of the tree...  Considering most development
>>> builds are for one or two files, longer than the entire
>>> rebuild/reboot/reload cycle.
>>>
>>
>> That all depends on the algorithm being used, doesn't it?  And MD5 is a
>> relatively slow algorithm - there are much faster ones (hint: you don't
>> need a 16 byte hash value!).
>>
>> In addition, you're loading and executing md5sum for every file, which
>> does not need to be done with an integrated checksum algorithm.
> 
> It'll still take longer to check than make takes to run a partial rebuild.
> 

Here's the results of a quick program I wrote to compute a CRC32 on all
.h, .c and .o files in a directory structure.  The test is checking
every source file for all architectures in the Linux source.

>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
38933 files computed, total byte count 534546283

real	1m52.269s
user	0m0.472s
sys	0m8.836s

However, this was in a 2GB virtual machine (Oracle VM VirtualBox)
running an Ubuntu guest under Windows 7 on a notebook with a relatively
slow hard disk.  The "disk" is actually an 8GB file on Windows, which
makes disk access even slower.  But it's a good test environment when
you aren't overly worried about performance.

When I rerun the test, all of the files will be in buffers and the
results are much different:

>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
38933 files computed, total byte count 534546283

real	0m3.485s
user	0m2.380s
sys	0m1.020s

About 3.5 seconds total to checksum 39K files and 500+Mb.

I would say that's not very time consuming.  I would expect that running
on a desktop with a reasonably fast disk would show pretty good performance.

And these tests are much more accurate than yours.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#85449

FromIan Collins <ian-news@hotmail.com>
Date2016-03-31 16:19 +1300
Message-ID<dm3jalF92p2U1@mid.individual.net>
In reply to#85408
On 03/31/16 10:27, Jerry Stuckle wrote:
>
> Here's the results of a quick program I wrote to compute a CRC32 on all
> .h, .c and .o files in a directory structure.  The test is checking
> every source file for all architectures in the Linux source.
>
>>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
> 38933 files computed, total byte count 534546283
>
> real	1m52.269s
> user	0m0.472s
> sys	0m8.836s
>
> However, this was in a 2GB virtual machine (Oracle VM VirtualBox)
> running an Ubuntu guest under Windows 7 on a notebook with a relatively
> slow hard disk.  The "disk" is actually an 8GB file on Windows, which
> makes disk access even slower.  But it's a good test environment when
> you aren't overly worried about performance.
>
> When I rerun the test, all of the files will be in buffers and the
> results are much different:
>
>>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
> 38933 files computed, total byte count 534546283
>
> real	0m3.485s
> user	0m2.380s
> sys	0m1.020s
>
> About 3.5 seconds total to checksum 39K files and 500+Mb.
>
> I would say that's not very time consuming.  I would expect that running
> on a desktop with a reasonably fast disk would show pretty good performance.

150MB/sec for reading small files is *very* good performance.  Imagine 
how fast it would be if it only had to check modification times.

> And these tests are much more accurate than yours.

That I doubt.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#85472

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-31 10:06 -0400
Message-ID<ndjamq$o6e$1@jstuckle.eternal-september.org>
In reply to#85449
On 3/30/2016 11:19 PM, Ian Collins wrote:
> On 03/31/16 10:27, Jerry Stuckle wrote:
>>
>> Here's the results of a quick program I wrote to compute a CRC32 on all
>> .h, .c and .o files in a directory structure.  The test is checking
>> every source file for all architectures in the Linux source.
>>
>>>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
>> 38933 files computed, total byte count 534546283
>>
>> real    1m52.269s
>> user    0m0.472s
>> sys    0m8.836s
>>
>> However, this was in a 2GB virtual machine (Oracle VM VirtualBox)
>> running an Ubuntu guest under Windows 7 on a notebook with a relatively
>> slow hard disk.  The "disk" is actually an 8GB file on Windows, which
>> makes disk access even slower.  But it's a good test environment when
>> you aren't overly worried about performance.
>>
>> When I rerun the test, all of the files will be in buffers and the
>> results are much different:
>>
>>>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
>> 38933 files computed, total byte count 534546283
>>
>> real    0m3.485s
>> user    0m2.380s
>> sys    0m1.020s
>>
>> About 3.5 seconds total to checksum 39K files and 500+Mb.
>>
>> I would say that's not very time consuming.  I would expect that running
>> on a desktop with a reasonably fast disk would show pretty good
>> performance.
> 
> 150MB/sec for reading small files is *very* good performance.  Imagine
> how fast it would be if it only had to check modification times.
>

Not really, especially when you consider you have two file systems
(Linux and Windows) running on a slow disk.  The whole thing on an SSD
running Linux natively would be on the order of a few seconds.  Even on
a fast hard disk, I would expect < 20 seconds for the whole works.  And
remember, you're checking *every* architecture.  The necessary files for
a single architecture will be much smaller.

>> And these tests are much more accurate than yours.
> 
> That I doubt.
> 

Then you have once again shown you have no idea what you're talking
about.  As I previously proved.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#85229

FromMalcolm McLean <malcolm.mclean5@btinternet.com>
Date2016-03-29 05:21 -0700
Message-ID<5468d979-6a0e-4c8d-acb6-5ee3d4f8c2cd@googlegroups.com>
In reply to#85123
On Sunday, March 27, 2016 at 5:45:21 PM UTC+1, Jerry Stuckle wrote:
> On 3/27/2016 11:14 AM, supercat@casperkitty.com wrote:
>
> Which means absolutely nothing.  I can "touch" a file in Linux, for
> instance - which updates the last modification time but doesn't change
> the file at all.  And even if the file is changed, there is no way to
> know if the change was pertinent to this execution or not.
> 
> And restoring from a backup also restores the last modification
> date/time of the file.
> 
> Your "tests" are not at all reliable.
> 
That's always an issue with computing.
Which process is allowed to insert a layer or interception between
another and which is not?
The whole point of "touch" is to lie to other processes to pretend 
that a file has been modified when it has not. However if you take
a sha1 hash of a file, you'll detect the difference between real
and fake modifications. So should we write a "tweak" which harmlessly
inserts a blank space at the end of text files? Or a "sha1 interceptor"
which reports a false hash?
Then of course someone will write a "compare ignoring whitespace runs",
so we can add things like <span></span> to override that. And it
goes on. There's no set way of determining what gets to override 
what. 

[toc] | [prev] | [next] | [standalone]


#85232

FromJerry Stuckle <jstucklex@attglobal.net>
Date2016-03-29 09:05 -0400
Message-ID<nddubq$sbk$2@jstuckle.eternal-september.org>
In reply to#85229
On 3/29/2016 8:21 AM, Malcolm McLean wrote:
> On Sunday, March 27, 2016 at 5:45:21 PM UTC+1, Jerry Stuckle wrote:
>> On 3/27/2016 11:14 AM, supercat@casperkitty.com wrote:
>>
>> Which means absolutely nothing.  I can "touch" a file in Linux, for
>> instance - which updates the last modification time but doesn't change
>> the file at all.  And even if the file is changed, there is no way to
>> know if the change was pertinent to this execution or not.
>>
>> And restoring from a backup also restores the last modification
>> date/time of the file.
>>
>> Your "tests" are not at all reliable.
>>
> That's always an issue with computing.
> Which process is allowed to insert a layer or interception between
> another and which is not?
> The whole point of "touch" is to lie to other processes to pretend 
> that a file has been modified when it has not. However if you take
> a sha1 hash of a file, you'll detect the difference between real
> and fake modifications. So should we write a "tweak" which harmlessly
> inserts a blank space at the end of text files? Or a "sha1 interceptor"
> which reports a false hash?
> Then of course someone will write a "compare ignoring whitespace runs",
> so we can add things like <span></span> to override that. And it
> goes on. There's no set way of determining what gets to override 
> what. 
> 

You can always get around a utility, if you try hard enough.  That isn't
the issue here.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================

[toc] | [prev] | [next] | [standalone]


#84957

FromKen Brody <kenbrody@spamcop.net>
Date2016-03-25 14:33 -0400
Message-ID<nd404d$emo$1@dont-email.me>
In reply to#84859
On 3/24/2016 4:15 PM, supercat@casperkitty.com wrote:
> On Thursday, March 24, 2016 at 2:57:28 PM UTC-5, Keith Thompson wrote:
>> supercat writes:
>>> Is there any requirement that the previous call to ftell() must have
>>> been performed within the current execution of the program?  Also, by
>>> what means if any could a current-position-relative fseek() ever have
>>> meaningfully- defined behavior?
>>
>> Have you checked N1570 7.21.9.2?
>
> I should have omitted the follow-on question, but I think the first is
> valid and the cited section doesn't mention it.  The possibility that
> the call may have been made on a previous execution of the program isn't
> merely a theoretical exercise.  Some implementations of "fortune" have an
> index file that contains the offset to the start of each entry in the
> main text file; is such an approach guaranteed to work?

Well, "some implementations" may depend on POSIX behavior, which might
define things which are left undefined in ISO C.

Quoting 7.19.9.2p4 (n1124):

> For a text stream, either offset shall be zero, or offset shall be a
> value returned by an earlier successful call to the ftell function on a
> stream associated with the same file and whence shall be SEEK_SET.

So, I suppose, one could interpret that "a stream associated with the same 
file" could apply across executions.  (IANALL -- I am not a language lawyer.)

On the other hand, for a text file, what happens if the file is modified 
between the ftell() and subsequent fseek()?

For example:  fwrite some text, ftell the current position, fwrite more 
text, ftell the new position, fwrite some more.  Now, fseek to the first 
position, and fwrite more text than before, such that the second ftell's 
position is no longer on a record boundary, and then fseek to the second 
position.  Is that guaranteed to work?  (I'm thinking VMS which, if I 
remember correctly, has text files which are actually records of 
variable-length data.)

-- 
Kenneth Brody

[toc] | [prev] | [next] | [standalone]


#84959

FromKeith Thompson <kst-u@mib.org>
Date2016-03-25 11:38 -0700
Message-ID<ln8u164d3y.fsf@kst-u.example.com>
In reply to#84957
Ken Brody <kenbrody@spamcop.net> writes:
[...]
> Quoting 7.19.9.2p4 (n1124):
[...]

N1124 is rather old.  It's a draft of the C99 standard with the first
two Technical Corrigenda merged into it.  N1256 is a better C99 draft;
it includes all three TCs.  And N1570 is the newest publicly available
draft of C11.  (The clause you're quoting is, as far as I know, the same
in all three.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]


#84976

Fromsupercat@casperkitty.com
Date2016-03-25 14:05 -0700
Message-ID<44b32e63-21f6-4d20-9188-0c0c44bc4c1d@googlegroups.com>
In reply to#84957
On Friday, March 25, 2016 at 1:33:59 PM UTC-5, Ken Brody wrote:
> On 3/24/2016 4:15 PM, supercat wrote:
> > I should have omitted the follow-on question, but I think the first is
> > valid and the cited section doesn't mention it.  The possibility that
> > the call may have been made on a previous execution of the program isn't
> > merely a theoretical exercise.  Some implementations of "fortune" have an
> > index file that contains the offset to the start of each entry in the
> > main text file; is such an approach guaranteed to work?
> 
> Well, "some implementations" may depend on POSIX behavior, which might
> define things which are left undefined in ISO C.

The question is whether this would be considered one of those things.

> On the other hand, for a text file, what happens if the file is modified 
> between the ftell() and subsequent fseek()?
> 
> For example:  fwrite some text, ftell the current position, fwrite more 
> text, ftell the new position, fwrite some more.  Now, fseek to the first 
> position, and fwrite more text than before, such that the second ftell's 
> position is no longer on a record boundary, and then fseek to the second 
> position.  Is that guaranteed to work?  (I'm thinking VMS which, if I 
> remember correctly, has text files which are actually records of 
> variable-length data.)

The "fortune" program checked the modification dates of the text file
and the index file, and would regenerate the index if the former was
newer; since the text file seldom changed, having to read the whole
thing only when it changed was a useful optimization.  Writing to a
text file in such a fashion as to change existing line boundaries is
apt to cause problems which may vary from one system to another, but
the "fortune" program didn't do that.

[toc] | [prev] | [next] | [standalone]


#85191

FromNick Bowler <nbowler@draconx.ca>
Date2016-03-28 16:14 +0000
Message-ID<ndbl8o$9qc$1@dont-email.me>
In reply to#84957
On Fri, 25 Mar 2016 14:33:47 -0400, Ken Brody wrote:
> For example:  fwrite some text, ftell the current position, fwrite more 
> text, ftell the new position, fwrite some more.  Now, fseek to the first 
> position, and fwrite more text than before, such that the second ftell's 
> position is no longer on a record boundary, and then fseek to the second 
> position.  Is that guaranteed to work?  (I'm thinking VMS which, if I 
> remember correctly, has text files which are actually records of 
> variable-length data.)

I believe the standard answers this question, although perhaps
the answer is not very satisfactory because it basically ends
with "the resulting file position when seeking on text streams
is unspecified".  Let's ignore issues of large files which
complicate real-world implementations of fseek and ftell.

The standard says this:

  - Every stream which supports seeking has an associated "file
    position indicator" (n1570 7.21.3p1).

  - For text streams, the the file position indicator contains
    unspecified information (n1570 7.21.9.4p2).

  - The ftell function simply returns the current value of the
    file position indicator, if successful (n1570 7.21.9.4p2).

  - A call of fseek(stream, value, whence) does the following things,
    if successful (n1570 7.21.9.2p5):

     - it calculates a new value for the file position indicator,
     - it undoes the effect of all ungetc calls on the stream,
     - it clears the end-of-file indicator for the stream,
     - it sets the file position indicator to the calculated value
     - for read/write streams, it puts the stream back into a state
       where the next operation can be either an input or an output
       operation.

There is a restriction for text streams that fseek calls have whence of
SEEK_SET and value either be 0 or come from an earlier call to ftell on
the same file (n1570 7.21.9.2p4).  Calls without that form therefore are
explicitly undefined.  But there is no description of the calculation of
the new value of the file position indicator (such calculation is only
defined for binary streams); I think the intent was that the supplied
value be taken literally as the new file position indicator (this is what
implementations actually do, same as for binary streams with SEEK_SET),
but the lack of specification here doesn't actually matter.

Regardless of whether or not the file is modified between the ftell and
fseek, I would say the implementation still has to do all those things
(assuming fseek is successful), including calculating a new value for
the file position indicator and setting the file position indicator to
that value.  If we immediately follow such an fseek with a call to ftell,
it must return (assuming it is successful) the new value of the file
position indicator.

But since the actual meaning of the indicator is unspecified (for
text streams), it might not be mean the same thing as when we called
originally called ftell.  So the part of the file accessed by a
subsequent read or write operation may not be the same.  This seems
likely if the file is modified, but conceivably it could change for
any other reasons at the implementation's fancy.

Binary streams have no such issue because the file position indicator
has a specified meaning: "the number of characters from the beginning
of the file" (n1570 7.21.9.4p2).

[toc] | [prev] | [next] | [standalone]


#85353

FromTim Rentsch <txr@alumni.caltech.edu>
Date2016-03-30 09:31 -0700
Message-ID<kfn4mbo9bc1.fsf@x-alumni2.alumni.caltech.edu>
In reply to#85191
Nick Bowler <nbowler@draconx.ca> writes:

[...]

> There is a restriction for text streams that fseek calls have whence of
> SEEK_SET and value either be 0 or come from an earlier call to ftell on
> the same file (n1570 7.21.9.2p4).  [...]

Hmmm.  My reading is a little different.  That paragraph says:

    For a text stream, either offset shall be zero, or offset
    shall be a value returned by an earlier successful call to
    the ftell function on a stream associated with the same file
    and whence shall be SEEK_SET.

I take this to mean offset must be zero (and whence could be any
of the three SEEK_ values), OR whence must be SEEK_SET and offset
must be a value from an earlier (successful) ftell call.  In
particular it looks like

    fseek( fp, 0, SEEK_CUR );
    fseek( fp, 0, SEEK_END );
    fseek( fp, 0, SEEK_SET );

are all meant to be allowed.  Does this seem right to you
or is there something you think I've missed?

[toc] | [prev] | [next] | [standalone]


#84880

FromIan Collins <ian-news@hotmail.com>
Date2016-03-25 15:06 +1300
Message-ID<dljkpkFih2gU9@mid.individual.net>
In reply to#84852
On 03/25/16 08:22, Stephen Sprunk wrote:
> On 24-Mar-16 13:07, Robert Wessel wrote:
>> fir <profesor.fir@gmail.com> wrote:
>>> I checked and    fseek( file, 0x200, SEEK_SET ); works
>>>
>>> it is good as it is much simpler than counting the bytes saved
>>
>> I don't believe there's any defined behavior according to the
>> standard for seeking past the end of a file and then writing.  On
>> some platforms it will pad the file with zeros to the point of the
>> write. On some system that also triggers the creation of a sparse
>> file.
>>
>> But it's not something you can generally rely on.
>
> It's actually worse than that; fseek()'s behavior is only defined when
> passed the result of a previous call to ftell().

This condition only applies to a text stream and it is omitted from the 
Unix specification where line ending conversions aren't an issue.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#84879

FromIan Collins <ian-news@hotmail.com>
Date2016-03-25 15:00 +1300
Message-ID<dljkdkFih2gU8@mid.individual.net>
In reply to#84840
On 03/25/16 07:07, Robert Wessel wrote:
>
> I don't believe there's any defined behavior according to the standard
> for seeking past the end of a file and then writing.  On some
> platforms it will pad the file with zeros to the point of the write.
> On some system that also triggers the creation of a sparse file.
>
> But it's not something you can generally rely on.

The Unix specification extends the description of fseek() to include 
seeking and writing beyond the end of the file.

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


Page 3 of 4 — ← Prev page 1 2 [3] 4  Next page →

Back to top | Article view | comp.lang.c


csiph-web