Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #84816 > unrolled thread
| Started by | fir <profesor.fir@gmail.com> |
|---|---|
| First post | 2016-03-24 08:29 -0700 |
| Last post | 2016-03-24 22:15 -0500 |
| Articles | 20 on this page of 61 — 17 participants |
Back to article view | Back to comp.lang.c
position in file when saving fir <profesor.fir@gmail.com> - 2016-03-24 08:29 -0700
Re: position in file when saving Barry Schwarz <schwarzb@dqel.com> - 2016-03-24 10:11 -0700
Re: position in file when saving fir <profesor.fir@gmail.com> - 2016-03-24 10:51 -0700
Re: position in file when saving Robert Wessel <robertwessel2@yahoo.com> - 2016-03-24 13:07 -0500
Re: position in file when saving Stephen Sprunk <stephen@sprunk.org> - 2016-03-24 14:22 -0500
Re: position in file when saving supercat@casperkitty.com - 2016-03-24 12:40 -0700
Re: position in file when saving Keith Thompson <kst-u@mib.org> - 2016-03-24 12:57 -0700
Re: position in file when saving supercat@casperkitty.com - 2016-03-24 13:15 -0700
Re: position in file when saving Keith Thompson <kst-u@mib.org> - 2016-03-24 15:06 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-24 20:29 -0400
Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 01:42 +0000
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-24 23:07 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-25 16:21 +1300
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 09:00 -0400
Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 11:07 +0000
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 09:06 -0400
Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 14:22 +0000
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 10:57 -0400
Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 16:31 +0000
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 12:36 -0400
Re: position in file when saving luser droog <luser.droog@gmail.com> - 2016-03-25 09:49 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 13:00 -0400
Re: position in file when saving Ben Bacarisse <ben.usenet@bsb.me.uk> - 2016-03-25 20:35 +0000
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 16:59 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-26 10:46 +1300
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-25 20:22 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-26 13:42 +1300
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-26 09:17 -0400
Re: position in file when saving supercat@casperkitty.com - 2016-03-26 07:50 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-26 21:37 -0400
Re: position in file when saving supercat@casperkitty.com - 2016-03-27 08:14 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-27 12:45 -0400
Re: position in file when saving supercat@casperkitty.com - 2016-03-27 12:20 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-27 17:07 -0400
Re: position in file when saving supercat@casperkitty.com - 2016-03-27 14:15 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-27 17:26 -0400
Re: position in file when saving Ken Brody <kenbrody@spamcop.net> - 2016-03-28 10:36 -0400
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-28 11:01 -0400
Re: position in file when saving David Brown <david.brown@hesbynett.no> - 2016-03-29 10:18 +0200
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 09:02 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-30 08:24 +1300
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 15:57 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-30 09:04 +1300
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 16:22 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-31 17:31 +1300
Re: position in file when saving gazelle@shell.xmission.com (Kenny McCormack) - 2016-03-31 09:55 +0000
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-31 10:01 -0400
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 16:31 -0400
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-30 17:27 -0400
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-31 16:19 +1300
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-31 10:06 -0400
Re: position in file when saving Malcolm McLean <malcolm.mclean5@btinternet.com> - 2016-03-29 05:21 -0700
Re: position in file when saving Jerry Stuckle <jstucklex@attglobal.net> - 2016-03-29 09:05 -0400
Re: position in file when saving Ken Brody <kenbrody@spamcop.net> - 2016-03-25 14:33 -0400
Re: position in file when saving Keith Thompson <kst-u@mib.org> - 2016-03-25 11:38 -0700
Re: position in file when saving supercat@casperkitty.com - 2016-03-25 14:05 -0700
Re: position in file when saving Nick Bowler <nbowler@draconx.ca> - 2016-03-28 16:14 +0000
Re: position in file when saving Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-30 09:31 -0700
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-25 15:06 +1300
Re: position in file when saving Ian Collins <ian-news@hotmail.com> - 2016-03-25 15:00 +1300
Re: position in file when saving Les Cargill <lcargill99@comcast.com> - 2016-03-24 22:15 -0500
Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2016-03-30 08:24 +1300 |
| Message-ID | <dm033rFr730U1@mid.individual.net> |
| In reply to | #85231 |
On 03/30/16 02:02, Jerry Stuckle wrote:
> On 3/29/2016 4:18 AM, David Brown wrote:
>>
>> That depends on your definition of "better". There are certainly build
>> utilities that work by checksums rather than modification times - they
>> will not be fooled by time sync problems. But they have to read all the
>> files involved for every build - if you have a lot of files, and only a
>> few change, that can mean a lot longer build time.
>>
>
> Reading a file and computing the checksum is not that hard to do, nor is
> it very time consuming - much less than compiling a file, for instance.
> And the utility has to check the file time also takes time. And the
> more files you have, the more critical it is to ensure the source files
> are in sync.
In a Linux kernel tree:
Checking dates -
time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
43499
real 0m0.340s
Checking checksums -
time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
43499
real 1m56.131s
Longer then rebuilding most of the tree... Considering most development
builds are for one or two files, longer than the entire
rebuild/reboot/reload cycle.
--
Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-29 15:57 -0400 |
| Message-ID | <ndemh3$ujq$1@jstuckle.eternal-september.org> |
| In reply to | #85261 |
On 3/29/2016 3:24 PM, Ian Collins wrote:
> On 03/30/16 02:02, Jerry Stuckle wrote:
>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>
>>> That depends on your definition of "better". There are certainly build
>>> utilities that work by checksums rather than modification times - they
>>> will not be fooled by time sync problems. But they have to read all the
>>> files involved for every build - if you have a lot of files, and only a
>>> few change, that can mean a lot longer build time.
>>>
>>
>> Reading a file and computing the checksum is not that hard to do, nor is
>> it very time consuming - much less than compiling a file, for instance.
>> And the utility has to check the file time also takes time. And the
>> more files you have, the more critical it is to ensure the source files
>> are in sync.
>
> In a Linux kernel tree:
>
> Checking dates -
> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
> 43499
>
> real 0m0.340s
>
> Checking checksums -
> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
> 43499
>
> real 1m56.131s
>
> Longer then rebuilding most of the tree... Considering most development
> builds are for one or two files, longer than the entire
> rebuild/reboot/reload cycle.
>
That all depends on the algorithm being used, doesn't it? And MD5 is a
relatively slow algorithm - there are much faster ones (hint: you don't
need a 16 byte hash value!).
In addition, you're loading and executing md5sum for every file, which
does not need to be done with an integrated checksum algorithm.
IOW, your "test" is crap.
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2016-03-30 09:04 +1300 |
| Message-ID | <dm05ebFr730U2@mid.individual.net> |
| In reply to | #85266 |
On 03/30/16 08:57, Jerry Stuckle wrote:
> On 3/29/2016 3:24 PM, Ian Collins wrote:
>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>
>>>> That depends on your definition of "better". There are certainly build
>>>> utilities that work by checksums rather than modification times - they
>>>> will not be fooled by time sync problems. But they have to read all the
>>>> files involved for every build - if you have a lot of files, and only a
>>>> few change, that can mean a lot longer build time.
>>>>
>>>
>>> Reading a file and computing the checksum is not that hard to do, nor is
>>> it very time consuming - much less than compiling a file, for instance.
>>> And the utility has to check the file time also takes time. And the
>>> more files you have, the more critical it is to ensure the source files
>>> are in sync.
>>
>> In a Linux kernel tree:
>>
>> Checking dates -
>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>> 43499
>>
>> real 0m0.340s
>>
>> Checking checksums -
>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>> 43499
>>
>> real 1m56.131s
>>
>> Longer then rebuilding most of the tree... Considering most development
>> builds are for one or two files, longer than the entire
>> rebuild/reboot/reload cycle.
>>
>
> That all depends on the algorithm being used, doesn't it? And MD5 is a
> relatively slow algorithm - there are much faster ones (hint: you don't
> need a 16 byte hash value!).
>
> In addition, you're loading and executing md5sum for every file, which
> does not need to be done with an integrated checksum algorithm.
It'll still take longer to check than make takes to run a partial rebuild.
--
Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-29 16:22 -0400 |
| Message-ID | <ndeo0b$49b$1@jstuckle.eternal-september.org> |
| In reply to | #85267 |
On 3/29/2016 4:04 PM, Ian Collins wrote:
> On 03/30/16 08:57, Jerry Stuckle wrote:
>> On 3/29/2016 3:24 PM, Ian Collins wrote:
>>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>>
>>>>> That depends on your definition of "better". There are certainly
>>>>> build
>>>>> utilities that work by checksums rather than modification times - they
>>>>> will not be fooled by time sync problems. But they have to read
>>>>> all the
>>>>> files involved for every build - if you have a lot of files, and
>>>>> only a
>>>>> few change, that can mean a lot longer build time.
>>>>>
>>>>
>>>> Reading a file and computing the checksum is not that hard to do,
>>>> nor is
>>>> it very time consuming - much less than compiling a file, for instance.
>>>> And the utility has to check the file time also takes time. And the
>>>> more files you have, the more critical it is to ensure the source files
>>>> are in sync.
>>>
>>> In a Linux kernel tree:
>>>
>>> Checking dates -
>>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>> 43499
>>>
>>> real 0m0.340s
>>>
>>> Checking checksums -
>>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>> 43499
>>>
>>> real 1m56.131s
>>>
>>> Longer then rebuilding most of the tree... Considering most development
>>> builds are for one or two files, longer than the entire
>>> rebuild/reboot/reload cycle.
>>>
>>
>> That all depends on the algorithm being used, doesn't it? And MD5 is a
>> relatively slow algorithm - there are much faster ones (hint: you don't
>> need a 16 byte hash value!).
>>
>> In addition, you're loading and executing md5sum for every file, which
>> does not need to be done with an integrated checksum algorithm.
>
> It'll still take longer to check than make takes to run a partial rebuild.
>
Based on what criteria? You have proven nothing, Ian. Your "test" is crap.
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2016-03-31 17:31 +1300 |
| Message-ID | <dm3ngdF92p2U2@mid.individual.net> |
| In reply to | #85268 |
On 03/30/16 09:22, Jerry Stuckle wrote: > On 3/29/2016 4:04 PM, Ian Collins wrote: >> On 03/30/16 08:57, Jerry Stuckle wrote: >>> In addition, you're loading and executing md5sum for every file, which >>> does not need to be done with an integrated checksum algorithm. >> >> It'll still take longer to check than make takes to run a partial rebuild. > > Based on what criteria? You have proven nothing, Ian. Your "test" is crap. That on the container I use to build ARM kernels, a clean build takes just over 2 minutes and a small incremental (which stats all the files) takes around 20 seconds. Claiming that having to checksum every file rather that check the modification time isn't an unnecessary overhead is ludicrous. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | gazelle@shell.xmission.com (Kenny McCormack) |
|---|---|
| Date | 2016-03-31 09:55 +0000 |
| Message-ID | <ndis65$308$2@news.xmission.com> |
| In reply to | #85451 |
In article <dm3ngdF92p2U2@mid.individual.net>, Ian Collins <ian-news@hotmail.com> wrote: >On 03/30/16 09:22, Jerry Stuckle babbled, but said nothing of any importance (as usual): >> Based on what criteria? You have proven nothing, Ian. Your "test" is crap. ... Ian replied: >That on the container I use to build ARM kernels, a clean build takes >just over 2 minutes and a small incremental (which stats all the files) >takes around 20 seconds. > >Claiming that having to checksum every file rather that check the >modification time isn't an unnecessary overhead is ludicrous. "ludicrous" is Jerry's stock-in-trade. -- If Jeb is Charlie Brown kicking a football-pulled-away, Mitt is a '50s housewife with a black eye who insists to her friends the roast wasn't dry.
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-31 10:01 -0400 |
| Message-ID | <ndjadr$n36$1@jstuckle.eternal-september.org> |
| In reply to | #85451 |
On 3/31/2016 12:31 AM, Ian Collins wrote: > On 03/30/16 09:22, Jerry Stuckle wrote: >> On 3/29/2016 4:04 PM, Ian Collins wrote: >>> On 03/30/16 08:57, Jerry Stuckle wrote: > >>>> In addition, you're loading and executing md5sum for every file, which >>>> does not need to be done with an integrated checksum algorithm. >>> >>> It'll still take longer to check than make takes to run a partial >>> rebuild. >> >> Based on what criteria? You have proven nothing, Ian. Your "test" is >> crap. > > That on the container I use to build ARM kernels, a clean build takes > just over 2 minutes and a small incremental (which stats all the files) > takes around 20 seconds. > > Claiming that having to checksum every file rather that check the > modification time isn't an unnecessary overhead is ludicrous. > 3.5 seconds to checksum *every* source file in *all* architectures? And when limit the test to the applicable files, the result will be under a second. I don't consider that "unnecessary overhead". Checking the date is "quick and dirty". It is not accurate, as I have already shown. And the Linux kernel may be the biggest project you've ever tried to compile - but it's not even a drop in the ocean. I've seen compiles which have tens of thousands of source and take overnight on a mainframe, for instance. These projects may have hundreds of programmers working on them, and something like an older file in the build can easily sneak in and cause problems. Sure, they use version management - but that's not perfect, either. It's why commercial utilities don't just use dates. -- ================== Remove the "x" from my email address Jerry Stuckle jstucklex@attglobal.net ==================
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-29 16:31 -0400 |
| Message-ID | <ndeog1$6ah$1@jstuckle.eternal-september.org> |
| In reply to | #85267 |
On 3/29/2016 4:04 PM, Ian Collins wrote:
> On 03/30/16 08:57, Jerry Stuckle wrote:
>> On 3/29/2016 3:24 PM, Ian Collins wrote:
>>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>>
>>>>> That depends on your definition of "better". There are certainly
>>>>> build
>>>>> utilities that work by checksums rather than modification times - they
>>>>> will not be fooled by time sync problems. But they have to read
>>>>> all the
>>>>> files involved for every build - if you have a lot of files, and
>>>>> only a
>>>>> few change, that can mean a lot longer build time.
>>>>>
>>>>
>>>> Reading a file and computing the checksum is not that hard to do,
>>>> nor is
>>>> it very time consuming - much less than compiling a file, for instance.
>>>> And the utility has to check the file time also takes time. And the
>>>> more files you have, the more critical it is to ensure the source files
>>>> are in sync.
>>>
>>> In a Linux kernel tree:
>>>
>>> Checking dates -
>>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>> 43499
>>>
>>> real 0m0.340s
>>>
>>> Checking checksums -
>>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>> 43499
>>>
>>> real 1m56.131s
>>>
>>> Longer then rebuilding most of the tree... Considering most development
>>> builds are for one or two files, longer than the entire
>>> rebuild/reboot/reload cycle.
>>>
>>
>> That all depends on the algorithm being used, doesn't it? And MD5 is a
>> relatively slow algorithm - there are much faster ones (hint: you don't
>> need a 16 byte hash value!).
>>
>> In addition, you're loading and executing md5sum for every file, which
>> does not need to be done with an integrated checksum algorithm.
>
> It'll still take longer to check than make takes to run a partial rebuild.
>
I should also add - your tree contains files for multiple architectures,
only one of which would be used during a build.
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-30 17:27 -0400 |
| Message-ID | <ndhg5t$fse$1@jstuckle.eternal-september.org> |
| In reply to | #85267 |
On 3/29/2016 4:04 PM, Ian Collins wrote:
> On 03/30/16 08:57, Jerry Stuckle wrote:
>> On 3/29/2016 3:24 PM, Ian Collins wrote:
>>> On 03/30/16 02:02, Jerry Stuckle wrote:
>>>> On 3/29/2016 4:18 AM, David Brown wrote:
>>>>>
>>>>> That depends on your definition of "better". There are certainly
>>>>> build
>>>>> utilities that work by checksums rather than modification times - they
>>>>> will not be fooled by time sync problems. But they have to read
>>>>> all the
>>>>> files involved for every build - if you have a lot of files, and
>>>>> only a
>>>>> few change, that can mean a lot longer build time.
>>>>>
>>>>
>>>> Reading a file and computing the checksum is not that hard to do,
>>>> nor is
>>>> it very time consuming - much less than compiling a file, for instance.
>>>> And the utility has to check the file time also takes time. And the
>>>> more files you have, the more critical it is to ensure the source files
>>>> are in sync.
>>>
>>> In a Linux kernel tree:
>>>
>>> Checking dates -
>>> time find . \( -name "*.[coh]" -o -ctime +100 \) | wc -l
>>> 43499
>>>
>>> real 0m0.340s
>>>
>>> Checking checksums -
>>> time find . -name "*.[cho]" -exec md5sum {} \; | wc -l
>>> 43499
>>>
>>> real 1m56.131s
>>>
>>> Longer then rebuilding most of the tree... Considering most development
>>> builds are for one or two files, longer than the entire
>>> rebuild/reboot/reload cycle.
>>>
>>
>> That all depends on the algorithm being used, doesn't it? And MD5 is a
>> relatively slow algorithm - there are much faster ones (hint: you don't
>> need a 16 byte hash value!).
>>
>> In addition, you're loading and executing md5sum for every file, which
>> does not need to be done with an integrated checksum algorithm.
>
> It'll still take longer to check than make takes to run a partial rebuild.
>
Here's the results of a quick program I wrote to compute a CRC32 on all
.h, .c and .o files in a directory structure. The test is checking
every source file for all architectures in the Linux source.
>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
38933 files computed, total byte count 534546283
real 1m52.269s
user 0m0.472s
sys 0m8.836s
However, this was in a 2GB virtual machine (Oracle VM VirtualBox)
running an Ubuntu guest under Windows 7 on a notebook with a relatively
slow hard disk. The "disk" is actually an 8GB file on Windows, which
makes disk access even slower. But it's a good test environment when
you aren't overly worried about performance.
When I rerun the test, all of the files will be in buffers and the
results are much different:
>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0
38933 files computed, total byte count 534546283
real 0m3.485s
user 0m2.380s
sys 0m1.020s
About 3.5 seconds total to checksum 39K files and 500+Mb.
I would say that's not very time consuming. I would expect that running
on a desktop with a reasonably fast disk would show pretty good performance.
And these tests are much more accurate than yours.
--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex@attglobal.net
==================
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2016-03-31 16:19 +1300 |
| Message-ID | <dm3jalF92p2U1@mid.individual.net> |
| In reply to | #85408 |
On 03/31/16 10:27, Jerry Stuckle wrote: > > Here's the results of a quick program I wrote to compute a CRC32 on all > .h, .c and .o files in a directory structure. The test is checking > every source file for all architectures in the Linux source. > >>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0 > 38933 files computed, total byte count 534546283 > > real 1m52.269s > user 0m0.472s > sys 0m8.836s > > However, this was in a 2GB virtual machine (Oracle VM VirtualBox) > running an Ubuntu guest under Windows 7 on a notebook with a relatively > slow hard disk. The "disk" is actually an 8GB file on Windows, which > makes disk access even slower. But it's a good test environment when > you aren't overly worried about performance. > > When I rerun the test, all of the files will be in buffers and the > results are much different: > >>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0 > 38933 files computed, total byte count 534546283 > > real 0m3.485s > user 0m2.380s > sys 0m1.020s > > About 3.5 seconds total to checksum 39K files and 500+Mb. > > I would say that's not very time consuming. I would expect that running > on a desktop with a reasonably fast disk would show pretty good performance. 150MB/sec for reading small files is *very* good performance. Imagine how fast it would be if it only had to check modification times. > And these tests are much more accurate than yours. That I doubt. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-31 10:06 -0400 |
| Message-ID | <ndjamq$o6e$1@jstuckle.eternal-september.org> |
| In reply to | #85449 |
On 3/30/2016 11:19 PM, Ian Collins wrote: > On 03/31/16 10:27, Jerry Stuckle wrote: >> >> Here's the results of a quick program I wrote to compute a CRC32 on all >> .h, .c and .o files in a directory structure. The test is checking >> every source file for all architectures in the Linux source. >> >>>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0 >> 38933 files computed, total byte count 534546283 >> >> real 1m52.269s >> user 0m0.472s >> sys 0m8.836s >> >> However, this was in a 2GB virtual machine (Oracle VM VirtualBox) >> running an Ubuntu guest under Windows 7 on a notebook with a relatively >> slow hard disk. The "disk" is actually an 8GB file on Windows, which >> makes disk access even slower. But it's a good test environment when >> you aren't overly worried about performance. >> >> When I rerun the test, all of the files will be in buffers and the >> results are much different: >> >>>> time ./crctest /usr/src/linux-source-4.2.0/linux-source-4.2.0 >> 38933 files computed, total byte count 534546283 >> >> real 0m3.485s >> user 0m2.380s >> sys 0m1.020s >> >> About 3.5 seconds total to checksum 39K files and 500+Mb. >> >> I would say that's not very time consuming. I would expect that running >> on a desktop with a reasonably fast disk would show pretty good >> performance. > > 150MB/sec for reading small files is *very* good performance. Imagine > how fast it would be if it only had to check modification times. > Not really, especially when you consider you have two file systems (Linux and Windows) running on a slow disk. The whole thing on an SSD running Linux natively would be on the order of a few seconds. Even on a fast hard disk, I would expect < 20 seconds for the whole works. And remember, you're checking *every* architecture. The necessary files for a single architecture will be much smaller. >> And these tests are much more accurate than yours. > > That I doubt. > Then you have once again shown you have no idea what you're talking about. As I previously proved. -- ================== Remove the "x" from my email address Jerry Stuckle jstucklex@attglobal.net ==================
[toc] | [prev] | [next] | [standalone]
| From | Malcolm McLean <malcolm.mclean5@btinternet.com> |
|---|---|
| Date | 2016-03-29 05:21 -0700 |
| Message-ID | <5468d979-6a0e-4c8d-acb6-5ee3d4f8c2cd@googlegroups.com> |
| In reply to | #85123 |
On Sunday, March 27, 2016 at 5:45:21 PM UTC+1, Jerry Stuckle wrote: > On 3/27/2016 11:14 AM, supercat@casperkitty.com wrote: > > Which means absolutely nothing. I can "touch" a file in Linux, for > instance - which updates the last modification time but doesn't change > the file at all. And even if the file is changed, there is no way to > know if the change was pertinent to this execution or not. > > And restoring from a backup also restores the last modification > date/time of the file. > > Your "tests" are not at all reliable. > That's always an issue with computing. Which process is allowed to insert a layer or interception between another and which is not? The whole point of "touch" is to lie to other processes to pretend that a file has been modified when it has not. However if you take a sha1 hash of a file, you'll detect the difference between real and fake modifications. So should we write a "tweak" which harmlessly inserts a blank space at the end of text files? Or a "sha1 interceptor" which reports a false hash? Then of course someone will write a "compare ignoring whitespace runs", so we can add things like <span></span> to override that. And it goes on. There's no set way of determining what gets to override what.
[toc] | [prev] | [next] | [standalone]
| From | Jerry Stuckle <jstucklex@attglobal.net> |
|---|---|
| Date | 2016-03-29 09:05 -0400 |
| Message-ID | <nddubq$sbk$2@jstuckle.eternal-september.org> |
| In reply to | #85229 |
On 3/29/2016 8:21 AM, Malcolm McLean wrote: > On Sunday, March 27, 2016 at 5:45:21 PM UTC+1, Jerry Stuckle wrote: >> On 3/27/2016 11:14 AM, supercat@casperkitty.com wrote: >> >> Which means absolutely nothing. I can "touch" a file in Linux, for >> instance - which updates the last modification time but doesn't change >> the file at all. And even if the file is changed, there is no way to >> know if the change was pertinent to this execution or not. >> >> And restoring from a backup also restores the last modification >> date/time of the file. >> >> Your "tests" are not at all reliable. >> > That's always an issue with computing. > Which process is allowed to insert a layer or interception between > another and which is not? > The whole point of "touch" is to lie to other processes to pretend > that a file has been modified when it has not. However if you take > a sha1 hash of a file, you'll detect the difference between real > and fake modifications. So should we write a "tweak" which harmlessly > inserts a blank space at the end of text files? Or a "sha1 interceptor" > which reports a false hash? > Then of course someone will write a "compare ignoring whitespace runs", > so we can add things like <span></span> to override that. And it > goes on. There's no set way of determining what gets to override > what. > You can always get around a utility, if you try hard enough. That isn't the issue here. -- ================== Remove the "x" from my email address Jerry Stuckle jstucklex@attglobal.net ==================
[toc] | [prev] | [next] | [standalone]
| From | Ken Brody <kenbrody@spamcop.net> |
|---|---|
| Date | 2016-03-25 14:33 -0400 |
| Message-ID | <nd404d$emo$1@dont-email.me> |
| In reply to | #84859 |
On 3/24/2016 4:15 PM, supercat@casperkitty.com wrote: > On Thursday, March 24, 2016 at 2:57:28 PM UTC-5, Keith Thompson wrote: >> supercat writes: >>> Is there any requirement that the previous call to ftell() must have >>> been performed within the current execution of the program? Also, by >>> what means if any could a current-position-relative fseek() ever have >>> meaningfully- defined behavior? >> >> Have you checked N1570 7.21.9.2? > > I should have omitted the follow-on question, but I think the first is > valid and the cited section doesn't mention it. The possibility that > the call may have been made on a previous execution of the program isn't > merely a theoretical exercise. Some implementations of "fortune" have an > index file that contains the offset to the start of each entry in the > main text file; is such an approach guaranteed to work? Well, "some implementations" may depend on POSIX behavior, which might define things which are left undefined in ISO C. Quoting 7.19.9.2p4 (n1124): > For a text stream, either offset shall be zero, or offset shall be a > value returned by an earlier successful call to the ftell function on a > stream associated with the same file and whence shall be SEEK_SET. So, I suppose, one could interpret that "a stream associated with the same file" could apply across executions. (IANALL -- I am not a language lawyer.) On the other hand, for a text file, what happens if the file is modified between the ftell() and subsequent fseek()? For example: fwrite some text, ftell the current position, fwrite more text, ftell the new position, fwrite some more. Now, fseek to the first position, and fwrite more text than before, such that the second ftell's position is no longer on a record boundary, and then fseek to the second position. Is that guaranteed to work? (I'm thinking VMS which, if I remember correctly, has text files which are actually records of variable-length data.) -- Kenneth Brody
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <kst-u@mib.org> |
|---|---|
| Date | 2016-03-25 11:38 -0700 |
| Message-ID | <ln8u164d3y.fsf@kst-u.example.com> |
| In reply to | #84957 |
Ken Brody <kenbrody@spamcop.net> writes:
[...]
> Quoting 7.19.9.2p4 (n1124):
[...]
N1124 is rather old. It's a draft of the C99 standard with the first
two Technical Corrigenda merged into it. N1256 is a better C99 draft;
it includes all three TCs. And N1570 is the newest publicly available
draft of C11. (The clause you're quoting is, as far as I know, the same
in all three.)
--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
[toc] | [prev] | [next] | [standalone]
| From | supercat@casperkitty.com |
|---|---|
| Date | 2016-03-25 14:05 -0700 |
| Message-ID | <44b32e63-21f6-4d20-9188-0c0c44bc4c1d@googlegroups.com> |
| In reply to | #84957 |
On Friday, March 25, 2016 at 1:33:59 PM UTC-5, Ken Brody wrote: > On 3/24/2016 4:15 PM, supercat wrote: > > I should have omitted the follow-on question, but I think the first is > > valid and the cited section doesn't mention it. The possibility that > > the call may have been made on a previous execution of the program isn't > > merely a theoretical exercise. Some implementations of "fortune" have an > > index file that contains the offset to the start of each entry in the > > main text file; is such an approach guaranteed to work? > > Well, "some implementations" may depend on POSIX behavior, which might > define things which are left undefined in ISO C. The question is whether this would be considered one of those things. > On the other hand, for a text file, what happens if the file is modified > between the ftell() and subsequent fseek()? > > For example: fwrite some text, ftell the current position, fwrite more > text, ftell the new position, fwrite some more. Now, fseek to the first > position, and fwrite more text than before, such that the second ftell's > position is no longer on a record boundary, and then fseek to the second > position. Is that guaranteed to work? (I'm thinking VMS which, if I > remember correctly, has text files which are actually records of > variable-length data.) The "fortune" program checked the modification dates of the text file and the index file, and would regenerate the index if the former was newer; since the text file seldom changed, having to read the whole thing only when it changed was a useful optimization. Writing to a text file in such a fashion as to change existing line boundaries is apt to cause problems which may vary from one system to another, but the "fortune" program didn't do that.
[toc] | [prev] | [next] | [standalone]
| From | Nick Bowler <nbowler@draconx.ca> |
|---|---|
| Date | 2016-03-28 16:14 +0000 |
| Message-ID | <ndbl8o$9qc$1@dont-email.me> |
| In reply to | #84957 |
On Fri, 25 Mar 2016 14:33:47 -0400, Ken Brody wrote:
> For example: fwrite some text, ftell the current position, fwrite more
> text, ftell the new position, fwrite some more. Now, fseek to the first
> position, and fwrite more text than before, such that the second ftell's
> position is no longer on a record boundary, and then fseek to the second
> position. Is that guaranteed to work? (I'm thinking VMS which, if I
> remember correctly, has text files which are actually records of
> variable-length data.)
I believe the standard answers this question, although perhaps
the answer is not very satisfactory because it basically ends
with "the resulting file position when seeking on text streams
is unspecified". Let's ignore issues of large files which
complicate real-world implementations of fseek and ftell.
The standard says this:
- Every stream which supports seeking has an associated "file
position indicator" (n1570 7.21.3p1).
- For text streams, the the file position indicator contains
unspecified information (n1570 7.21.9.4p2).
- The ftell function simply returns the current value of the
file position indicator, if successful (n1570 7.21.9.4p2).
- A call of fseek(stream, value, whence) does the following things,
if successful (n1570 7.21.9.2p5):
- it calculates a new value for the file position indicator,
- it undoes the effect of all ungetc calls on the stream,
- it clears the end-of-file indicator for the stream,
- it sets the file position indicator to the calculated value
- for read/write streams, it puts the stream back into a state
where the next operation can be either an input or an output
operation.
There is a restriction for text streams that fseek calls have whence of
SEEK_SET and value either be 0 or come from an earlier call to ftell on
the same file (n1570 7.21.9.2p4). Calls without that form therefore are
explicitly undefined. But there is no description of the calculation of
the new value of the file position indicator (such calculation is only
defined for binary streams); I think the intent was that the supplied
value be taken literally as the new file position indicator (this is what
implementations actually do, same as for binary streams with SEEK_SET),
but the lack of specification here doesn't actually matter.
Regardless of whether or not the file is modified between the ftell and
fseek, I would say the implementation still has to do all those things
(assuming fseek is successful), including calculating a new value for
the file position indicator and setting the file position indicator to
that value. If we immediately follow such an fseek with a call to ftell,
it must return (assuming it is successful) the new value of the file
position indicator.
But since the actual meaning of the indicator is unspecified (for
text streams), it might not be mean the same thing as when we called
originally called ftell. So the part of the file accessed by a
subsequent read or write operation may not be the same. This seems
likely if the file is modified, but conceivably it could change for
any other reasons at the implementation's fancy.
Binary streams have no such issue because the file position indicator
has a specified meaning: "the number of characters from the beginning
of the file" (n1570 7.21.9.4p2).
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <txr@alumni.caltech.edu> |
|---|---|
| Date | 2016-03-30 09:31 -0700 |
| Message-ID | <kfn4mbo9bc1.fsf@x-alumni2.alumni.caltech.edu> |
| In reply to | #85191 |
Nick Bowler <nbowler@draconx.ca> writes:
[...]
> There is a restriction for text streams that fseek calls have whence of
> SEEK_SET and value either be 0 or come from an earlier call to ftell on
> the same file (n1570 7.21.9.2p4). [...]
Hmmm. My reading is a little different. That paragraph says:
For a text stream, either offset shall be zero, or offset
shall be a value returned by an earlier successful call to
the ftell function on a stream associated with the same file
and whence shall be SEEK_SET.
I take this to mean offset must be zero (and whence could be any
of the three SEEK_ values), OR whence must be SEEK_SET and offset
must be a value from an earlier (successful) ftell call. In
particular it looks like
fseek( fp, 0, SEEK_CUR );
fseek( fp, 0, SEEK_END );
fseek( fp, 0, SEEK_SET );
are all meant to be allowed. Does this seem right to you
or is there something you think I've missed?
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2016-03-25 15:06 +1300 |
| Message-ID | <dljkpkFih2gU9@mid.individual.net> |
| In reply to | #84852 |
On 03/25/16 08:22, Stephen Sprunk wrote: > On 24-Mar-16 13:07, Robert Wessel wrote: >> fir <profesor.fir@gmail.com> wrote: >>> I checked and fseek( file, 0x200, SEEK_SET ); works >>> >>> it is good as it is much simpler than counting the bytes saved >> >> I don't believe there's any defined behavior according to the >> standard for seeking past the end of a file and then writing. On >> some platforms it will pad the file with zeros to the point of the >> write. On some system that also triggers the creation of a sparse >> file. >> >> But it's not something you can generally rely on. > > It's actually worse than that; fseek()'s behavior is only defined when > passed the result of a previous call to ftell(). This condition only applies to a text stream and it is omitted from the Unix specification where line ending conversions aren't an issue. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
| From | Ian Collins <ian-news@hotmail.com> |
|---|---|
| Date | 2016-03-25 15:00 +1300 |
| Message-ID | <dljkdkFih2gU8@mid.individual.net> |
| In reply to | #84840 |
On 03/25/16 07:07, Robert Wessel wrote: > > I don't believe there's any defined behavior according to the standard > for seeking past the end of a file and then writing. On some > platforms it will pad the file with zeros to the point of the write. > On some system that also triggers the creation of a sparse file. > > But it's not something you can generally rely on. The Unix specification extends the description of fseek() to include seeking and writing beyond the end of the file. -- Ian Collins
[toc] | [prev] | [next] | [standalone]
Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →
Back to top | Article view | comp.lang.c
csiph-web