Groups | Search | Server Info | Login | Register


Groups > comp.sources.testers > #5

Re: filesystem content tracking

From Ivan Shmakov <oneingray@gmail.com>
Newsgroups comp.unix.misc, comp.software.testing, comp.sources.testers
Subject Re: filesystem content tracking
Date 2012-03-19 18:03 +0700
Organization Aioe.org NNTP Server
Message-ID <86fwd43lai.fsf@gray.siamics.net> (permalink)
References <86aa3j9quj.fsf@gray.siamics.net>

Cross-posted to 3 groups.

Show all headers | View raw


>>>>> Ivan Shmakov <oneingray@gmail.com> writes:

	[Cross-posting to news:comp.software.testing, too.]

[...]

 > Now, to the numbers.  For the test run, I've processed 47151 files
 > (41285 regular, 2.6 GiB in total), and it took only 6:21 to produce
 > the corresponding 251978 records (10 MiB in total.)

	More precisely, 251978 (or, rather, 251979) is the number of
	/object definition/ records only.

	The distribution of the record types is as follows:

 251979 objectDef
  48253     blob
  47192     filename
  47151     fileBind
  42883     solidStat
  42883     fileRecord
  23615     digest
      1     uuid
      1     digestKind
  47151 objectConfirm
   2559 filePadding
    381 timestamp
      1 sessionHeader
      1 sessionTrailer

	(Please also note that for the testing purposes, the block
	padding size was set to 4 KiB, which is a rather low value.  I
	guess that the reasonable default for the released version of
	the code would be around 128 KiB, and the number of filePadding
	records will be proportionally lower.)

 > The time is comparable to raw sha1sum(1) (about 4:20), and the space
 > demands are roughly 222 bytes per file on average (compare to the
 > SQLite-based version's 360 above.)

 > I: file_bind=251978, file_rec=251977, digest=251976
 > 132.12user 13.32system 6:21.35elapsed 38%CPU (0avgtext+0avgdata 233008maxresident)k
 > 4096210inputs+8outputs (4major+14613minor)pagefaults 0swaps

 > Unfortunately, the use of the previously recorded sessions is not
 > implemented at this moment, so I don't have the space requirements
 > for the "incremental update" case at hand, but I expect them to be
 > times lower than those for the SQLite-based version.  Combined with
 > the ability to easily compress or move away the older sessions, I
 > hope this may finally get the tool to a usable and useful state.

	And here they are:

I: loaded, id-start=0, id-next=251979
I: session
...
I: file_bind=251978, file_rec=251977, confirmed
128.16user 4.96system 2:37.34elapsed 84%CPU (0avgtext+0avgdata 2020096maxresident)k
35824inputs+0outputs (0major+127316minor)pagefaults 0swaps

	This time, only 207 digests were computed (preasumably for the
	new and changed files in the set), and most of the time was
	seemingly spent reading the "previous" session.

	The resulting file is less than 774 KiB, contains 61858 records
	total, 47393 of whose are "objectConfirm" ones (whose number
	should be equal to the number of files processed.)  The average
	space consumption is thus less than 17 bytes per file, which is
	a huge win when compared to the 86 bytes per file per session in
	the previous, SQLite-based version.

	The distribution of the record types in this "incremental" file
	is as follows:

  47393 objectConfirm
  14218 objectDef
   6578     fileRecord
   6578     fileBind
    369     blob
    272     solidStat
    243     filename
    177     digest
      1     file
    193 filePadding
     51 timestamp
      1 objectMapDef
      1 sessionHeader
      1 sessionTrailer

	The objectDef records here should be mainly fileRecord and
	fileBind ones, due to the changes in Unix access times (a-times)
	of the files processed.  Should a-time recording be disabled (or
	the filesystem be mounted with a-time updates disabled), I
	expect much fewer of these records.

 > (For anyone wishing to try this version, its sources could also be
 > found at [1, 2], under the fccs-2012-03-asn.1 branch.)

	... Though I'm yet to upload the recent changes there.

[...]

 > [1] http://gray.am-1.org/~ivan/archives/git/gitweb.cgi?p=fc-2012.git
 > [2] http://gray.am-1.org/~ivan/archives/git/fc-2012.git/

-- 
FSF associate member #7257

Back to comp.sources.testers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

filesystem content tracking Ivan Shmakov <oneingray@gmail.com> - 2012-03-14 21:53 +0700
  Re: filesystem content tracking Ivan Shmakov <oneingray@gmail.com> - 2012-03-19 18:03 +0700
    Re: filesystem content tracking Ivan Shmakov <oneingray@gmail.com> - 2012-03-20 23:37 +0700
    Re: filesystem content tracking Ivan Shmakov <oneingray@gmail.com> - 2012-03-22 11:41 +0700

csiph-web