Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.unix.shell > #872

Re: variable-length strings

From Uno <Uno@example.invalid>
Newsgroups comp.unix.shell
Subject Re: variable-length strings
Date 2011-06-07 23:00 -0600
Message-ID <958dvqF57fU1@mid.individual.net> (permalink)
References <94mfhcFngqU1@mid.individual.net> <slrniuf4u1.dhq.hjp-usenet2@hrunkner.hjp.at> <SJmdnS35VN9hm3TQnZ2dnUVZ_uadnZ2d@posted.nuvoxcommunications> <94u05hF6b1U1@mid.individual.net> <ise3um$1gr$1@speranza.aioe.org>

Show all headers | View raw


On 06/04/2011 02:18 PM, Alan Curry wrote:
> In article<94u05hF6b1U1@mid.individual.net>, Uno<Uno@example.invalid>  wrote:
>> On 06/03/2011 10:08 AM, Robert Bonomi wrote:
>>
>>> [snip] see the -l option.
>>
>> $ cmp -l -n 50  before.avi after2.avi
>>   5 204 150
>>   6 251 364
>>   7 112  66
>>   8   0   2
>
> In an AVI file, the first 4 bytes are "RIFF" and the next 4 bytes are the
> size of the payload, which should be the rest of the file. It's a
> little-endian format. Combining the bytes:
>
> dc<<<'8i 204 251 112 0 Ai 256*+256*+256*+p'
> 4893060
> dc<<<'8i 150 364 66 2 Ai 256*+256*+256*+p'
> 37155944
>
> the first file has a declared payload size of 4893060, so it should have been
> a file of about 4893068 bytes. Sometimes they have a junk segment on the end
> that isn't counted in the RIFF header but those are small. The other file
> should be about 37155952 bytes long, more than 7 times the size of the first
> file.

$ cmp -l -n 50  before.avi after2.avi
  5 204 150
  6 251 364
  7 112  66
  8   0   2
17 314 166
18  22  42
33  65 126
37   0 351
38 300  73
39  22  10
46   0  11
47   1   0
49 170 301
50   0  15
$ ls -l

-rw-r--r--  1 dan dan  37155580 2011-05-30 16:54 after2.avi
-rw-r--r--  1 dan dan   4892982 2011-06-01 17:13 before.avi

Thanks for your generous response, Alan, and I don't want you to be mad 
because I have to reveal that your calculations were correct, and I was 
basically comparing two files that never had anything to do with each 
other except that they ended in .avi and together lay in a pool of my 
failures.

Downthread I have a better isolation of this problem.  It's one where 
I'm back and forth to windows, and I hope you'll understand that this is 
precisely the type of f* up that results from this OS schizophrenia.
>
> The differences themselves don't look like any kind of simple corruption. The
> insertion or deletion of extra '\r' (13) bytes as in FTP ASCII mode wouldn't
> have done this. It's not just a couple of bits flipped either.
>
> After the RIFF size, the next 8 bytes would normally be "AVI ", signifying
> the type of the root node, and "LIST" indicating that the next node is a
> list. (In the RIFF format, there are LIST nodes which are like directories
> and non-LIST nodes are like files...)
>
> And the cmp shows no differences there. As expected, all AVI files have the
> same stuff there.
>
>> 17 314 166
>> 18  22  42
>
> After "LIST" will be the size of the list node. I expect this to be the
> "hdrl" (header list) node, which contains the movie metadata (frame rate,
> codecs, etc.) This is another 4-byte little endian number. I think we can
> assume that bytes 19 and 20 (the upper half of the hdrl size) are 0, since
> the metadata easily fits in less than 64K.
>
> The first file has hdrl size=4812, the second has hdrl size=8822.
>
> dc<<<'8i 314 22 0 0 Ai 256*+256*+256*+p'
> 4812
> dc<<<'8i 166 42 0 0 Ai 256*+256*+256*+p'
> 8822

Holy crap, I can do it, too:

$ dc<<<'8i 314 22 0 0 Ai 256*+256*+256*+p'
4812
$ dc<<<'8i 166 42 0 0 Ai 256*+256*+256*+p'
8822
$ man dc
$

>
> The next 4 bytes (21 through 24 in the cmp output) would be "hdrl". And they
> match. After that would come the name of the hdrl node's first child. That's
> probably going to be "avih", explaining why bytes 25 through 28 match. Bytes
> 29 through 32 would be the size of the avih node, which is always going to be
> 56 bytes, so they match too.
>
> Starting at byte 33 we'll get the main AVI header. Bytes 33-36 are the field
> dwMicroSecPerFrame. This time it's not reasonable to guess that all the bytes
> not shown by cmp were 0.
>
>> 33  65 126
>
> These bytes (in hex: 0x35 and 0x56) could reasonably be the low parts of
> 0x8235 and 0x8256, representing frame rates of:
>
> dc<<<'1000000 16i8235 2k/p'
> 30.00
> dc<<<'1000000 16i8256 2k/p'
> 29.97
>
> 30fps and 29.97fps, both of which are commonly occurring frame rates.
>
>> 37   0 351
>> 38 300  73
>> 39  22  10
>
> Bytes 37-40 are the field dwMaxBytesPerSec. Assuming the high byte is 0, the
> 2 files have values of 1228800 and 539625. I don't know if this field is
> useful, since it seems to be common for creators to put a 0 here and for
> players to ignore it.
>
> Bytes 41-44 are the field dwPaddingGranularity. All 0, I guess.
>
>> 46   0  11
>> 47   1   0
>
> Bytes 45-48 are dwFlags. The first file has AVIF_WASCAPTUREFILE (0x10000) and
> the second file has AVIF_ISINTERLEAVED|AVIF_TRUSTCKTYPE (0x900). These look
> reasonable (although I can't figure out what AVIF_WASCAPTUREFILE means even
> after reading Microsoft's explanation of it).
>
> Bytes 49-52 are dwTotalFrames. We only have the low 2 bytes here:
>
>> 49 170 301
>> 50   0  15
>> $
>
> But it's plausible that the upper bytes were 0 anyway. In that case, the
> first file has 120 frames and the second file has 3521 frames. In the first
> case, 120 frames matches up very well with a size of 4893060 bytes, a rate of
> 1228800 bytes per second, and 30 frames per second. 4893060/1228800*30=119.45
>
> The second one doesn't match up so easily. It contains audio (you can tell
> that from the AVIF_ISINTERLEAVED flag) which counts toward dwMaxBytesPerSec
> but not dwTotalFrames. The bytes per second of the video stream should be
> 37155944 bytes * 29.97 frames/sec / 3521 frames = 316263 bytes/sec. Its
> declared rate is 539625 bytes/sec. If the audio takes up the missing space,
> it would be about 42% of the file. That could be correct if the audio is
> uncompressed.
>
> Or maybe dwMaxBytesPerSec just wasn't useful for this purpose. It's not
> called "average bytes per sec" after all.
>
>>
>> It looks like it gets back on track for a bit.  Can you speculate what
>> happened to this download?
>
> It gets back on track because the AVI header has a lot of unused space after
> the interesting fields, so there's a big spread of 0's to match up. They
> probably have some more mismatches around byte 4840, where the first file's
> hdrl chunk ends.
>
> Since they both look like valid AVI files, with many differing fields but
> none that look invalid, I speculate that they're 2 different movies, and even
> though one or both of them may contain errors, neither one is a corrupted
> copy of the other. The differences between them are just too non-random to be
> any kind of accidental corruption.

Of course, you're right.
>
> If you want to compare AVIs, better options than cmp are:
>
> 1. play them! side by side if necessary.

Windows is always telling me that the files have been corrupted.
>
> 2. /usr/bin/file file1.avi file2.avi
>
> 3. (like /usr/bin/file but better)
> alias mi='mplayer -noconsolecontrols -identify -frames 0 -vo null -ao null'
> mi file1.avi 2>&1 | grep '^ID'>  dump1
> mi file2.avi 2>&1 | grep '^ID'>  dump2
> diff -u dump1 dump2
>
> 4. xdelta delta file1.avi file2.avi deltafile ; ls -l deltafile
> xdelta records a complete set of instructions for constructing file2 from
> file1. If the delta file is very small, that means there's not much
> difference between them, and one could be a corrupted copy of the other. If
> the delta file is nearly as big as file2, they files just didn't have much in
> common.
>

I'll consider these when I have opportunity.  Cheers,
-- 
Uno

Back to comp.unix.shell | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

variable-length strings Uno <Uno@example.invalid> - 2011-06-01 03:36 -0600
  Re: variable-length strings ccc31807 <cartercc@gmail.com> - 2011-06-01 07:13 -0700
    Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-02 10:54 -0600
  Re: variable-length strings "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> - 2011-06-01 17:01 +0300
  Re: variable-length strings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-02 15:33 +0200
  Re: variable-length strings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-02 15:44 +0200
    Re: variable-length strings Willem <willem@toad.stack.nl> - 2011-06-02 14:19 +0000
      cmp (was: variable-length strings) "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-02 18:15 +0200
      Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-03 23:21 -0600
        Re: variable-length strings "David W. Hodgins" <dwhodgins@nomail.afraid.org> - 2011-06-04 02:43 -0400
    Re: variable-length strings bonomi@host122.r-bonomi.com (Robert Bonomi) - 2011-06-03 11:08 -0500
      Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-04 00:03 -0600
        Re: variable-length strings pacman@kosh.dhis.org (Alan Curry) - 2011-06-04 20:18 +0000
          Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-07 23:00 -0600
      cmp (was: variable-length strings) "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-04 10:29 +0200
        Re: cmp Uno <Uno@example.invalid> - 2011-06-07 22:38 -0600
          Re: cmp pacman@kosh.dhis.org (Alan Curry) - 2011-06-08 05:22 +0000
            Re: cmp Uno <Uno@example.invalid> - 2011-06-09 16:10 -0600
    Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-03 22:58 -0600
      bdiff (was: variable-length strings) "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-04 10:52 +0200
        Re: bdiff Uno <Uno@example.invalid> - 2011-06-08 21:59 -0600
          Re: bdiff Ian Collins <ian-news@hotmail.com> - 2011-06-09 16:08 +1200
            Re: bdiff Uno <Uno@example.invalid> - 2011-06-08 22:20 -0600
              Re: bdiff Uno <Uno@example.invalid> - 2011-06-09 02:57 -0600
                Re: bdiff Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-06-09 11:10 +0200
                Re: bdiff bonomi@host122.r-bonomi.com (Robert Bonomi) - 2011-06-19 06:10 -0500
          Re: bdiff Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-06-09 07:13 -0400
            Re: bdiff Uno <Uno@example.invalid> - 2011-06-09 13:30 -0600
    Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-03 23:11 -0600
      Re: variable-length strings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-04 10:55 +0200

csiph-web