Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.unix.shell > #872
| From | Uno <Uno@example.invalid> |
|---|---|
| Newsgroups | comp.unix.shell |
| Subject | Re: variable-length strings |
| Date | 2011-06-07 23:00 -0600 |
| Message-ID | <958dvqF57fU1@mid.individual.net> (permalink) |
| References | <94mfhcFngqU1@mid.individual.net> <slrniuf4u1.dhq.hjp-usenet2@hrunkner.hjp.at> <SJmdnS35VN9hm3TQnZ2dnUVZ_uadnZ2d@posted.nuvoxcommunications> <94u05hF6b1U1@mid.individual.net> <ise3um$1gr$1@speranza.aioe.org> |
On 06/04/2011 02:18 PM, Alan Curry wrote: > In article<94u05hF6b1U1@mid.individual.net>, Uno<Uno@example.invalid> wrote: >> On 06/03/2011 10:08 AM, Robert Bonomi wrote: >> >>> [snip] see the -l option. >> >> $ cmp -l -n 50 before.avi after2.avi >> 5 204 150 >> 6 251 364 >> 7 112 66 >> 8 0 2 > > In an AVI file, the first 4 bytes are "RIFF" and the next 4 bytes are the > size of the payload, which should be the rest of the file. It's a > little-endian format. Combining the bytes: > > dc<<<'8i 204 251 112 0 Ai 256*+256*+256*+p' > 4893060 > dc<<<'8i 150 364 66 2 Ai 256*+256*+256*+p' > 37155944 > > the first file has a declared payload size of 4893060, so it should have been > a file of about 4893068 bytes. Sometimes they have a junk segment on the end > that isn't counted in the RIFF header but those are small. The other file > should be about 37155952 bytes long, more than 7 times the size of the first > file. $ cmp -l -n 50 before.avi after2.avi 5 204 150 6 251 364 7 112 66 8 0 2 17 314 166 18 22 42 33 65 126 37 0 351 38 300 73 39 22 10 46 0 11 47 1 0 49 170 301 50 0 15 $ ls -l -rw-r--r-- 1 dan dan 37155580 2011-05-30 16:54 after2.avi -rw-r--r-- 1 dan dan 4892982 2011-06-01 17:13 before.avi Thanks for your generous response, Alan, and I don't want you to be mad because I have to reveal that your calculations were correct, and I was basically comparing two files that never had anything to do with each other except that they ended in .avi and together lay in a pool of my failures. Downthread I have a better isolation of this problem. It's one where I'm back and forth to windows, and I hope you'll understand that this is precisely the type of f* up that results from this OS schizophrenia. > > The differences themselves don't look like any kind of simple corruption. The > insertion or deletion of extra '\r' (13) bytes as in FTP ASCII mode wouldn't > have done this. It's not just a couple of bits flipped either. > > After the RIFF size, the next 8 bytes would normally be "AVI ", signifying > the type of the root node, and "LIST" indicating that the next node is a > list. (In the RIFF format, there are LIST nodes which are like directories > and non-LIST nodes are like files...) > > And the cmp shows no differences there. As expected, all AVI files have the > same stuff there. > >> 17 314 166 >> 18 22 42 > > After "LIST" will be the size of the list node. I expect this to be the > "hdrl" (header list) node, which contains the movie metadata (frame rate, > codecs, etc.) This is another 4-byte little endian number. I think we can > assume that bytes 19 and 20 (the upper half of the hdrl size) are 0, since > the metadata easily fits in less than 64K. > > The first file has hdrl size=4812, the second has hdrl size=8822. > > dc<<<'8i 314 22 0 0 Ai 256*+256*+256*+p' > 4812 > dc<<<'8i 166 42 0 0 Ai 256*+256*+256*+p' > 8822 Holy crap, I can do it, too: $ dc<<<'8i 314 22 0 0 Ai 256*+256*+256*+p' 4812 $ dc<<<'8i 166 42 0 0 Ai 256*+256*+256*+p' 8822 $ man dc $ > > The next 4 bytes (21 through 24 in the cmp output) would be "hdrl". And they > match. After that would come the name of the hdrl node's first child. That's > probably going to be "avih", explaining why bytes 25 through 28 match. Bytes > 29 through 32 would be the size of the avih node, which is always going to be > 56 bytes, so they match too. > > Starting at byte 33 we'll get the main AVI header. Bytes 33-36 are the field > dwMicroSecPerFrame. This time it's not reasonable to guess that all the bytes > not shown by cmp were 0. > >> 33 65 126 > > These bytes (in hex: 0x35 and 0x56) could reasonably be the low parts of > 0x8235 and 0x8256, representing frame rates of: > > dc<<<'1000000 16i8235 2k/p' > 30.00 > dc<<<'1000000 16i8256 2k/p' > 29.97 > > 30fps and 29.97fps, both of which are commonly occurring frame rates. > >> 37 0 351 >> 38 300 73 >> 39 22 10 > > Bytes 37-40 are the field dwMaxBytesPerSec. Assuming the high byte is 0, the > 2 files have values of 1228800 and 539625. I don't know if this field is > useful, since it seems to be common for creators to put a 0 here and for > players to ignore it. > > Bytes 41-44 are the field dwPaddingGranularity. All 0, I guess. > >> 46 0 11 >> 47 1 0 > > Bytes 45-48 are dwFlags. The first file has AVIF_WASCAPTUREFILE (0x10000) and > the second file has AVIF_ISINTERLEAVED|AVIF_TRUSTCKTYPE (0x900). These look > reasonable (although I can't figure out what AVIF_WASCAPTUREFILE means even > after reading Microsoft's explanation of it). > > Bytes 49-52 are dwTotalFrames. We only have the low 2 bytes here: > >> 49 170 301 >> 50 0 15 >> $ > > But it's plausible that the upper bytes were 0 anyway. In that case, the > first file has 120 frames and the second file has 3521 frames. In the first > case, 120 frames matches up very well with a size of 4893060 bytes, a rate of > 1228800 bytes per second, and 30 frames per second. 4893060/1228800*30=119.45 > > The second one doesn't match up so easily. It contains audio (you can tell > that from the AVIF_ISINTERLEAVED flag) which counts toward dwMaxBytesPerSec > but not dwTotalFrames. The bytes per second of the video stream should be > 37155944 bytes * 29.97 frames/sec / 3521 frames = 316263 bytes/sec. Its > declared rate is 539625 bytes/sec. If the audio takes up the missing space, > it would be about 42% of the file. That could be correct if the audio is > uncompressed. > > Or maybe dwMaxBytesPerSec just wasn't useful for this purpose. It's not > called "average bytes per sec" after all. > >> >> It looks like it gets back on track for a bit. Can you speculate what >> happened to this download? > > It gets back on track because the AVI header has a lot of unused space after > the interesting fields, so there's a big spread of 0's to match up. They > probably have some more mismatches around byte 4840, where the first file's > hdrl chunk ends. > > Since they both look like valid AVI files, with many differing fields but > none that look invalid, I speculate that they're 2 different movies, and even > though one or both of them may contain errors, neither one is a corrupted > copy of the other. The differences between them are just too non-random to be > any kind of accidental corruption. Of course, you're right. > > If you want to compare AVIs, better options than cmp are: > > 1. play them! side by side if necessary. Windows is always telling me that the files have been corrupted. > > 2. /usr/bin/file file1.avi file2.avi > > 3. (like /usr/bin/file but better) > alias mi='mplayer -noconsolecontrols -identify -frames 0 -vo null -ao null' > mi file1.avi 2>&1 | grep '^ID'> dump1 > mi file2.avi 2>&1 | grep '^ID'> dump2 > diff -u dump1 dump2 > > 4. xdelta delta file1.avi file2.avi deltafile ; ls -l deltafile > xdelta records a complete set of instructions for constructing file2 from > file1. If the delta file is very small, that means there's not much > difference between them, and one could be a corrupted copy of the other. If > the delta file is nearly as big as file2, they files just didn't have much in > common. > I'll consider these when I have opportunity. Cheers, -- Uno
Back to comp.unix.shell | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
variable-length strings Uno <Uno@example.invalid> - 2011-06-01 03:36 -0600
Re: variable-length strings ccc31807 <cartercc@gmail.com> - 2011-06-01 07:13 -0700
Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-02 10:54 -0600
Re: variable-length strings "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> - 2011-06-01 17:01 +0300
Re: variable-length strings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-02 15:33 +0200
Re: variable-length strings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-02 15:44 +0200
Re: variable-length strings Willem <willem@toad.stack.nl> - 2011-06-02 14:19 +0000
cmp (was: variable-length strings) "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-02 18:15 +0200
Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-03 23:21 -0600
Re: variable-length strings "David W. Hodgins" <dwhodgins@nomail.afraid.org> - 2011-06-04 02:43 -0400
Re: variable-length strings bonomi@host122.r-bonomi.com (Robert Bonomi) - 2011-06-03 11:08 -0500
Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-04 00:03 -0600
Re: variable-length strings pacman@kosh.dhis.org (Alan Curry) - 2011-06-04 20:18 +0000
Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-07 23:00 -0600
cmp (was: variable-length strings) "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-04 10:29 +0200
Re: cmp Uno <Uno@example.invalid> - 2011-06-07 22:38 -0600
Re: cmp pacman@kosh.dhis.org (Alan Curry) - 2011-06-08 05:22 +0000
Re: cmp Uno <Uno@example.invalid> - 2011-06-09 16:10 -0600
Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-03 22:58 -0600
bdiff (was: variable-length strings) "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-04 10:52 +0200
Re: bdiff Uno <Uno@example.invalid> - 2011-06-08 21:59 -0600
Re: bdiff Ian Collins <ian-news@hotmail.com> - 2011-06-09 16:08 +1200
Re: bdiff Uno <Uno@example.invalid> - 2011-06-08 22:20 -0600
Re: bdiff Uno <Uno@example.invalid> - 2011-06-09 02:57 -0600
Re: bdiff Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-06-09 11:10 +0200
Re: bdiff bonomi@host122.r-bonomi.com (Robert Bonomi) - 2011-06-19 06:10 -0500
Re: bdiff Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-06-09 07:13 -0400
Re: bdiff Uno <Uno@example.invalid> - 2011-06-09 13:30 -0600
Re: variable-length strings Uno <Uno@example.invalid> - 2011-06-03 23:11 -0600
Re: variable-length strings "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2011-06-04 10:55 +0200
csiph-web