Message-ID: <666baa01@news.ausics.net> From: not@telling.you.invalid (Computer Nerd Kev) Subject: Re: Script to conditionally find and compress files recursively Newsgroups: comp.os.linux.misc References: <666b7b6c@news.ausics.net> User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586)) NNTP-Posting-Host: news.ausics.net Date: 14 Jun 2024 12:25:06 +1000 Organization: Ausics - https://newsgroups.ausics.net Lines: 26 X-Complaints: abuse@ausics.net Path: csiph.com!news.bbs.nz!news.ausics.net!not-for-mail Xref: csiph.com comp.os.linux.misc:56569 Computer Nerd Kev wrote: > Anssi Saari wrote: >> >> Well then, I believe the solution was already posted. Grab 5% of your >> files with dd and see how it compresses. > > The solution that I see grabs the first 1MB, but it would make more > sense to sample eg. 1% of the file size in five places within the > file. 100MB file = 1MB sample, 100MB/5 = 20MB, so use dd to grab > one 1MB sample from the start of the file then four more at an > offset that increments by 20MB each time. Store these separately, > compress them separately, then average the compression ratio of all > the samples. Also for some types of data (if it's not all video), like text, some more advanced compressors build a dictionary to better compress larger files. But this requires a minimum file size, so the small samples might not represent the compression ratio of the whole file with a dictionary included. A solution is to pre-generate a dictionary based on a collection of the same type of files you're compressing, then you could compress the small samples using that dictionary and get a more accurate result. -- __ __ #_ < |\| |< _#