Message-ID: <666baa01@news.ausics.net>
From: not@telling.you.invalid (Computer Nerd Kev)
Subject: Re: Script to conditionally find and compress files recursively
Newsgroups: comp.os.linux.misc
References: <v48s96$u6fg$1@dont-email.me> <v4b46s$7dh$1@tncsrv09.home.tnetconsulting.net> <v4dtdt$23kjq$1@dont-email.me> <sm05xudwc1b.fsf@lakka.kapsi.fi> <666b7b6c@news.ausics.net>
User-Agent: tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i586))
NNTP-Posting-Host: news.ausics.net
Date: 14 Jun 2024 12:25:06 +1000
Organization: Ausics - https://newsgroups.ausics.net
Lines: 26
X-Complaints: abuse@ausics.net
Path: csiph.com!news.bbs.nz!news.ausics.net!not-for-mail
Xref: csiph.com comp.os.linux.misc:56569

Computer Nerd Kev <not@telling.you.invalid> wrote:
> Anssi Saari <anssi.saari@usenet.mail.kapsi.fi> wrote:
>> 
>> Well then, I believe the solution was already posted. Grab 5% of your
>> files with dd and see how it compresses. 
> 
> The solution that I see grabs the first 1MB, but it would make more
> sense to sample eg. 1% of the file size in five places within the
> file. 100MB file = 1MB sample, 100MB/5 = 20MB, so use dd to grab
> one 1MB sample from the start of the file then four more at an
> offset that increments by 20MB each time. Store these separately,
> compress them separately, then average the compression ratio of all
> the samples.

Also for some types of data (if it's not all video), like text, some
more advanced compressors build a dictionary to better compress
larger files. But this requires a minimum file size, so the small
samples might not represent the compression ratio of the whole file
with a dictionary included. A solution is to pre-generate a
dictionary based on a collection of the same type of files you're
compressing, then you could compress the small samples using that
dictionary and get a more accurate result.

-- 
__          __
#_ < |\| |< _#