Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin1!goblin2!goblin.stu.neva.ru!newsfeed1.swip.net!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <205bfa4f-29de-43de-be5a-72a12d77d0c9@googlegroups.com>
References: <205bfa4f-29de-43de-be5a-72a12d77d0c9@googlegroups.com>
Date: Wed, 30 Oct 2013 13:46:57 -0600
Subject: Re: Algorithm that makes maximum compression of completly diffused data.
From: Modulok <modulok@gmail.com>
To: jonas.thornvall@gmail.com
Content-Type: multipart/alternative; boundary=047d7b86e3d890ddbb04e9fa9801
Cc: Python mailing list <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1835.1383162420.18130.python-list@python.org>
Lines: 129
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:58101

--047d7b86e3d890ddbb04e9fa9801
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Oct 30, 2013 at 12:21 PM, <jonas.thornvall@gmail.com> wrote:

> I am searching for the program or algorithm that makes the best possible
> of completly (diffused data/random noise) and wonder what the state of art
> compression is.
>
> I understand this is not the correct forum but since i think i have an
> algorithm that can do this very good, and do not know where to turn for
> such question i was thinking to start here.
>
> It is of course lossless compression i am speaking of.
> --
> https://mail.python.org/mailman/listinfo/python-list



>> I am searching for the program or algorithm that makes the best possible
of
>> completly (diffused data/random noise) and wonder what the state of art
>> compression is.

None. If the data to be compressed is truly homogeneous, random noise as you
describe (for example a 100mb file read from cryptographically secure random
bit generator such as /dev/random on *nix systems), the state-of-the-art
lossless compression is zero and will remain that way for the foreseeable
future.

There is no lossless algorithm that will reduce truly random (high entropy)
data by any significant margin. In classical information theory, such an
algorithm can never be invented. See: Kolmogorov complexity

Real world data is rarely completely random. You would have to test various
algorithms on the data set in question. Small things such as non-obvious
statistical clumping can make a big difference in the compression ratio from
one algorithm to another. Data that might look "random", might not actually
be
random in the entropy sense of the word.

>> I understand this is not the correct forum but since i think i have an
>> algorithm that can do this very good, and do not know where to turn for
such
>> question i was thinking to start here.

Not to sound like a downer, but I would wager that the data you're testing
your
algorithm on is not as truly random as you imply or is not a large enough
body
of test data to draw such conclusions from. It's akin to inventing a
perpetual
motion machine or an inertial propulsion engine or any other classically
impossible solutions. (This only applies to truly random data.)

-Modulok-

--047d7b86e3d890ddbb04e9fa9801
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Wed, Oct 30, 2013 at 12:21 PM,  <span dir=3D"ltr">&lt;<=
a href=3D"mailto:jonas.thornvall@gmail.com" target=3D"_blank">jonas.thornva=
ll@gmail.com</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div class=
=3D"gmail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">I am searching for the program or algorithm that makes the=
 best possible of completly (diffused data/random noise) and wonder what th=
e state of art compression is.<br>

<br>
I understand this is not the correct forum but since i think i have an algo=
rithm that can do this very good, and do not know where to turn for such qu=
estion i was thinking to start here.<br>
<br>
It is of course lossless compression i am speaking of.<br>
<span class=3D""><font color=3D"#888888">--<br>
<a href=3D"https://mail.python.org/mailman/listinfo/python-list" target=3D"=
_blank">https://mail.python.org/mailman/listinfo/python-list</a></font></sp=
an></blockquote><div><br></div><div>=A0</div></div><div class=3D"gmail_extr=
a">
&gt;&gt; I am searching for the program or algorithm that makes the best po=
ssible of</div><div class=3D"gmail_extra">&gt;&gt; completly (diffused data=
/random noise) and wonder what the state of art</div><div class=3D"gmail_ex=
tra">
&gt;&gt; compression is.</div><div class=3D"gmail_extra"><br></div><div cla=
ss=3D"gmail_extra">None. If the data to be compressed is truly homogeneous,=
 random noise as you</div><div class=3D"gmail_extra">describe (for example =
a 100mb file read from cryptographically secure random</div>
<div class=3D"gmail_extra">bit generator such as /dev/random on *nix system=
s), the state-of-the-art</div><div class=3D"gmail_extra">lossless compressi=
on is zero and will remain that way for the foreseeable</div><div class=3D"=
gmail_extra">
future.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra=
">There is no lossless algorithm that will reduce truly random (high entrop=
y)</div><div class=3D"gmail_extra">data by any significant margin. In class=
ical information theory, such an</div>
<div class=3D"gmail_extra">algorithm can never be invented. See: Kolmogorov=
 complexity</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_e=
xtra">Real world data is rarely completely random. You would have to test v=
arious</div>
<div class=3D"gmail_extra">algorithms on the data set in question. Small th=
ings such as non-obvious</div><div class=3D"gmail_extra">statistical clumpi=
ng can make a big difference in the compression ratio from</div><div class=
=3D"gmail_extra">
one algorithm to another. Data that might look &quot;random&quot;, might no=
t actually be</div><div class=3D"gmail_extra">random in the entropy sense o=
f the word.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_e=
xtra">
&gt;&gt; I understand this is not the correct forum but since i think i hav=
e an</div><div class=3D"gmail_extra">&gt;&gt; algorithm that can do this ve=
ry good, and do not know where to turn for such</div><div class=3D"gmail_ex=
tra">
&gt;&gt; question i was thinking to start here.</div><div class=3D"gmail_ex=
tra"><br></div><div class=3D"gmail_extra">Not to sound like a downer, but I=
 would wager that the data you&#39;re testing your</div><div class=3D"gmail=
_extra">
algorithm on is not as truly random as you imply or is not a large enough b=
ody</div><div class=3D"gmail_extra">of test data to draw such conclusions f=
rom. It&#39;s akin to inventing a perpetual</div><div class=3D"gmail_extra"=
>
motion machine or an inertial propulsion engine or any other classically</d=
iv><div class=3D"gmail_extra">impossible solutions. (This only applies to t=
ruly random data.)</div><div class=3D"gmail_extra"><br></div><div class=3D"=
gmail_extra">
-Modulok-</div></div></div>

--047d7b86e3d890ddbb04e9fa9801--