Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin1!goblin2!goblin.stu.neva.ru!newsfeed1.swip.net!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.082 X-Spam-Evidence: '*H*': 0.84; '*S*': 0.00; 'algorithm': 0.04; 'see:': 0.07; 'imply': 0.09; 'statistical': 0.09; 'cc:addr:python-list': 0.11; 'question.': 0.14; 'random': 0.14; '>>': 0.16; '(high': 0.16; 'compression': 0.16; 'none.': 0.16; 'rarely': 0.16; 'applies': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'bit': 0.19; 'machine': 0.22; 'example': 0.22; 'email addr:gmail.com>': 0.22; 'cc:addr:python.org': 0.22; 'of.': 0.24; 'question': 0.24; 'cc:2**0': 0.24; 'solutions.': 0.26; '(for': 0.26; 'header:In- Reply-To:1': 0.27; 'testing': 0.29; 'correct': 0.29; '(this': 0.29; 'wonder': 0.29; 'message-id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; 'another.': 0.31; 'subject:that': 0.31; 'file': 0.32; 'url:python': 0.33; 'sense': 0.34; 'test': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'url:listinfo': 0.36; 'possible': 0.36; 'url:org': 0.36; 'searching': 0.37; 'turn': 0.37; 'skip:& 10': 0.38; 'pm,': 0.38; 'enough': 0.39; 'url:mail': 0.40; 'read': 0.60; 'algorithms': 0.60; 'is.': 0.60; 'truly': 0.60; 'forum': 0.61; 'course': 0.61; "you're": 0.61; 'information': 0.63; 'real': 0.63; 'such': 0.63; '30,': 0.65; 'to:addr:gmail.com': 0.65; 'world': 0.66; 'sound': 0.68; 'secure': 0.71; 'completly': 0.84; 'complexity': 0.84; 'conclusions': 0.84; 'noise': 0.84; 'good,': 0.91; 'ratio': 0.91; 'from.': 0.93; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Y37dKbSvEtbM/0pywKJhwAMS+47K5TalUO9ABCkUygc=; b=pboCB+zdUFHfSpmun4AW7pAwTJMyOrgwn9w82tya28nGclSALspdsRYZG7/82OoF4/ 4nHKhNFjUlvsNHrek+92k4dpcKthb9Hg7pkWpzBt7Y28bPQtQ7UOA72Y7SNqOVIzbW0O jggfBEozHbzXlfJ0Ek7jSGw1wDGFvSQwm9uYdshocuB70rJtirBP+3PTiFZ0H0ISFgMy nc4jR3fieVJfRzqGfO9pV+uUZ8/Smp8aj4gBAtOmIf8LJTAbP4n6z33NYNlVBkm48o0G qLZ/1oMRCPXzHWy+qMcFrBI3aaoKoyz6kkkBrW/A6BDdUeEWHa0gCdTpedrtklWNnHMe YgkQ== MIME-Version: 1.0 X-Received: by 10.58.216.74 with SMTP id oo10mr4205167vec.0.1383162417566; Wed, 30 Oct 2013 12:46:57 -0700 (PDT) In-Reply-To: <205bfa4f-29de-43de-be5a-72a12d77d0c9@googlegroups.com> References: <205bfa4f-29de-43de-be5a-72a12d77d0c9@googlegroups.com> Date: Wed, 30 Oct 2013 13:46:57 -0600 Subject: Re: Algorithm that makes maximum compression of completly diffused data. From: Modulok To: jonas.thornvall@gmail.com Content-Type: multipart/alternative; boundary=047d7b86e3d890ddbb04e9fa9801 Cc: Python mailing list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 129 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1383162420 news.xs4all.nl 15911 [2001:888:2000:d::a6]:35670 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:58101 --047d7b86e3d890ddbb04e9fa9801 Content-Type: text/plain; charset=ISO-8859-1 On Wed, Oct 30, 2013 at 12:21 PM, wrote: > I am searching for the program or algorithm that makes the best possible > of completly (diffused data/random noise) and wonder what the state of art > compression is. > > I understand this is not the correct forum but since i think i have an > algorithm that can do this very good, and do not know where to turn for > such question i was thinking to start here. > > It is of course lossless compression i am speaking of. > -- > https://mail.python.org/mailman/listinfo/python-list >> I am searching for the program or algorithm that makes the best possible of >> completly (diffused data/random noise) and wonder what the state of art >> compression is. None. If the data to be compressed is truly homogeneous, random noise as you describe (for example a 100mb file read from cryptographically secure random bit generator such as /dev/random on *nix systems), the state-of-the-art lossless compression is zero and will remain that way for the foreseeable future. There is no lossless algorithm that will reduce truly random (high entropy) data by any significant margin. In classical information theory, such an algorithm can never be invented. See: Kolmogorov complexity Real world data is rarely completely random. You would have to test various algorithms on the data set in question. Small things such as non-obvious statistical clumping can make a big difference in the compression ratio from one algorithm to another. Data that might look "random", might not actually be random in the entropy sense of the word. >> I understand this is not the correct forum but since i think i have an >> algorithm that can do this very good, and do not know where to turn for such >> question i was thinking to start here. Not to sound like a downer, but I would wager that the data you're testing your algorithm on is not as truly random as you imply or is not a large enough body of test data to draw such conclusions from. It's akin to inventing a perpetual motion machine or an inertial propulsion engine or any other classically impossible solutions. (This only applies to truly random data.) -Modulok- --047d7b86e3d890ddbb04e9fa9801 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Wed, Oct 30, 2013 at 12:21 PM, <<= a href=3D"mailto:jonas.thornvall@gmail.com" target=3D"_blank">jonas.thornva= ll@gmail.com> wrote:
I am searching for the program or algorithm that makes the= best possible of completly (diffused data/random noise) and wonder what th= e state of art compression is.

I understand this is not the correct forum but since i think i have an algo= rithm that can do this very good, and do not know where to turn for such qu= estion i was thinking to start here.

It is of course lossless compression i am speaking of.
--
https://mail.python.org/mailman/listinfo/python-list

=A0
>> I am searching for the program or algorithm that makes the best po= ssible of
>> completly (diffused data= /random noise) and wonder what the state of art
>> compression is.

None. If the data to be compressed is truly homogeneous,= random noise as you
describe (for example = a 100mb file read from cryptographically secure random
bit generator such as /dev/random on *nix system= s), the state-of-the-art
lossless compressi= on is zero and will remain that way for the foreseeable
future.

There is no lossless algorithm that will reduce truly random (high entrop= y)
data by any significant margin. In class= ical information theory, such an
algorithm can never be invented. See: Kolmogorov= complexity

Real world data is rarely completely random. You would have to test v= arious
algorithms on the data set in question. Small th= ings such as non-obvious
statistical clumpi= ng can make a big difference in the compression ratio from
one algorithm to another. Data that might look "random", might no= t actually be
random in the entropy sense o= f the word.

>> I understand this is not the correct forum but since i think i hav= e an
>> algorithm that can do this ve= ry good, and do not know where to turn for such
>> question i was thinking to start here.

Not to sound like a downer, but I= would wager that the data you're testing your
algorithm on is not as truly random as you imply or is not a large enough b= ody
of test data to draw such conclusions f= rom. It's akin to inventing a perpetual
motion machine or an inertial propulsion engine or any other classically
impossible solutions. (This only applies to t= ruly random data.)

-Modulok-
--047d7b86e3d890ddbb04e9fa9801--