Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'encoding': 0.05; 'binary': 0.07; 'assuming': 0.09; 'bytes.': 0.09; 'integers': 0.09; 'integral': 0.09; 'mind,': 0.09; 'output,': 0.09; 'patent': 0.09; 'storage.': 0.09; 'worse': 0.09; 'suggest': 0.14; 'wrote': 0.14; 'thread': 0.14; '1990,': 0.16; '384': 0.16; 'assumptions': 0.16; 'bits.': 0.16; 'compression': 0.16; 'declared': 0.16; 'finite': 0.16; 'introduces': 0.16; 'python-list,': 0.16; 'specifying': 0.16; 'thread,': 0.16; 'subject:python': 0.16; 'size,': 0.16; 'wrote:': 0.18; 'discussion': 0.18; 'obviously': 0.18; 'bit': 0.19; 'previously': 0.22; 'header:User-Agent:1': 0.23; 'exists': 0.24; 'integer': 0.24; "haven't": 0.24; 'values': 0.27; 'header :In-Reply-To:1': 0.27; '[1]': 0.29; 'am,': 0.29; '[2]': 0.30; 'url:wiki': 0.31; 'another.': 0.31; 'assumes': 0.31; 'bunch': 0.31; 'probability': 0.31; 'schemes': 0.31; 'subject:skip:i 10': 0.31; 'url:wikipedia': 0.31; 'file': 0.32; 'probably': 0.32; 'minimal': 0.33; 'actual': 0.34; 'maybe': 0.34; 'anywhere': 0.35; 'convert': 0.35; 'building': 0.35; 'there': 0.35; 'scheme': 0.36; 'done': 0.36; 'next': 0.36; 'method': 0.36; 'thanks': 0.36; 'url:org': 0.36; 'should': 0.36; 'example,': 0.37; 'two': 0.37; 'clear': 0.37; 'subject:new': 0.38; 'others.': 0.38; 'to:addr :python-list': 0.38; 'that,': 0.38; 'does': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'dave': 0.60; 'is.': 0.60; 'tell': 0.60; 'length': 0.61; 'new': 0.61; "you're": 0.61; "you'll": 0.62; "you've": 0.63; 'email addr:gmail.com': 0.63; 'such': 0.63; 'url:blogspot': 0.65; 'talking': 0.65; 'charset:windows-1252': 0.65; 'between': 0.67; 'beat': 0.68; 'received:74.208': 0.68; 'study': 0.69; 'ranges': 0.74; 'goal': 0.75; 'low': 0.83; '2015': 0.84; 'concept.': 0.84; 'functions:': 0.84; 'received:74.208.4.194': 0.84; 'goals,': 0.91; 'postings.': 0.91 Date: Thu, 09 Apr 2015 08:03:51 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: python implementation of a new integer encoding algorithm. References: <515047c1-a20d-430e-a029-1c2d77db2f1a@googlegroups.com> <2a717ffb-d61d-4407-9082-1c17cd7ee573@googlegroups.com> <993f64c3-fc85-48a7-9d02-a4f12ecb33c6@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:PIFmPvgpXsM4rQ9WQkY1594m/pVNi1DW5X8eJ4mZ22qyuhpe4WF ai/i95qzMNa1mFXkZVzNTbwiDZzDoB11rj5I2olQfEkHzxZ8e//AI6I0159Ry/2K2jgh3H5 H9F1y1fnE+jyiixZEEJb/+qiDco/4/iDs6yGpLxwk5Sv71F/i40+vRpAren8wnNFmXcZvHW U5/K/OIJCmwt5JlCl3RuQ== X-UI-Out-Filterresults: notjunk:1; X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 57 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1428581054 news.xs4all.nl 2898 [2001:888:2000:d::a6]:41438 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:88711 On 04/09/2015 05:33 AM, janhein.vanderburg@gmail.com wrote: > Op donderdag 19 februari 2015 19:25:14 UTC+1 schreef Dave Angel: >> I wrote the following pair of functions: >> >> Here's a couple of ranges of output, showing that the 7bit scheme does >> better for values between 384 and 16379. > Thanks for this test; I obviously should have done it myself. > Please have a look at http://optarbvalintenc.blogspot.nl/2015/04/inputs-from-complangpython.html and the next two postings. > I still don't see where you have anywhere declared what your goal is. Like building a recursive compression scheme [1], if you don't have a specific goal in mind, you'll never actually be sure you've achieved it, even though you might be able to fool the patent examiners. Any method of encoding will be worse for some values in order to be better for others. Without specifying a distribution, you cannot tell whether a "typical" set of integers is better with one method than another. For example, if you have uniform distribution of all integer values up to 256**n-1, you will not be able to beat a straight n-byte binary storage. Other than that, I make no claims that any of the schemes previously discussed in this thread is unbeatable. You also haven't made it clear whether you're assuming such a compressed bit stream is required to occupy an integral number of bytes. For example, if your goal is to store a bunch of these arbitrary length integers in a file of minimal size, then you're talking classic compression techniques. Or maybe you should be minimizing the time to convert such a bit stream to and from a conventional one. I suggest you study Huffman encoding[2], and see what makes it tick. It makes the assumptions that there are a finite set of symbols, and that there exists a probability distribution of the likelihood of each symbol, and that each takes an integral number of *bits*. Then study arithmetic-encoding[3], which no longer assumes that a single symbol occupy a whole number of bits. A mind-blowing concept. Incidentally, it introduces a "stop-symbol" which is given a very low probability. See the book "Text Compression", 1990, by Bell, Cleary, and Witten. [1] - http://gailly.net/05533051.html [2] - http://en.wikipedia.org/wiki/Huffman_coding [3] - http://en.wikipedia.org/wiki/Arithmetic_coding If you're going to continue the discussion on python-list, you probably should start a new thread, state your actual goals, and -- DaveA