Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!bcyclone01.am1.xlned.com!bcyclone01.am1.xlned.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.026 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'algorithm': 0.04; 'value,': 0.04; 'encoding': 0.05; 'referring': 0.07; 'bits': 0.09; 'bytes,': 0.09; 'encode': 0.09; 'integers': 0.09; 'assumptions': 0.16; 'byte,': 0.16; 'values?': 0.16; 'zero,': 0.16; 'subject:python': 0.16; 'wrote:': 0.18; 'bit': 0.19; '>>>': 0.22; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'bytes': 0.24; 'define': 0.26; 'second': 0.26; 'values': 0.27; 'header:In-Reply- To:1': 0.27; 'tried': 0.27; 'rest': 0.29; '[1]': 0.29; 'am,': 0.29; "i'm": 0.30; 'went': 0.31; 'apparently': 0.31; 'claiming': 0.31; 'constant': 0.31; 'subject:skip:i 10': 0.31; 'terms.': 0.31; 'values.': 0.31; 'not.': 0.33; 'third': 0.33; 'except': 0.35; 'beyond': 0.35; 'shows': 0.36; 'subject:new': 0.38; 'challenging': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'that,': 0.38; 'to:addr:python.org': 0.39; 'how': 0.40; 'dave': 0.60; 'new': 0.61; 'range': 0.61; 'first': 0.61; 'you.': 0.62; 'email addr:gmail.com': 0.63; 'decided': 0.64; 'provide': 0.64; 'charset:windows-1252': 0.65; 'sample': 0.67; 'between': 0.67; 'beat': 0.68; 'received:74.208': 0.68; 'million': 0.74; '100': 0.79; '2015': 0.84; 'yours': 0.88; 'average': 0.93 Date: Wed, 18 Feb 2015 17:19:48 -0500 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: python implementation of a new integer encoding algorithm. References: <515047c1-a20d-430e-a029-1c2d77db2f1a@googlegroups.com> <2a717ffb-d61d-4407-9082-1c17cd7ee573@googlegroups.com> In-Reply-To: <2a717ffb-d61d-4407-9082-1c17cd7ee573@googlegroups.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:onNUMsNMx6dS8HUeGsI/uI9W5xbgFPmLHnSP/OeZy8n i5boxYKRzJQbwy0cPMrcTBeU7lDl5cQUGcdkAu37KAeyt5WD6g /780w/hwxsSvBx5eRs6BVTlCaGIKYiN7uMfkgMRlQfe/pvTcth vcftL1PKNV3AtbKzdK2dj22/iThBLv5DXZPQx/rTj9zjSZQSdP pnyBUX3E5W2Gpha8tg0iLnD03KzxyQ9txeQ0rQW52aRpf9ZiYD 2jmr0EcOBsEkGO/qOEGZ5EPxazCUlpqkbfIE0Oep0NZTu5BS5Y qIbYd6k3AzBNnuUJGNBU6Td2yZaUEGMWAfr2Fppw/aQHgsvOyB bMSBfg39q/cBQR7aJOU0= X-UI-Out-Filterresults: notjunk:1; X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 48 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424298001 news.xs4all.nl 2853 [2001:888:2000:d::a6]:42956 X-Complaints-To: abuse@xs4all.nl X-Received-Bytes: 5342 X-Received-Body-CRC: 2802915041 Xref: csiph.com comp.lang.python:85827 On 02/18/2015 02:55 PM, janhein.vanderburg@gmail.com wrote: > Op woensdag 18 februari 2015 17:47:49 UTC+1 schreef Dave Angel: >> On 02/18/2015 03:59 AM, janhein.vanderburg@gmail.com wrote: >> >> >>> encoding individual integers optimally without any assumptions about their values. >>> >> >> Contradiction in terms. >> >> -- >> DaveA > > Not. > Jan-Hein. > Then you had better define your new word "optimal" to us. I decided to try your algorithm for all the values between 0 and 999999. One million values, and the 7bit encoding[1] beat yours for 950081 of them. The rest were tied. Yours never was shorter. For a uniform distribution of those particular values, the average number of bytes used by yours was 3.933568 bytes, and by 7bit encoding was 2.983487 For the second and third million, yours are all 4 bytes, while 7bit uses 3. Beyond 2097152, 7bit uses 4 bytes, same as you. Between 16 and 17 million, you average 4.156865, while 7bit is a constant 4.0. After that, I started spot-checking. I went up to 100 billion, and for none of those I tried did your algorithm take fewer bytes than 7bit. So how is yours optimal? Over what range of values? I'm not necessarily doubting it, just challenging you to provide a data sample that actually shows it. And of course, I'm not claiming that 7bit is in any way optimal. You cannot define optimal without first defining the distribution. [1] by 7bit, I'm referring to the one apparently used in MIDI encoding, where 7 bits of each byte hold the value, and the 8th bit is zero, except for the last byte, where the 8th bit is one. So 3 bytes can encode 21 bits, or up to 2**21 - 1. -- DaveA