Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!bcyclone02.am1.xlned.com!bcyclone02.am1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'algorithm': 0.04; 'encoding': 0.05; 'binary': 0.07; 'correct.': 0.07; 'encoded': 0.07; 'sized': 0.07; 'bits': 0.09; 'encode': 0.09; 'escape': 0.09; 'indicates': 0.09; 'oh,': 0.09; 'compares': 0.16; 'compute': 0.16; 'encodes': 0.16; 'expected,': 0.16; 'integer,': 0.16; 'larger,': 0.16; 'lengths': 0.16; 'manner,': 0.16; 'range,': 0.16; 'simplicity,': 0.16; 'variations': 0.16; 'subject:python': 0.16; 'followed': 0.16; 'thanks,': 0.17; 'wrote:': 0.18; 'code.': 0.18; 'looked': 0.18; 'variable': 0.18; 'bit': 0.19; 'starts': 0.20; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'integer': 0.24; 'specify': 0.24; 'typical': 0.24; "i've": 0.25; 'define': 0.26; 'values': 0.27; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'chris': 0.29; 'am,': 0.29; 'field,': 0.30; 'needed.': 0.30; "i'm": 0.30; '255,': 0.31; 'description,': 0.31; 'skip:7 10': 0.31; 'subject:skip:i 10': 0.31; 'values.': 0.31; 'probably': 0.32; 'entirely': 0.33; 'raw': 0.33; 'sense': 0.34; 'could': 0.34; 'but': 0.35; 'format.': 0.36; 'example,': 0.37; 'two': 0.37; 'subject:new': 0.38; 'to:addr:python-list': 0.38; 'expect': 0.39; 'does': 0.39; 'to:addr:python.org': 0.39; 'space': 0.40; 'how': 0.40; 'read': 0.60; 'then,': 0.60; 'most': 0.60; 'ago,': 0.61; 'length': 0.61; 'simply': 0.61; 'first': 0.61; 'email addr:gmail.com': 0.63; 'field': 0.63; 'size.': 0.65; 'charset:windows-1252': 0.65; 'received:74.208': 0.68; 'fact,': 0.69; '20,000': 0.84; '2015': 0.84; '99.9%': 0.84; 'beats': 0.84; 'concept.': 0.84; 'irrelevant': 0.84; 'presumably': 0.84; 'approach.': 0.91; 'average': 0.93; 'hand,': 0.93 Date: Wed, 18 Feb 2015 08:54:29 -0500 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: python implementation of a new integer encoding algorithm. References: <54E34C68.6040700@davea.name> <65b209a0-33fc-4c1f-8af6-de3a4626623c@googlegroups.com> In-Reply-To: <65b209a0-33fc-4c1f-8af6-de3a4626623c@googlegroups.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:OB1cfpIiMrHKvdXZvLcGAFQn/Cz9IjYeZfV1lCIsyRI 6+YtlFTj6oJUkD10bD4jMAOWrpr8okkAB0/oeuBDJjHnsjOh9O 01bGMiu/fCxNfPe5yaWMIkFXaF8jAEkliFMIFUf0rO1sLp4gz8 2W/ldS8FWdT0YgkYJVDc+jpgR6PK1xmwAYKZSwIw+0OLN/xrVu Lb7DfNU4C1hmrLqMVBIiMpXXYCN199qXMXPC46rCzIVw80LnOC ZrVWoUG5XL89nMBQOQRMAmjD76cBkkVnCdHoesbMITL0cxMWv0 Z07giDSgUefDIe5hsoBadkxtjh4uvIiUt92vh1OlkO0f7g10nl RxFtJ1rCSslvr+VkcFCE= X-UI-Out-Filterresults: notjunk:1; X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 47 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424267695 news.xs4all.nl 2881 [2001:888:2000:d::a6]:40619 X-Complaints-To: abuse@xs4all.nl X-Received-Bytes: 6211 X-Received-Body-CRC: 1955361323 Xref: csiph.com comp.lang.python:85792 On 02/18/2015 04:04 AM, janhein.vanderburg@gmail.com wrote: > On Tuesday, February 17, 2015 at 3:35:16 PM UTC+1, Chris Angelico wrote: >> Oh, incidentally: If you want a decent binary format for >> variable-sized integer, check out the MIDI spec. > > I did some time ago, thanks, and it is indeed a decent format. > I also looked at variations of that approach. > None of them beats Define "beats." You might mean beats in simplicity, or in elegance, or in clarity of code. But you probably mean in space efficiency, or "compression." But that's meaningless without a target distribution of values that you expect to encode. For example, if 99.9% of your values are going to be less than 255, then the most efficient byte encoding would be one that simply stores a value less than 255, and starts with an FF for larger values. It's almost irrelevant how it encodes those larger values. On the other hand, if most values are going to be in the 10,000 to 20,000 bit size range, and a few will be much smaller, and a few will be very much larger, then it would be very practical to start with a size field, say 16 bits, followed by the raw packed data. Naturally, the size field would need to have an escape value that indicates a larger field was needed. In fact, the size field could be encoded in a 7bits-per-byte manner, so it would encode an arbitrary sized number as well. > "my" concept of two counters that cooperatively specify field lengths and represented integer values. > >> I've tried to read through the original algorithm description, but I'm >> not entirely sure: How many payload bits per transmitted byte does it >> actually achieve? > > I don't think that payload bits per byte makes sense in this concept. > Correct. Presumably one means average payload bits per byte. First one would have to define what the "standard" unencoded variable length integer format was. Then one could call that size the payload size. Then, in order to compute an average, one would have to specify an expected, or target distribution of values. One then compares and averages the payload size for each typical value with the encoded size. -- DaveA