Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'algorithm': 0.04; 'correct.': 0.07; 'encoded': 0.07; 'bits': 0.09; 'bytes,': 0.09; 'bytes.': 0.09; 'encode': 0.09; 'separately': 0.09; 'cc:addr :python-list': 0.11; 'bytes)': 0.16; 'compares': 0.16; 'compute': 0.16; 'expected,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'subject:python': 0.16; 'followed': 0.16; 'wrote:': 0.18; 'variable': 0.18; 'have:': 0.19; 'thu,': 0.19; 'feb': 0.22; '>>>': 0.22; '(in': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'instance,': 0.24; 'integer': 0.24; 'specify': 0.24; 'typical': 0.24; 'cc:2**0': 0.24; "i've": 0.25; 'define': 0.26; 'header:In- Reply-To:1': 0.27; 'tried': 0.27; 'am,': 0.29; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'description,': 0.31; 'subject:skip:i 10': 0.31; 'values.': 0.31; 'entirely': 0.33; 'sense': 0.34; 'could': 0.34; 'etc': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'subject:new': 0.38; 'does': 0.39; 'extremely': 0.39; 'sure': 0.39; 'how': 0.40; 'read': 0.60; 'easy': 0.60; 'dave': 0.60; 'then,': 0.60; 'identify': 0.61; 'length': 0.61; 'numbers': 0.61; 'first': 0.61; 'different': 0.65; 'size.': 0.65; 'ranges': 0.74; '2015': 0.84; 'concept.': 0.84; 'presumably': 0.84; 'angel': 0.91; 'to:none': 0.92; 'average': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=X7fZVYgupDexCqzQ5x6omHE87bPnn9uxqBRHAIpLHBY=; b=pHm5vT7zZsfLbnq7vY8J3VGWrxzs8TJ4sxGwrdcbMgVq4L8cpw7G4k3qy08GRGtqEq +6nAxJCNs/DYHaWfmFR/vM4Wa8Wt0N+fpL5o93LwFUruHe9+n27o7JzfA+b6z0pohCm1 YHUr2/rT+uhMQr/rmEc4OivqMVlexSBw7QhN0/+fp+7gkebw6a3h40zD+wMxuNoa/PLT QO5Dzr7Sm5+wo7wnBqx5fTNj20rTF124Eeuod5W/pdVZ26A5PRp2fkBU5GRQ6AXpE2fX ryRydh7+GrcKLihX1cDvSDYCd7LrAkHqv/RfFH6e/qV9oYpQAg6DO4gbeK1eNXf6J/id 72AQ== MIME-Version: 1.0 X-Received: by 10.42.64.197 with SMTP id h5mr393229ici.12.1424268960780; Wed, 18 Feb 2015 06:16:00 -0800 (PST) In-Reply-To: <54E49995.6040403@davea.name> References: <54E34C68.6040700@davea.name> <65b209a0-33fc-4c1f-8af6-de3a4626623c@googlegroups.com> <54E49995.6040403@davea.name> Date: Thu, 19 Feb 2015 01:16:00 +1100 Subject: Re: python implementation of a new integer encoding algorithm. From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 45 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424268970 news.xs4all.nl 2902 [2001:888:2000:d::a6]:49488 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:85794 On Thu, Feb 19, 2015 at 12:54 AM, Dave Angel wrote: >>> I've tried to read through the original algorithm description, but I'm >>> not entirely sure: How many payload bits per transmitted byte does it >>> actually achieve? >> >> >> I don't think that payload bits per byte makes sense in this concept. >> > > Correct. Presumably one means average payload bits per byte. > > First one would have to define what the "standard" unencoded variable length > integer format was. Then one could call that size the payload size. Then, > in order to compute an average, one would have to specify an expected, or > target distribution of values. One then compares and averages the payload > size for each typical value with the encoded size. Average, or separately for different ranges. Alternatively, identify which ranges of numbers can be encoded with how many bytes. For instance, if you have a varlen length (in bytes) followed by that many packed bytes, you would have: 2: up to 1<<8-1 3: up to 1<<16-1 4: up to 1<<24-1 ... 128: up to 1<<1016-1 130: up to 1<<1024-1 131: up to 1<<1032-1 etc Using varlen directly gives: 1: up to 1<<7-1 2: up to 1<<14-1 3: up to 1<<21-1 etc So if your number is around about 1<<20, you know it's going to take 3 bytes to encode as varlen, or 4 to encode with the meta-length. Easy comparisons. I'm not sure how this algorithm compares, because it's extremely clever. ChrisA