Groups > comp.lang.python > #88702 > unrolled thread

Re: python implementation of a new integer encoding algorithm.

Started by	janhein.vanderburg@gmail.com
First post	2015-04-09 02:33 -0700
Last post	2015-04-12 13:50 -0700
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-04-09 02:33 -0700
    Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-04-09 08:03 -0400
      Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-04-12 13:50 -0700

#88702 — Re: python implementation of a new integer encoding algorithm.

From	janhein.vanderburg@gmail.com
Date	2015-04-09 02:33 -0700
Subject	Re: python implementation of a new integer encoding algorithm.
Message-ID	<c6007d78-7c21-4841-814f-4ca529889298@googlegroups.com>

Op donderdag 19 februari 2015 19:25:14 UTC+1 schreef Dave Angel:
> I wrote the following pair of functions:
> 
> def seven_code(n):
>      acc = bytearray()
>      if n == 0:
>          acc.append(0)
>      while n > 0:
>          quotient, remainder = divmod(n, 128)
>          acc.append(remainder)
>          n = quotient
>      acc[-1] |= 0x80            #add a stop bit to last byte
>      return acc
> 
> def seven_decode(sequ):
>      acc = 0
>      multiplier = 1
>      for by in sequ:
>          acc += (by & 0x7f) * multiplier
>          if by > 0x7f:
>              break
>          multiplier *= 128
>      return acc
> 
> Here's a couple of ranges of output, showing that the 7bit scheme does 
> better for values between 384 and 16379.
Thanks for this test; I obviously should have done it myself.
Please have a look at http://optarbvalintenc.blogspot.nl/2015/04/inputs-from-complangpython.html and the next two postings.

[toc] | [next] | [standalone]

#88711

From	Dave Angel <davea@davea.name>
Date	2015-04-09 08:03 -0400
Message-ID	<mailman.164.1428581054.12925.python-list@python.org>
In reply to	#88702

On 04/09/2015 05:33 AM, janhein.vanderburg@gmail.com wrote:
> Op donderdag 19 februari 2015 19:25:14 UTC+1 schreef Dave Angel:
>> I wrote the following pair of functions:

    <snip>
>>
>> Here's a couple of ranges of output, showing that the 7bit scheme does
>> better for values between 384 and 16379.
> Thanks for this test; I obviously should have done it myself.
> Please have a look at http://optarbvalintenc.blogspot.nl/2015/04/inputs-from-complangpython.html and the next two postings.
>

I still don't see where you have anywhere declared what your goal is. 
Like building a recursive compression scheme [1], if you don't have a 
specific goal in mind, you'll never actually be sure you've achieved it, 
even though you might be able to fool the patent examiners.

Any method of encoding will be worse for some values in order to be 
better for others.  Without specifying a distribution, you cannot tell 
whether a "typical" set of integers is better with one method than another.

For example, if you have uniform distribution of all integer values up 
to 256**n-1, you will not be able to beat a straight n-byte binary storage.

Other than that, I make no claims that any of the schemes previously 
discussed in this thread is unbeatable.

You also haven't made it clear whether you're assuming such a compressed 
bit stream is required to occupy an integral number of bytes.  For 
example, if your goal is to store a bunch of these arbitrary length 
integers in a file of minimal size, then you're talking classic 
compression techniques.  Or maybe you should be minimizing the time to 
convert such a bit stream to and from a conventional one.

I suggest you study Huffman encoding[2], and see what makes it tick.  It 
makes the assumptions that there are a finite set of symbols, and that 
there exists a probability distribution of the likelihood of each 
symbol, and that each takes an integral number of *bits*.

Then study arithmetic-encoding[3], which no longer assumes that a single 
symbol occupy a whole number of bits.  A mind-blowing concept. 
Incidentally, it introduces a "stop-symbol" which is given a very low 
probability.

See the book "Text Compression", 1990, by Bell, Cleary, and Witten.

[1] - http://gailly.net/05533051.html
[2] - http://en.wikipedia.org/wiki/Huffman_coding
[3] - http://en.wikipedia.org/wiki/Arithmetic_coding

If you're going to continue the discussion on python-list, you probably 
should start a new thread, state your actual goals, and

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#88877

From	janhein.vanderburg@gmail.com
Date	2015-04-12 13:50 -0700
Message-ID	<07f12c79-fe40-41f3-8573-d5142c8d05fb@googlegroups.com>
In reply to	#88711

Op donderdag 9 april 2015 14:04:25 UTC+2 schreef Dave Angel:
> I still don't see where you have anywhere declared what your goal is. 
Sorry that you didn't like the improvements to the stop bit encoding that I illustrated in http://optarbvalintenc.blogspot.nl/2015/04/the-new-proposal.html and the 8 bit prototype presented in http://optarbvalintenc.blogspot.nl/2015/04/the-8-bit-prototype.html

I agree that we should stop this thread and I invite everybody to continue the discussion on the blog itself.

[toc] | [prev] | [standalone]

csiph-web

Re: python implementation of a new integer encoding algorithm.

Contents

#88702 — Re: python implementation of a new integer encoding algorithm.

#88711

#88877