Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #85741
| Date | 2015-02-17 09:50 -0500 |
|---|---|
| From | Dave Angel <davea@davea.name> |
| Subject | Re: python implementation of a new integer encoding algorithm. |
| References | <e45a71b2-7ec0-4c7d-88ae-c48aebe154b7@googlegroups.com> <54E34C68.6040700@davea.name> <CAPTjJmpF2OkaK3LKS1Uf=Rsc-XEczb=pkuUoyWLvLfQpvJHKSA@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.18786.1424184618.18130.python-list@python.org> (permalink) |
On 02/17/2015 09:34 AM, Chris Angelico wrote: > On Wed, Feb 18, 2015 at 1:12 AM, Dave Angel <davea@davea.name> wrote: >> They had a field type called a "compressed integer." It could vary between >> one byte and I think about six. And the theory was that it took less space >> than the equivalent format of fixed size integers. > > Oh, incidentally: If you want a decent binary format for > variable-sized integer, check out the MIDI spec. Its varlen integer is > simple: each byte carries seven bits of payload, and the high bit is > set on all except the last byte. Anything under 128 is represented by > the byte with that value; after that, you get a series of continuation > bytes followed by a final byte. It could scale to infinity if you > wanted to, though I think the MIDI spec doesn't have anything that can > go quite that high, and it's a worst-case wastage of 12.5% (or, > looking the other way, worst-case expansion of 14.2%), compared to an > optimal eight-bit encoding. Given that MIDI often works with very > small numbers, but occasionally could have to cope with much larger > ones (eg "time to next event", which is often zero or very low, but > once in a while could be anything at all), this is a good trade-off. > If you know you're frequently going to work with numbers up to 2**31 > but only occasionally larger than that, it might be better to take > four bytes at a time, and use the high bit as a continuation bit, in > the same way. It's all trade-offs. I have used that 7 bits/byte encoding myself for variable length integers. It worked well enough for its purpose. > > Decimal digits give you roughly 10/3 bits per character (== per byte, > if you encode UTF-8). If you consider that your baseline, you can then > try to justify your algorithm on the basis of "it's twice as efficient > as decimal". Hexadecimal gives you 4 bits per character, at a > relatively small cost; if you can't get better than about 6 bits per > byte average throughput, I would say your algorithm is completely > valueless - the difference will so rarely even matter, and you lose > all the benefits I mentioned in my last email. > > I've tried to read through the original algorithm description, but I'm > not entirely sure: How many payload bits per transmitted byte does it > actually achieve? Somehow when I first read the message, I missed the fact that there was a link to a web page with source code. But the first thing I'd expect to see would be a target estimate of the anticipated distribution of number values/magnitudes. For example, if a typical integer is 1500 bits, plus/minus 200 bits, I'd probably try encoding in base 65535, and have a terminator character of 0xffff. Without such a target, it's not a compression scheme, but an encoding one. An interesting point of history. At the time of Huffman's paper, it was provable that it was the best possible lossless compression. But there were some unwritten assumptions, such as that each character would be encoded with a whole number of bits. Changing that assumption makes arithmetic coding possible. Next assumption was that probabilities of each character didn't depend on context. Changing that assumption enables LZ. And so on. -- DaveA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-17 03:22 -0800
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-18 00:16 +1100
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 00:55 -0800
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-18 20:36 +1100
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 11:29 -0800
Re: python implementation of a new integer encoding algorithm. Laura Creighton <lac@openend.se> - 2015-02-18 11:32 +0100
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 11:48 -0800
People hated it for the same reasons I found them cool (was: python implementation of a new integer encoding algorithm.) Ben Finney <ben+python@benfinney.id.au> - 2015-02-18 21:57 +1100
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-17 09:12 -0500
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 00:59 -0800
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-18 11:46 -0500
Re: python implementation of a new integer encoding algorithm. Grant Edwards <invalid@invalid.invalid> - 2015-02-18 17:30 +0000
Re: python implementation of a new integer encoding algorithm. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-18 18:12 +0000
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 11:55 -0800
Re: python implementation of a new integer encoding algorithm. Marko Rauhamaa <marko@pacujo.net> - 2015-02-18 23:54 +0200
Re: python implementation of a new integer encoding algorithm. Marko Rauhamaa <marko@pacujo.net> - 2015-02-19 00:08 +0200
Re: python implementation of a new integer encoding algorithm. Grant Edwards <invalid@invalid.invalid> - 2015-02-18 22:58 +0000
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-18 17:19 -0500
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-19 07:45 -0800
Re: python implementation of a new integer encoding algorithm. Ian Kelly <ian.g.kelly@gmail.com> - 2015-02-19 11:04 -0700
Re: python implementation of a new integer encoding algorithm. Ian Kelly <ian.g.kelly@gmail.com> - 2015-02-19 11:16 -0700
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-19 13:24 -0500
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-20 05:34 +1100
Re: python implementation of a new integer encoding algorithm. Ian Kelly <ian.g.kelly@gmail.com> - 2015-02-19 11:32 -0700
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-19 13:41 -0500
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-19 13:46 -0500
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-20 05:49 +1100
Re: python implementation of a new integer encoding algorithm. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-18 17:00 +0000
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-18 01:34 +1100
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 01:04 -0800
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-18 08:54 -0500
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 11:52 -0800
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-19 01:16 +1100
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-17 09:50 -0500
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-18 01:58 +1100
Re: python implementation of a new integer encoding algorithm. Dave Angel <davea@davea.name> - 2015-02-17 10:18 -0500
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-18 02:25 +1100
Re: python implementation of a new integer encoding algorithm. Paul Rubin <no.email@nospam.invalid> - 2015-02-17 08:43 -0800
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-18 01:06 -0800
Re: python implementation of a new integer encoding algorithm. Mario Figueiredo <marfig@gmail.com> - 2015-02-19 08:44 +0100
Re: python implementation of a new integer encoding algorithm. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-19 08:06 +0000
Re: python implementation of a new integer encoding algorithm. Marko Rauhamaa <marko@pacujo.net> - 2015-02-19 10:36 +0200
Re: python implementation of a new integer encoding algorithm. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-19 09:33 +0000
Re: python implementation of a new integer encoding algorithm. Terry Reedy <tjreedy@udel.edu> - 2015-02-19 14:50 -0500
Re: python implementation of a new integer encoding algorithm. Marko Rauhamaa <marko@pacujo.net> - 2015-02-19 21:55 +0200
Re: python implementation of a new integer encoding algorithm. Chris Angelico <rosuav@gmail.com> - 2015-02-19 19:36 +1100
Re: python implementation of a new integer encoding algorithm. Mario Figueiredo <marfig@gmail.com> - 2015-02-19 10:42 +0100
Re: python implementation of a new integer encoding algorithm. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-19 10:28 +0000
Re: python implementation of a new integer encoding algorithm. Mario Figueiredo <marfig@gmail.com> - 2015-02-19 14:27 +0100
Re: python implementation of a new integer encoding algorithm. Jonas Wielicki <jonas@wielicki.name> - 2015-02-19 09:38 +0100
Re: python implementation of a new integer encoding algorithm. janhein.vanderburg@gmail.com - 2015-02-19 07:58 -0800
Re: python implementation of a new integer encoding algorithm. Denis McMahon <denismfmcmahon@gmail.com> - 2015-02-20 02:46 +0000
Re: python implementation of a new integer encoding algorithm. wxjmfauth@gmail.com - 2015-02-20 00:58 -0800
csiph-web