Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2585 > unrolled thread
| Started by | "Rick C. Hodgin" <rick.c.hodgin@gmail.com> |
|---|---|
| First post | 2020-09-05 12:05 -0400 |
| Last post | 2020-09-10 17:13 -0400 |
| Articles | 12 — 9 participants |
Back to article view | Back to comp.compilers
Bit swizzling "Rick C. Hodgin" <rick.c.hodgin@gmail.com> - 2020-09-05 12:05 -0400
Re: Bit swizzling Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-09-05 19:38 +0200
Re: Bit Swizzling "John Levine" <johnl@taugh.com> - 2020-09-05 18:50 +0000
Re: Bit Swizzling Chris <xxx.syseng.yyy@gfsys.co.uk> - 2020-09-06 17:48 +0100
Re: Bit swizzling Kaz Kylheku <793-849-0957@kylheku.com> - 2020-09-05 20:15 +0000
Re: Bit swizzling "davidl...@gmail.com" <davidlovemore@gmail.com> - 2020-09-06 00:31 -0700
Re: Bit swizzling Martin Ward <martin@gkc.org.uk> - 2020-09-07 12:08 +0100
Re: Bit swizzling Tom Crick <tomcrick@gmail.com> - 2020-09-08 09:35 +0100
Re: Bit swizzling "Rick C. Hodgin" <rick.c.hodgin@gmail.com> - 2020-09-07 10:55 -0400
Re: Bit swizzling Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-09-08 01:45 +0200
Re: Bit swizzling gah4 <gah4@u.washington.edu> - 2020-09-10 10:34 -0700
Re: Bit swizzling "Rick C. Hodgin" <rick.c.hodgin@gmail.com> - 2020-09-10 17:13 -0400
| From | "Rick C. Hodgin" <rick.c.hodgin@gmail.com> |
|---|---|
| Date | 2020-09-05 12:05 -0400 |
| Subject | Bit swizzling |
| Message-ID | <20-09-014@comp.compilers> |
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?
For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:
Input: 07 06 05 04 03 02 01 00
Output: 05 04 07 02 01 03 00 06
I can easily swizzle the data using a brute force method:
v = get_value();
o = 0;
swizzle1(o, v, 0, 6);
swizzle1(o, v, 1, 0);
swizzle1(o, v, 2, 3);
swizzle1(o, v, 3, 1);
swizzle1(o, v, 4, 2);
swizzle1(o, v, 5, 7);
swizzle1(o, v, 6, 4);
swizzle1(o, v, 7, 5);
// Untested, off the top of my head
void swizzle(unsigned char& o, unsigned char v, int s, int d)
{
o |= (((v & (1 << s)) >> s) << d);
}
And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.
However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence. It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.
In addition, given the bit operator abilities that exist on various
CPUs there are potentially other combinations that exist behind an
operation, such as bitswap, where the order of bits flips or mirrors
across the center position.
Are there any existing algorithms which examine the operations that
must be conducted and then create an optimized / minimal sequence of
mechanical steps to conduct it given a constrained set of features
(such as those present on a given CPU)?
Thank you in advance.
--
Rick C. Hodgin
[toc] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-09-05 19:38 +0200 |
| Message-ID | <20-09-015@comp.compilers> |
| In reply to | #2585 |
Am 05.09.2020 um 18:05 schrieb Rick C. Hodgin: > Are there any algorithms which take a known-at-compile-time sequence > of bitwise operations on an 8-bit to 64-bit quantity, and optimize > them down to their minimal set of operations? > > For example, if I have an 8-bit byte and I want to swizzle the bits > thusly: > > Input: 07 06 05 04 03 02 01 00 > Output: 05 04 07 02 01 03 00 06 In your 8-bit case an array of outputs is sufficient, indexed by the input. > In addition, given the bit operator abilities that exist on various > CPUs there are potentially other combinations that exist behind an > operation, such as bitswap, where the order of bits flips or mirrors > across the center position. Somebody must build the tables for those operations, distinct for each CPU, then write code to make use of these tables. I don't think that it has been done yet, except perhaps for the basic operations (AND, OR...) > Are there any existing algorithms which examine the operations that > must be conducted and then create an optimized / minimal sequence of > mechanical steps to conduct it given a constrained set of features > (such as those present on a given CPU)? I often used Quine-McCluskey to minimize my logic circuits. In most applications a toggled bit in the outputs invalidates all prior computation optimization attempts, and everything has to be analyzed and constructed again. In general a hash code of the inputs could be used to index an array of outputs. That code is insensitive to changes of output bits, only the affected array element has to be uptdated then. DoDi
[toc] | [prev] | [next] | [standalone]
| From | "John Levine" <johnl@taugh.com> |
|---|---|
| Date | 2020-09-05 18:50 +0000 |
| Message-ID | <20-09-016@comp.compilers> |
| In reply to | #2585 |
In article <rivvah$1neg$2@gioia.aioe.org>, >> ----- >> Are there any algorithms which take a known-at-compile-time sequence >> of bitwise operations on an 8-bit to 64-bit quantity, and optimize >> them down to their minimal set of operations? >Why not just use a lookup table ?. Minimum ops and fast... Assuming you're looking for something you can implement in logic rather than by table lookup, it sounds like a set of Karnaugh maps. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
[toc] | [prev] | [next] | [standalone]
| From | Chris <xxx.syseng.yyy@gfsys.co.uk> |
|---|---|
| Date | 2020-09-06 17:48 +0100 |
| Message-ID | <20-09-019@comp.compilers> |
| In reply to | #2587 |
On 09/05/20 19:50, John Levine wrote: > In article<rivvah$1neg$2@gioia.aioe.org>, >>> ----- >>> Are there any algorithms which take a known-at-compile-time sequence >>> of bitwise operations on an 8-bit to 64-bit quantity, and optimize >>> them down to their minimal set of operations? > >> Why not just use a lookup table ?. Minimum ops and fast... > > Assuming you're looking for something you can implement in logic > rather than by table lookup, it sounds like a set of Karnaugh maps. Unless this is just an intellectual exercise for the fun of it, an engineer would choose the minimal design at lowest cost to satisfy the requirements. Table methods don't have to be in software as a single eprom or gate array could do it in hardware. 8 inputs to address lines, then 8 bits of output, scale as required, so why make life more difficult than necessary ?. Old project here, where a programmer spent nearly 5 pages of 6800 asm to translate an input connect pin layout to that required for the internal functions. Code was impenetrable, so substituted a 256 byte lookup table. Less space that the code and easily modified for new requirements... Chris [I think the question is whether there is a way to mechanically generate a version of the opaque assembler. -John]
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <793-849-0957@kylheku.com> |
|---|---|
| Date | 2020-09-05 20:15 +0000 |
| Message-ID | <20-09-017@comp.compilers> |
| In reply to | #2585 |
On 2020-09-05, Rick C. Hodgin <rick.c.hodgin@gmail.com> wrote: > Are there any existing algorithms which examine the operations that > must be conducted and then create an optimized / minimal sequence of > mechanical steps to conduct it given a constrained set of features > (such as those present on a given CPU)? For mapping 8 bit quantities to other 8 bit quantities, you can always use a 256 byte look up table. Of course, it's not practical for larger spaces. Still, it may be possibel to identify the possibility of involving multiple smaller look-up tables.
[toc] | [prev] | [next] | [standalone]
| From | "davidl...@gmail.com" <davidlovemore@gmail.com> |
|---|---|
| Date | 2020-09-06 00:31 -0700 |
| Message-ID | <20-09-018@comp.compilers> |
| In reply to | #2585 |
On Saturday, September 5, 2020 at 5:45:43 PM UTC+1, Rick C. Hodgin wrote:
> Are there any algorithms which take a known-at-compile-time sequence
> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
> them down to their minimal set of operations?
There's some good resources on bit permutations including this online calculator here:
http://programming.sirrida.de/calcperm.php
For:
> Input: 07 06 05 04 03 02 01 00
> Output: 05 04 07 02 01 03 00 06
It gives:
x = bit_permute_step_simple(x, 0x55555555, 1); // Bit index complement 0
x = bit_permute_step_simple(x, 0x33333333, 2); // Bit index complement 1
x = bit_permute_step_simple(x, 0x0f0f0f0f, 4); // Bit index complement 2
where
t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) {
return ((x & m) << shift) | ((x >> shift) & m);
}
Approx 12 cycles on superscalar processor.
Source code for a more sophisticated permutation code generator is also available linked from the page.
Some notes here on the online generator:
http://programming.sirrida.de/bit_perm.html#calculator
[toc] | [prev] | [next] | [standalone]
| From | Martin Ward <martin@gkc.org.uk> |
|---|---|
| Date | 2020-09-07 12:08 +0100 |
| Message-ID | <20-09-020@comp.compilers> |
| In reply to | #2585 |
On 05/09/2020 17:05, Rick C. Hodgin wrote: > Are there any existing algorithms which examine the operations that > must be conducted and then create an optimized / minimal sequence of > mechanical steps to conduct it given a constrained set of features > (such as those present on a given CPU)? The process you are describing is called "Superoptimization": finding the optimal code sequence for one loop-free sequence of instructions. https://en.wikipedia.org/wiki/Superoptimization The term superoptimization was first coined by Alexia Massalin in the 1987 paper "Superoptimizer: A Look at the Smallest Program": http://www.stanford.edu/class/cs343/resources/superoptimizer.pdf Back in 1980 there was a discussion in the magazine "Pratical Electronics" regarding the shortest sequence of instructions needed to reverse the bits in the accumulator for a 6800 system (an operation that is needed in some Fast Fourier Transform algorithms). One solution (given in June 1980, page 70) consisted of just two instructions and used no additional data. The instructions were: STAA PORTA write to output port LDAB PORTB read from input port with the output port being wired directly to the input port with the bits reversed! -- Martin Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4
[toc] | [prev] | [next] | [standalone]
| From | Tom Crick <tomcrick@gmail.com> |
|---|---|
| Date | 2020-09-08 09:35 +0100 |
| Message-ID | <20-09-023@comp.compilers> |
| In reply to | #2591 |
On Mon, 7 Sep 2020 at 18:57, Martin Ward <martin@gkc.org.uk> wrote: > > On 05/09/2020 17:05, Rick C. Hodgin wrote: > > Are there any existing algorithms which examine the operations that > > must be conducted and then create an optimized / minimal sequence of > > mechanical steps to conduct it given a constrained set of features > > (such as those present on a given CPU)? > > The process you are describing is called "Superoptimization": > finding the optimal code sequence for one loop-free sequence > of instructions. ... Back in the distant past (2009), I did my PhD on superoptimisation — provably optimal code generation using answer set programming: https://proftomcrick.com/2012/02/18/three-papers-on-superoptimisation/ Still an area with lots of potential (IMHO)... Best wishes, Tom
[toc] | [prev] | [next] | [standalone]
| From | "Rick C. Hodgin" <rick.c.hodgin@gmail.com> |
|---|---|
| Date | 2020-09-07 10:55 -0400 |
| Message-ID | <20-09-021@comp.compilers> |
| In reply to | #2585 |
Thank you, Martin. I saw a Google developer talk on C/C++ where they discussed generating optimum code for a given logic flow (as dictated by source code). I remember thinking, what if your base logic breakout isn't optimum? What if the way the developer was designing the algorithm wasn't optimum? What if the entire premise of what you're trying to do needed to be first broken down into fundamentals, and then re-assembled in a truly optimum form? There are cases where, if you know specific input ranges, you could sub- stitute the algorithm with something completely unlike the actual function, yet yield the same results. One such idea is writing to a port, and reading back from another port, where the external thing does the processing for you as a co-processor or custom ASIC. The actual 6800 algorithm as dictated by the developer may have been in C code or other lagugae, and would've used Nn asm opcodes to conduct the same workload with even an optimizing compiler. But an intel- ligent compiler, one that could break down the operations to fundamentals, understand what's trying to be accomplished, and then re-engineer the logic to accomodate the abilities which exist (such as those extra read/ write ports), is what's really quired. In some algorithm cases, the nature of what you're trying to do may be handled fully by a completely unexpected series of instructions. I've long desired for my CAlive language to support that type of opti- mization. Walter Banks and I used to discuss the possibility, and had he lived we would've continued down that line. It goes along with another fundamental idea I have for processing that I would like to explore someday ... if there's time. I appreciate your response. -- Rick C. Hodgin On 9/7/20 7:08 AM, Martin Ward wrote: > On 05/09/2020 17:05, Rick C. Hodgin wrote: >> Are there any existing algorithms which examine the operations that >> must be conducted and then create an optimized / minimal sequence of >> mechanical steps to conduct it given a constrained set of features >> (such as those present on a given CPU)? > > The process you are describing is called "Superoptimization": > finding the optimal code sequence for one loop-free sequence > of instructions. > > https://en.wikipedia.org/wiki/Superoptimization > > The term superoptimization was first coined by Alexia Massalin in the 1987 paper "Superoptimizer: A Look at the Smallest Program": > > http://www.stanford.edu/class/cs343/resources/superoptimizer.pdf > > Back in 1980 there was a discussion in the magazine > "Pratical Electronics" regarding the shortest sequence of > instructions needed to reverse the bits in the accumulator > for a 6800 system (an operation that is needed in some Fast Fourier > Transform algorithms). > > One solution (given in June 1980, page 70) consisted of just > two instructions and used no additional data. > The instructions were: > > STAA PORTA write to output port > LDAB PORTB read from input port > > with the output port being wired directly to the input port with > the bits reversed!
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-09-08 01:45 +0200 |
| Message-ID | <20-09-022@comp.compilers> |
| In reply to | #2592 |
Am 07.09.2020 um 16:55 schrieb Rick C. Hodgin: > One such idea is writing to a port, and reading back from another port, > where the external thing does the processing for you as a co-processor or > custom ASIC. In a more general approach you connect a ROM between the output and input port, for any possible mapping of n-into-n bits. Nowadays a table in flash memory is better applicable. DoDi
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2020-09-10 10:34 -0700 |
| Message-ID | <20-09-026@comp.compilers> |
| In reply to | #2585 |
On Saturday, September 5, 2020 at 9:45:43 AM UTC-7, Rick C. Hodgin wrote: > Are there any algorithms which take a known-at-compile-time sequence > of bitwise operations on an 8-bit to 64-bit quantity, and optimize > them down to their minimal set of operations? > > For example, if I have an 8-bit byte and I want to swizzle the bits > thusly: > > Input: 07 06 05 04 03 02 01 00 > Output: 05 04 07 02 01 03 00 06 There is a lot of work, and many algorithms, for logic optimization, or minimization, that, for example, will find the optimal combination of NAND and NOR gates to evaluate some logical operation. This is used to compile Verilog or VHDL into either FPGA bit files or ASIC masks. On the other hand, logic minimization tends to believe that you can wire anything to anything else. That is, bit swizzle is just wiring. The question you ask doesn't come up. (Later on, routing has to get all the wires to the appropriate place, but that is true in general, not just for this case.) Note that it isn't just adjacent bits, but bits with the same spacing in the input and output, such that you can mask with AND, shift, and combine with OR. I suspect that with the operations usually available: AND, OR, XOR, and shift, it wouldn't be hard to find an algorithm for the optimal case. If you add more operations, maybe allow also add and multiply, it gets interesting.
[toc] | [prev] | [next] | [standalone]
| From | "Rick C. Hodgin" <rick.c.hodgin@gmail.com> |
|---|---|
| Date | 2020-09-10 17:13 -0400 |
| Message-ID | <20-09-027@comp.compilers> |
| In reply to | #2597 |
On 9/10/20 1:34 PM, gah4 wrote: > On Saturday, September 5, 2020 at 9:45:43 AM UTC-7, Rick C. Hodgin wrote: >> Are there any algorithms which take a known-at-compile-time sequence >> of bitwise operations on an 8-bit to 64-bit quantity, and optimize >> them down to their minimal set of operations? ... > There is a lot of work, and many algorithms, for logic optimization, > or minimization, that, for example, will find the optimal combination > of NAND and NOR gates to evaluate some logical operation. ... I have a preliminary algorithm that works, but it's still not fully optimized and is a little clunky. I want a way to abstract the logic and work with it there. I haven't put in any additional time on this algorithm yet, due to some life things happening. But I plan to come back to it. -- Rick C. Hodgin
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web