Groups > comp.compilers > #301 > unrolled thread

How to eliminate redundant constant move instructions

Started by	"Amker.Cheng" <amker.cheng@gmail.com>
First post	2011-10-31 17:53 +0800
Last post	2011-11-01 20:58 -0700
Articles	19 — 8 participants

Back to article view | Back to comp.compilers

  How to eliminate redundant constant move instructions "Amker.Cheng" <amker.cheng@gmail.com> - 2011-10-31 17:53 +0800
    Re: How to eliminate redundant constant move instructions Kaz Kylheku <kaz@kylheku.com> - 2011-10-31 17:08 +0000
      Re: How to eliminate redundant constant move instructions amker <can.finner@gmail.com> - 2011-11-01 19:01 -0700
      Re: How to eliminate redundant constant move instructions mac <acolvin@efunct.com> - 2011-11-03 02:20 +0000
    Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-01 14:32 -0400
      Re: How to eliminate redundant constant move instructions glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-11-01 22:35 +0000
        Re: How to eliminate redundant constant move instructions amker <can.finner@gmail.com> - 2011-11-01 19:35 -0700
        Re: How to eliminate redundant constant move instructions amker <amker.cheng@gmail.com> - 2011-11-01 21:04 -0700
        Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-02 12:38 -0400
          Re: How to eliminate redundant constant move instructions Kaz Kylheku <kaz@kylheku.com> - 2011-11-03 03:20 +0000
            Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-04 13:27 -0400
              Re: How to eliminate redundant constant move instructions glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-11-04 21:19 +0000
          Re: How to eliminate redundant constant move instructions glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-11-03 03:32 +0000
      Re: How to eliminate redundant constant move instructions amker <can.finner@gmail.com> - 2011-11-01 19:21 -0700
        Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-04 17:26 -0400
          Re: How to eliminate redundant constant move instructions amker <amker.cheng@gmail.com> - 2011-11-07 17:33 -0800
            Re: How to eliminate redundant constant move instructions Wei-Jen Chen <chenwj@cs.NCTU.edu.tw> - 2011-11-10 08:04 +0000
            Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-10 18:18 -0500
      Re: How to eliminate redundant constant move instructions amker <amker.cheng@gmail.com> - 2011-11-01 20:58 -0700

#301 — How to eliminate redundant constant move instructions

From	"Amker.Cheng" <amker.cheng@gmail.com>
Date	2011-10-31 17:53 +0800
Subject	How to eliminate redundant constant move instructions
Message-ID	<11-10-019@comp.compilers>

Hi,
I found following intermediate codes are generated in gcc

rx <- 0
...
use rx
...
ry <- 0
use ry
...

It's normally a result of const propagation.
After register allocation, It is likely rx/ry get allocated into
different hard registers.
Thus in final codes, there would be a redundant "move 0" instruction.

The story even stands for Os compiling, so the question is:
Is there any optimization technique dedicates to this kind of case?
Or is it normally handled by other optimizations as sub task?

Thanks very much.
--
Best Regards.

[toc] | [next] | [standalone]

#304

From	Kaz Kylheku <kaz@kylheku.com>
Date	2011-10-31 17:08 +0000
Message-ID	<11-11-002@comp.compilers>
In reply to	#301

On 2011-10-31, Amker.Cheng <amker.cheng@gmail.com> wrote:
> I found following intermediate codes are generated in gcc
>
> rx <- 0
> ...
> use rx
> ...
> ry <- 0
> use ry
> ...
>
> It's normally a result of const propagation.
> After register allocation, It is likely rx/ry get allocated into
> different hard registers.
> Thus in final codes, there would be a redundant "move 0" instruction.

Surely you mean, if they get allocated into the SAME register, there
will be a redundant initialization?

> The story even stands for Os compiling, so the question is:
> Is there any optimization technique dedicates to this kind of case?

It's a variant of ``common subexpression elimination''.  This should
be done early on. You have two different instances of 0 in the code,
which should be recognized as a common subexpression, sharing the same
intermediate code, and virtual registers.

[toc] | [prev] | [next] | [standalone]

#309

From	amker <can.finner@gmail.com>
Date	2011-11-01 19:01 -0700
Message-ID	<11-11-007@comp.compilers>
In reply to	#304

On Nov 1, 1:08 am, Kaz Kylheku <k...@kylheku.com> wrote:
> On 2011-10-31, Amker.Cheng <amker.ch...@gmail.com> wrote:
>
> > I found following intermediate codes are generated in gcc
>
> > rx <- 0
> > ...
> > use rx
> > ...
> > ry <- 0
> > use ry
> > ...
>
> > It's normally a result of const propagation.
> > After register allocation, It is likely rx/ry get allocated into
> > different hard registers.
> > Thus in final codes, there would be a redundant "move 0" instruction.
>
> Surely you mean, if they get allocated into the SAME register, there
> will be a redundant initialization?

Something like that. In register allocation, if there is no register
pressure issue, the program should be converted into

 rx <- 0
 ...
 use rx
 ...
 rx <- 0   <----which is redundant and can be removed
 use rx


> > The story even stands for Os compiling, so the question is:
> > Is there any optimization technique dedicates to this kind of case?
>
> It's a variant of ``common subexpression elimination''.  This should
> be done early on. You have two different instances of 0 in the code,
> which should be recognized as a common subexpression, sharing the same
> intermediate code, and virtual registers.

Yes, I just found gcc's cse pass can handle such cases in extended
basic block, but not globally.  Another question is, if we do it
before register allocation, the live range of rx would be extended and
might cause spill in register allocation.

Thanks

[toc] | [prev] | [next] | [standalone]

#319

From	mac <acolvin@efunct.com>
Date	2011-11-03 02:20 +0000
Message-ID	<11-11-017@comp.compilers>
In reply to	#304

>> I found following intermediate codes are generated in gcc
>>
>> rx <- 0
>> ...
>> use rx
>> ...
>> ry <- 0
>> use ry
>> ...

This may be "rematerialization", which probably has other names.

Sometimes it's better to re-compute a common value than to tie up a
register. This may well be true for a constant 0 if there are few registers
(x86-32).

[toc] | [prev] | [next] | [standalone]

#306

From	George Neuner <gneuner2@comcast.net>
Date	2011-11-01 14:32 -0400
Message-ID	<11-11-004@comp.compilers>
In reply to	#301

On Mon, 31 Oct 2011 17:53:46 +0800, "Amker.Cheng"
<amker.cheng@gmail.com> wrote:

>I found following intermediate codes are generated in gcc
>
>rx <- 0
>...
>use rx
>...
>ry <- 0
>use ry
>...
>
>It's normally a result of const propagation.
>After register allocation, It is likely rx/ry get allocated into
>different hard registers.
>Thus in final codes, there would be a redundant "move 0" instruction.
>
>The story even stands for Os compiling, so the question is:
>Is there any optimization technique dedicates to this kind of case?
>Or is it normally handled by other optimizations as sub task?

It's very hard to tell anything without more context - we need to know
what CPU, what compiler, and we need to see the surrounding code.

Because you mention "Os" I'm assuming you are using GCC. Incidentally,
that really should be written as "-Os" so people understand you mean
an option rather than thinking you're compiling an operation system
8-)

GCC doesn't really perform a separate constant propagation
optimization ... instead it generically tracks use of values to (try
to) minimize redundant loads.  There is a separate optimization,
-fcprop-register, which is a peephole pass that eliminates redundant
register moves (introduced by other optimizations), but that is
performed after register allocation.

That said:

You might be asking "if the value already is in a register, why not
just use it rather than load a second register?"  The answer to that
likely is a scheduling issue which depends on the use of the first
register.  You have to remember that many CPUs can execute multiple
instructions in parallel, and those parallel instruction streams may
be executed out of order with respect to a program listing.

On most CPUs loading an immediate constant is as cheap as a register
move.  Also, loading a constant ties up only the target register
whereas a move ties up both target and source registers.

So a lot more information is needed to say whether the compiler is
doing something dumb or doing something clever.

George

[toc] | [prev] | [next] | [standalone]

#307

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2011-11-01 22:35 +0000
Message-ID	<11-11-005@comp.compilers>
In reply to	#306

George Neuner <gneuner2@comcast.net> wrote:

(big snip on optimization and register loads)

> You might be asking "if the value already is in a register, why not
> just use it rather than load a second register?"  The answer to that
> likely is a scheduling issue which depends on the use of the first
> register.  You have to remember that many CPUs can execute multiple
> instructions in parallel, and those parallel instruction streams may
> be executed out of order with respect to a program listing.

That is what register renaming is for.  Usually using more than
the architecturally specified number of registers, the CPU
internally remaps the registers such that it can keep one value
in a register while an instruction is being executed out of order.

This is especially important for IA32, with so few registers,
and for S/360 and S/370 floating point, again with few registers.
(Sometime in ESA/390 it was increased to 16, as the instruction
bits were there.)

> On most CPUs loading an immediate constant is as cheap as a register
> move.  Also, loading a constant ties up only the target register
> whereas a move ties up both target and source registers.

Dynamic programming with the appropriate weights should choose
the optimal instruction sequence.  If register clear is faster
than load immediate it would be chosen.  Now, how to choose the
weights when one doesn't know the specific target processor?
That is a good question.

> So a lot more information is needed to say whether the compiler is
> doing something dumb or doing something clever.

-- glen

[toc] | [prev] | [next] | [standalone]

#310

From	amker <can.finner@gmail.com>
Date	2011-11-01 19:35 -0700
Message-ID	<11-11-008@comp.compilers>
In reply to	#307

On Nov 2, 6:35 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> George Neuner <gneun...@comcast.net> wrote:
>
> (big snip on optimization and register loads)
>

> That is what register renaming is for.  Usually using more than
> the architecturally specified number of registers, the CPU
> internally remaps the registers such that it can keep one value
> in a register while an instruction is being executed out of order.
>
> This is especially important for IA32, with so few registers,
> and for S/360 and S/370 floating point, again with few registers.

Well, for my original case, I mean optimize the codes

 rx <- 0
 ...
 use rx
 ...
 ry <- 0   <----which is redundant and can be removed
 use ry

into

 rx <- 0
 ...
 use rx
 ...
 use rx  <-----------------

In this case, the 2nd 'use rx' instruction introduces true dependency
on "rx <- 0",
So register renaming won't help here, right?

Thanks

[toc] | [prev] | [next] | [standalone]

#313

From	amker <amker.cheng@gmail.com>
Date	2011-11-01 21:04 -0700
Message-ID	<11-11-011@comp.compilers>
In reply to	#307

On Nov 2, 6:35 am, glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> That is what register renaming is for.  Usually using more than
> the architecturally specified number of registers, the CPU
> internally remaps the registers such that it can keep one value
> in a register while an instruction is being executed out of order.

For this specific case, I previous intention was optimizing codes

rx <- 0
...
use rx
...
ry <- 0
use ry

into

rx <- 0
...
use rx
...
use rx

In this manner, I guess the register renaming won't help,
since the transformation introduces true dependency, right?

> Now, how to choose the > weights when one doesn't know the specific
target processor?  > That is a good question.

How could this happen if middle end does not know the specific
processor? What I can image is there should be a way in which
back end provides information.

Thanks

[toc] | [prev] | [next] | [standalone]

#316

From	George Neuner <gneuner2@comcast.net>
Date	2011-11-02 12:38 -0400
Message-ID	<11-11-014@comp.compilers>
In reply to	#307

On Tue, 1 Nov 2011 22:35:04 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>George Neuner <gneuner2@comcast.net> wrote:
>
>(big snip on optimization and register loads)
>
>> You might be asking "if the value already is in a register, why not
>> just use it rather than load a second register?"  The answer to that
>> likely is a scheduling issue which depends on the use of the first
>> register.  You have to remember that many CPUs can execute multiple
>> instructions in parallel, and those parallel instruction streams may
>> be executed out of order with respect to a program listing.
>
>That is what register renaming is for.  Usually using more than
>the architecturally specified number of registers, the CPU
>internally remaps the registers such that it can keep one value
>in a register while an instruction is being executed out of order.

Yes.  But the compiler can't count on register renaming ... it can see
only the architectural named registers.  If the code in question had
copied Rx-> Ry then renaming would have been possible, but instead the
code performed a constant load to each register.  No possibility of
rename sharing there.

George

[toc] | [prev] | [next] | [standalone]

#320

From	Kaz Kylheku <kaz@kylheku.com>
Date	2011-11-03 03:20 +0000
Message-ID	<11-11-018@comp.compilers>
In reply to	#316

On 2011-11-02, George Neuner <gneuner2@comcast.net> wrote:
> Yes.  But the compiler can't count on register renaming ... it can see
> only the architectural named registers.  If the code in question had
> copied Rx-> Ry then renaming would have been possible, but instead the
> code performed a constant load to each register.  No possibility of
> rename sharing there.

If the register holds a value that is used, but not clobbered, then renaming is
moot. The two blocks of code both depend on rx being zero, and so nothing is
achieved by splitting the instruction set register into two physical registers.

[toc] | [prev] | [next] | [standalone]

#325

From	George Neuner <gneuner2@comcast.net>
Date	2011-11-04 13:27 -0400
Message-ID	<11-11-023@comp.compilers>
In reply to	#320

On Thu, 3 Nov 2011 03:20:32 +0000 (UTC), Kaz Kylheku <kaz@kylheku.com>
wrote:

>On 2011-11-02, George Neuner <gneuner2@comcast.net> wrote:
>> Yes.  But the compiler can't count on register renaming ... it can see
>> only the architectural named registers.  If the code in question had
>> copied Rx-> Ry then renaming would have been possible, but instead the
>> code performed a constant load to each register.  No possibility of
>> rename sharing there.
>
>If the register holds a value that is used, but not clobbered, then renaming is
>moot. The two blocks of code both depend on rx being zero, and so nothing is
>achieved by splitting the instruction set register into two physical registers.

Yes.  But the original question involve 2 registers both being loaded
with the same constant value.  The OP asked why use a 2nd register.  I
gave a couple of reasons why that might be, and then Glen brought up
renaming.

We haven't seen the actual code, but from the description it appears
that the compiler could have reused the original register but instead
chose to use a different one.  We also don't know what CPU, but
regardless, renaming only might have been done if the compiler had
used a copy instruction.  With the use of constant loads there was no
possibility of renaming being performed.

George

[toc] | [prev] | [next] | [standalone]

#326

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2011-11-04 21:19 +0000
Message-ID	<11-11-024@comp.compilers>
In reply to	#325

George Neuner <gneuner2@comcast.net> wrote:

(snip)
> Yes.  But the original question involve 2 registers both being loaded
> with the same constant value.  The OP asked why use a 2nd register.  I
> gave a couple of reasons why that might be, and then Glen brought up
> renaming.

I brought up register renaming when someone else brought up
out-of-order execution.

> We haven't seen the actual code, but from the description it appears
> that the compiler could have reused the original register but instead
> chose to use a different one.

There are many possible reasons.  For IA32, some registers
have special uses, such that not all can be reused.

> We also don't know what CPU, but regardless, renaming only might
> have been done if the compiler had used a copy instruction.

I do hope that it doesn't load an actual zero from memory.

Also, some zeros should be optimized away.  If, for example,
something was then added to the zero.

> With the use of constant loads there was no possibility of renaming
> being performed.

I don't know about others, but the 360/91 can rename loads
from the same address, or store followed by load.  Mostly that
was needed with the goal of running unmodified S/360 code, and
the small number of floating point registers.

Is there any machine without a one instruction way to zero
a register?

-- glen

[toc] | [prev] | [next] | [standalone]

#321

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2011-11-03 03:32 +0000
Message-ID	<11-11-019@comp.compilers>
In reply to	#316

George Neuner <gneuner2@comcast.net> wrote:

(snip, then I wrote)
>>That is what register renaming is for.  Usually using more than
>>the architecturally specified number of registers, the CPU
>>internally remaps the registers such that it can keep one value
>>in a register while an instruction is being executed out of order.

> Yes.  But the compiler can't count on register renaming ... it can see
> only the architectural named registers.

The comment came after a follow-up on out-of-order execution.

Yes, you can't count on out-of-order, or register renaming, but
for out-of-order to work you usually also need register renaming.

What usually happens, though, is that a dynamic programming algorithm
is used to select an optimal instruction sequence given the
appropriate weights (costs) for each instruction sequence.

On the other hand, if there is a tie then dynamic programming will
choose one, which may look less than optimal to a human.

There have been stories back to the first Fortran compiler on people
being surprised to see optimized generated code better than the
compiler writers might have written by hand.

I remember a friend being assigned to write optimal assembly language
code for a Fortran DO loop, and then I ran the loop through the
Fortran H compiler which generated one fewer instruction.  (It was a
very small loop, where it might have been four instead of five
instructions.)

If you are selling programs, you usually don't compile for the exact
processor, but some compromise.  For personal use, you might know the
exact processor, but even so, with modern processors and overlapped
(or out-of-order) it is hard to choose the optimal instruction
sequence.

> If the code in question had > copied Rx-> Ry then renaming would
have been possible, but instead the > code performed a constant load
to each register.  No possibility of > rename sharing there.

I know the IBM 360/91 would recognize loads from the same address, and
use register renaming to avoid the second load, or load following
store.  (That is, if the first load or store was still in the
appropriate buffer.)  But then you shouldn't clear registers by
loading zero from memory.

-- glen

[toc] | [prev] | [next] | [standalone]

#311

From	amker <can.finner@gmail.com>
Date	2011-11-01 19:21 -0700
Message-ID	<11-11-009@comp.compilers>
In reply to	#306

On Nov 2, 2:32 am, George Neuner <gneun...@comcast.net> wrote:
> On Mon, 31 Oct 2011 17:53:46 +0800, "Amker.Cheng" <amker.ch...@gmail.com> wrote:
> >I found following intermediate codes are generated in gcc
>
> >rx <- 0
> >...
> >use rx
> >...
> >ry <- 0
> >use ry
> >...
>
> >It's normally a result of const propagation.
> >After register allocation, It is likely rx/ry get allocated into
> >different hard registers.
> >Thus in final codes, there would be a redundant "move 0" instruction.
>
> >The story even stands for Os compiling, so the question is:
> >Is there any optimization technique dedicates to this kind of case?
> >Or is it normally handled by other optimizations as sub task?
>
> It's very hard to tell anything without more context - we need to know
> what CPU, what compiler, and we need to see the surrounding code.
>
> Because you mention "Os" I'm assuming you are using GCC. Incidentally,
> that really should be written as "-Os" so people understand you mean
> an option rather than thinking you're compiling an operation system

Sorry for the misleading term, the test case showed at:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025
which is actually a reported gcc bug.

> GCC doesn't really perform a separate constant propagation
> optimization ... instead it generically tracks use of values to (try
> to) minimize redundant loads.  There is a separate optimization,
> -fcprop-register, which is a peephole pass that eliminates redundant
> register moves (introduced by other optimizations), but that is
> performed after register allocation.

Yes, I have noticed this pass. Seems it can solve the problem if I
can:
1, extend the pass in value numbering way, at least for const values.
2, extend the pass in global data analysis way.

> You might be asking "if the value already is in a register, why not
> just use it rather than load a second register?"  The answer to that
> likely is a scheduling issue which depends on the use of the first
> register.  You have to remember that many CPUs can execute multiple
> instructions in parallel, and those parallel instruction streams may
> be executed out of order with respect to a program listing.
>
> On most CPUs loading an immediate constant is as cheap as a register
> move.  Also, loading a constant ties up only the target register
> whereas a move ties up both target and source registers.

Yes, This is the point I missed. So even the codes are optimized into:
rx <- 0
...
use rx
...
use rx

It might be worse than original codes considering out-of-order and
parallel execution.
In this way, how should I know for sure which case is better?

> So a lot more information is needed to say whether the compiler is
> doing something dumb or doing something clever.

Thanks.

[toc] | [prev] | [next] | [standalone]

#327

From	George Neuner <gneuner2@comcast.net>
Date	2011-11-04 17:26 -0400
Message-ID	<11-11-025@comp.compilers>
In reply to	#311

On Tue, 1 Nov 2011 19:21:54 -0700 (PDT), amker <can.finner@gmail.com>
wrote:

>On Nov 2, 2:32 am, George Neuner <gneun...@comcast.net> wrote:
>> On Mon, 31 Oct 2011 17:53:46 +0800, "Amker.Cheng" <amker.ch...@gmail.com> wrote:
>> >I found following intermediate codes are generated in gcc
>>
>> >rx <- 0
>> >...
>> >use rx
>> >...
>> >ry <- 0
>> >use ry
>> >...
>>
>>
>... the test case showed at:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025
>which is actually a reported gcc bug.

Ok.  That bug was reported for ARM7 thumb code.  I'm not familiar with
ARM assembler (had to look up the instructions), but I do see that R0
already holds the proper value when R3 is loaded - which I presume is
what you are talking about.

But I don't necessarily agree that it is a bug.  It's definitely not
optimal code, but R0 and R3 represent different temporary variables,
so it's hard to see how any of value numbering, constant propagation
or register allocation is at fault.

>> GCC doesn't really perform a separate constant propagation
>> optimization ... instead it generically tracks use of values to (try
>> to) minimize redundant loads.  There is a separate optimization,
>> -fcprop-register, which is a peephole pass that eliminates redundant
>> register moves (introduced by other optimizations), but that is
>> performed after register allocation.
>
>Yes, I have noticed this pass. Seems it can solve the problem if I
>can:
>1, extend the pass in value numbering way, at least for const values.
>2, extend the pass in global data analysis way.

Yes.  But value numbering doesn't know or care what specific value is
in the register - it cares only whether the value in the register
might have been changed between read uses.

Variable value tracking is the related technique used for constant
propagation.  However what you are after is variable equivalence
testing - I don't think GCC even tries to do this.

>> You might be asking "if the value already is in a register, why not
>> just use it rather than load a second register?"  The answer to that
>> likely is a scheduling issue which depends on the use of the first
>> register.  You have to remember that many CPUs can execute multiple
>> instructions in parallel, and those parallel instruction streams may
>> be executed out of order with respect to a program listing.
>>
>> On most CPUs loading an immediate constant is as cheap as a register
>> move.  Also, loading a constant ties up only the target register
>> whereas a move ties up both target and source registers.
>
>Yes, This is the point I missed. So even if the codes are optimized into:
>
>rx <- 0
>...
>use rx
>...
>use rx
>
>It might be worse than original codes considering out-of-order and
>parallel execution.  In this way, how should I know for sure which case
>is better?

You can't know without trying it.  If the target CPU is known, the
compiler could simulate execution of the code sequence to try out
logically equivalent approaches.  But most CPU manufacturers don't
even publish instruction times any more - never mind providing enough
information to accurately simulate the chip (emulation is not
simulation).

George

[toc] | [prev] | [next] | [standalone]

#332

From	amker <amker.cheng@gmail.com>
Date	2011-11-07 17:33 -0800
Message-ID	<11-11-030@comp.compilers>
In reply to	#327

On Nov 5, 5:26 am, George Neuner <gneun...@comcast.net> wrote:
> On Tue, 1 Nov 2011 19:21:54 -0700 (PDT), amker <can.fin...@gmail.com>
> wrote:

> >Yes, I have noticed this pass. Seems it can solve the problem if I
> >can:
> >1, extend the pass in value numbering way, at least for const values.
> >2, extend the pass in global data analysis way.
>
> Yes.  But value numbering doesn't know or care what specific value is
> in the register - it cares only whether the value in the register
> might have been changed between read uses.
>
> Variable value tracking is the related technique used for constant
> propagation.  However what you are after is variable equivalence
> testing - I don't think GCC even tries to do this.

Yes, I guess it is the variable value tracking. But I think GCC does
do such work in some passes, at least cse.c. In this pass gcc records
the constant value if a quantity has a known constant value.  Then gcc
simplify expressions containing such constant value by calling
fold_rtx.

Unfortunately, cse only works on basis of extended basic block.
Nothing it can do for global cases as reported in mentioned bug.

Also, could you point me some papers of implementation of such
variable value tracking technique? I googled and found nothing
about it.

Thanks

[toc] | [prev] | [next] | [standalone]

#333

From	Wei-Jen Chen <chenwj@cs.NCTU.edu.tw>
Date	2011-11-10 08:04 +0000
Message-ID	<11-11-031@comp.compilers>
In reply to	#332

> Unfortunately, cse only works on basis of extended basic block.
> Nothing it can do for global cases as reported in mentioned bug.

  Can LTO help?

Regards,
chenwj
[That's gcc's link time optimizer -John]

[toc] | [prev] | [next] | [standalone]

#334

From	George Neuner <gneuner2@comcast.net>
Date	2011-11-10 18:18 -0500
Message-ID	<11-11-032@comp.compilers>
In reply to	#332

On Mon, 7 Nov 2011 17:33:29 -0800 (PST), amker <amker.cheng@gmail.com>
wrote:

>On Nov 5, 5:26 am, George Neuner <gneun...@comcast.net> wrote:
>
>> Variable value tracking is the related technique used for constant
>> propagation.  However what you are after is variable equivalence
>> testing - I don't think GCC even tries to do this.
>
>Yes, I guess it is the variable value tracking. But I think GCC does
>do such work in some passes, at least cse.c. In this pass gcc records
>the constant value if a quantity has a known constant value.

Common subexpression elimination would not help in the case in
question ... the temporaries containing the same values all belong to
different expressions.  There is nothing common to find.

>Unfortunately, cse only works on basis of extended basic block.
>Nothing it can do for global cases as reported in mentioned bug.

Yes.  But the code in question all was in a single function, so the
range of CSE was not the problem.

The only simple remedy I see - and it's not so simple - would be to
make a pass tracking register values by following constant load and
copy instructions.  You need to keep track of register update
locations so that in places where multiple registers contain the same
value, you can identify the "primary" copy - i.e. the register holding
the oldest copy of the value.  Then when you see a use of a younger
copy, you rewrite it to use the primary copy instead.

But you have to be careful - a local optimization can't make
assumptions about register contents on entry to a function.  You also
have to be careful with registers whose contents may change across
functions calls.

Ultimately, though, the value of such an optimization may be quite
limited.  If you successfully eliminate a particular use of a register
in some location, then you may also need to eliminate useless
saves/reload of that register across lower function calls.  You may be
able to avoid the useless saves/reloads in the first place by doing
the optimization on the RTL prior to register allocation ... but RTL
has an unlimited numbers of registers, so the bookkeeping is much
easier if you wait until after register allocation.  But then you need
to process the lower level functions as well (which may not be
possible if they are in a different compilation unit).

>Also, could you point me some papers of implementation of such
>variable value tracking technique? I googled and found nothing
>about it.
>
>Thanks

A lot of optimizations have fairly obvious implementations (at least a
brute force version) when you think about them and so it can be hard
to find papers that take you step by step.  Many discussions of
optimization simply assume you can figure out how to do whatever it is
from the description.

I've never seen any papers regarding this particular register level
optimization, nor have I seen it mentioned in books ... I don't recall
where it was that I learned about it.  However, a search on
"redundancy elimination" at http://www.codeopt.com/ turned up a few
interesting looking titles.

George

[toc] | [prev] | [next] | [standalone]

#312

From	amker <amker.cheng@gmail.com>
Date	2011-11-01 20:58 -0700
Message-ID	<11-11-010@comp.compilers>
In reply to	#306

On Nov 2, 2:32 am, George Neuner <gneuner2@comcast.net> wrote:
> It's very hard to tell anything without more context - we need to know
> what CPU, what compiler, and we need to see the surrounding code.

Sorry for the misleading message, the test case comes from a reported
gcc bug, at:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025


> -fcprop-register, which is a peephole pass that eliminates redundant
> register moves (introduced by other optimizations), but that is

Yes, I found that pass, and seems it can solve the problem if I:
1, extend the pass in a value numbering way, at least for const
values;
2, extend the pass in a global data analysis way;

> performed after register allocation.

After register allocation also brings advantages, like no register
pressure issue.

> You have to remember that many CPUs can execute multiple
> instructions in parallel, and those parallel instruction streams may
> be executed out of order with respect to a program listing.

This is What I have missed. But in this manner I will never know which
codes is better since the performance depends on scheduling and
out-of- ordering...  right?

Thanks for your explanation.

[toc] | [prev] | [standalone]

csiph-web

How to eliminate redundant constant move instructions

Contents

#301 — How to eliminate redundant constant move instructions

#304

#309

#319

#306

#307

#310

#313

#316

#320

#325

#326

#321

#311

#327

#332

#333

#334

#312