Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #301 > unrolled thread
| Started by | "Amker.Cheng" <amker.cheng@gmail.com> |
|---|---|
| First post | 2011-10-31 17:53 +0800 |
| Last post | 2011-11-01 20:58 -0700 |
| Articles | 19 — 8 participants |
Back to article view | Back to comp.compilers
How to eliminate redundant constant move instructions "Amker.Cheng" <amker.cheng@gmail.com> - 2011-10-31 17:53 +0800
Re: How to eliminate redundant constant move instructions Kaz Kylheku <kaz@kylheku.com> - 2011-10-31 17:08 +0000
Re: How to eliminate redundant constant move instructions amker <can.finner@gmail.com> - 2011-11-01 19:01 -0700
Re: How to eliminate redundant constant move instructions mac <acolvin@efunct.com> - 2011-11-03 02:20 +0000
Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-01 14:32 -0400
Re: How to eliminate redundant constant move instructions glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-11-01 22:35 +0000
Re: How to eliminate redundant constant move instructions amker <can.finner@gmail.com> - 2011-11-01 19:35 -0700
Re: How to eliminate redundant constant move instructions amker <amker.cheng@gmail.com> - 2011-11-01 21:04 -0700
Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-02 12:38 -0400
Re: How to eliminate redundant constant move instructions Kaz Kylheku <kaz@kylheku.com> - 2011-11-03 03:20 +0000
Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-04 13:27 -0400
Re: How to eliminate redundant constant move instructions glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-11-04 21:19 +0000
Re: How to eliminate redundant constant move instructions glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-11-03 03:32 +0000
Re: How to eliminate redundant constant move instructions amker <can.finner@gmail.com> - 2011-11-01 19:21 -0700
Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-04 17:26 -0400
Re: How to eliminate redundant constant move instructions amker <amker.cheng@gmail.com> - 2011-11-07 17:33 -0800
Re: How to eliminate redundant constant move instructions Wei-Jen Chen <chenwj@cs.NCTU.edu.tw> - 2011-11-10 08:04 +0000
Re: How to eliminate redundant constant move instructions George Neuner <gneuner2@comcast.net> - 2011-11-10 18:18 -0500
Re: How to eliminate redundant constant move instructions amker <amker.cheng@gmail.com> - 2011-11-01 20:58 -0700
| From | "Amker.Cheng" <amker.cheng@gmail.com> |
|---|---|
| Date | 2011-10-31 17:53 +0800 |
| Subject | How to eliminate redundant constant move instructions |
| Message-ID | <11-10-019@comp.compilers> |
Hi, I found following intermediate codes are generated in gcc rx <- 0 ... use rx ... ry <- 0 use ry ... It's normally a result of const propagation. After register allocation, It is likely rx/ry get allocated into different hard registers. Thus in final codes, there would be a redundant "move 0" instruction. The story even stands for Os compiling, so the question is: Is there any optimization technique dedicates to this kind of case? Or is it normally handled by other optimizations as sub task? Thanks very much. -- Best Regards.
[toc] | [next] | [standalone]
| From | Kaz Kylheku <kaz@kylheku.com> |
|---|---|
| Date | 2011-10-31 17:08 +0000 |
| Message-ID | <11-11-002@comp.compilers> |
| In reply to | #301 |
On 2011-10-31, Amker.Cheng <amker.cheng@gmail.com> wrote: > I found following intermediate codes are generated in gcc > > rx <- 0 > ... > use rx > ... > ry <- 0 > use ry > ... > > It's normally a result of const propagation. > After register allocation, It is likely rx/ry get allocated into > different hard registers. > Thus in final codes, there would be a redundant "move 0" instruction. Surely you mean, if they get allocated into the SAME register, there will be a redundant initialization? > The story even stands for Os compiling, so the question is: > Is there any optimization technique dedicates to this kind of case? It's a variant of ``common subexpression elimination''. This should be done early on. You have two different instances of 0 in the code, which should be recognized as a common subexpression, sharing the same intermediate code, and virtual registers.
[toc] | [prev] | [next] | [standalone]
| From | amker <can.finner@gmail.com> |
|---|---|
| Date | 2011-11-01 19:01 -0700 |
| Message-ID | <11-11-007@comp.compilers> |
| In reply to | #304 |
On Nov 1, 1:08 am, Kaz Kylheku <k...@kylheku.com> wrote: > On 2011-10-31, Amker.Cheng <amker.ch...@gmail.com> wrote: > > > I found following intermediate codes are generated in gcc > > > rx <- 0 > > ... > > use rx > > ... > > ry <- 0 > > use ry > > ... > > > It's normally a result of const propagation. > > After register allocation, It is likely rx/ry get allocated into > > different hard registers. > > Thus in final codes, there would be a redundant "move 0" instruction. > > Surely you mean, if they get allocated into the SAME register, there > will be a redundant initialization? Something like that. In register allocation, if there is no register pressure issue, the program should be converted into rx <- 0 ... use rx ... rx <- 0 <----which is redundant and can be removed use rx > > The story even stands for Os compiling, so the question is: > > Is there any optimization technique dedicates to this kind of case? > > It's a variant of ``common subexpression elimination''. This should > be done early on. You have two different instances of 0 in the code, > which should be recognized as a common subexpression, sharing the same > intermediate code, and virtual registers. Yes, I just found gcc's cse pass can handle such cases in extended basic block, but not globally. Another question is, if we do it before register allocation, the live range of rx would be extended and might cause spill in register allocation. Thanks
[toc] | [prev] | [next] | [standalone]
| From | mac <acolvin@efunct.com> |
|---|---|
| Date | 2011-11-03 02:20 +0000 |
| Message-ID | <11-11-017@comp.compilers> |
| In reply to | #304 |
>> I found following intermediate codes are generated in gcc >> >> rx <- 0 >> ... >> use rx >> ... >> ry <- 0 >> use ry >> ... This may be "rematerialization", which probably has other names. Sometimes it's better to re-compute a common value than to tie up a register. This may well be true for a constant 0 if there are few registers (x86-32).
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-11-01 14:32 -0400 |
| Message-ID | <11-11-004@comp.compilers> |
| In reply to | #301 |
On Mon, 31 Oct 2011 17:53:46 +0800, "Amker.Cheng" <amker.cheng@gmail.com> wrote: >I found following intermediate codes are generated in gcc > >rx <- 0 >... >use rx >... >ry <- 0 >use ry >... > >It's normally a result of const propagation. >After register allocation, It is likely rx/ry get allocated into >different hard registers. >Thus in final codes, there would be a redundant "move 0" instruction. > >The story even stands for Os compiling, so the question is: >Is there any optimization technique dedicates to this kind of case? >Or is it normally handled by other optimizations as sub task? It's very hard to tell anything without more context - we need to know what CPU, what compiler, and we need to see the surrounding code. Because you mention "Os" I'm assuming you are using GCC. Incidentally, that really should be written as "-Os" so people understand you mean an option rather than thinking you're compiling an operation system 8-) GCC doesn't really perform a separate constant propagation optimization ... instead it generically tracks use of values to (try to) minimize redundant loads. There is a separate optimization, -fcprop-register, which is a peephole pass that eliminates redundant register moves (introduced by other optimizations), but that is performed after register allocation. That said: You might be asking "if the value already is in a register, why not just use it rather than load a second register?" The answer to that likely is a scheduling issue which depends on the use of the first register. You have to remember that many CPUs can execute multiple instructions in parallel, and those parallel instruction streams may be executed out of order with respect to a program listing. On most CPUs loading an immediate constant is as cheap as a register move. Also, loading a constant ties up only the target register whereas a move ties up both target and source registers. So a lot more information is needed to say whether the compiler is doing something dumb or doing something clever. George
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2011-11-01 22:35 +0000 |
| Message-ID | <11-11-005@comp.compilers> |
| In reply to | #306 |
George Neuner <gneuner2@comcast.net> wrote: (big snip on optimization and register loads) > You might be asking "if the value already is in a register, why not > just use it rather than load a second register?" The answer to that > likely is a scheduling issue which depends on the use of the first > register. You have to remember that many CPUs can execute multiple > instructions in parallel, and those parallel instruction streams may > be executed out of order with respect to a program listing. That is what register renaming is for. Usually using more than the architecturally specified number of registers, the CPU internally remaps the registers such that it can keep one value in a register while an instruction is being executed out of order. This is especially important for IA32, with so few registers, and for S/360 and S/370 floating point, again with few registers. (Sometime in ESA/390 it was increased to 16, as the instruction bits were there.) > On most CPUs loading an immediate constant is as cheap as a register > move. Also, loading a constant ties up only the target register > whereas a move ties up both target and source registers. Dynamic programming with the appropriate weights should choose the optimal instruction sequence. If register clear is faster than load immediate it would be chosen. Now, how to choose the weights when one doesn't know the specific target processor? That is a good question. > So a lot more information is needed to say whether the compiler is > doing something dumb or doing something clever. -- glen
[toc] | [prev] | [next] | [standalone]
| From | amker <can.finner@gmail.com> |
|---|---|
| Date | 2011-11-01 19:35 -0700 |
| Message-ID | <11-11-008@comp.compilers> |
| In reply to | #307 |
On Nov 2, 6:35 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote: > George Neuner <gneun...@comcast.net> wrote: > > (big snip on optimization and register loads) > > That is what register renaming is for. Usually using more than > the architecturally specified number of registers, the CPU > internally remaps the registers such that it can keep one value > in a register while an instruction is being executed out of order. > > This is especially important for IA32, with so few registers, > and for S/360 and S/370 floating point, again with few registers. Well, for my original case, I mean optimize the codes rx <- 0 ... use rx ... ry <- 0 <----which is redundant and can be removed use ry into rx <- 0 ... use rx ... use rx <----------------- In this case, the 2nd 'use rx' instruction introduces true dependency on "rx <- 0", So register renaming won't help here, right? Thanks
[toc] | [prev] | [next] | [standalone]
| From | amker <amker.cheng@gmail.com> |
|---|---|
| Date | 2011-11-01 21:04 -0700 |
| Message-ID | <11-11-011@comp.compilers> |
| In reply to | #307 |
On Nov 2, 6:35 am, glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: > That is what register renaming is for. Usually using more than > the architecturally specified number of registers, the CPU > internally remaps the registers such that it can keep one value > in a register while an instruction is being executed out of order. For this specific case, I previous intention was optimizing codes rx <- 0 ... use rx ... ry <- 0 use ry into rx <- 0 ... use rx ... use rx In this manner, I guess the register renaming won't help, since the transformation introduces true dependency, right? > Now, how to choose the > weights when one doesn't know the specific target processor? > That is a good question. How could this happen if middle end does not know the specific processor? What I can image is there should be a way in which back end provides information. Thanks
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-11-02 12:38 -0400 |
| Message-ID | <11-11-014@comp.compilers> |
| In reply to | #307 |
On Tue, 1 Nov 2011 22:35:04 +0000 (UTC), glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: >George Neuner <gneuner2@comcast.net> wrote: > >(big snip on optimization and register loads) > >> You might be asking "if the value already is in a register, why not >> just use it rather than load a second register?" The answer to that >> likely is a scheduling issue which depends on the use of the first >> register. You have to remember that many CPUs can execute multiple >> instructions in parallel, and those parallel instruction streams may >> be executed out of order with respect to a program listing. > >That is what register renaming is for. Usually using more than >the architecturally specified number of registers, the CPU >internally remaps the registers such that it can keep one value >in a register while an instruction is being executed out of order. Yes. But the compiler can't count on register renaming ... it can see only the architectural named registers. If the code in question had copied Rx-> Ry then renaming would have been possible, but instead the code performed a constant load to each register. No possibility of rename sharing there. George
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <kaz@kylheku.com> |
|---|---|
| Date | 2011-11-03 03:20 +0000 |
| Message-ID | <11-11-018@comp.compilers> |
| In reply to | #316 |
On 2011-11-02, George Neuner <gneuner2@comcast.net> wrote: > Yes. But the compiler can't count on register renaming ... it can see > only the architectural named registers. If the code in question had > copied Rx-> Ry then renaming would have been possible, but instead the > code performed a constant load to each register. No possibility of > rename sharing there. If the register holds a value that is used, but not clobbered, then renaming is moot. The two blocks of code both depend on rx being zero, and so nothing is achieved by splitting the instruction set register into two physical registers.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-11-04 13:27 -0400 |
| Message-ID | <11-11-023@comp.compilers> |
| In reply to | #320 |
On Thu, 3 Nov 2011 03:20:32 +0000 (UTC), Kaz Kylheku <kaz@kylheku.com> wrote: >On 2011-11-02, George Neuner <gneuner2@comcast.net> wrote: >> Yes. But the compiler can't count on register renaming ... it can see >> only the architectural named registers. If the code in question had >> copied Rx-> Ry then renaming would have been possible, but instead the >> code performed a constant load to each register. No possibility of >> rename sharing there. > >If the register holds a value that is used, but not clobbered, then renaming is >moot. The two blocks of code both depend on rx being zero, and so nothing is >achieved by splitting the instruction set register into two physical registers. Yes. But the original question involve 2 registers both being loaded with the same constant value. The OP asked why use a 2nd register. I gave a couple of reasons why that might be, and then Glen brought up renaming. We haven't seen the actual code, but from the description it appears that the compiler could have reused the original register but instead chose to use a different one. We also don't know what CPU, but regardless, renaming only might have been done if the compiler had used a copy instruction. With the use of constant loads there was no possibility of renaming being performed. George
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2011-11-04 21:19 +0000 |
| Message-ID | <11-11-024@comp.compilers> |
| In reply to | #325 |
George Neuner <gneuner2@comcast.net> wrote: (snip) > Yes. But the original question involve 2 registers both being loaded > with the same constant value. The OP asked why use a 2nd register. I > gave a couple of reasons why that might be, and then Glen brought up > renaming. I brought up register renaming when someone else brought up out-of-order execution. > We haven't seen the actual code, but from the description it appears > that the compiler could have reused the original register but instead > chose to use a different one. There are many possible reasons. For IA32, some registers have special uses, such that not all can be reused. > We also don't know what CPU, but regardless, renaming only might > have been done if the compiler had used a copy instruction. I do hope that it doesn't load an actual zero from memory. Also, some zeros should be optimized away. If, for example, something was then added to the zero. > With the use of constant loads there was no possibility of renaming > being performed. I don't know about others, but the 360/91 can rename loads from the same address, or store followed by load. Mostly that was needed with the goal of running unmodified S/360 code, and the small number of floating point registers. Is there any machine without a one instruction way to zero a register? -- glen
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2011-11-03 03:32 +0000 |
| Message-ID | <11-11-019@comp.compilers> |
| In reply to | #316 |
George Neuner <gneuner2@comcast.net> wrote: (snip, then I wrote) >>That is what register renaming is for. Usually using more than >>the architecturally specified number of registers, the CPU >>internally remaps the registers such that it can keep one value >>in a register while an instruction is being executed out of order. > Yes. But the compiler can't count on register renaming ... it can see > only the architectural named registers. The comment came after a follow-up on out-of-order execution. Yes, you can't count on out-of-order, or register renaming, but for out-of-order to work you usually also need register renaming. What usually happens, though, is that a dynamic programming algorithm is used to select an optimal instruction sequence given the appropriate weights (costs) for each instruction sequence. On the other hand, if there is a tie then dynamic programming will choose one, which may look less than optimal to a human. There have been stories back to the first Fortran compiler on people being surprised to see optimized generated code better than the compiler writers might have written by hand. I remember a friend being assigned to write optimal assembly language code for a Fortran DO loop, and then I ran the loop through the Fortran H compiler which generated one fewer instruction. (It was a very small loop, where it might have been four instead of five instructions.) If you are selling programs, you usually don't compile for the exact processor, but some compromise. For personal use, you might know the exact processor, but even so, with modern processors and overlapped (or out-of-order) it is hard to choose the optimal instruction sequence. > If the code in question had > copied Rx-> Ry then renaming would have been possible, but instead the > code performed a constant load to each register. No possibility of > rename sharing there. I know the IBM 360/91 would recognize loads from the same address, and use register renaming to avoid the second load, or load following store. (That is, if the first load or store was still in the appropriate buffer.) But then you shouldn't clear registers by loading zero from memory. -- glen
[toc] | [prev] | [next] | [standalone]
| From | amker <can.finner@gmail.com> |
|---|---|
| Date | 2011-11-01 19:21 -0700 |
| Message-ID | <11-11-009@comp.compilers> |
| In reply to | #306 |
On Nov 2, 2:32 am, George Neuner <gneun...@comcast.net> wrote: > On Mon, 31 Oct 2011 17:53:46 +0800, "Amker.Cheng" <amker.ch...@gmail.com> wrote: > >I found following intermediate codes are generated in gcc > > >rx <- 0 > >... > >use rx > >... > >ry <- 0 > >use ry > >... > > >It's normally a result of const propagation. > >After register allocation, It is likely rx/ry get allocated into > >different hard registers. > >Thus in final codes, there would be a redundant "move 0" instruction. > > >The story even stands for Os compiling, so the question is: > >Is there any optimization technique dedicates to this kind of case? > >Or is it normally handled by other optimizations as sub task? > > It's very hard to tell anything without more context - we need to know > what CPU, what compiler, and we need to see the surrounding code. > > Because you mention "Os" I'm assuming you are using GCC. Incidentally, > that really should be written as "-Os" so people understand you mean > an option rather than thinking you're compiling an operation system Sorry for the misleading term, the test case showed at: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025 which is actually a reported gcc bug. > GCC doesn't really perform a separate constant propagation > optimization ... instead it generically tracks use of values to (try > to) minimize redundant loads. There is a separate optimization, > -fcprop-register, which is a peephole pass that eliminates redundant > register moves (introduced by other optimizations), but that is > performed after register allocation. Yes, I have noticed this pass. Seems it can solve the problem if I can: 1, extend the pass in value numbering way, at least for const values. 2, extend the pass in global data analysis way. > You might be asking "if the value already is in a register, why not > just use it rather than load a second register?" The answer to that > likely is a scheduling issue which depends on the use of the first > register. You have to remember that many CPUs can execute multiple > instructions in parallel, and those parallel instruction streams may > be executed out of order with respect to a program listing. > > On most CPUs loading an immediate constant is as cheap as a register > move. Also, loading a constant ties up only the target register > whereas a move ties up both target and source registers. Yes, This is the point I missed. So even the codes are optimized into: rx <- 0 ... use rx ... use rx It might be worse than original codes considering out-of-order and parallel execution. In this way, how should I know for sure which case is better? > So a lot more information is needed to say whether the compiler is > doing something dumb or doing something clever. Thanks.
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-11-04 17:26 -0400 |
| Message-ID | <11-11-025@comp.compilers> |
| In reply to | #311 |
On Tue, 1 Nov 2011 19:21:54 -0700 (PDT), amker <can.finner@gmail.com> wrote: >On Nov 2, 2:32 am, George Neuner <gneun...@comcast.net> wrote: >> On Mon, 31 Oct 2011 17:53:46 +0800, "Amker.Cheng" <amker.ch...@gmail.com> wrote: >> >I found following intermediate codes are generated in gcc >> >> >rx <- 0 >> >... >> >use rx >> >... >> >ry <- 0 >> >use ry >> >... >> >> >... the test case showed at: >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025 >which is actually a reported gcc bug. Ok. That bug was reported for ARM7 thumb code. I'm not familiar with ARM assembler (had to look up the instructions), but I do see that R0 already holds the proper value when R3 is loaded - which I presume is what you are talking about. But I don't necessarily agree that it is a bug. It's definitely not optimal code, but R0 and R3 represent different temporary variables, so it's hard to see how any of value numbering, constant propagation or register allocation is at fault. >> GCC doesn't really perform a separate constant propagation >> optimization ... instead it generically tracks use of values to (try >> to) minimize redundant loads. There is a separate optimization, >> -fcprop-register, which is a peephole pass that eliminates redundant >> register moves (introduced by other optimizations), but that is >> performed after register allocation. > >Yes, I have noticed this pass. Seems it can solve the problem if I >can: >1, extend the pass in value numbering way, at least for const values. >2, extend the pass in global data analysis way. Yes. But value numbering doesn't know or care what specific value is in the register - it cares only whether the value in the register might have been changed between read uses. Variable value tracking is the related technique used for constant propagation. However what you are after is variable equivalence testing - I don't think GCC even tries to do this. >> You might be asking "if the value already is in a register, why not >> just use it rather than load a second register?" The answer to that >> likely is a scheduling issue which depends on the use of the first >> register. You have to remember that many CPUs can execute multiple >> instructions in parallel, and those parallel instruction streams may >> be executed out of order with respect to a program listing. >> >> On most CPUs loading an immediate constant is as cheap as a register >> move. Also, loading a constant ties up only the target register >> whereas a move ties up both target and source registers. > >Yes, This is the point I missed. So even if the codes are optimized into: > >rx <- 0 >... >use rx >... >use rx > >It might be worse than original codes considering out-of-order and >parallel execution. In this way, how should I know for sure which case >is better? You can't know without trying it. If the target CPU is known, the compiler could simulate execution of the code sequence to try out logically equivalent approaches. But most CPU manufacturers don't even publish instruction times any more - never mind providing enough information to accurately simulate the chip (emulation is not simulation). George
[toc] | [prev] | [next] | [standalone]
| From | amker <amker.cheng@gmail.com> |
|---|---|
| Date | 2011-11-07 17:33 -0800 |
| Message-ID | <11-11-030@comp.compilers> |
| In reply to | #327 |
On Nov 5, 5:26 am, George Neuner <gneun...@comcast.net> wrote: > On Tue, 1 Nov 2011 19:21:54 -0700 (PDT), amker <can.fin...@gmail.com> > wrote: > >Yes, I have noticed this pass. Seems it can solve the problem if I > >can: > >1, extend the pass in value numbering way, at least for const values. > >2, extend the pass in global data analysis way. > > Yes. But value numbering doesn't know or care what specific value is > in the register - it cares only whether the value in the register > might have been changed between read uses. > > Variable value tracking is the related technique used for constant > propagation. However what you are after is variable equivalence > testing - I don't think GCC even tries to do this. Yes, I guess it is the variable value tracking. But I think GCC does do such work in some passes, at least cse.c. In this pass gcc records the constant value if a quantity has a known constant value. Then gcc simplify expressions containing such constant value by calling fold_rtx. Unfortunately, cse only works on basis of extended basic block. Nothing it can do for global cases as reported in mentioned bug. Also, could you point me some papers of implementation of such variable value tracking technique? I googled and found nothing about it. Thanks
[toc] | [prev] | [next] | [standalone]
| From | Wei-Jen Chen <chenwj@cs.NCTU.edu.tw> |
|---|---|
| Date | 2011-11-10 08:04 +0000 |
| Message-ID | <11-11-031@comp.compilers> |
| In reply to | #332 |
> Unfortunately, cse only works on basis of extended basic block. > Nothing it can do for global cases as reported in mentioned bug. Can LTO help? Regards, chenwj [That's gcc's link time optimizer -John]
[toc] | [prev] | [next] | [standalone]
| From | George Neuner <gneuner2@comcast.net> |
|---|---|
| Date | 2011-11-10 18:18 -0500 |
| Message-ID | <11-11-032@comp.compilers> |
| In reply to | #332 |
On Mon, 7 Nov 2011 17:33:29 -0800 (PST), amker <amker.cheng@gmail.com> wrote: >On Nov 5, 5:26 am, George Neuner <gneun...@comcast.net> wrote: > >> Variable value tracking is the related technique used for constant >> propagation. However what you are after is variable equivalence >> testing - I don't think GCC even tries to do this. > >Yes, I guess it is the variable value tracking. But I think GCC does >do such work in some passes, at least cse.c. In this pass gcc records >the constant value if a quantity has a known constant value. Common subexpression elimination would not help in the case in question ... the temporaries containing the same values all belong to different expressions. There is nothing common to find. >Unfortunately, cse only works on basis of extended basic block. >Nothing it can do for global cases as reported in mentioned bug. Yes. But the code in question all was in a single function, so the range of CSE was not the problem. The only simple remedy I see - and it's not so simple - would be to make a pass tracking register values by following constant load and copy instructions. You need to keep track of register update locations so that in places where multiple registers contain the same value, you can identify the "primary" copy - i.e. the register holding the oldest copy of the value. Then when you see a use of a younger copy, you rewrite it to use the primary copy instead. But you have to be careful - a local optimization can't make assumptions about register contents on entry to a function. You also have to be careful with registers whose contents may change across functions calls. Ultimately, though, the value of such an optimization may be quite limited. If you successfully eliminate a particular use of a register in some location, then you may also need to eliminate useless saves/reload of that register across lower function calls. You may be able to avoid the useless saves/reloads in the first place by doing the optimization on the RTL prior to register allocation ... but RTL has an unlimited numbers of registers, so the bookkeeping is much easier if you wait until after register allocation. But then you need to process the lower level functions as well (which may not be possible if they are in a different compilation unit). >Also, could you point me some papers of implementation of such >variable value tracking technique? I googled and found nothing >about it. > >Thanks A lot of optimizations have fairly obvious implementations (at least a brute force version) when you think about them and so it can be hard to find papers that take you step by step. Many discussions of optimization simply assume you can figure out how to do whatever it is from the description. I've never seen any papers regarding this particular register level optimization, nor have I seen it mentioned in books ... I don't recall where it was that I learned about it. However, a search on "redundancy elimination" at http://www.codeopt.com/ turned up a few interesting looking titles. George
[toc] | [prev] | [next] | [standalone]
| From | amker <amker.cheng@gmail.com> |
|---|---|
| Date | 2011-11-01 20:58 -0700 |
| Message-ID | <11-11-010@comp.compilers> |
| In reply to | #306 |
On Nov 2, 2:32 am, George Neuner <gneuner2@comcast.net> wrote: > It's very hard to tell anything without more context - we need to know > what CPU, what compiler, and we need to see the surrounding code. Sorry for the misleading message, the test case comes from a reported gcc bug, at: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44025 > -fcprop-register, which is a peephole pass that eliminates redundant > register moves (introduced by other optimizations), but that is Yes, I found that pass, and seems it can solve the problem if I: 1, extend the pass in a value numbering way, at least for const values; 2, extend the pass in a global data analysis way; > performed after register allocation. After register allocation also brings advantages, like no register pressure issue. > You have to remember that many CPUs can execute multiple > instructions in parallel, and those parallel instruction streams may > be executed out of order with respect to a program listing. This is What I have missed. But in this manner I will never know which codes is better since the performance depends on scheduling and out-of- ordering... right? Thanks for your explanation.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web