Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #232 > unrolled thread
| Started by | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| First post | 2011-08-12 04:05 +0000 |
| Last post | 2011-08-15 09:39 +0200 |
| Articles | 8 — 6 participants |
Back to article view | Back to comp.compilers
optimizing glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-08-12 04:05 +0000
Re: optimizing anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2011-08-13 12:38 +0000
Re: optimizing Hans Aberg <haberg-news@telia.com> - 2011-08-13 18:12 +0200
Re: optimizing Volker Birk <bumens@dingens.org> - 2011-08-15 13:16 +0000
Re: optimizing Hans Aberg <haberg-news@telia.com> - 2011-08-15 18:55 +0200
Re: optimizing Walter Banks <walter@bytecraft.com> - 2011-08-14 06:29 -0400
Re: optimizing glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-08-15 18:03 +0000
Re: optimizing torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-08-15 09:39 +0200
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2011-08-12 04:05 +0000 |
| Subject | optimizing |
| Message-ID | <11-08-015@comp.compilers> |
A recent post to comp.lang.fortran on optimization reminded me of something I thought about some time ago. Someone was wondering if any optimization was done at link time. In the case of Fortran, the answer is usually no. It seems to me, though, that in the case of RISC, and even more in the case of VLIW processors like Itanium, delaying the final optimization and code generation pass would be useful. For processors which depend on the compiler to generate optimal code sequences (that is, ones that don't do out-of-order execution) and assuming that you want to allow for improved versions of the processor, delaying code generation would be very useful. It could be done at run time, or, more likely, at program install time. (The latter limits the ability to move a disk between systems, but that shouldn't be so much of a problem.) One possibility would be to supply the optimization and code generation either as an on-chip ROM, or a separate ROM that come with the processor. Also, the intermediate code format has to be standardized well enough to allow for it. Though this requires compiler designers and processor designers to talk to each other, which might not happen as often as it should. -- glen [This is pretty standard in the toolchains for embedded processors. I gather that the ARM compilers generate intermediate code, and all the optimization and code generation happens in the linker. -John]
[toc] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2011-08-13 12:38 +0000 |
| Message-ID | <11-08-018@comp.compilers> |
| In reply to | #232 |
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes: >A recent post to comp.lang.fortran on optimization reminded me of >something I thought about some time ago. Someone was wondering if any >optimization was done at link time. In the case of Fortran, the >answer is usually no. > >It seems to me, though, that in the case of RISC, and even more in the >case of VLIW processors like Itanium, delaying the final optimization >and code generation pass would be useful. I would expect that most Fortran compilers can do this, not because it's useful in practice, but because they want to shine at SPEC CPU benchmarketing (which contains Fortran programs). As for the usefulness in practice: Link-time optimization increases link time and therefore slows down development. Many projects use dynamically linked libraries even for modules specific to the project; one of the reasons it to avoid the slowness of static linking, and making that even slower is out of question (my impression is that these are usually C++ projects, though). A frequently encountered meme in discussions is: developing with using fast compilation options for development and slower compilation producing better code for production. However, many compilers these days take the liberty to produce code that behaves differently when changing optimization options (or, as slipped from one compiler maintainer, "miscompiling" programs with higher optimization options (normally the compiler maintainers use some euphemism instead)), most developers soon give up on that plan. The other reason is that the intended production version turns out to need another change and thus becomes another development version; or the reversal: the program is worked on until the deadline is here; there was never any time to compile slowly and test the result (and if there was, the test would fail because of the miscompilation thing mentioned above, so the developer would revert back to the development version). For supercomputing applications, I can imagine that the disadvantages of link-time optimizations weigh relatively less heavily, but I also imagine that the advantages are also smaller: They tend to spend a lot of time in inner loops, so the benefits of optimizing function call overhead are relatively small. >It could be done at run time, or, more likely, at program install >time. Both approaches have been tried: the "run-time" approach in JIT compilers and dynamic binary translators; the "program install time" approach in ANDF. JIT compilers are quite successful, ANDF never took off. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/ [ANDF had the usual UNCOL problems. See subsequent messages for more on successful whole program optimizers. -John]
[toc] | [prev] | [next] | [standalone]
| From | Hans Aberg <haberg-news@telia.com> |
|---|---|
| Date | 2011-08-13 18:12 +0200 |
| Message-ID | <11-08-019@comp.compilers> |
| In reply to | #232 |
On 2011/08/12 06:05, John Levine wrote: > On 2011/08/12 06:05, glen herrmannsfeldt wrote: >> ... Someone was wondering if any >> optimization was done at link time. > [This is pretty standard in the toolchains for embedded processors. I > gather that the ARM compilers generate intermediate code, and all the > optimization and code generation happens in the linker. -John] I think that part of the reason that Apple switched to an LLVM based compiler is that it can generate intermediate code that can be optimized depending on what GPU is present. Perhaps it makes a switch from Intel to ARM simpler. Hans
[toc] | [prev] | [next] | [standalone]
| From | Volker Birk <bumens@dingens.org> |
|---|---|
| Date | 2011-08-15 13:16 +0000 |
| Message-ID | <11-08-024@comp.compilers> |
| In reply to | #236 |
Hans Aberg <haberg-news@telia.com> wrote: > I think that part of the reason that Apple switched to an LLVM based > compiler is that it can generate intermediate code that can be optimized > depending on what GPU is present. Perhaps it makes a switch from Intel > to ARM simpler. I think that all reason that Apple switches to an LLVM based compiler is BSD license. Yours, VB.
[toc] | [prev] | [next] | [standalone]
| From | Hans Aberg <haberg-news@telia.com> |
|---|---|
| Date | 2011-08-15 18:55 +0200 |
| Message-ID | <11-08-025@comp.compilers> |
| In reply to | #241 |
On 2011/08/15 15:16, Volker Birk wrote: > Hans Aberg<haberg-news@telia.com> wrote: >> I think that part of the reason that Apple switched to an LLVM based >> compiler is that it can generate intermediate code that can be optimized >> depending on what GPU is present. Perhaps it makes a switch from Intel >> to ARM simpler. > > I think that all reason that Apple switches to an LLVM based compiler is > BSD license. There are some discussions of the technical issues here: http://en.wikipedia.org/wiki/Clang#Background http://gcc.gnu.org/ml/gcc/2005-11/msg00918.html Hans
[toc] | [prev] | [next] | [standalone]
| From | Walter Banks <walter@bytecraft.com> |
|---|---|
| Date | 2011-08-14 06:29 -0400 |
| Message-ID | <11-08-022@comp.compilers> |
| In reply to | #232 |
glen herrmannsfeldt wrote: > It seems to me, though, that in the case of RISC, and even more in the > case of VLIW processors like Itanium, delaying the final optimization > and code generation pass would be useful. ... > [This is pretty standard in the toolchains for embedded processors. I > gather that the ARM compilers generate intermediate code, and all the > optimization and code generation happens in the linker. -John] To add to John's comment. We have been writing and selling compilers for embedded systems for a long time. Since the early 90's we have been doing our code generation at link time. Embedded systems are unique to make this attractive. The application code is almost never hosted, fast small code is highly desired and they are compile once run often systems. Link time code generation offers many optimization possibilities. The biggest change is the mindset change to full application optimization. There are some downsides. Reused code and libraries have the potential to behave differently within some applications because each link could generate different sequences for the same source module. Regards -- Walter Banks Byte Craft Limited http://www.bytecraft.com
[toc] | [prev] | [next] | [standalone]
| From | glen herrmannsfeldt <gah@ugcs.caltech.edu> |
|---|---|
| Date | 2011-08-15 18:03 +0000 |
| Message-ID | <11-08-026@comp.compilers> |
| In reply to | #239 |
Walter Banks <walter@bytecraft.com> wrote: (snip, I wrote) >> It seems to me, though, that in the case of RISC, and even more in the >> case of VLIW processors like Itanium, delaying the final optimization >> and code generation pass would be useful. ... >> [This is pretty standard in the toolchains for embedded processors. I >> gather that the ARM compilers generate intermediate code, and all the >> optimization and code generation happens in the linker. -John] > To add to John's comment. We have been writing and selling compilers > for embedded systems for a long time. Since the early 90's we have > been doing our code generation at link time. > Embedded systems are unique to make this attractive. The application > code is almost never hosted, fast small code is highly desired and > they are compile once run often systems. In addition, embedded code is rarely expected to run on a newer generation processor without any code changes. Consider that a current generation z/OS machine can run programs compiled and linked for OS/360 over 40 years ago. Maybe not as well as if optimized for z/Architecture, but not so bad. Now, consider an Itanium-5 designed in 10 years (well, not likely), and designed to make optimal use of state-of-the-art technology. Even to execute the same instructions, the optimal packing of those instructions into words, or even the instruction word size, may change. The ability to overlap the execution of instructions depends very much on how they are ordered by the compiler. Freeing the processor designed from bit compatibility, while requiring the intermediate code to stay compatible, should allow for more optimal processor design. Maybe there won't be any new VLIW architectures. RISC isn't quite as dependent on the compiler as VLIW, but RISC does still depend on the compiler code generator. Stories are that Office for Mac 2004 won't run at all on OS X Lion, and that Office 2008 has some problems on Lion. Maybe the expectation that old programs will run one newer systems is passe. Do people now expect to by all new software when they buy a new machine? -- glen
[toc] | [prev] | [next] | [standalone]
| From | torbenm@diku.dk (Torben Ægidius Mogensen) |
|---|---|
| Date | 2011-08-15 09:39 +0200 |
| Message-ID | <11-08-023@comp.compilers> |
| In reply to | #232 |
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes: > A recent post to comp.lang.fortran on optimization reminded me of > something I thought about some time ago. Someone was wondering if any > optimization was done at link time. In the case of Fortran, the > answer is usually no. > > It seems to me, though, that in the case of RISC, and even more in the > case of VLIW processors like Itanium, delaying the final optimization > and code generation pass would be useful. A variant of this is whole-program compilation: Each module is, individually, just type-checked and, perhaps, simplified somewhat (expanding macros and syntactic sugar), but the real code generation does not happen until you combine the modules into a single executable. This allows, for example, inlining across modules and program-wide dataflow analysys (which can give much better results for, e.g, alias analysis). The downside is, of course, much longer compilation times. But if you have a sufficiently strong type system, many bugs will be caught already when you compile individual modules, so you don't compile the whole program as often. This is why this approach is more common for strongly-typed functional languages than for C-like languages. Torben
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web