Groups > comp.compilers > #232 > unrolled thread

optimizing

Started by	glen herrmannsfeldt <gah@ugcs.caltech.edu>
First post	2011-08-12 04:05 +0000
Last post	2011-08-15 09:39 +0200
Articles	8 — 6 participants

Back to article view | Back to comp.compilers

  optimizing glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-08-12 04:05 +0000
    Re: optimizing anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2011-08-13 12:38 +0000
    Re: optimizing Hans Aberg <haberg-news@telia.com> - 2011-08-13 18:12 +0200
      Re: optimizing Volker Birk <bumens@dingens.org> - 2011-08-15 13:16 +0000
        Re: optimizing Hans Aberg <haberg-news@telia.com> - 2011-08-15 18:55 +0200
    Re: optimizing Walter Banks <walter@bytecraft.com> - 2011-08-14 06:29 -0400
      Re: optimizing glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-08-15 18:03 +0000
    Re: optimizing torbenm@diku.dk (Torben Ægidius Mogensen) - 2011-08-15 09:39 +0200

#232 — optimizing

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2011-08-12 04:05 +0000
Subject	optimizing
Message-ID	<11-08-015@comp.compilers>

A recent post to comp.lang.fortran on optimization reminded me of
something I thought about some time ago.  Someone was wondering if any
optimization was done at link time.  In the case of Fortran, the
answer is usually no.

It seems to me, though, that in the case of RISC, and even more in the
case of VLIW processors like Itanium, delaying the final optimization
and code generation pass would be useful.

For processors which depend on the compiler to generate optimal code
sequences (that is, ones that don't do out-of-order execution) and
assuming that you want to allow for improved versions of the
processor, delaying code generation would be very useful.

It could be done at run time, or, more likely, at program install
time.  (The latter limits the ability to move a disk between systems,
but that shouldn't be so much of a problem.)

One possibility would be to supply the optimization and code
generation either as an on-chip ROM, or a separate ROM that come
with the processor.  Also, the intermediate code format has to
be standardized well enough to allow for it.

Though this requires compiler designers and processor designers
to talk to each other, which might not happen as often as it should.

-- glen
[This is pretty standard in the toolchains for embedded processors.  I
gather that the ARM compilers generate intermediate code, and all the
optimization and code generation happens in the linker. -John]

[toc] | [next] | [standalone]

#235

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2011-08-13 12:38 +0000
Message-ID	<11-08-018@comp.compilers>
In reply to	#232

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
>A recent post to comp.lang.fortran on optimization reminded me of
>something I thought about some time ago.  Someone was wondering if any
>optimization was done at link time.  In the case of Fortran, the
>answer is usually no.
>
>It seems to me, though, that in the case of RISC, and even more in the
>case of VLIW processors like Itanium, delaying the final optimization
>and code generation pass would be useful.

I would expect that most Fortran compilers can do this, not because
it's useful in practice, but because they want to shine at SPEC CPU
benchmarketing (which contains Fortran programs).

As for the usefulness in practice: Link-time optimization increases
link time and therefore slows down development.  Many projects use
dynamically linked libraries even for modules specific to the project;
one of the reasons it to avoid the slowness of static linking, and
making that even slower is out of question (my impression is that
these are usually C++ projects, though).

A frequently encountered meme in discussions is: developing with using
fast compilation options for development and slower compilation
producing better code for production.  However, many compilers these
days take the liberty to produce code that behaves differently when
changing optimization options (or, as slipped from one compiler
maintainer, "miscompiling" programs with higher optimization options
(normally the compiler maintainers use some euphemism instead)), most
developers soon give up on that plan.  The other reason is that the
intended production version turns out to need another change and thus
becomes another development version; or the reversal: the program is
worked on until the deadline is here; there was never any time to
compile slowly and test the result (and if there was, the test would
fail because of the miscompilation thing mentioned above, so the
developer would revert back to the development version).

For supercomputing applications, I can imagine that the disadvantages
of link-time optimizations weigh relatively less heavily, but I also
imagine that the advantages are also smaller: They tend to spend a lot
of time in inner loops, so the benefits of optimizing function call
overhead are relatively small.

>It could be done at run time, or, more likely, at program install
>time.

Both approaches have been tried: the "run-time" approach in JIT
compilers and dynamic binary translators; the "program install time"
approach in ANDF.  JIT compilers are quite successful, ANDF never took
off.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[ANDF had the usual UNCOL problems. See subsequent messages for more
on successful whole program optimizers. -John]

[toc] | [prev] | [next] | [standalone]

#236

From	Hans Aberg <haberg-news@telia.com>
Date	2011-08-13 18:12 +0200
Message-ID	<11-08-019@comp.compilers>
In reply to	#232

On 2011/08/12 06:05, John Levine wrote:

> On 2011/08/12 06:05, glen herrmannsfeldt wrote:
>> ... Someone was wondering if any
>> optimization was done at link time.

> [This is pretty standard in the toolchains for embedded processors.  I
> gather that the ARM compilers generate intermediate code, and all the
> optimization and code generation happens in the linker. -John]

I think that part of the reason that Apple switched to an LLVM based
compiler is that it can generate intermediate code that can be optimized
depending on what GPU is present. Perhaps it makes a switch from Intel
to ARM simpler.

Hans

[toc] | [prev] | [next] | [standalone]

#241

From	Volker Birk <bumens@dingens.org>
Date	2011-08-15 13:16 +0000
Message-ID	<11-08-024@comp.compilers>
In reply to	#236

Hans Aberg <haberg-news@telia.com> wrote:
> I think that part of the reason that Apple switched to an LLVM based
> compiler is that it can generate intermediate code that can be optimized
> depending on what GPU is present. Perhaps it makes a switch from Intel
> to ARM simpler.

I think that all reason that Apple switches to an LLVM based compiler is
BSD license.

Yours,
VB.

[toc] | [prev] | [next] | [standalone]

#242

From	Hans Aberg <haberg-news@telia.com>
Date	2011-08-15 18:55 +0200
Message-ID	<11-08-025@comp.compilers>
In reply to	#241

On 2011/08/15 15:16, Volker Birk wrote:
> Hans Aberg<haberg-news@telia.com>  wrote:
>> I think that part of the reason that Apple switched to an LLVM based
>> compiler is that it can generate intermediate code that can be optimized
>> depending on what GPU is present. Perhaps it makes a switch from Intel
>> to ARM simpler.
>
> I think that all reason that Apple switches to an LLVM based compiler is
> BSD license.

There are some discussions of the technical issues here:
   http://en.wikipedia.org/wiki/Clang#Background
   http://gcc.gnu.org/ml/gcc/2005-11/msg00918.html

Hans

[toc] | [prev] | [next] | [standalone]

#239

From	Walter Banks <walter@bytecraft.com>
Date	2011-08-14 06:29 -0400
Message-ID	<11-08-022@comp.compilers>
In reply to	#232

glen herrmannsfeldt wrote:

> It seems to me, though, that in the case of RISC, and even more in the
> case of VLIW processors like Itanium, delaying the final optimization
> and code generation pass would be useful. ...

> [This is pretty standard in the toolchains for embedded processors.  I
> gather that the ARM compilers generate intermediate code, and all the
> optimization and code generation happens in the linker. -John]

To add to John's comment. We have been writing and selling compilers
for embedded systems for a long time. Since the early 90's we have
been doing our code generation at link time.

Embedded systems are unique to make this attractive. The application
code is almost never hosted, fast small code is highly desired and
they are compile once run often systems.

Link time code generation offers many optimization possibilities. The
biggest change is the mindset change to full application optimization.
There are some downsides. Reused code and libraries have the potential
to behave differently within some applications because each link could
generate different sequences for the same source module.

Regards

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

[toc] | [prev] | [next] | [standalone]

#243

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2011-08-15 18:03 +0000
Message-ID	<11-08-026@comp.compilers>
In reply to	#239

Walter Banks <walter@bytecraft.com> wrote:

(snip, I wrote)
>> It seems to me, though, that in the case of RISC, and even more in the
>> case of VLIW processors like Itanium, delaying the final optimization
>> and code generation pass would be useful. ...

>> [This is pretty standard in the toolchains for embedded processors.  I
>> gather that the ARM compilers generate intermediate code, and all the
>> optimization and code generation happens in the linker. -John]

> To add to John's comment. We have been writing and selling compilers
> for embedded systems for a long time. Since the early 90's we have
> been doing our code generation at link time.

> Embedded systems are unique to make this attractive. The application
> code is almost never hosted, fast small code is highly desired and
> they are compile once run often systems.

In addition, embedded code is rarely expected to run on a newer
generation processor without any code changes.

Consider that a current generation z/OS machine can run programs
compiled and linked for OS/360 over 40 years ago.  Maybe not as well
as if optimized for z/Architecture, but not so bad.

Now, consider an Itanium-5 designed in 10 years (well, not likely),
and designed to make optimal use of state-of-the-art technology.  Even
to execute the same instructions, the optimal packing of those
instructions into words, or even the instruction word size, may
change.  The ability to overlap the execution of instructions depends
very much on how they are ordered by the compiler.  Freeing the
processor designed from bit compatibility, while requiring the
intermediate code to stay compatible, should allow for more optimal
processor design.

Maybe there won't be any new VLIW architectures.  RISC isn't quite as
dependent on the compiler as VLIW, but RISC does still depend on the
compiler code generator.

Stories are that Office for Mac 2004 won't run at all on OS X Lion,
and that Office 2008 has some problems on Lion.  Maybe the expectation
that old programs will run one newer systems is passe.  Do people now
expect to by all new software when they buy a new machine?

-- glen

[toc] | [prev] | [next] | [standalone]

#240

From	torbenm@diku.dk (Torben Ægidius Mogensen)
Date	2011-08-15 09:39 +0200
Message-ID	<11-08-023@comp.compilers>
In reply to	#232

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

> A recent post to comp.lang.fortran on optimization reminded me of
> something I thought about some time ago.  Someone was wondering if any
> optimization was done at link time.  In the case of Fortran, the
> answer is usually no.
>
> It seems to me, though, that in the case of RISC, and even more in the
> case of VLIW processors like Itanium, delaying the final optimization
> and code generation pass would be useful.

A variant of this is whole-program compilation: Each module is,
individually, just type-checked and, perhaps, simplified somewhat
(expanding macros and syntactic sugar), but the real code generation
does not happen until you combine the modules into a single
executable.  This allows, for example, inlining across modules and
program-wide dataflow analysys (which can give much better results
for, e.g, alias analysis).

The downside is, of course, much longer compilation times.  But if you
have a sufficiently strong type system, many bugs will be caught
already when you compile individual modules, so you don't compile the
whole program as often.  This is why this approach is more common for
strongly-typed functional languages than for C-like languages.

	Torben

[toc] | [prev] | [standalone]

csiph-web

optimizing

Contents

#232 — optimizing

#235

#236

#241

#242

#239

#243

#240