Groups > comp.compilers > #3346 > unrolled thread

Are there different programming languages that are compiled to the same intermediate language?

Started by	Roger L Costello <costello@mitre.org>
First post	2023-01-29 15:49 +0000
Last post	2023-02-08 01:04 -0800
Articles	13 — 7 participants

Back to article view | Back to comp.compilers

#3346 — Are there different programming languages that are compiled to the same intermediate language?

From	Roger L Costello <costello@mitre.org>
Date	2023-01-29 15:49 +0000
Subject	Are there different programming languages that are compiled to the same intermediate language?
Message-ID	<23-01-078@comp.compilers>

Hi Folks,

Programming_Language_1 has a compiler which generates Intermediate_Code_1.
Many backends are created for Intermediate_Code_1.

Time passes ....

Sally Smith creates Programming_Language_2. Sally wants to leverage the
compiler work done on Programming_Language_1. She considers two approaches:

1. Create a compiler front-end for Programming_Language_2 which compiles
instances to Intermediate_Code_1.
2. Create a translator which converts instances of Programming_Language_2 into
Programming_Language_1 instances.

Sally is concerned. She asks herself:

- With either approach, how do I prove that my mapping is correct?
- For approach 1, how do I prove that the Intermediate_Code_1 that I generate
is correct for my programming language?
- For approach 2, how do I prove that the instance of Programming_Language_1
that I generate is semantically equivalent to my Programming_Language_2
instance?"

What is the answer to Sally's questions?

/Roger
[I think the answer either way is that you don't, and you try to have a test
suite that is as comprehensive as possible. -John]

[toc] | [next] | [standalone]

#3347

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2023-01-29 22:14 +0000
Message-ID	<23-01-080@comp.compilers>
In reply to	#3346

Roger L Costello <costello@mitre.org> schrieb:
> Hi Folks,
>
> Programming_Language_1 has a compiler which generates Intermediate_Code_1.
> Many backends are created for Intermediate_Code_1.
>
> Time passes ....
>
> Sally Smith creates Programming_Language_2. Sally wants to leverage the
> compiler work done on Programming_Language_1. She considers two approaches:
>
> 1. Create a compiler front-end for Programming_Language_2 which compiles
> instances to Intermediate_Code_1.

Compilers have been doing that for a very long time.  I remember
reading that the IBM PL.8 compiler used a target-independent
intermediate representation, and gcc and LLVM both adopt this.
Makes sense - most of the optimizations are target-indpendent.
Typically, SSA (single static assignment) is used these days
because it offers many optimization opportunites.

> 2. Create a translator which converts instances of Programming_Language_2 into
> Programming_Language_1 instances.

That is also an approach, which works if the Programming_Language_2
is sufficiently low-level, and yet sufficiently flexible, to allow
a reasonable representation.  In practice, this usually means C,
where things like pointer arithmetic can be used to implement
multi-dimensional arrays.

Fortran is a prime example.  Both the well-known f2c translator, based
on the Bell Labs Fortran 77 compiler, and the NAG Fortran comppiler use
C as intermediate language.

> Sally is concerned. She asks herself:
>
> - With either approach, how do I prove that my mapping is correct?

Usually, she can't.

Compiler front ends are well-known to have bugs, and it would be
very hard to prove their correctness, especially if the bugs are
in semantics, which are usually not described in a formal way.

> - For approach 1, how do I prove that the Intermediate_Code_1 that I generate
> is correct for my programming language?

See above.

> - For approach 2, how do I prove that the instance of Programming_Language_1
> that I generate is semantically equivalent to my Programming_Language_2
> instance?"

Same problem.

[Target-independent intermediate forms work OK when the source
languages are semantically similar, e.g. C and Fortran, and range of
target architectures is limited, e.g., two's complement machines with
8-bit byte flat addressing. Every few years someone comes up with the
bright idea to do an all purpose intermediate language, which never
works. See UNCOL around 1960 for the first time that idea failed.

By the way, f2c is based on f77 which compiled Fortran into the intermediate
language the PDP-11 C compiler used. -John]

[toc] | [prev] | [next] | [standalone]

#3350

From	arnold@freefriends.org (Aharon Robbins)
Date	2023-01-30 08:03 +0000
Message-ID	<23-01-084@comp.compilers>
In reply to	#3347

Our esteemed moderator writes:

>By the way, f2c is based on f77 which compiled Fortran into the intermediate
>language the PDP-11 C compiler used. -John]

Just to be clear, it was the intermediate language of Johnson's PCC,
which was first targeted at the Interdata, then the PDP-11 and Vax,
not Dennis Ritchie's original PDP-11 compiler.

Arnold
[Good point. I wrote a competing F77 compiler INfort which did use the
Ritchie intermediate language. Someone else later modified it to use
pcc. -John]

[toc] | [prev] | [next] | [standalone]

#3351

From	William Fahle <billfahle@gmail.com>
Date	2023-01-30 02:00 -0800
Message-ID	<23-01-086@comp.compilers>
In reply to	#3346

On Sunday, January 29, 2023 at 10:58:58 AM UTC-6, Roger L Costello wrote:
> Sally is concerned. She asks herself:
>
> - With either approach, how do I prove that my mapping is correct?
> - For approach 1, how do I prove that the Intermediate_Code_1 that I generate
> is correct for my programming language?
> - For approach 2, how do I prove that the instance of Programming_Language_1
> that I generate is semantically equivalent to my Programming_Language_2
> instance?"
>
> What is the answer to Sally's questions?
>
> /Roger
> [I think the answer either way is that you don't, and you try to have a test
> suite that is as comprehensive as possible. -John]

This is Post's Correspondence Problem, which has been proven to be
equivalent to the Halting Problem, thus uncomputable.
[Remember that while the halting problem is insoluble in general, it's
often soluble in specific cases, e.g. "halt" or "foo: goto foo".
Nevertheless, I would be quite surprised if anyone could prove a
non-toy translator correct. -John]

[toc] | [prev] | [next] | [standalone]

#3352

From	Roger L Costello <costello@mitre.org>
Date	2023-01-30 14:36 +0000
Message-ID	<23-01-087@comp.compilers>
In reply to	#3346

I wrote:

>> 2. Create a translator which converts instances of
>> Programming_Language_2 into
>> Programming_Language_1 instances.

Thomas Koenig responded:

> That is also an approach, which works if the
> Programming_Language_2 is sufficiently low-level

Is that a typo? Do you mean Programming_Language_1 is sufficiently low level?
Recall that the approach is to translate instances of Programming_Language_2
to instances of Programming_Language_1.

I will assume you meant Programming_Language_1. So you are saying that the
target language (Programming_Language_1) must be a lower level language than
the source language (Programming_Language_2), right?

John Levine wrote:

> Target-independent intermediate forms work OK
> when the source languages are semantically similar

How to determine if two source languages are semantically similar?

/Roger
[Not to snark too much, but you know when when you see it.  C and Fortran
are similar, COBOL and Lisp are not. -John]

[toc] | [prev] | [next] | [standalone]

#3353

From	Kaz Kylheku <864-117-4973@kylheku.com>
Date	2023-01-30 17:36 +0000
Message-ID	<23-01-090@comp.compilers>
In reply to	#3346

On 2023-01-29, Roger L Costello <costello@mitre.org> wrote:
> Hi Folks,
>
> Programming_Language_1 has a compiler which generates Intermediate_Code_1.
> Many backends are created for Intermediate_Code_1.
>
> Time passes ....
>
> Sally Smith creates Programming_Language_2. Sally wants to leverage the
> compiler work done on Programming_Language_1. She considers two approaches:
>
> 1. Create a compiler front-end for Programming_Language_2 which compiles
> instances to Intermediate_Code_1.
> 2. Create a translator which converts instances of Programming_Language_2 into
> Programming_Language_1 instances.
>
> Sally is concerned. She asks herself:
>
> - With either approach, how do I prove that my mapping is correct?
> - For approach 1, how do I prove that the Intermediate_Code_1 that I generate
> is correct for my programming language?
> - For approach 2, how do I prove that the instance of Programming_Language_1
> that I generate is semantically equivalent to my Programming_Language_2
> instance?"
>
> What is the answer to Sally's questions?

The answer is that proving a compiler correct is the same as proving
any other software correct. What can make it it easier than some
software is that there is no real-time, nondeterministic behavior. A
compiler should produce exactly the same result every time it is
invoked with the same inputs.

Also, compilers can be written using pure functional techniques, and
this is actually "easy" to do, compared to some other kinds of
tasks.  Recursion is natural for compiling because the input has a
nested, recursive structure, which the compiler follows.

Functional programs are easier to prove correct than imperative or
object-oriented programs which work by mutating the state in variables
and objects.

The compiler is expressed as a recursive function which handles all the
cases that occur (syntax shapes being translated).   You can then argue
that it is correct by induction: inspect that each case is performing a
correct transformation from source language to the intermediate language
for the construct which it is handling, and that it's correctly invoking
itself recursively for the sub-expressions that the input contains.
Also included must be an argument showing that the recursion terminates.

E.g. If you know that your compiler handles, say, an if/else statement
correctly and some kind of for loop also correctly, and it's all
clean recursion, you can confidently argue that if statements nested
in loops, and vice versa, or in each other, are also handled correctly.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[I have bad news for you -- there are real compilers that produce different
results on different runs, due to things like it being loaded at different
memory addreses and hash tables of pointers end up in different orders. -John]

[toc] | [prev] | [next] | [standalone]

#3355

From	gah4 <gah4@u.washington.edu>
Date	2023-01-31 22:59 -0800
Message-ID	<23-02-001@comp.compilers>
In reply to	#3346

On Sunday, January 29, 2023 at 8:58:58 AM UTC-8, Roger L Costello wrote:

(snip)

> Sally Smith creates Programming_Language_2. Sally wants to leverage the
> compiler work done on Programming_Language_1. She considers two approaches:

> 1. Create a compiler front-end for Programming_Language_2 which compiles
> instances to Intermediate_Code_1.

> 2. Create a translator which converts instances of Programming_Language_2 into
> Programming_Language_1 instances.

The complication with this question is the definition of intermediate language.

Some compilers have a front end, middle end, and back end, with some form of
intermediate code between each. They might be more or less general.

Depending on the compiler writer (or writers), there may be thought and time
to make them more general, allowing for other languages.

But okay, choice 2, especially if the language 1 has a standard, can be useful.
(I was almost going to shorten "Programming_Language_1" to PL-1, but
then realized that some might read that wrong.)

That allows it to be used with other compilers, even ones not written yet.
On the other hand, it is restricted by the features of that language, which
sometimes make things complicated.

In the case of choice 1, you might be able to add new features to the
intermediate language, especially if you can join with its project.

Usually, though, there are some features that don't translate to
language 1.  Or translate in an extremely complicated way.

Usually the translated language is unreadable to most people who
actually know the language.

Some examples.  TeX was written in Pascal.  Well, not really.
It was written in Web, a language and preprocessor that generates
Pascal code to be compiled.  At some point, it was desired to have
it in C instead.  A program was written to translate much of
Pascal into C, but some things were not so easy.  So instead,
the features of the Web preprocessor were used, to convert to
a Pascal-like language, easier to convert to C.  This was made
possible by the macro facilities of Web, and with its change
file feature.

Another one is f2c, which translates Fortran to C, but uses
special library routines and otherwise not so readable output.

Could we have a standard intermediate language, with all
features needed, or that ever will be needed?
[The answer to that last question is no.  It's a very old bad idea,
with the first failed attempt UNCOL in 1958. -John]

[toc] | [prev] | [next] | [standalone]

#3358

From	gah4 <gah4@u.washington.edu>
Date	2023-02-02 16:24 -0800
Message-ID	<23-02-007@comp.compilers>
In reply to	#3355

(snip, I wrote)

> Could we have a standard intermediate language, with all
> features needed, or that ever will be needed?

> [The answer to that last question is no. It's a very old bad idea,
> with the first failed attempt UNCOL in 1958. -John]

I have wondered, though, about a standardized intermediate
for a processor family.  One could write compilers to generate it,
and then updated processors come along with updated code
generators.   Or even distribute intermediate code, and the
installer generates the right version for the processor.

This would have been especially useful for Itanium, which
(mostly) failed due to problems with code generation.
Since the whole idea is that the processor depends on the
code generator doing things in the right order.  That is, out
of order execution, but determined at compile time. Failure
to do that meant failure for the whole idea.

[Someone comes up with an intermediate language that works for a few
source languages and a few targets, and usually publishes a paper
about his breakthrough. Then people try to add more front ends and
back ends, the intermediate language gets impossibly complex and
buggy, and the project is quietly forgotten. I'd think the back end
problem is a lot easier now than it was in 1958 since everything is
twos complement arithmetic in 8-bit bytes, but the front end is still
daunting.
If you don't push it too far it's certainly possible to do a lot of
work in a largely machine-independent way as LLVM does. -John]

[toc] | [prev] | [next] | [standalone]

#3360

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2023-02-03 11:44 +0000
Message-ID	<23-02-011@comp.compilers>
In reply to	#3358

gah4 <gah4@u.washington.edu> writes:
>I have wondered, though, about a standardized intermediate
>for a processor family.  One could write compilers to generate it,
>and then updated processors come along with updated code
>generators.   Or even distribute intermediate code, and the
>installer generates the right version for the processor.
>
>This would have been especially useful for Itanium, which
>(mostly) failed due to problems with code generation.

I dispute the latter claim.  My take is that IA-64 failed because the
original assumption that in-order performance would exceed OoO
performance was wrong.  OoO processors surpassed in-order CPUs; they
managed to get higher clock rates (my guess is that this is due to
them having smaller feedback loops) and they benefit from better
branch prediction, which extends to 512-instruction reorder buffers on
recent Intel CPUs, far beyond what compilers can achieve on IA-64.
The death knell for IA-64 competetiveness was the introduction of SIMD
instruction set extensions which made OoO CPUs surpass IA-64 even in
those vectorizable codes where IA-64 had been competetive.

>Since the whole idea is that the processor depends on the
>code generator doing things in the right order.  That is, out
>of order execution, but determined at compile time. Failure
>to do that meant failure for the whole idea.

But essentially all sold IA-64 CPUs were variations of the McKinley
microarchitecture as far as performance characteristics were
concerned, especially during the time when IA-64 was still perceived
as relevant.  The next microarchitecture Poulson was only released in
2012 when IA-64 had already lost.

>[Someone comes up with an intermediate language that works for a few
>source languages and a few targets, and usually publishes a paper
>about his breakthrough. Then people try to add more front ends and
>back ends, the intermediate language gets impossibly complex and
>buggy, and the project is quietly forgotten. I'd think the back end
>problem is a lot easier now than it was in 1958 since everything is
>twos complement arithmetic in 8-bit bytes

Yes.  Computer architecture converges.  20 years ago we still had to
worry about alignment and byte ordering, nowadays alignment has been
settled in general-purpose CPUs (no alignment restrictions), and byte
ordering is mostly settled to little-endian (exception s390/s390x).

>If you don't push it too far it's certainly possible to do a lot of
>work in a largely machine-independent way as LLVM does. -John]

LLVM mostly supports what the hardware supports, but there is at least
one exception: LLVM IR divides the code into functions, that you can
only call, with LLVM handling the calling convention.  When someone
needs to deviate from the calling convention, like the Haskel/C--
people, LLVM provides some possibilities, but from what I read, they
had a hard time pulling it through.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[I'm with you on IA-64 and VLIW.  I knew the Multiflow people pretty
well and it is evident in retrospect that while it was a good fit
with what you could build in the 1980s, hardware soon reached the
point where it could do better what VLIW compilers had done. -John]

[toc] | [prev] | [next] | [standalone]

#3363 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

From	gah4 <gah4@u.washington.edu>
Date	2023-02-03 19:13 -0800
Subject	Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?
Message-ID	<23-02-015@comp.compilers>
In reply to	#3360

On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:

(snip,  I wrote)

> >This would have been especially useful for Itanium, which
> >(mostly) failed due to problems with code generation.

> I dispute the latter claim. My take is that IA-64 failed because the
> original assumption that in-order performance would exceed OoO
> performance was wrong. OoO processors surpassed in-order CPUs; they
> managed to get higher clock rates (my guess is that this is due to
> them having smaller feedback loops) and they benefit from better
> branch prediction, which extends to 512-instruction reorder buffers on
> recent Intel CPUs, far beyond what compilers can achieve on IA-64.
> The death knell for IA-64 competetiveness was the introduction of SIMD
> instruction set extensions which made OoO CPUs surpass IA-64 even in
> those vectorizable codes where IA-64 had been competetive.

I got interested in OoO in the days of the IBM 360/91, which is early
in the line of OoO processors.  Among others, the 91 has imprecise
interrupts, where the stored address is not the instruction after the
interrupt cause.

But okay, the biggest failure of Itanium is that it was two years or
so behind schedule when it came out.  And partly, as well as I
remember, is the need to implement x86 instructions, too.

> >Since the whole idea is that the processor depends on the
> >code generator doing things in the right order. That is, out
> >of order execution, but determined at compile time. Failure
> >to do that meant failure for the whole idea.

> But essentially all sold IA-64 CPUs were variations of the McKinley
> microarchitecture as far as performance characteristics were
> concerned, especially during the time when IA-64 was still perceived
> as relevant. The next microarchitecture Poulson was only released in
> 2012 when IA-64 had already lost.

But is it the whole idea of compile-time instruction scheduling the
cause of the failure, or just the way they did it?

The 360/91 had some interesting problems.  One is that it had 16 way
interleaved memory with a cycle time of 13 processor cycles, and the
goal of one instruction per clock cycle.  That means it is dependent on
memory address ordering, which is hard to know at compile time.

The slightly later 360/85, without the fancy OoO processor, but with
cache memory (the first machine with cache!) was about as fast
on real problems.

Otherwise, the 91 has a limited number of reservation stations,
limiting how for OoO it can go.  All OoO processors have a limit
to how far they can go. But the compiler does not have that limit.

Now, since transistors are cheap now, and one can throw a large
number into reorder buffers and such, one can build really deep
pipelines.

But the reason for bringing this up, is that if Intel had a defined
intermediate code, and supplied the back end that used it,
and even more, could update that back end later, that would have
been very convenient for compiler writers.

Even more, design for it could have been done in parallel with the
processor, making both work well together.

Reminds me of the 8087 virtual register stack.  The 8087 has
eight registers, but they were supposed to be a virtual stack.
They would be copied to/from memory on stack overflow
or underflow.  But no-one tried to write the interrupt routine
until after the hardware was made, and it turns out not to
be possible.  I never knew why that wasn't fixed for the 287
or 387, though.
[Multiflow found that VLIW compile-time instruction scheduling was
swell for code with predictable memory access patterns, much less so
for code with data-dependent access patterns. I doubt that has
changed.  And if the memory access is that predictable, you can
likely use SIMD instructions instead. -John]

[toc] | [prev] | [next] | [standalone]

#3364 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

From	gah4 <gah4@u.washington.edu>
Date	2023-02-04 02:55 -0800
Subject	Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?
Message-ID	<23-02-017@comp.compilers>
In reply to	#3363

On Friday, February 3, 2023 at 7:51:41 PM UTC-8, gah4 wrote:

(snip on Itanium and OoO execution)

> The 360/91 had some interesting problems. One is that it had 16 way
> interleaved memory with a cycle time of 13 processor cycles, and the
> goal of one instruction per clock cycle. That means it is dependent on
> memory address ordering, which is hard to know at compile time.

(snip)

> But the reason for bringing this up, is that if Intel had a defined
> intermediate code, and supplied the back end that used it,
> and even more, could update that back end later, that would have
> been very convenient for compiler writers.

(snip, our moderator wrote)
> [Multiflow found that VLIW compile-time instruction scheduling was
> swell for code with predictable memory access patterns, much less so
> for code with data-dependent access patterns. I doubt that has
> changed. And if the memory access is that predictable, you can
> likely use SIMD instructions instead. -John]

So, 50 years ago we had the 360/91, and some CDC competition.

About 45 years ago, we had the Cray-1 and successor vector processors.

So, yes, SIMD processors.

Then 30 years ago, Fortran added the FORALL statement, to make writing
vector code easier.  Except that it doesn't.

FORALL, and array assignment statements, always operate such that
the effect is as if the whole right-hand side is evaluated before the left
side is changed.  Unless the whole thing fits in a vector register,
the compiler is still stuck.

And just about that time, vector processor go out of style.

(There is now DO CONCURRENT, which might be better.)

Much matrix code goes sequentially through memory, and programmers
know that.  (Or they are supposed to know that.)

Early Fortran had the FREQUENCY statement, such that one could
help the compiler know the likely iteration count for DO loops, and
branch probability for branch instructions.  Those would help with
static branch predictors, but are long gone now.

It would seem, though, that a statement telling the compiler the
expected memory access pattern would help.

And the Cray-1, like the 360/91, has 16-way interleaved
memory, 64 bits wide.
[Legend says that in one of the early Fortran compilers, the FREQUENCY
statement was accidentally implemented backward and nobody noticed. We
have found over and over that programmers' intuitions about the way
their programs perform are quite poor and we're much better off saying
what we mean and letting the compiler figure out how to do low level
optimizations. -John]

[toc] | [prev] | [next] | [standalone]

#3370 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2023-02-07 08:35 +0000
Subject	Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?
Message-ID	<23-02-027@comp.compilers>
In reply to	#3363

gah4 <gah4@u.washington.edu> writes:
>On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote:
>
>(snip,  I wrote)
>
>> >This would have been especially useful for Itanium, which
>> >(mostly) failed due to problems with code generation.
>
>> I dispute the latter claim. My take is that IA-64 failed because the
>> original assumption that in-order performance would exceed OoO
>> performance was wrong. OoO processors surpassed in-order CPUs; they
>> managed to get higher clock rates (my guess is that this is due to
>> them having smaller feedback loops) and they benefit from better
>> branch prediction, which extends to 512-instruction reorder buffers on
>> recent Intel CPUs, far beyond what compilers can achieve on IA-64.
>> The death knell for IA-64 competetiveness was the introduction of SIMD
>> instruction set extensions which made OoO CPUs surpass IA-64 even in
>> those vectorizable codes where IA-64 had been competetive.
...
>But okay, the biggest failure of Itanium is that it was two years or
>so behind schedule when it came out.

If it had had superior performance when McKinley came out in 2002,
that would have been just a minor speed bump on the way to success.

>And partly, as well as I
>remember, is the need to implement x86 instructions, too.

Certainly, backwards compatibility is paramount in the markets that it
was intended for.  And the 386 compatibility also had to be close to
competetive.  But it wasn't.

>But is it the whole idea of compile-time instruction scheduling the
>cause of the failure, or just the way they did it?

It seems to me that the idea that in-order execution combined with
architectural features for reducing dependencies and with compiler
scheduling turned out to be inferior to out-of-order execution.
Moreover, in those areas where the compiler works well (numerical
code), it also works ok for using SIMD instructions
(auto-vectorization) which could be added relatively cheaply to the
hardware.

>All OoO processors have a limit
>to how far they can go. But the compiler does not have that limit.

And the compiler can make more sophisticated scheduling decisions
based on the critical path, while the hardware scheduler picks the
oldest ready instruction.  These were the ideas that seduced Intel,
HP, and Transmeta to invest huge amounts of money into EPIC ideas.

But the compiler has other limits.  It cannot schedule across indirect
calls (used in object-oriented dispatch), across compilation unit
boundaries (in particular, calls to and returns from shared
libraries).  Another important limit is the predictability of
branches.  Static branch prediction using profiles has ~10%
mispredictions, while (hardware) dynamic branch prediction has a much
lower misprediction rate (I remember numbers like 3% (for the same
benchmarks that have 10% mispredictions with static branch prediction)
in papers from the last century; I expect that this has improved even
more in the meantime.  If the compiler mispredicts, it will schedule
instructions from the wrong path, instructions that will be useless
for execution.

In the end a compiler typically can schedule across a few dozen
instructions, while hardware can schedule across a few hundred.
Compiler scheduling works well for simple loops, and that's where
IA-64 shone, but only doing loops well is not good enough for
general-purpose software.

>Now, since transistors are cheap now, and one can throw a large
>number into reorder buffers and such, one can build really deep
>pipelines.

It's interesting that Intel managed to produce their first OoO CPU
(the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
first Itanium) at 25.4M transistors was too large for the 250nm and
they had to switch to 180nm (contributing to the delays).  So, while
the theory was that the EPIC principle would reduce the hardware
complexity (to allow adding more functional units for increased
performance), in Itanium practice the hardware was more complex, and
the performance advantages did not appear.

>But the reason for bringing this up, is that if Intel had a defined
>intermediate code, and supplied the back end that used it,
>and even more, could update that back end later, that would have
>been very convenient for compiler writers.

Something like this happened roughly at the same time with LLVM.
There were other initiatives, but LLVM was the one that succeeded.

There was the Open Research Compiler for IA-64 from Intel and the
Chinese Academy of Sciences.

SGI released their compiler targeted at IA-64 as Open64.

>Even more, design for it could have been done in parallel with the
>processor, making both work well together.

Intel, HP, and others worked on compilers in parallel to the hardware
work.  It's just that the result did not perform as well for
general-purpose code as OoO processors with conventional compilers.

>[Multiflow found that VLIW compile-time instruction scheduling was
>swell for code with predictable memory access patterns, much less so
>for code with data-dependent access patterns.

Multiflow and Cydrome built computers for numerical computations (aka
HPC aka (mini-)supercomputing).  These tend to spend a lot of time in
simple loops with high iteration counts and have statically
predictable branches.  They found compiler techniques like trace
scheduling (Multiflow) that work well for code with predictable
branches, and modulo schedling (Cydrome) that work well for simple
loops with high iteration counts.  The IA-64 and Transmeta architects
wanted to extend these successes to general-purpose computing, but it
did not work out.

Concerning memory access patterns: While they are not particularly
predictable in general-purpose code, mostd general-purpose code
benefits quite a lot from caches (more than numerical code), so I
don't think that this was a big problem for IA-64.

Some people mention varying latencies due to caches as a problem, but
the choice of a small L1 cache (16KB compared to 64KB for the earlier
21264 and K7 CPUs) for McKinley indicates that average latency was
more of a problem for IA-64 than varying latencies.

>And if the memory access is that predictable, you can
>likely use SIMD instructions instead. -John]

Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
programs where IA-64 outdid the mainstream.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#3372 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

From	gah4 <gah4@u.washington.edu>
Date	2023-02-08 01:04 -0800
Subject	Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?
Message-ID	<23-02-030@comp.compilers>
In reply to	#3370

On Tuesday, February 7, 2023 at 6:24:44 PM UTC-8, Anton Ertl wrote:

(snip)

> >All OoO processors have a limit
> >to how far they can go. But the compiler does not have that limit.

> And the compiler can make more sophisticated scheduling decisions
> based on the critical path, while the hardware scheduler picks the
> oldest ready instruction. These were the ideas that seduced Intel,
> HP, and Transmeta to invest huge amounts of money into EPIC ideas.

> But the compiler has other limits. It cannot schedule across indirect
> calls (used in object-oriented dispatch), across compilation unit
> boundaries (in particular, calls to and returns from shared
> libraries).

OK, maybe we are getting closer.
Many processors now have speculative execution.
If you don't know what else to do, execute some instructions
just in case, but don't change actual memory or registers yet.

And Itanium doesn't do that.  I don't see, though, that it isn't
possible to have EPIC and also speculative execution.


> Another important limit is the predictability of
> branches. Static branch prediction using profiles has ~10%
> mispredictions, while (hardware) dynamic branch prediction has a much
> lower misprediction rate (I remember numbers like 3% (for the same
> benchmarks that have 10% mispredictions with static branch prediction)
> in papers from the last century; I expect that this has improved even
> more in the meantime. If the compiler mispredicts, it will schedule
> instructions from the wrong path, instructions that will be useless
> for execution.

OK, but we just discussed speculative execution.  So you can execute
two paths or maybe three.


> In the end a compiler typically can schedule across a few dozen
> instructions, while hardware can schedule across a few hundred.
> Compiler scheduling works well for simple loops, and that's where
> IA-64 shone, but only doing loops well is not good enough for
> general-purpose software.

OK, loops were important for the 360/91, meant for floating point
number crunching.  (Only floating point is OoO.)
And small inner loops are usual in those programs.  Among those
is loop mode, where it keeps the instructions, and doesn't have
to keep fetching them.

But also, the 360/91 was designed to run code written for any
S/360 model. So, no special ordering and even self-modifying
code.  Even more, S/360 only has four floating point registers,
so register renaming was important.  Instructions had to be
ordered for use of registers, where out-of-order and register
renaming could fix that.

Okay, so I am not saying that EPIC has to get all the ordering,
but only close enough. So, ones now can keep hundreds of
instructions in execution at the same time?

> >Now, since transistors are cheap now, and one can throw a large
> >number into reorder buffers and such, one can build really deep
> >pipelines.

> It's interesting that Intel managed to produce their first OoO CPU
> (the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the
> first Itanium) at 25.4M transistors was too large for the 250nm and
> they had to switch to 180nm (contributing to the delays). So, while
> the theory was that the EPIC principle would reduce the hardware
> complexity (to allow adding more functional units for increased
> performance), in Itanium practice the hardware was more complex, and
> the performance advantages did not appear.

I remember lots of stories about how PPro didn't do well with the 16 bit
code still in much of DOS and Windows.  Enough that it was slower
than Pentium. I am not sure by now how much OO the PPro does.

> >But the reason for bringing this up, is that if Intel had a defined
> >intermediate code, and supplied the back end that used it,
> >and even more, could update that back end later, that would have
> >been very convenient for compiler writers.

> Something like this happened roughly at the same time with LLVM.
> There were other initiatives, but LLVM was the one that succeeded.

> There was the Open Research Compiler for IA-64 from Intel and the
> Chinese Academy of Sciences.

> SGI released their compiler targeted at IA-64 as Open64.

> >Even more, design for it could have been done in parallel with the
> >processor, making both work well together.

> Intel, HP, and others worked on compilers in parallel to the hardware
> work. It's just that the result did not perform as well for
> general-purpose code as OoO processors with conventional compilers.

> >[Multiflow found that VLIW compile-time instruction scheduling was
> >swell for code with predictable memory access patterns, much less so
> >for code with data-dependent access patterns.

> Multiflow and Cydrome built computers for numerical computations (aka
> HPC aka (mini-)supercomputing). These tend to spend a lot of time in
> simple loops with high iteration counts and have statically
> predictable branches. They found compiler techniques like trace
> scheduling (Multiflow) that work well for code with predictable
> branches, and modulo schedling (Cydrome) that work well for simple
> loops with high iteration counts. The IA-64 and Transmeta architects
> wanted to extend these successes to general-purpose computing, but it
> did not work out.

I sometimes work on computational physics problems. The ones with
tight loops of floating point operations.  I believe that some Itanium
systems went into a large cluster somewhere.  I got an RX2600
for $100 some years ago, when there were many on eBay.
(You could even buy three at a time.)

But yes, I suspect for web servers, that they aren't especially good.

> Concerning memory access patterns: While they are not particularly
> predictable in general-purpose code, mostd general-purpose code
> benefits quite a lot from caches (more than numerical code), so I
> don't think that this was a big problem for IA-64.

> Some people mention varying latencies due to caches as a problem, but
> the choice of a small L1 cache (16KB compared to 64KB for the earlier
> 21264 and K7 CPUs) for McKinley indicates that average latency was
> more of a problem for IA-64 than varying latencies.

> >And if the memory access is that predictable, you can
> >likely use SIMD instructions instead. -John]

> Yes, SIMD ate EPICs lunch on the numerical program side, leaving few
> programs where IA-64 outdid the mainstream.

The Cray series of vector processors seems long gone by now.

[toc] | [prev] | [standalone]

csiph-web

Are there different programming languages that are compiled to the same intermediate language?

Contents

#3346 — Are there different programming languages that are compiled to the same intermediate language?

#3347

#3350

#3351

#3352

#3353

#3355

#3358

#3360

#3363 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

#3364 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

#3370 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?

#3372 — Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language?