Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3346 > unrolled thread
| Started by | Roger L Costello <costello@mitre.org> |
|---|---|
| First post | 2023-01-29 15:49 +0000 |
| Last post | 2023-02-08 01:04 -0800 |
| Articles | 13 — 7 participants |
Back to article view | Back to comp.compilers
Are there different programming languages that are compiled to the same intermediate language? Roger L Costello <costello@mitre.org> - 2023-01-29 15:49 +0000
Re: Are there different programming languages that are compiled to the same intermediate language? Thomas Koenig <tkoenig@netcologne.de> - 2023-01-29 22:14 +0000
Re: Are there different programming languages that are compiled to the same intermediate language? arnold@freefriends.org (Aharon Robbins) - 2023-01-30 08:03 +0000
Re: Are there different programming languages that are compiled to the same intermediate language? William Fahle <billfahle@gmail.com> - 2023-01-30 02:00 -0800
Re: Are there different programming languages that are compiled to the same intermediate language? Roger L Costello <costello@mitre.org> - 2023-01-30 14:36 +0000
Re: Are there different programming languages that are compiled to the same intermediate language? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-01-30 17:36 +0000
Re: Are there different programming languages that are compiled to the same intermediate language? gah4 <gah4@u.washington.edu> - 2023-01-31 22:59 -0800
Re: Are there different programming languages that are compiled to the same intermediate language? gah4 <gah4@u.washington.edu> - 2023-02-02 16:24 -0800
Re: Are there different programming languages that are compiled to the same intermediate language? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-02-03 11:44 +0000
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? gah4 <gah4@u.washington.edu> - 2023-02-03 19:13 -0800
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? gah4 <gah4@u.washington.edu> - 2023-02-04 02:55 -0800
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-02-07 08:35 +0000
Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? gah4 <gah4@u.washington.edu> - 2023-02-08 01:04 -0800
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Date | 2023-01-29 15:49 +0000 |
| Subject | Are there different programming languages that are compiled to the same intermediate language? |
| Message-ID | <23-01-078@comp.compilers> |
Hi Folks, Programming_Language_1 has a compiler which generates Intermediate_Code_1. Many backends are created for Intermediate_Code_1. Time passes .... Sally Smith creates Programming_Language_2. Sally wants to leverage the compiler work done on Programming_Language_1. She considers two approaches: 1. Create a compiler front-end for Programming_Language_2 which compiles instances to Intermediate_Code_1. 2. Create a translator which converts instances of Programming_Language_2 into Programming_Language_1 instances. Sally is concerned. She asks herself: - With either approach, how do I prove that my mapping is correct? - For approach 1, how do I prove that the Intermediate_Code_1 that I generate is correct for my programming language? - For approach 2, how do I prove that the instance of Programming_Language_1 that I generate is semantically equivalent to my Programming_Language_2 instance?" What is the answer to Sally's questions? /Roger [I think the answer either way is that you don't, and you try to have a test suite that is as comprehensive as possible. -John]
[toc] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2023-01-29 22:14 +0000 |
| Message-ID | <23-01-080@comp.compilers> |
| In reply to | #3346 |
Roger L Costello <costello@mitre.org> schrieb: > Hi Folks, > > Programming_Language_1 has a compiler which generates Intermediate_Code_1. > Many backends are created for Intermediate_Code_1. > > Time passes .... > > Sally Smith creates Programming_Language_2. Sally wants to leverage the > compiler work done on Programming_Language_1. She considers two approaches: > > 1. Create a compiler front-end for Programming_Language_2 which compiles > instances to Intermediate_Code_1. Compilers have been doing that for a very long time. I remember reading that the IBM PL.8 compiler used a target-independent intermediate representation, and gcc and LLVM both adopt this. Makes sense - most of the optimizations are target-indpendent. Typically, SSA (single static assignment) is used these days because it offers many optimization opportunites. > 2. Create a translator which converts instances of Programming_Language_2 into > Programming_Language_1 instances. That is also an approach, which works if the Programming_Language_2 is sufficiently low-level, and yet sufficiently flexible, to allow a reasonable representation. In practice, this usually means C, where things like pointer arithmetic can be used to implement multi-dimensional arrays. Fortran is a prime example. Both the well-known f2c translator, based on the Bell Labs Fortran 77 compiler, and the NAG Fortran comppiler use C as intermediate language. > Sally is concerned. She asks herself: > > - With either approach, how do I prove that my mapping is correct? Usually, she can't. Compiler front ends are well-known to have bugs, and it would be very hard to prove their correctness, especially if the bugs are in semantics, which are usually not described in a formal way. > - For approach 1, how do I prove that the Intermediate_Code_1 that I generate > is correct for my programming language? See above. > - For approach 2, how do I prove that the instance of Programming_Language_1 > that I generate is semantically equivalent to my Programming_Language_2 > instance?" Same problem. [Target-independent intermediate forms work OK when the source languages are semantically similar, e.g. C and Fortran, and range of target architectures is limited, e.g., two's complement machines with 8-bit byte flat addressing. Every few years someone comes up with the bright idea to do an all purpose intermediate language, which never works. See UNCOL around 1960 for the first time that idea failed. By the way, f2c is based on f77 which compiled Fortran into the intermediate language the PDP-11 C compiler used. -John]
[toc] | [prev] | [next] | [standalone]
| From | arnold@freefriends.org (Aharon Robbins) |
|---|---|
| Date | 2023-01-30 08:03 +0000 |
| Message-ID | <23-01-084@comp.compilers> |
| In reply to | #3347 |
Our esteemed moderator writes: >By the way, f2c is based on f77 which compiled Fortran into the intermediate >language the PDP-11 C compiler used. -John] Just to be clear, it was the intermediate language of Johnson's PCC, which was first targeted at the Interdata, then the PDP-11 and Vax, not Dennis Ritchie's original PDP-11 compiler. Arnold [Good point. I wrote a competing F77 compiler INfort which did use the Ritchie intermediate language. Someone else later modified it to use pcc. -John]
[toc] | [prev] | [next] | [standalone]
| From | William Fahle <billfahle@gmail.com> |
|---|---|
| Date | 2023-01-30 02:00 -0800 |
| Message-ID | <23-01-086@comp.compilers> |
| In reply to | #3346 |
On Sunday, January 29, 2023 at 10:58:58 AM UTC-6, Roger L Costello wrote: > Sally is concerned. She asks herself: > > - With either approach, how do I prove that my mapping is correct? > - For approach 1, how do I prove that the Intermediate_Code_1 that I generate > is correct for my programming language? > - For approach 2, how do I prove that the instance of Programming_Language_1 > that I generate is semantically equivalent to my Programming_Language_2 > instance?" > > What is the answer to Sally's questions? > > /Roger > [I think the answer either way is that you don't, and you try to have a test > suite that is as comprehensive as possible. -John] This is Post's Correspondence Problem, which has been proven to be equivalent to the Halting Problem, thus uncomputable. [Remember that while the halting problem is insoluble in general, it's often soluble in specific cases, e.g. "halt" or "foo: goto foo". Nevertheless, I would be quite surprised if anyone could prove a non-toy translator correct. -John]
[toc] | [prev] | [next] | [standalone]
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Date | 2023-01-30 14:36 +0000 |
| Message-ID | <23-01-087@comp.compilers> |
| In reply to | #3346 |
I wrote: >> 2. Create a translator which converts instances of >> Programming_Language_2 into >> Programming_Language_1 instances. Thomas Koenig responded: > That is also an approach, which works if the > Programming_Language_2 is sufficiently low-level Is that a typo? Do you mean Programming_Language_1 is sufficiently low level? Recall that the approach is to translate instances of Programming_Language_2 to instances of Programming_Language_1. I will assume you meant Programming_Language_1. So you are saying that the target language (Programming_Language_1) must be a lower level language than the source language (Programming_Language_2), right? John Levine wrote: > Target-independent intermediate forms work OK > when the source languages are semantically similar How to determine if two source languages are semantically similar? /Roger [Not to snark too much, but you know when when you see it. C and Fortran are similar, COBOL and Lisp are not. -John]
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-01-30 17:36 +0000 |
| Message-ID | <23-01-090@comp.compilers> |
| In reply to | #3346 |
On 2023-01-29, Roger L Costello <costello@mitre.org> wrote: > Hi Folks, > > Programming_Language_1 has a compiler which generates Intermediate_Code_1. > Many backends are created for Intermediate_Code_1. > > Time passes .... > > Sally Smith creates Programming_Language_2. Sally wants to leverage the > compiler work done on Programming_Language_1. She considers two approaches: > > 1. Create a compiler front-end for Programming_Language_2 which compiles > instances to Intermediate_Code_1. > 2. Create a translator which converts instances of Programming_Language_2 into > Programming_Language_1 instances. > > Sally is concerned. She asks herself: > > - With either approach, how do I prove that my mapping is correct? > - For approach 1, how do I prove that the Intermediate_Code_1 that I generate > is correct for my programming language? > - For approach 2, how do I prove that the instance of Programming_Language_1 > that I generate is semantically equivalent to my Programming_Language_2 > instance?" > > What is the answer to Sally's questions? The answer is that proving a compiler correct is the same as proving any other software correct. What can make it it easier than some software is that there is no real-time, nondeterministic behavior. A compiler should produce exactly the same result every time it is invoked with the same inputs. Also, compilers can be written using pure functional techniques, and this is actually "easy" to do, compared to some other kinds of tasks. Recursion is natural for compiling because the input has a nested, recursive structure, which the compiler follows. Functional programs are easier to prove correct than imperative or object-oriented programs which work by mutating the state in variables and objects. The compiler is expressed as a recursive function which handles all the cases that occur (syntax shapes being translated). You can then argue that it is correct by induction: inspect that each case is performing a correct transformation from source language to the intermediate language for the construct which it is handling, and that it's correctly invoking itself recursively for the sub-expressions that the input contains. Also included must be an argument showing that the recursion terminates. E.g. If you know that your compiler handles, say, an if/else statement correctly and some kind of for loop also correctly, and it's all clean recursion, you can confidently argue that if statements nested in loops, and vice versa, or in each other, are also handled correctly. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca [I have bad news for you -- there are real compilers that produce different results on different runs, due to things like it being loaded at different memory addreses and hash tables of pointers end up in different orders. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-01-31 22:59 -0800 |
| Message-ID | <23-02-001@comp.compilers> |
| In reply to | #3346 |
On Sunday, January 29, 2023 at 8:58:58 AM UTC-8, Roger L Costello wrote: (snip) > Sally Smith creates Programming_Language_2. Sally wants to leverage the > compiler work done on Programming_Language_1. She considers two approaches: > 1. Create a compiler front-end for Programming_Language_2 which compiles > instances to Intermediate_Code_1. > 2. Create a translator which converts instances of Programming_Language_2 into > Programming_Language_1 instances. The complication with this question is the definition of intermediate language. Some compilers have a front end, middle end, and back end, with some form of intermediate code between each. They might be more or less general. Depending on the compiler writer (or writers), there may be thought and time to make them more general, allowing for other languages. But okay, choice 2, especially if the language 1 has a standard, can be useful. (I was almost going to shorten "Programming_Language_1" to PL-1, but then realized that some might read that wrong.) That allows it to be used with other compilers, even ones not written yet. On the other hand, it is restricted by the features of that language, which sometimes make things complicated. In the case of choice 1, you might be able to add new features to the intermediate language, especially if you can join with its project. Usually, though, there are some features that don't translate to language 1. Or translate in an extremely complicated way. Usually the translated language is unreadable to most people who actually know the language. Some examples. TeX was written in Pascal. Well, not really. It was written in Web, a language and preprocessor that generates Pascal code to be compiled. At some point, it was desired to have it in C instead. A program was written to translate much of Pascal into C, but some things were not so easy. So instead, the features of the Web preprocessor were used, to convert to a Pascal-like language, easier to convert to C. This was made possible by the macro facilities of Web, and with its change file feature. Another one is f2c, which translates Fortran to C, but uses special library routines and otherwise not so readable output. Could we have a standard intermediate language, with all features needed, or that ever will be needed? [The answer to that last question is no. It's a very old bad idea, with the first failed attempt UNCOL in 1958. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-02-02 16:24 -0800 |
| Message-ID | <23-02-007@comp.compilers> |
| In reply to | #3355 |
(snip, I wrote) > Could we have a standard intermediate language, with all > features needed, or that ever will be needed? > [The answer to that last question is no. It's a very old bad idea, > with the first failed attempt UNCOL in 1958. -John] I have wondered, though, about a standardized intermediate for a processor family. One could write compilers to generate it, and then updated processors come along with updated code generators. Or even distribute intermediate code, and the installer generates the right version for the processor. This would have been especially useful for Itanium, which (mostly) failed due to problems with code generation. Since the whole idea is that the processor depends on the code generator doing things in the right order. That is, out of order execution, but determined at compile time. Failure to do that meant failure for the whole idea. [Someone comes up with an intermediate language that works for a few source languages and a few targets, and usually publishes a paper about his breakthrough. Then people try to add more front ends and back ends, the intermediate language gets impossibly complex and buggy, and the project is quietly forgotten. I'd think the back end problem is a lot easier now than it was in 1958 since everything is twos complement arithmetic in 8-bit bytes, but the front end is still daunting. If you don't push it too far it's certainly possible to do a lot of work in a largely machine-independent way as LLVM does. -John]
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2023-02-03 11:44 +0000 |
| Message-ID | <23-02-011@comp.compilers> |
| In reply to | #3358 |
gah4 <gah4@u.washington.edu> writes: >I have wondered, though, about a standardized intermediate >for a processor family. One could write compilers to generate it, >and then updated processors come along with updated code >generators. Or even distribute intermediate code, and the >installer generates the right version for the processor. > >This would have been especially useful for Itanium, which >(mostly) failed due to problems with code generation. I dispute the latter claim. My take is that IA-64 failed because the original assumption that in-order performance would exceed OoO performance was wrong. OoO processors surpassed in-order CPUs; they managed to get higher clock rates (my guess is that this is due to them having smaller feedback loops) and they benefit from better branch prediction, which extends to 512-instruction reorder buffers on recent Intel CPUs, far beyond what compilers can achieve on IA-64. The death knell for IA-64 competetiveness was the introduction of SIMD instruction set extensions which made OoO CPUs surpass IA-64 even in those vectorizable codes where IA-64 had been competetive. >Since the whole idea is that the processor depends on the >code generator doing things in the right order. That is, out >of order execution, but determined at compile time. Failure >to do that meant failure for the whole idea. But essentially all sold IA-64 CPUs were variations of the McKinley microarchitecture as far as performance characteristics were concerned, especially during the time when IA-64 was still perceived as relevant. The next microarchitecture Poulson was only released in 2012 when IA-64 had already lost. >[Someone comes up with an intermediate language that works for a few >source languages and a few targets, and usually publishes a paper >about his breakthrough. Then people try to add more front ends and >back ends, the intermediate language gets impossibly complex and >buggy, and the project is quietly forgotten. I'd think the back end >problem is a lot easier now than it was in 1958 since everything is >twos complement arithmetic in 8-bit bytes Yes. Computer architecture converges. 20 years ago we still had to worry about alignment and byte ordering, nowadays alignment has been settled in general-purpose CPUs (no alignment restrictions), and byte ordering is mostly settled to little-endian (exception s390/s390x). >If you don't push it too far it's certainly possible to do a lot of >work in a largely machine-independent way as LLVM does. -John] LLVM mostly supports what the hardware supports, but there is at least one exception: LLVM IR divides the code into functions, that you can only call, with LLVM handling the calling convention. When someone needs to deviate from the calling convention, like the Haskel/C-- people, LLVM provides some possibilities, but from what I read, they had a hard time pulling it through. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/ [I'm with you on IA-64 and VLIW. I knew the Multiflow people pretty well and it is evident in retrospect that while it was a good fit with what you could build in the 1980s, hardware soon reached the point where it could do better what VLIW compilers had done. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-02-03 19:13 -0800 |
| Subject | Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? |
| Message-ID | <23-02-015@comp.compilers> |
| In reply to | #3360 |
On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote: (snip, I wrote) > >This would have been especially useful for Itanium, which > >(mostly) failed due to problems with code generation. > I dispute the latter claim. My take is that IA-64 failed because the > original assumption that in-order performance would exceed OoO > performance was wrong. OoO processors surpassed in-order CPUs; they > managed to get higher clock rates (my guess is that this is due to > them having smaller feedback loops) and they benefit from better > branch prediction, which extends to 512-instruction reorder buffers on > recent Intel CPUs, far beyond what compilers can achieve on IA-64. > The death knell for IA-64 competetiveness was the introduction of SIMD > instruction set extensions which made OoO CPUs surpass IA-64 even in > those vectorizable codes where IA-64 had been competetive. I got interested in OoO in the days of the IBM 360/91, which is early in the line of OoO processors. Among others, the 91 has imprecise interrupts, where the stored address is not the instruction after the interrupt cause. But okay, the biggest failure of Itanium is that it was two years or so behind schedule when it came out. And partly, as well as I remember, is the need to implement x86 instructions, too. > >Since the whole idea is that the processor depends on the > >code generator doing things in the right order. That is, out > >of order execution, but determined at compile time. Failure > >to do that meant failure for the whole idea. > But essentially all sold IA-64 CPUs were variations of the McKinley > microarchitecture as far as performance characteristics were > concerned, especially during the time when IA-64 was still perceived > as relevant. The next microarchitecture Poulson was only released in > 2012 when IA-64 had already lost. But is it the whole idea of compile-time instruction scheduling the cause of the failure, or just the way they did it? The 360/91 had some interesting problems. One is that it had 16 way interleaved memory with a cycle time of 13 processor cycles, and the goal of one instruction per clock cycle. That means it is dependent on memory address ordering, which is hard to know at compile time. The slightly later 360/85, without the fancy OoO processor, but with cache memory (the first machine with cache!) was about as fast on real problems. Otherwise, the 91 has a limited number of reservation stations, limiting how for OoO it can go. All OoO processors have a limit to how far they can go. But the compiler does not have that limit. Now, since transistors are cheap now, and one can throw a large number into reorder buffers and such, one can build really deep pipelines. But the reason for bringing this up, is that if Intel had a defined intermediate code, and supplied the back end that used it, and even more, could update that back end later, that would have been very convenient for compiler writers. Even more, design for it could have been done in parallel with the processor, making both work well together. Reminds me of the 8087 virtual register stack. The 8087 has eight registers, but they were supposed to be a virtual stack. They would be copied to/from memory on stack overflow or underflow. But no-one tried to write the interrupt routine until after the hardware was made, and it turns out not to be possible. I never knew why that wasn't fixed for the 287 or 387, though. [Multiflow found that VLIW compile-time instruction scheduling was swell for code with predictable memory access patterns, much less so for code with data-dependent access patterns. I doubt that has changed. And if the memory access is that predictable, you can likely use SIMD instructions instead. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-02-04 02:55 -0800 |
| Subject | Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? |
| Message-ID | <23-02-017@comp.compilers> |
| In reply to | #3363 |
On Friday, February 3, 2023 at 7:51:41 PM UTC-8, gah4 wrote: (snip on Itanium and OoO execution) > The 360/91 had some interesting problems. One is that it had 16 way > interleaved memory with a cycle time of 13 processor cycles, and the > goal of one instruction per clock cycle. That means it is dependent on > memory address ordering, which is hard to know at compile time. (snip) > But the reason for bringing this up, is that if Intel had a defined > intermediate code, and supplied the back end that used it, > and even more, could update that back end later, that would have > been very convenient for compiler writers. (snip, our moderator wrote) > [Multiflow found that VLIW compile-time instruction scheduling was > swell for code with predictable memory access patterns, much less so > for code with data-dependent access patterns. I doubt that has > changed. And if the memory access is that predictable, you can > likely use SIMD instructions instead. -John] So, 50 years ago we had the 360/91, and some CDC competition. About 45 years ago, we had the Cray-1 and successor vector processors. So, yes, SIMD processors. Then 30 years ago, Fortran added the FORALL statement, to make writing vector code easier. Except that it doesn't. FORALL, and array assignment statements, always operate such that the effect is as if the whole right-hand side is evaluated before the left side is changed. Unless the whole thing fits in a vector register, the compiler is still stuck. And just about that time, vector processor go out of style. (There is now DO CONCURRENT, which might be better.) Much matrix code goes sequentially through memory, and programmers know that. (Or they are supposed to know that.) Early Fortran had the FREQUENCY statement, such that one could help the compiler know the likely iteration count for DO loops, and branch probability for branch instructions. Those would help with static branch predictors, but are long gone now. It would seem, though, that a statement telling the compiler the expected memory access pattern would help. And the Cray-1, like the 360/91, has 16-way interleaved memory, 64 bits wide. [Legend says that in one of the early Fortran compilers, the FREQUENCY statement was accidentally implemented backward and nobody noticed. We have found over and over that programmers' intuitions about the way their programs perform are quite poor and we're much better off saying what we mean and letting the compiler figure out how to do low level optimizations. -John]
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2023-02-07 08:35 +0000 |
| Subject | Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? |
| Message-ID | <23-02-027@comp.compilers> |
| In reply to | #3363 |
gah4 <gah4@u.washington.edu> writes: >On Friday, February 3, 2023 at 10:17:06 AM UTC-8, Anton Ertl wrote: > >(snip, I wrote) > >> >This would have been especially useful for Itanium, which >> >(mostly) failed due to problems with code generation. > >> I dispute the latter claim. My take is that IA-64 failed because the >> original assumption that in-order performance would exceed OoO >> performance was wrong. OoO processors surpassed in-order CPUs; they >> managed to get higher clock rates (my guess is that this is due to >> them having smaller feedback loops) and they benefit from better >> branch prediction, which extends to 512-instruction reorder buffers on >> recent Intel CPUs, far beyond what compilers can achieve on IA-64. >> The death knell for IA-64 competetiveness was the introduction of SIMD >> instruction set extensions which made OoO CPUs surpass IA-64 even in >> those vectorizable codes where IA-64 had been competetive. ... >But okay, the biggest failure of Itanium is that it was two years or >so behind schedule when it came out. If it had had superior performance when McKinley came out in 2002, that would have been just a minor speed bump on the way to success. >And partly, as well as I >remember, is the need to implement x86 instructions, too. Certainly, backwards compatibility is paramount in the markets that it was intended for. And the 386 compatibility also had to be close to competetive. But it wasn't. >But is it the whole idea of compile-time instruction scheduling the >cause of the failure, or just the way they did it? It seems to me that the idea that in-order execution combined with architectural features for reducing dependencies and with compiler scheduling turned out to be inferior to out-of-order execution. Moreover, in those areas where the compiler works well (numerical code), it also works ok for using SIMD instructions (auto-vectorization) which could be added relatively cheaply to the hardware. >All OoO processors have a limit >to how far they can go. But the compiler does not have that limit. And the compiler can make more sophisticated scheduling decisions based on the critical path, while the hardware scheduler picks the oldest ready instruction. These were the ideas that seduced Intel, HP, and Transmeta to invest huge amounts of money into EPIC ideas. But the compiler has other limits. It cannot schedule across indirect calls (used in object-oriented dispatch), across compilation unit boundaries (in particular, calls to and returns from shared libraries). Another important limit is the predictability of branches. Static branch prediction using profiles has ~10% mispredictions, while (hardware) dynamic branch prediction has a much lower misprediction rate (I remember numbers like 3% (for the same benchmarks that have 10% mispredictions with static branch prediction) in papers from the last century; I expect that this has improved even more in the meantime. If the compiler mispredicts, it will schedule instructions from the wrong path, instructions that will be useless for execution. In the end a compiler typically can schedule across a few dozen instructions, while hardware can schedule across a few hundred. Compiler scheduling works well for simple loops, and that's where IA-64 shone, but only doing loops well is not good enough for general-purpose software. >Now, since transistors are cheap now, and one can throw a large >number into reorder buffers and such, one can build really deep >pipelines. It's interesting that Intel managed to produce their first OoO CPU (the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the first Itanium) at 25.4M transistors was too large for the 250nm and they had to switch to 180nm (contributing to the delays). So, while the theory was that the EPIC principle would reduce the hardware complexity (to allow adding more functional units for increased performance), in Itanium practice the hardware was more complex, and the performance advantages did not appear. >But the reason for bringing this up, is that if Intel had a defined >intermediate code, and supplied the back end that used it, >and even more, could update that back end later, that would have >been very convenient for compiler writers. Something like this happened roughly at the same time with LLVM. There were other initiatives, but LLVM was the one that succeeded. There was the Open Research Compiler for IA-64 from Intel and the Chinese Academy of Sciences. SGI released their compiler targeted at IA-64 as Open64. >Even more, design for it could have been done in parallel with the >processor, making both work well together. Intel, HP, and others worked on compilers in parallel to the hardware work. It's just that the result did not perform as well for general-purpose code as OoO processors with conventional compilers. >[Multiflow found that VLIW compile-time instruction scheduling was >swell for code with predictable memory access patterns, much less so >for code with data-dependent access patterns. Multiflow and Cydrome built computers for numerical computations (aka HPC aka (mini-)supercomputing). These tend to spend a lot of time in simple loops with high iteration counts and have statically predictable branches. They found compiler techniques like trace scheduling (Multiflow) that work well for code with predictable branches, and modulo schedling (Cydrome) that work well for simple loops with high iteration counts. The IA-64 and Transmeta architects wanted to extend these successes to general-purpose computing, but it did not work out. Concerning memory access patterns: While they are not particularly predictable in general-purpose code, mostd general-purpose code benefits quite a lot from caches (more than numerical code), so I don't think that this was a big problem for IA-64. Some people mention varying latencies due to caches as a problem, but the choice of a small L1 cache (16KB compared to 64KB for the earlier 21264 and K7 CPUs) for McKinley indicates that average latency was more of a problem for IA-64 than varying latencies. >And if the memory access is that predictable, you can >likely use SIMD instructions instead. -John] Yes, SIMD ate EPICs lunch on the numerical program side, leaving few programs where IA-64 outdid the mainstream. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2023-02-08 01:04 -0800 |
| Subject | Re: OoO, VLIW, Are there different programming languages that are compiled to the same intermediate language? |
| Message-ID | <23-02-030@comp.compilers> |
| In reply to | #3370 |
On Tuesday, February 7, 2023 at 6:24:44 PM UTC-8, Anton Ertl wrote: (snip) > >All OoO processors have a limit > >to how far they can go. But the compiler does not have that limit. > And the compiler can make more sophisticated scheduling decisions > based on the critical path, while the hardware scheduler picks the > oldest ready instruction. These were the ideas that seduced Intel, > HP, and Transmeta to invest huge amounts of money into EPIC ideas. > But the compiler has other limits. It cannot schedule across indirect > calls (used in object-oriented dispatch), across compilation unit > boundaries (in particular, calls to and returns from shared > libraries). OK, maybe we are getting closer. Many processors now have speculative execution. If you don't know what else to do, execute some instructions just in case, but don't change actual memory or registers yet. And Itanium doesn't do that. I don't see, though, that it isn't possible to have EPIC and also speculative execution. > Another important limit is the predictability of > branches. Static branch prediction using profiles has ~10% > mispredictions, while (hardware) dynamic branch prediction has a much > lower misprediction rate (I remember numbers like 3% (for the same > benchmarks that have 10% mispredictions with static branch prediction) > in papers from the last century; I expect that this has improved even > more in the meantime. If the compiler mispredicts, it will schedule > instructions from the wrong path, instructions that will be useless > for execution. OK, but we just discussed speculative execution. So you can execute two paths or maybe three. > In the end a compiler typically can schedule across a few dozen > instructions, while hardware can schedule across a few hundred. > Compiler scheduling works well for simple loops, and that's where > IA-64 shone, but only doing loops well is not good enough for > general-purpose software. OK, loops were important for the 360/91, meant for floating point number crunching. (Only floating point is OoO.) And small inner loops are usual in those programs. Among those is loop mode, where it keeps the instructions, and doesn't have to keep fetching them. But also, the 360/91 was designed to run code written for any S/360 model. So, no special ordering and even self-modifying code. Even more, S/360 only has four floating point registers, so register renaming was important. Instructions had to be ordered for use of registers, where out-of-order and register renaming could fix that. Okay, so I am not saying that EPIC has to get all the ordering, but only close enough. So, ones now can keep hundreds of instructions in execution at the same time? > >Now, since transistors are cheap now, and one can throw a large > >number into reorder buffers and such, one can build really deep > >pipelines. > It's interesting that Intel managed to produce their first OoO CPU > (the Pentium Pro with 5.5M transistors) in 350nm, while Merced (the > first Itanium) at 25.4M transistors was too large for the 250nm and > they had to switch to 180nm (contributing to the delays). So, while > the theory was that the EPIC principle would reduce the hardware > complexity (to allow adding more functional units for increased > performance), in Itanium practice the hardware was more complex, and > the performance advantages did not appear. I remember lots of stories about how PPro didn't do well with the 16 bit code still in much of DOS and Windows. Enough that it was slower than Pentium. I am not sure by now how much OO the PPro does. > >But the reason for bringing this up, is that if Intel had a defined > >intermediate code, and supplied the back end that used it, > >and even more, could update that back end later, that would have > >been very convenient for compiler writers. > Something like this happened roughly at the same time with LLVM. > There were other initiatives, but LLVM was the one that succeeded. > There was the Open Research Compiler for IA-64 from Intel and the > Chinese Academy of Sciences. > SGI released their compiler targeted at IA-64 as Open64. > >Even more, design for it could have been done in parallel with the > >processor, making both work well together. > Intel, HP, and others worked on compilers in parallel to the hardware > work. It's just that the result did not perform as well for > general-purpose code as OoO processors with conventional compilers. > >[Multiflow found that VLIW compile-time instruction scheduling was > >swell for code with predictable memory access patterns, much less so > >for code with data-dependent access patterns. > Multiflow and Cydrome built computers for numerical computations (aka > HPC aka (mini-)supercomputing). These tend to spend a lot of time in > simple loops with high iteration counts and have statically > predictable branches. They found compiler techniques like trace > scheduling (Multiflow) that work well for code with predictable > branches, and modulo schedling (Cydrome) that work well for simple > loops with high iteration counts. The IA-64 and Transmeta architects > wanted to extend these successes to general-purpose computing, but it > did not work out. I sometimes work on computational physics problems. The ones with tight loops of floating point operations. I believe that some Itanium systems went into a large cluster somewhere. I got an RX2600 for $100 some years ago, when there were many on eBay. (You could even buy three at a time.) But yes, I suspect for web servers, that they aren't especially good. > Concerning memory access patterns: While they are not particularly > predictable in general-purpose code, mostd general-purpose code > benefits quite a lot from caches (more than numerical code), so I > don't think that this was a big problem for IA-64. > Some people mention varying latencies due to caches as a problem, but > the choice of a small L1 cache (16KB compared to 64KB for the earlier > 21264 and K7 CPUs) for McKinley indicates that average latency was > more of a problem for IA-64 than varying latencies. > >And if the memory access is that predictable, you can > >likely use SIMD instructions instead. -John] > Yes, SIMD ate EPICs lunch on the numerical program side, leaving few > programs where IA-64 outdid the mainstream. The Cray series of vector processors seems long gone by now.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web