Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2715 > unrolled thread
| Started by | "Luke A. Guest" <laguest@archeia.com> |
|---|---|
| First post | 2021-10-05 13:22 +0100 |
| Last post | 2021-10-06 16:20 +0100 |
| Articles | 16 — 10 participants |
Back to article view | Back to comp.compilers
Modern compilers for ye olde architectures "Luke A. Guest" <laguest@archeia.com> - 2021-10-05 13:22 +0100
Re: Modern compilers for ye olde architectures David Brown <david.brown@hesbynett.no> - 2021-10-05 19:59 +0200
Re: Modern compilers for ye olde architectures Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2021-10-05 22:12 +0200
Re: Modern compilers for ye olde architectures Derek Jones <derek@NOSPAM-knosof.co.uk> - 2021-10-06 01:26 +0100
Re: Modern compilers for ye olde architectures "Luke A. Guest" <laguest@archeia.com> - 2021-10-06 09:00 +0100
Re: Modern compilers for ye olde architectures anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2021-10-06 07:56 +0000
Re: Modern compilers for ye olde architectures Philipp Klaus Krause <pkk@spth.de> - 2021-10-06 18:20 +0200
Re: Modern compilers for ye olde architectures anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2021-10-15 07:37 +0000
Re: Modern compilers for ye olde architectures Philipp Klaus Krause <pkk@spth.de> - 2021-10-18 08:35 +0200
Re: Modern compilers for ye olde architectures Philipp Klaus Krause <pkk@spth.de> - 2021-10-18 08:56 +0200
Re: Modern compilers for ye olde architectures Philipp Klaus Krause <pkk@spth.de> - 2021-10-18 09:17 +0200
Re: Modern compilers for ye olde architectures gah4 <gah4@u.washington.edu> - 2021-10-21 21:53 -0700
Re: Modern compilers for ye olde architectures Kaz Kylheku <480-992-1380@kylheku.com> - 2021-10-22 17:28 +0000
Re: Modern compilers for ye olde architectures dave_thompson_2@comcast.net - 2021-11-14 15:04 -0500
Re: Modern compilers for ye olde architectures Theo <theom+news@chiark.greenend.org.uk> - 2021-10-06 10:36 +0100
Re: Modern compilers for ye olde architectures "Luke A. Guest" <laguest@archeia.com> - 2021-10-06 16:20 +0100
| From | "Luke A. Guest" <laguest@archeia.com> |
|---|---|
| Date | 2021-10-05 13:22 +0100 |
| Subject | Modern compilers for ye olde architectures |
| Message-ID | <21-10-007@comp.compilers> |
Hi, I have a Z80 project in mind and would like to build a compiler for a Z80. I was wondering if modern backend techniques can be applied successfully for these old CPU's, i.e. SSA. I know GCC has backends for some older architectures, but these do weird gymnastics such as implementing a virtual cpu in rtl and then lowering further. Thanks, Luke.
[toc] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2021-10-05 19:59 +0200 |
| Message-ID | <21-10-008@comp.compilers> |
| In reply to | #2715 |
On 05/10/2021 14:22, Luke A. Guest wrote: > I have a Z80 project in mind and would like to build a compiler for a > Z80. I was wondering if modern backend techniques can be applied > successfully for these old CPU's, i.e. SSA. > > I know GCC has backends for some older architectures, but these do weird > gymnastics such as implementing a virtual cpu in rtl and then lowering > further. > Certainly modern techniques /can/ be applied to a Z80 compiler. And the Z80 is a lot more powerful than many 8-bit CISC microcontrollers. As far as I know, the only 8-bit target for gcc (in the mainline at least) is the AVR. This is a RISC processor, with 32 8-bit registers, and a much more orthogonal architecture. gcc does do some "weird gymnastics", however - it allocates registers in pairs so that it can pretend to be 16-bit, and then code gets simplified in peephole passes. gcc also supports some 16-bit CISC devices. For the Z80, however, there are a number of compiler options. There are some commercial toolchains, such as HiSoft (IIRC), and there is the "Rabbit" microcontroller which is a derivative of the Z80. It is supported by a compiler for a somewhat extended version of C. Such compilers are probably quite old, however, and might not be easily available. Your best bet is probably SDCC. This is a multi-target open source compiler that is in regular development, and aimed specifically for small CISC microcontroller cores. The Z80 is one of its targets.
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2021-10-05 22:12 +0200 |
| Message-ID | <21-10-009@comp.compilers> |
| In reply to | #2715 |
On 10/5/21 2:22 PM, Luke A. Guest wrote: > I have a Z80 project in mind and would like to build a compiler for a > Z80. I was wondering if modern backend techniques can be applied > successfully for these old CPU's, i.e. SSA. > > I know GCC has backends for some older architectures, but these do weird > gymnastics such as implementing a virtual cpu in rtl and then lowering > further. I wonder what's the purpose of such compilers. Let a Z80 emulate some supported CPU and be happy :-) DoDi [Maybe it's retrocomputing, maybe it's some device with an embedded Z80. If that's all the computing you need, why pay for more? -John]
[toc] | [prev] | [next] | [standalone]
| From | Derek Jones <derek@NOSPAM-knosof.co.uk> |
|---|---|
| Date | 2021-10-06 01:26 +0100 |
| Message-ID | <21-10-010@comp.compilers> |
| In reply to | #2715 |
Luke, > Z80. I was wondering if modern backend techniques can be applied > successfully for these old CPU's, i.e. SSA. Modern, as in invented in 1988. Modern, as in post-1985 techniques require lots of memory. So they are not of any use if you plan to host your compiler on a Z80, otherwise your next problem is mapping techniques that are designed to work well with orthogonal architectures (which the Z80 is certainly not). [I believe the plan is to cross-compile with Z80 as the target. -John]
[toc] | [prev] | [next] | [standalone]
| From | "Luke A. Guest" <laguest@archeia.com> |
|---|---|
| Date | 2021-10-06 09:00 +0100 |
| Message-ID | <21-10-011@comp.compilers> |
| In reply to | #2718 |
On 05/10/2021 18:59, David Brown wrote: > For the Z80, however, there are a number of compiler options. ... I'm not talking about using an existing compiler. > Your best bet is probably SDCC. This is a multi-target open source > compiler that is in regular development, and aimed specifically for > small CISC microcontroller cores. The Z80 is one of its targets. I know of SDCC. I'm not talking about using C. On 06/10/2021 01:26, Derek Jones wrote: > Modern, as in post-1985 techniques require lots of memory. > So they are not of any use if you plan to host your compiler > on a Z80, otherwise your next problem is mapping techniques > that are designed to work well with orthogonal architectures > (which the Z80 is certainly not). > [I believe the plan is to cross-compile with Z80 as the target. -John] I'm not talking about running a compiler on a Z80, just targetting one. [Perhaps you could give us more hints about what you're doing so we can offer more usefule answers. -John]
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2021-10-06 07:56 +0000 |
| Message-ID | <21-10-012@comp.compilers> |
| In reply to | #2715 |
"Luke A. Guest" <laguest@archeia.com> writes: >I have a Z80 project in mind and would like to build a compiler for a >Z80. I was wondering if modern backend techniques can be applied >successfully for these old CPU's, i.e. SSA. SSA can certainly be used in a compiler for the Z80, but SSA is not a back-end technique; it's a way to represent data flow in the intermediate representation. As for the back-end, it seems to me that the major problem with the Z80 is that it does not have general-purpose registers; instead, many instructions deal with specific registers. Many early architectures were like that, and assembly programmers could puzzle out good register assignments, but compilers were not particularly good at it. So eventually computer architects introduced machines with general-purpose registers like the PDP-11, the VAX, and the RISCs; and compiler writers developed techniques like graph colouring to make good use of these architectures. Maybe with the increased memory and processing power available now, one could do better, but given that special-purpose registers are mostly a thing of the past, there has not been much research into that, that I am aware of. Maybe the puzzle-solving approach to register allocation can help, but I am not familiar enough with that to say for sure. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2021-10-06 18:20 +0200 |
| Message-ID | <21-10-015@comp.compilers> |
| In reply to | #2720 |
Am 06.10.21 um 09:56 schrieb Anton Ertl: > So eventually computer architects introduced machines with > general-purpose registers like the PDP-11, the VAX, and the RISCs; and > compiler writers developed techniques like graph colouring to make > good use of these architectures. > > Maybe with the increased memory and processing power available now, > one could do better, but given that special-purpose registers are > mostly a thing of the past, there has not been much research into > that, that I am aware of. See "Optimal Register Allocation in Polynomial Time". A graph-coloring approach that can handle irregularities well (as long as there are not too many registers). SDCC uses such a register allocator for some backends, including z80. Philipp
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2021-10-15 07:37 +0000 |
| Message-ID | <21-10-024@comp.compilers> |
| In reply to | #2723 |
Philipp Klaus Krause <pkk@spth.de> writes: >See "Optimal Register Allocation in Polynomial Time". A graph-coloring >approach that can handle irregularities well (as long as there are not >too many registers). SDCC uses such a register allocator for some >backends, including z80. Interesting paper. I had trouble following the theoretical part, but looking at the practical part and taking the part that I understood about the theory, you try out all possible assignments of all registers to the maximum clique (live ranges that are alive at the same time), and because the number of registers is a constant (for a given architecture), this is a polynomial. You discuss splitting live ranges at control-flow-graph boundaries, but it seems to me that in some cases you want to split live ranges between instructions within a control-flow node; this does not change the complexity-theoretic result, but of course the implementation. If only the maximum clique size plays a role, it's as if the assignments to the cliques are independent; but then you have to reconcile the assignments for different cliques by introducing reg-reg move instructions, and take these costs into account, and optimize them, too; so I don't see that the cliques can be treated as independent. And you probably don't, because there are additional factors in the complexity. In any case, I am missing something, and maybe you have an idea what it is. I am wondering about one thing in the empirical results in your paper: Why is the code size not monotonously falling with increased numbers of assignments? Are these independent runs with different (pseudo-random) assignments? I had some questions which were mostly answered by the paper, but maybe you can offer additional insights: * Am I right that earlier register allocators were bad for irregular register sets, and that's why general-purpose registers won once compilers became dominant? Why did general-purpose registers become dominant? The advantage of general-purpose registers is given in that paper as reducing the complexity of the algorithm by a factor of: (2*(tree-width(G)+1)*#registers)! E.g., for tree-width 2 and 9 registers, it's 54! = 2.30843697*10^71 Which poses the question: In your empirical work you stopped at 10^8 assignments (in some cases, less). How did you get provably optimal assignments on the Z80 with its 9 registers? * What are the key points why your work can deal with irregular register sets, and earlier approaches are pretty bad at that? It seems to me that you use the CPU power available now to try out many different assignments, while earlier work has balked at that. * Do you have any idea why no good approach for dealing with irregular instruction sets has been found in, say, the 1970s and 1980s when irregular register sets were more common (e.g. on the Z80 and the 8086). Your approach is an (ideally exhaustive) search that uses more CPU power (and memory?) than was available then. At the time, one would have resorted to heuristics, but apparently no general effective heuristics have been found. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2021-10-18 08:35 +0200 |
| Message-ID | <21-10-032@comp.compilers> |
| In reply to | #2732 |
Am 15.10.21 um 09:37 schrieb Anton Ertl: > Philipp Klaus Krause <pkk@spth.de> writes: > > I am wondering about one thing in the empirical results in your paper: > Why is the code size not monotonously falling with increased numbers > of assignments? Are these independent runs with different > (pseudo-random) assignments? The basic idea is that for some subsets of nodes in the control-flow graph, we try all possible assignments to registers. The tree-decomposition tells us which subsets that are, and how we can assemble the locally optimal solutions into globally optimal ones. Since the cost function gives us the real costs from code generation, we are optimal wrt. to the optimization goal, e.g. code size. However, thre ar three aspects, why code size might not fall monotonously with an increased number of tried assignments: 1) If the peephole optimizer is active: The register allocation approach can only consider code size after code generation, it doesn't know what the peephole optimizer might do with the result. 2) The register allocator considers variables / parts of variables as assigned to certain registers, or spilt. It doesn't know where spilt variables will end up on the stack. It can thus consider the benefits of coalescing for variables assigned to registers, but not for spilt variables. 3) We know that the number of assignments we need to consider to get a provably optimal result is polynomial if the input is a structured program. But the polynomial bound might be too big for practical compilation, so we limit the number of assignments considered (via the --max-allocs-per-node parameter to SDCC). In that case the allocator can not give a provably optimal result. In some cases, it needs to make a decision, on which assignments to condier further (and does so based on incomplete information using a heuristic). Here, considering more assignments can lead to a certain assignment considered previously no lonjger being considered. E.g. when going from --max-allocs-per-node 10 to --max-allocs-per-node 100, we will now consider 100 assignments instead of 10. But not all of the previously considered 10 might make it into the 100 considered now. And in the end it might turn out that one of the 10 might have given us a better result globally. The decision which assignments to consider further us not random, but based on a heuristic that mostly takes local information into account. Philipp
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2021-10-18 08:56 +0200 |
| Message-ID | <21-10-033@comp.compilers> |
| In reply to | #2732 |
Am 15.10.21 um 09:37 schrieb Anton Ertl: > > I had some questions which were mostly answered by the paper, but > maybe you can offer additional insights: > > * Am I right that earlier register allocators were bad for irregular > register sets, and that's why general-purpose registers won once > compilers became dominant? Why did general-purpose registers become > dominant? Indeed earlier register allocators could not handle irregular architectures well. In particular, Chaitin-style graph coloring register allocators are simple, and efficient if we have enough general-purpose registers. Chaiting-style register allocators and RISC-style architectufres are a good match. > * What are the key points why your work can deal with irregular > register sets, and earlier approaches are pretty bad at that? > > It seems to me that you use the CPU power available now to try out > many different assignments, while earlier work has balked at that. > > * Do you have any idea why no good approach for dealing with irregular > instruction sets has been found in, say, the 1970s and 1980s when > irregular register sets were more common (e.g. on the Z80 and the > 8086). > > Your approach is an (ideally exhaustive) search that uses more CPU > power (and memory?) than was available then. At the time, one would > have resorted to heuristics, but apparently no general effective > heuristics have been found. Indeed my approach uses more CPU power and memory than was available then. There is another newer approach that claims to handle irregularitites well by Roberto Castañeda Lozano, but it also uses more CPU time and memory than was available in the past. Besides the CPU power and memory, both my and his approach also build on theoretical advances that simply weren't there back then. I use tree-decompositions. While the basic idea is there in Wagner's work on S-functions in the 1970s, it did not get applied to register allocation until Thorup's and Bodlaender's works in 1998. Those two works considered a quite abstract version of register allocation (graph-coloring with all variables the same size). Thorup also offered for the first time, a practical way to get tree-decompositions of control-flow graphs (there are flaws in what he did, but it was still good enough to build on for me, then). Philipp
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2021-10-18 09:17 +0200 |
| Message-ID | <21-10-034@comp.compilers> |
| In reply to | #2732 |
Am 15.10.21 um 09:37 schrieb Anton Ertl: > > Which poses the question: In your empirical work you stopped at 10^8 > assignments (in some cases, less). How did you get provably optimal > assignments on the Z80 with its 9 registers? > Castañeda Lozano On one hand, we have the theoretical bound on the number of assignments, which is useful for proving that we can be optimal in polynomial time. On the other hand, getting a provably optimal result when compiling an individual function is something that is easier to achieve, as the theoretical bound is a worst case. * In real programs, most bags in the tree-decomposition will be smaller than the tree-width allows. * In real programs, there will be parts of the control-flow graph, where the number of variables alive is less than (tw(G) + 1)*r. * In the implementation, we can throw away some assignments early, without sacrificing optimality, when we know that code generation for them would be impossible (e.g. in the z80 port, unlike the stm8 port, code generation cannot handle variables that have some of their bytes allocated to registers, but other bytes spilt). In ports of SDCC that use this register allocator (which now is most of them), when enabling extra comments in the generated assembly code via --fverbose-asm, SDCC will place a comment at the beginning of each function stating if the register allocation was provably optimal. Philipp
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2021-10-21 21:53 -0700 |
| Message-ID | <21-10-037@comp.compilers> |
| In reply to | #2742 |
On Thursday, October 21, 2021 at 5:08:46 PM UTC-7, Philipp Klaus Krause wrote: (snip) > On one hand, we have the theoretical bound on the number of assignments, > which is useful for proving that we can be optimal in polynomial time. > On the other hand, getting a provably optimal result when compiling an > individual function is something that is easier to achieve, as the > theoretical bound is a worst case. This is reminding me that early Fortran compilers had a FREQUENCY statement (optionally) telling the compiler the relative probability of branching for IF statements, and the estimated iterations for DO loops. It was removed before the first standard in 1966. The above methods might be fine on a small scale, but for more global optimization you need the relative probabilities. I suspect that everyone assumes equal probabilities for everything. I suspect that there is no interest in bringing FREQUENCY back to Fortran, or any other language, though. [Legend says that in at least one compiler, FREQUENCY was implemented backward and nobody noticed. -John]
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| Date | 2021-10-22 17:28 +0000 |
| Message-ID | <21-10-038@comp.compilers> |
| In reply to | #2744 |
On 2021-10-22, gah4 <gah4@u.washington.edu> wrote:
> I suspect that there is no interest in bringing FREQUENCY back to Fortran,
> or any other language, though.
GNU C has built-in primitives for expressing the likelihood of branches
being taken: __builtin_expect and __builtin_expect_with_probability.
Plus other builtins related to optimization such as that you can request
cache prefetch with __builtin_prefetch.
Goto labels can have attributes which indicate the likelihood of their
use:
again:
/* we jump back here all the freakin' time */
__attribute__((hot));
...
err:
/* oh crap! this happens, though rarely */
__attribute__((cold));
These things are smartly out of the way in a protected namespace,
so you can easily use macros to write code that takes advantage of
these things yet remains portable.
> [Legend says that in at least one compiler, FREQUENCY was implemented
> backward and nobody noticed. -John]
The problem is that programmers sometimes optimize things just as a fun
chrome plating exercise, and not as a change to the code accompanied by
a unit test which somehow proves that the change had the required
effect.
It's hard to do and annoying; you can't test absolute speeds of anything
because machines change. The test case may have to locally duplicate and
preserve the old version of the entire module of code, and compare its
performance to the new version. Then if things change in that code, that
test is going to be annoying to maintain; its reference now has to be a
fictitious old version of the code that never existed which doesn't have
the optimization, so you can keep proving that the optimization provides
a testable benefit.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[toc] | [prev] | [next] | [standalone]
| From | dave_thompson_2@comcast.net |
|---|---|
| Date | 2021-11-14 15:04 -0500 |
| Message-ID | <21-11-002@comp.compilers> |
| In reply to | #2720 |
On Wed, 06 Oct 2021 07:56:59 GMT, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: ... > As for the back-end, it seems to me that the major problem with the > Z80 is that it does not have general-purpose registers; instead, many > instructions deal with specific registers. Many early architectures > were like that, and assembly programmers could puzzle out good > register assignments, but compilers were not particularly good at it. > So eventually computer architects introduced machines with > general-purpose registers like the PDP-11, the VAX, and the RISCs; ... Eventually? PDP-11 was 7 years before Z-80, and 5 years before that both PDP-6 and S/360 had 16 GPRs (& none 'wasted' as PC SP FP). S/360 and PDP-11 did have floating-point registers separate, and at least on the latter optional. (I believe there were 360 models listed without FP, but heard that actual instances were about as rare as PDP-6 without the 'option' for 0-17 to be registers instead of core.) [Floating point was optional on the low end 360/22, /25, /30, and /40. Considering what they were used for and how slow the FP was, e.g., on the /30 floating add was over 50us, multiply up to 400us, I expect a lot of them skipped the floating point. Larger models all had it. -John]
[toc] | [prev] | [next] | [standalone]
| From | Theo <theom+news@chiark.greenend.org.uk> |
|---|---|
| Date | 2021-10-06 10:36 +0100 |
| Message-ID | <21-10-013@comp.compilers> |
| In reply to | #2715 |
Luke A. Guest <laguest@archeia.com> wrote: > I have a Z80 project in mind and would like to build a compiler for a > Z80. I was wondering if modern backend techniques can be applied > successfully for these old CPU's, i.e. SSA. > > I know GCC has backends for some older architectures, but these do weird > gymnastics such as implementing a virtual cpu in rtl and then lowering > further. There is, it seems, an LLVM backend for Z80: https://github.com/jacobly0/llvm-project (see the 'z80' branch) It appears TI calculators are the main use case. I don't know the current status/functionality, but it would be fun to see what the various LLVM passes do to the generated code. It seems like there's been some work done on Rust for Z80 (and 6502): https://github.com/jacobly0/llvm-project/issues/15 Theo
[toc] | [prev] | [next] | [standalone]
| From | "Luke A. Guest" <laguest@archeia.com> |
|---|---|
| Date | 2021-10-06 16:20 +0100 |
| Message-ID | <21-10-014@comp.compilers> |
| In reply to | #2721 |
On 06/10/2021 10:36, Theo wrote: > Luke A. Guest <laguest@archeia.com> wrote: >> I have a Z80 project in mind and would like to build a compiler for a >> Z80. I was wondering if modern backend techniques can be applied >> successfully for these old CPU's, i.e. SSA. >> >> I know GCC has backends for some older architectures, but these do weird >> gymnastics such as implementing a virtual cpu in rtl and then lowering >> further. > > There is, it seems, an LLVM backend for Z80: > https://github.com/jacobly0/llvm-project > (see the 'z80' branch) > It appears TI calculators are the main use case. That is long dead and cannot be merged into a newer branch, they made a complete mess of the source tree there. That is pre llvm-3. > I don't know the current status/functionality, but it would be fun to see > what the various LLVM passes do to the generated code. > > It seems like there's been some work done on Rust for Z80 (and 6502): > https://github.com/jacobly0/llvm-project/issues/15 Ok, just looked and seems there has been something done recently, last I looked it was dead and ancient.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web