Path: csiph.com!xmission!news.alt.net!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: Nitty-gritty aspects of register allocation Date: Fri, 11 Sep 2020 10:35:51 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 79 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-09-031@comp.compilers> References: <20-09-028@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="86722"; mail-complaints-to="abuse@iecc.com" Keywords: code, optimize Posted-Date: 11 Sep 2020 22:50:07 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2602 Elijah Stone writes: >There is a lot of research and a lot of resources on the high-level >aspects of register allocation--colouring interference graphs, lifetimes, >CFGs, etc. > >But there are a lot of low-level architecture-specific things that these >models don't account for on their own. Most notably, not all registers >can be used for the same things. Just on amd64: > >- Multiplication and division use the rdx:rax register pair. > >- Bit shifts always use cl. > >- When calling functions, their arguments have to go in specific registers. > >In all of these cases, you can spill whatever happens to already be in >those registers, but it would be nicer if you could arrange for the >appropriate values to already be in the right places. But how? A classical approach (at least for graph colouring) is to have a register coalescing phase before the actual register allocation. Register coalescing tries to eliminate move instructions by combining the two live ranges connected by the move into one live range (i.e., the same register is used for both). If you have that, you can insert move instructions before doing the coalescing and register allocation, and coalescing will eliminate them if possible, and will keep them if not. So, for the problem with fixed registers you create live ranges for the fixed registers, and insert a move from another live range A to the fixed live range X just before the value in A is needed in the fixed register, and a move from the fixed live range Y to another live range B just after a value is put into a fixed register (by an instruction or by the calling convention). Coalescing will eliminate as many of these moves as it can; if it can eliminate both of the moves in the examples, A is in the register X and B is in the register Y, with no moves necessary. > But that's not a general solution >(imagine if every register has a few pieces of unique functionality) The technique above is a workaround for relatively rare cases; it probably does not work so well for an instruction set like the 8080, where every register has a special purpose. That's why we have seen architectures with general-purpose registers ("register machines") whenever it was expected that most of the code is generated by compilers rather than assembly programmers. That includes IA-32 and AMD64, where the instructions with implicit registers (like XLAT or LODS) were superseded by faster implementations of general instructions (e.g., on the 486 LODS takes 5 cycles, while the general replacement takes 2 cycles and has no register requirements). >- Some registers need a special prefix to be used (REX prefix). These > registers are generally different from the special-purpose registers > (for e.g. multiplication). Is it better to put a non-multiplied value > in a REX-prefixed register, or keep it in an unprefixed register and > spill it later when you need to multiply? Coalescing should answer that. One other aspect of REX is that it is necessary to get 64-bit width for many instructions. So if you have a 64-bit data type, you can just as well put it into R8-R15, because you often need a REX anyway. I would do this with preferencing (i.e., in the register allocation phase, when you have to choose a register and one in the preferred class is available, choose it). >- The second-lowest 8 bits of some registers can be addressed > separately. When does it make sense to use them? Probably never. The 16 other 8-bit registers should be enough; and even if you want to use them, I recommend checking if using AH BH CH DH is not a performance pitfall. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/