Path: csiph.com!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Elijah Stone Newsgroups: comp.compilers Subject: In defense of phi nodes Date: Mon, 26 Apr 2021 16:35:25 -0700 Organization: A noiseless patient Spider Lines: 131 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <21-04-012@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="74097"; mail-complaints-to="abuse@iecc.com" Keywords: optimize, design Posted-Date: 27 Apr 2021 14:01:06 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2651 A conflict, in compiler IL design: 1. We would like to use SSA, to enable fancy optimizations. 2. We would like to allow the value of a register to depend on a branch. The latter constraint necessitates the existence of multiple assignments, while the former necessitates that each register be assigned to only once. How to resolve this? (Note: the syntax a { b... } [c] creates a block named a, whose body comprises the instructions b, and which is terminated by the instruction c.) Solution the first: memory locations. Consider this (somewhat contrived) implementation of the logical NOT: first { result_location ← alloca 1 } [branch input=0, was_zero, was_one] was_zero { store result_location, 1 } [jmp end] was_one { store result_location, 0 } [jmp end] end { result ← load result_location } [ret result] This is the most horrible, naive possible implementation. An IR using SSA but requiring this implementation would probably be slower than one which simply allowed mutable registers. Not much more to say here. Solution the second: optionally mutable registers. I don't know of any compiler which actually does this, and it doesn't seem particularly worthwhile, but it is something I thought of. (Though I do think one of gcc and llvm were experimenting with mutable registers recently, maybe?) We could allow some registers to be mutable, and some to be declared statically immutable, as in: first { result ← mutable byte } [branch input=0, was_zero, was_one] was_zero { result ← 1 } [jmp end] was_one { result ← 0 } [jmp end] end {} [ret result] This solves _some_ of the problems with the first implementation (and indeed would likely produce optimal code for this example), but the possibility of a register's being mutable still inhibits optimizations. On to the solutions people actually use: Solution the third: block parameters. Here, we allow blocks to pass each other arguments, as when calling functions. The values of these arguments can be different in different invocations of the same block, but registers still cannot be mutated. Example: first {} [branch input=0, was_zero, was_one] was_zero { tmp0 ← 1 } [jmp end, tmp0] was_one { tmp1 ← 0 } [jmp end, tmp1] end(value) {} [ret value] Solution the fourth: phi nodes. A 'phi' is a special instruction whose value is that of _whichever of its operands was assigned to most recently_. An example is worth a thousand abstract pontifications, so: first {} [branch input=0, was_zero, was_one] was_zero { result0 ← 1 } [jmp end] was_one { result1 ← 0 } [jmp end] end { result ← phi result0, result1 } [ret result] If the input was 0, then control flow goes through was_zero and not was_one, so result0 will have been assigned to recently (and result1 not at all), so the final result will be result0--that is, 1. Block parameters and phi nodes are equivalent; it's always possible to losslessly translate between the two. Phi nodes were discovered first, in the 1980s, and remain in broader use, while block parameters are a newer (and somewhat more fashionable) innovation. The common arguments in favour of block parameters are that they are easier to reason about (I didn't need nearly so much explanation for the block parameter example than for the phi node example!) and that they improve locality. Block parameters do _increase_ locality, but I don't think they improve it so much as they obscure essential nonlocality. Both phi nodes and block parameters express a relationship between multiple registers. Phi nodes place the expression of that relationship at the genesis of the 'destination' register, while block parameters scatter the expression of relationship, not tying it clearly to either the 'destination' register or the 'source' registers. And this ties more ultimately into what phi nodes and block parameters enable, which is _principled mutability_. Pure static _single_ assignment is not sufficient to express all algorithms efficiently, so we need a limited way to assign to a register multiple times. (Perhaps we should rather call it Statically Enumerated Assignment?) In both schemes, there is a special class of register--either block parameters, or those registers whose geneses are phi nodes--which are assigned to multiple times. Phi nodes provide a focal point which enumerating those assignments (in the form of proxy registers). Thoughts?