Path: csiph.com!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Elijah Stone <elronnd@elronnd.net>
Newsgroups: comp.compilers
Subject: In defense of phi nodes
Date: Mon, 26 Apr 2021 16:35:25 -0700
Organization: A noiseless patient Spider
Lines: 131
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-04-012@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="74097"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, design
Posted-Date: 27 Apr 2021 14:01:06 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2651

A conflict, in compiler IL design:

1. We would like to use SSA, to enable fancy optimizations.
2. We would like to allow the value of a register to depend on a branch.

The latter constraint necessitates the existence of multiple assignments,
while the former necessitates that each register be assigned to only once.
How to resolve this?

(Note: the syntax  a { b... } [c]  creates a block named a, whose body
  comprises the instructions b, and which is terminated by the instruction
c.)


Solution the first: memory locations.

Consider this (somewhat contrived) implementation of the logical NOT:

first {
 	result_location ← alloca 1
} [branch input=0, was_zero, was_one]
was_zero {
 	store result_location, 1
} [jmp end]
was_one {
 	store result_location, 0
} [jmp end]
end {
 	result ← load result_location
} [ret result]

This is the most horrible, naive possible implementation.  An IR using SSA
but requiring this implementation would probably be slower than one which
simply allowed mutable registers.  Not much more to say here.


Solution the second: optionally mutable registers.

I don't know of any compiler which actually does this, and it doesn't seem
particularly worthwhile, but it is something I thought of.  (Though I do
think one of gcc and llvm were experimenting with mutable registers
recently, maybe?)

We could allow some registers to be mutable, and some to be declared
statically immutable, as in:

first {
 	result ← mutable byte
} [branch input=0, was_zero, was_one]
was_zero {
 	result ← 1
} [jmp end]
was_one {
 	result ← 0
} [jmp end]
end {} [ret result]

This solves _some_ of the problems with the first implementation (and
indeed would likely produce optimal code for this example), but the
possibility of a register's being mutable still inhibits optimizations.
On to the solutions people actually use:


Solution the third: block parameters.

Here, we allow blocks to pass each other arguments, as when calling
functions. The values of these arguments can be different in different
invocations of the same block, but registers still cannot be mutated.
Example:

first {} [branch input=0, was_zero, was_one]
was_zero {
 	tmp0 ← 1
} [jmp end, tmp0]
was_one {
 	tmp1 ← 0
} [jmp end, tmp1]
end(value) {} [ret value]


Solution the fourth: phi nodes.

A 'phi' is a special instruction whose value is that of _whichever of its
operands was assigned to most recently_.  An example is worth a thousand
abstract pontifications, so:

first {} [branch input=0, was_zero, was_one]
was_zero {
 	result0 ← 1
} [jmp end]
was_one {
 	result1 ← 0
} [jmp end]
end {
 	result ← phi result0, result1
} [ret result]

If the input was 0, then control flow goes through was_zero and not
was_one, so result0 will have been assigned to recently (and result1 not
at all), so the final result will be result0--that is, 1.


Block parameters and phi nodes are equivalent; it's always possible to
losslessly translate between the two.  Phi nodes were discovered first, in
the 1980s, and remain in broader use, while block parameters are a newer
(and somewhat more fashionable) innovation.

The common arguments in favour of block parameters are that they are
easier to reason about (I didn't need nearly so much explanation for the
block parameter example than for the phi node example!) and that they
improve locality.

Block parameters do _increase_ locality, but I don't think they improve it
so much as they obscure essential nonlocality.  Both phi nodes and block
parameters express a relationship between multiple registers.  Phi nodes
place the expression of that relationship at the genesis of the
'destination' register, while block parameters scatter the expression of
relationship, not tying it clearly to either the 'destination' register or
the 'source' registers.

And this ties more ultimately into what phi nodes and block parameters
enable, which is _principled mutability_.  Pure static _single_ assignment
is not sufficient to express all algorithms efficiently, so we need a
limited way to assign to a register multiple times.  (Perhaps we should
rather call it Statically Enumerated Assignment?)  In both schemes, there
is a special class of register--either block parameters, or those
registers whose geneses are phi nodes--which are assigned to multiple
times.  Phi nodes provide a focal point which enumerating those
assignments (in the form of proxy registers).

Thoughts?