Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.tele.dk!feed118.news.tele.dk!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
From: torbenm@diku.dk (Torben Ægidius Mogensen)
Newsgroups: comp.arch
Subject: Re: Architecting a return address stack
References: <37e7fa2b-2d98-428d-bdb8-369b995bcaad@d17g2000vba.googlegroups.com>
Date: Tue, 10 Apr 2012 12:23:19 +0200
Message-ID: <7zlim3lwbs.fsf@ask.diku.dk>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)
Cancel-Lock: sha1:1w7cw6LO6ciIzhw717G60c98F4I=
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Lines: 37
Organization: SunSITE.dk - Supporting Open source
NNTP-Posting-Host: 130.225.96.225
X-Trace: news.sunsite.dk DXC=i3mMS3QV2<@CbPFlQZmVYDYSB=nbEKnkKQ?=6fSNTkQAeA0c^o`FO0DNW[cF@L;gDGF5\^6VX=jYD63c;UX1]PiF87Z9b@SmWPK
X-Complaints-To: staff@sunsite.dk
Xref: csiph.com comp.arch:6468

"Paul A. Clayton" <paaronclayton@gmail.com> writes:

> While an architected return address stack can provide a modest
> benefit from early return address determination (a predictive
> return address stack provides most of this benefit but is
> predictive, of limited size, and stores information redundantly)
> and protection from return address overwriting, such introduces
> some overhead which may make it less attractive at the very low
> end (the level of, e.g., ARM Cortex-M0+) and at the higher end
> (providing an additional OoO execution mechanism for such
> special purpose registers and memory accesses could be a
> nuisance, though being highly decoupled from ordinary
> execution should constrain complexity it seems).

One possible benefit from a hardware return stack is to execute the
continuation after the return concurrently with the call.  Instead of
saving registers on the return stack, the caller gets its own copy of
the registers (through renaming) and is executed in parallel with the
continuation, synchronising through the return-value register.  If there
is not enough registers (or threads) for the new call, a thread is
spilled to a stack to make room.

In a normal memory model, the continuation can continue until either it
accesses the return value register or a memory location (as this might
potentially be overwritten by the called procedure), but you could have
a more "functional" memory model where it is assumed that the called
procedure does not modify memory reachable by the caller (except through
a pointer in the return-value register).  This would allow for more
parallelism.  You could have two kinds of memory-access instructions:
Functional and imperative, where a thread would stop on an imperative
memory access until all calls higher in the stack are done, but where
functional memory accesses would not block.  Enforcing that functional
memory accesses do, indeed, behave in a write-once-read-many way may be
expensive, but you could leave that to the compiler to ensure and simply
say that behaviour is undefined if this is not done.

	Torben