Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.tele.dk!feed118.news.tele.dk!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail From: torbenm@diku.dk (Torben Ægidius Mogensen) Newsgroups: comp.arch Subject: Re: Architecting a return address stack References: <37e7fa2b-2d98-428d-bdb8-369b995bcaad@d17g2000vba.googlegroups.com> Date: Tue, 10 Apr 2012 12:23:19 +0200 Message-ID: <7zlim3lwbs.fsf@ask.diku.dk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) Cancel-Lock: sha1:1w7cw6LO6ciIzhw717G60c98F4I= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Lines: 37 Organization: SunSITE.dk - Supporting Open source NNTP-Posting-Host: 130.225.96.225 X-Trace: news.sunsite.dk DXC=i3mMS3QV2<@CbPFlQZmVYDYSB=nbEKnkKQ?=6fSNTkQAeA0c^o`FO0DNW[cF@L;gDGF5\^6VX=jYD63c;UX1]PiF87Z9b@SmWPK X-Complaints-To: staff@sunsite.dk Xref: csiph.com comp.arch:6468 "Paul A. Clayton" writes: > While an architected return address stack can provide a modest > benefit from early return address determination (a predictive > return address stack provides most of this benefit but is > predictive, of limited size, and stores information redundantly) > and protection from return address overwriting, such introduces > some overhead which may make it less attractive at the very low > end (the level of, e.g., ARM Cortex-M0+) and at the higher end > (providing an additional OoO execution mechanism for such > special purpose registers and memory accesses could be a > nuisance, though being highly decoupled from ordinary > execution should constrain complexity it seems). One possible benefit from a hardware return stack is to execute the continuation after the return concurrently with the call. Instead of saving registers on the return stack, the caller gets its own copy of the registers (through renaming) and is executed in parallel with the continuation, synchronising through the return-value register. If there is not enough registers (or threads) for the new call, a thread is spilled to a stack to make room. In a normal memory model, the continuation can continue until either it accesses the return value register or a memory location (as this might potentially be overwritten by the called procedure), but you could have a more "functional" memory model where it is assumed that the called procedure does not modify memory reachable by the caller (except through a pointer in the return-value register). This would allow for more parallelism. You could have two kinds of memory-access instructions: Functional and imperative, where a thread would stop on an imperative memory access until all calls higher in the stack are done, but where functional memory accesses would not block. Enforcing that functional memory accesses do, indeed, behave in a write-once-read-many way may be expensive, but you could leave that to the compiler to ensure and simply say that behaviour is undefined if this is not done. Torben