Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.arch > #5657 > unrolled thread
| Started by | Brett Davis <ggtgp@yahoo.com> |
|---|---|
| First post | 2012-02-02 02:16 -0600 |
| Last post | 2012-02-03 12:20 -0800 |
| Articles | 11 on this page of 31 — 15 participants |
Back to article view | Back to comp.arch
Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-02 02:16 -0600
Re: Are rotating register files still a bad idea? "Paul A. Clayton" <paaronclayton@gmail.com> - 2012-02-02 07:41 -0800
Re: Are rotating register files still a bad idea? Nomen Nescio <nobody@dizum.com> - 2012-02-02 19:04 +0100
Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-02 23:16 -0600
Re: Are rotating register files still a bad idea? Fritz Wuehler <fritz@spamexpire-201202.rodent.frell.theremailer.net> - 2012-02-03 18:19 +0100
Re: Are rotating register files still a bad idea? Anne & Lynn Wheeler <lynn@garlic.com> - 2012-02-03 13:09 -0500
Re: Are rotating register files still a bad idea? Stephen Fuld <SFuld@alumni.cmu.edu.invalid> - 2012-02-03 12:28 -0800
Re: Are rotating register files still a bad idea? MitchAlsup <MitchAlsup@aol.com> - 2012-02-03 13:04 -0800
Re: Are rotating register files still a bad idea? Thomas Womack <twomack@chiark.greenend.org.uk> - 2012-02-03 19:15 +0000
Re: Are rotating register files still a bad idea? Mark Thorson <nospam@sonic.net> - 2012-02-03 13:08 -0800
Re: Are rotating register files still a bad idea? Thomas Womack <twomack@chiark.greenend.org.uk> - 2012-02-03 22:21 +0000
Re: Are rotating register files still a bad idea? Mark Thorson <nospam@sonic.net> - 2012-02-03 18:42 -0800
Re: Are rotating register files still a bad idea? Glen Overby <coreSPAMsample@charter.net> - 2012-03-01 18:18 -0600
Re: Are rotating register files still a bad idea? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-02-04 13:26 +0000
Re: Are rotating register files still a bad idea? Fritz Wuehler <fritz@spamexpire-201202.rodent.frell.theremailer.net> - 2012-02-05 00:58 +0100
Re: Are rotating register files still a bad idea? Nomen Nescio <nobody@dizum.com> - 2012-02-04 21:00 +0100
Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-05 00:35 -0600
Re: Are rotating register files still a bad idea? Quadibloc <jsavard@ecn.ab.ca> - 2012-02-03 10:11 -0800
Re: Are rotating register files still a bad idea? jgk@panix.com (Joe keane) - 2012-02-03 23:20 +0000
Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-02-03 22:54 -0800
Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-04 07:15 -0600
Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-02-05 13:49 -0800
Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-06 05:36 -0600
Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-02-06 18:46 -0800
Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-28 20:02 -0600
Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-03-07 22:35 -0600
Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-03-08 07:25 -0800
Re: Are rotating register files still a bad idea? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-02-06 12:49 +0000
Re: Are rotating register files still a bad idea? Quadibloc <jsavard@ecn.ab.ca> - 2012-02-04 11:14 -0800
Re: Are rotating register files still a bad idea? Michael S <already5chosen@yahoo.com> - 2012-02-03 06:04 -0800
Re: Are rotating register files still a bad idea? MitchAlsup <MitchAlsup@aol.com> - 2012-02-03 12:20 -0800
Page 2 of 2 — ← Prev page 1 [2]
| From | Brett Davis <ggtgp@yahoo.com> |
|---|---|
| Date | 2012-02-04 07:15 -0600 |
| Message-ID | <ggtgp-25AEB5.07151604022012@netnews.mchsi.com> |
| In reply to | #5697 |
In article <4F2CD63A.4050306@SPAM.comp-arch.net>, "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote: > On 2/2/2012 12:16 AM, Brett Davis wrote: > > > > Do rotating register files buy you anything net after costs, > > or is it just a fiasco in the age of modern OoO pipelines? > > When I first saw your post, I thought that you were talking about the > rotating register file that Cydrome and Itanium had, that could be made > to rotate on every loop iteration, and not SPARC-style overlapping > register windows (which Itanium also had). Opps, wrong terminology, meant register windows. That Itanium had both types of register rotation made me wince. > I actually think that Cydrome state rotating register files may be a > good idea, albeit neglected. They allow software pipelining to be done > without requiring reg-reg moves and give many of the benefits of loop > unrolling without increasing code size. On actual rotating registers http://www.cs.nmsu.edu/~rvinyard/itanium/register_rotation.htm I do not see that rotating registers have a point when you are OoO with 100 instructions in process. The hardware unrolls loops for you. I work in embedded with limited OoO and only 6 ops in flight, which means I unroll loops. Rotating registers could help, but there is so much other baggage and setup that I do not see a embedded chip going down that road. Compiler support is the killer. You still have the prolog and epilogue plus some setup, all you save is one or two register copies, per loop. so ~5% fewer instructions which means 5% faster. I can sneeze on the code and get a 5% speedup. I don't bother hand optimizing unless I can get 20%. I would rather add LEA to a RISC chip, or ARM shifted compute, or IBM's cool insert/extract, these will give a similar speedup on more codes. << add your favorites here >> With 32 registers that means half would rotate, sucky, you need at least 64 registers. The Console guys might consider it, they have 128 floats/vectors and are mostly in-order. Now that Itanic is dead I do not have an axe to grind, I am willing to give anything Itanium did a second look. Better yet Cydrome. On my third look I noticed that you can sometimes fold the prolog and epilogue into the loop, it makes my head hurt and you do not want to do this if you end up canceling a large number of ops, which is exactly what will happen most of the time. Defeats the purpose. You save ~six instructions at the cost of burning ~two dozen in the loop. Makes my axe grind. Feel free to make corrections and give your own take of the tradeoffs, and anything I missed/overlooked. > Overlapping register windows for function calls I am not so sure about. > > One thing the think about: GPU style multiple threads in the same > register file. > > a) COULD work with rotating register windows for loops > > b) would be much harder to make work with overlapping register windows > for function calls.
[toc] | [prev] | [next] | [standalone]
| From | "Andy (Super) Glew" <andy@SPAM.comp-arch.net> |
|---|---|
| Date | 2012-02-05 13:49 -0800 |
| Message-ID | <4F2EF96A.1040107@SPAM.comp-arch.net> |
| In reply to | #5700 |
On 2/4/2012 5:15 AM, Brett Davis wrote:
> In article<4F2CD63A.4050306@SPAM.comp-arch.net>,
> "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote:
>
>> On 2/2/2012 12:16 AM, Brett Davis wrote:
>>>
>>> Do rotating register files buy you anything net after costs,
>>> or is it just a fiasco in the age of modern OoO pipelines?
>>
>> When I first saw your post, I thought that you were talking about the
>> rotating register file that Cydrome and Itanium had, that could be made
>> to rotate on every loop iteration, and not SPARC-style overlapping
>> register windows (which Itanium also had).
>
> Opps, wrong terminology, meant register windows.
>
> That Itanium had both types of register rotation made me wince.
Me, too. And yet I can see nice aspecys of both rotating register files
for loops and register windows for call/returns, etc.
Oftentimes, when there is more than one good way to extend an
architecture, like here, but where they conflict - often, in such
situations, it is better to do neither.
Although let me note one thing: rotating register files don't change the
number of registers that need to be renamed. They just mean that you
either need to rotate the renamings in the register alias table, or that
you do some arithmetic on the logical register number before it is renamed.
Register windows, on the other hand, increase the number of registers
that need to be renamed. This was a killer for Itanium. I tried playing
games such as renaming only the currently accessible registers (on
Itanium even that was a lot), or the top few procedure contexts,
stalling until retirement and renaming had settled down if you called
and returned too quickly. But this forces you into a copy at retirement
model.
>> I actually think that Cydrome state rotating register files may be a
>> good idea, albeit neglected. They allow software pipelining to be done
>> without requiring reg-reg moves and give many of the benefits of loop
>> unrolling without increasing code size.
>
> On actual rotating registers
> http://www.cs.nmsu.edu/~rvinyard/itanium/register_rotation.htm
>
> I do not see that rotating registers have a point when you are OoO
> with 100 instructions in process. The hardware unrolls loops for you.
And yet people insist of unrolling loops for OOO processors. And it
often pays off.
E.g. I have been looking at a loop recently that is 9 instructions long,
of which only 5 are real work. 4 of 9 are overhead - loop induction
variables and and branch. And it fills the machine, maximal IPC. So
any unrolling that can reduce the overhead per iteration, eliminating
any of those overhead instructions, very likely improves performance.
At the cost of code size.
Unfortunately, rotating registers does not help this aspect of loop
unrolling - reducing the induction variable overhead.
Rotating registers DOES reduce the code scheduling overhead of loop
unrolling.
Hmm...
>
> I work in embedded with limited OoO and only 6 ops in flight, which
> means I unroll loops. Rotating registers could help, but there is
> so much other baggage and setup that I do not see a embedded chip
> going down that road. Compiler support is the killer.
>
> You still have the prolog and epilogue plus some setup, all you
> save is one or two register copies, per loop. so ~5% fewer
> instructions which means 5% faster. I can sneeze on the code and
> get a 5% speedup.
If by prologue and epilogue you mean the code that ramps up to the full,
maximally scheduled, loop body, and then ramps down, then, no, you don't
need that. One of the features of rotating register files is that the
maximally scheduled loop body has loop predicates, that allow the
prologue and epilogue to be folded in.
(However, I suspect that the ramp up/down code obtained in this way is
sub-optimal. Which means that you tradeoff the lower code size, for
lower performance on small loop counts.
> With 32 registers that means half would rotate, sucky, you need
> at least 64 registers. The Console guys might consider it, they
> have 128 floats/vectors and are mostly in-order.
?? Where do you get "half would rotate"?
That's a number TBD.
The circuit is basically
post_rotation_regnum :=
if( pre_rotation_regnum & rotmask == rotbase )
then (((pre_rotation_regnum & ~rotmask)
+ rotcount) & ~rotmask)
| rotbase)
else pre_rotation_regnum
which I have expressed suboptimally for clarity.
With the above you can make the rotating region any power of two size,
and any naturaly aligned section of the register file.
>
> Now that Itanic is dead I do not have an axe to grind, I am willing
> to give anything Itanium did a second look. Better yet Cydrome.
>
> On my third look I noticed that you can sometimes fold the
> prolog and epilogue into the loop, it makes my head hurt and you
> do not want to do this if you end up canceling a large number of ops,
> which is exactly what will happen most of the time. Defeats the purpose.
> You save ~six instructions at the cost of burning ~two dozen in the loop.
No, that's exactly my concern. I think they oversold the benefits of
rotating loop predicates. I would reexamined with and without.
[toc] | [prev] | [next] | [standalone]
| From | Brett Davis <ggtgp@yahoo.com> |
|---|---|
| Date | 2012-02-06 05:36 -0600 |
| Message-ID | <ggtgp-7BA6C0.05364706022012@netnews.mchsi.com> |
| In reply to | #5724 |
In article <4F2EF96A.1040107@SPAM.comp-arch.net>, "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote: > On 2/4/2012 5:15 AM, Brett Davis wrote: > > In article<4F2CD63A.4050306@SPAM.comp-arch.net>, > > "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote: > > > >> On 2/2/2012 12:16 AM, Brett Davis wrote: > > > > That Itanium had both types of register rotation made me wince. > > Me, too. And yet I can see nice aspecys of both rotating register files > for loops and register windows for call/returns, etc. > > Oftentimes, when there is more than one good way to extend an > architecture, like here, but where they conflict - often, in such > situations, it is better to do neither. I agree, good design is not about saying yes to every kitchen sink choice, it's about saying no, so you can be good at what you do well. As soon as I saw the Itanium instruction set I knew they were DOOMED. > Although let me note one thing: rotating register files don't change the > number of registers that need to be renamed. They just mean that you > either need to rotate the renamings in the register alias table, or that > you do some arithmetic on the logical register number before it is renamed. Is this cheap/free or will it cause issues above 4GHz? (As much as I believe high clocks are not possible, times change.) > Register windows, on the other hand, increase the number of registers > that need to be renamed. This was a killer for Itanium. I tried playing > games such as renaming only the currently accessible registers (on > Itanium even that was a lot), or the top few procedure contexts, > stalling until retirement and renaming had settled down if you called > and returned too quickly. But this forces you into a copy at retirement > model. So register windows are a loser, but not for the rotation reason I thought. Thanks, the world is full of misidentified reasons for success and failure, and you cleared one up. I had heard that one of the big reasons for the failure of the first Itanium design had to do with one of the forms of register rotation, rumor was late in the design Intel had to add two stages to the pipeline to deal with rotation, and this killed all the performance advantage. Any comment? > >> I actually think that Cydrome state rotating register files may be a > >> good idea, albeit neglected. They allow software pipelining to be done > >> without requiring reg-reg moves and give many of the benefits of loop > >> unrolling without increasing code size. > > > > On actual rotating registers > > http://www.cs.nmsu.edu/~rvinyard/itanium/register_rotation.htm > > > > I do not see that rotating registers have a point when you are OoO > > with 100 instructions in process. The hardware unrolls loops for you. > > And yet people insist of unrolling loops for OOO processors. And it > often pays off. > > E.g. I have been looking at a loop recently that is 9 instructions long, > of which only 5 are real work. 4 of 9 are overhead - loop induction > variables and and branch. And it fills the machine, maximal IPC. So > any unrolling that can reduce the overhead per iteration, eliminating > any of those overhead instructions, very likely improves performance. > At the cost of code size. I do not optimize for x86 or high end and overlooked the other reasons for unrolling. Loop induction overhead will force unrolling, you would think that compilers would be able to detect this and auto-unroll. The vast majority of optimizations I do have do with C aliasing rules that compilers have to obey, that cripple performance on low end mostly in-order processors. > Unfortunately, rotating registers does not help this aspect of loop > unrolling - reducing the induction variable overhead. > > Rotating registers DOES reduce the code scheduling overhead of loop > unrolling. > > Hmm... > > > > > I work in embedded with limited OoO and only 6 ops in flight, which > > means I unroll loops. Rotating registers could help, but there is > > so much other baggage and setup that I do not see a embedded chip > > going down that road. Compiler support is the killer. > > > > You still have the prolog and epilogue plus some setup, all you > > save is one or two register copies, per loop. so ~5% fewer > > instructions which means 5% faster. I can sneeze on the code and > > get a 5% speedup. Whining about a 5% speedup is childish, I was a little unfair to register rotation. > If by prologue and epilogue you mean the code that ramps up to the full, > maximally scheduled, loop body, and then ramps down, then, no, you don't > need that. One of the features of rotating register files is that the > maximally scheduled loop body has loop predicates, that allow the > prologue and epilogue to be folded in. > > (However, I suspect that the ramp up/down code obtained in this way is > sub-optimal. Which means that you tradeoff the lower code size, for > lower performance on small loop counts. That is a bad trade, and one I never have never had to make. On the other hand I am the one that talks about infinitely fast CPU's that are choked by slow RAM. The DRAM fetch in the prologue is going to stall the CPU any way, the CPU can merrily run a bunch of canceled math ops at little cost while waiting for the load. So I can only complain about folding the epilogue being wasteful, folding the prologue looks wasteful but is free, a bonus even. On the second hand the false dependancies in the to be canceled ops may also cause stalls, low end only has a 6 instruction OoO window. The compiler has little hope in heck of deciding to fold prologues and epilogues, by default I would not fold either. Compilers do not know when to unroll, but you can use prologues and epilogues without rolling. The purpose would be to start the first load for the loop early, this will give a small performance bump at a small code increase. Again, compiler not smart enough to decide. I am warming to register rotation, lots of small potential, all the downside is in the hardware design... > > With 32 registers that means half would rotate, sucky, you need > > at least 64 registers. The Console guys might consider it, they > > have 128 floats/vectors and are mostly in-order. > > ?? Where do you get "half would rotate"? > > That's a number TBD. > > The circuit is basically > > post_rotation_regnum := > if( pre_rotation_regnum & rotmask == rotbase ) > then (((pre_rotation_regnum & ~rotmask) > + rotcount) & ~rotmask) > | rotbase) > else pre_rotation_regnum > > which I have expressed suboptimally for clarity. > > With the above you can make the rotating region any power of two size, > and any naturaly aligned section of the register file. That info is important, and helps sell register rotation. > > Now that Itanic is dead I do not have an axe to grind, I am willing > > to give anything Itanium did a second look. Better yet Cydrome. > > > > On my third look I noticed that you can sometimes fold the > > prolog and epilogue into the loop, it makes my head hurt and you > > do not want to do this if you end up canceling a large number of ops, > > which is exactly what will happen most of the time. Defeats the purpose. > > You save ~six instructions at the cost of burning ~two dozen in the loop. > > No, that's exactly my concern. I think they oversold the benefits of > rotating loop predicates. I would reexamined with and without. Itanium was ultimately going for 12 issue of instructions, when you are issuing so many instructions you can afford to throw away lots of instructions with predication. Folding the prolog and epilogue into the loop makes huge sense, saves code and costs nothing, in theory. Real CPU's cannot be so wasteful.
[toc] | [prev] | [next] | [standalone]
| From | "Andy (Super) Glew" <andy@SPAM.comp-arch.net> |
|---|---|
| Date | 2012-02-06 18:46 -0800 |
| Message-ID | <4F30907E.4020806@SPAM.comp-arch.net> |
| In reply to | #5728 |
On 2/6/2012 3:36 AM, Brett Davis wrote: > In article<4F2EF96A.1040107@SPAM.comp-arch.net>, > "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote: >> Although let me note one thing: rotating register files don't change the >> number of registers that need to be renamed. They just mean that you >> either need to rotate the renamings in the register alias table, or that >> you do some arithmetic on the logical register number before it is renamed. > > Is this cheap/free or will it cause issues above 4GHz? > (As much as I believe high clocks are not possible, times change.) It costs pipestages. Not frequency. But as you note below, pipestages have a performance cost. The rule of thumb in the P6 generation was 5% per pipestage. By Willamette, it was 1% per pipestage - deeper pipelines, and better branch prediction. > I had heard that one of the big reasons for the failure of the first > Itanium design had to do with one of the forms of register rotation, > rumor was late in the design Intel had to add two stages to the pipeline > to deal with rotation, and this killed all the performance advantage. I don't know.
[toc] | [prev] | [next] | [standalone]
| From | Brett Davis <ggtgp@yahoo.com> |
|---|---|
| Date | 2012-02-28 20:02 -0600 |
| Message-ID | <ggtgp-F4CD07.20021128022012@netnews.mchsi.com> |
| In reply to | #5754 |
In article <4F30907E.4020806@SPAM.comp-arch.net>, "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote: > On 2/6/2012 3:36 AM, Brett Davis wrote: > > In article<4F2EF96A.1040107@SPAM.comp-arch.net>, > > "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote: > > >> Although let me note one thing: rotating register files don't change the > >> number of registers that need to be renamed. They just mean that you > >> either need to rotate the renamings in the register alias table, or that > >> you do some arithmetic on the logical register number before it is renamed. > > > > Is this cheap/free or will it cause issues above 4GHz? > > (As much as I believe high clocks are not possible, times change.) > > It costs pipestages. Not frequency. > > But as you note below, pipestages have a performance cost. The rule of > thumb in the P6 generation was 5% per pipestage. By Willamette, it was > 1% per pipestage - deeper pipelines, and better branch prediction. I have been thinking hard on rotating register files, and I think a better compromise is adding a rename register instruction. This is cheaper than a register copy, and perhaps could be handled by the front end, not needing to occupy any of the instruction pipes. (Merging a tag onto a nearby instruction if need be for other reasons.) You could recommend that the rename precede an instruction that uses the rename. You still have the decode cost using up one of your four decoders, but the stated reason for using register rotation is for short loops with a small number of instructions. This does not sound like a situation where you are decode limited. If you are decode limited you are generally in a big loop with a large count, where register rotation does not help as much. I do not know of any CPU's with a rename register instruction, suspect I am missing something.
[toc] | [prev] | [next] | [standalone]
| From | Brett Davis <ggtgp@yahoo.com> |
|---|---|
| Date | 2012-03-07 22:35 -0600 |
| Message-ID | <ggtgp-5C3492.22351707032012@netnews.mchsi.com> |
| In reply to | #6157 |
In article <ggtgp-F4CD07.20021128022012@netnews.mchsi.com>, Brett Davis <ggtgp@yahoo.com> wrote: > In article <4F30907E.4020806@SPAM.comp-arch.net>, > "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote: > > > On 2/6/2012 3:36 AM, Brett Davis wrote: > > > In article<4F2EF96A.1040107@SPAM.comp-arch.net>, > > > "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote: > > > > >> Although let me note one thing: rotating register files don't change the > > >> number of registers that need to be renamed. They just mean that you > > >> either need to rotate the renamings in the register alias table, or that > > >> you do some arithmetic on the logical register number before it is renamed. > > > > > > Is this cheap/free or will it cause issues above 4GHz? > > > (As much as I believe high clocks are not possible, times change.) > > > > It costs pipestages. Not frequency. > > > > But as you note below, pipestages have a performance cost. The rule of > > thumb in the P6 generation was 5% per pipestage. By Willamette, it was > > 1% per pipestage - deeper pipelines, and better branch prediction. > > I have been thinking hard on rotating register files, and I think a > better compromise is adding a rename register instruction. > This is cheaper than a register copy, and perhaps could be handled > by the front end, not needing to occupy any of the instruction pipes. > (Merging a tag onto a nearby instruction if need be for other reasons.) > You could recommend that the rename precede an instruction that uses > the rename. On the new Ivy Bridge chips just announced the MOV instructions no longer occupy an execution port. I think AMD has been doing this for years. This would seem to negate much of the need/use for register rotation. > You still have the decode cost using up one of your four decoders, > but the stated reason for using register rotation is for short loops > with a small number of instructions. This does not sound like a > situation where you are decode limited. > > If you are decode limited you are generally in a big loop with a large > count, where register rotation does not help as much. > > I do not know of any CPU's with a rename register instruction, > suspect I am missing something.
[toc] | [prev] | [next] | [standalone]
| From | "Andy (Super) Glew" <andy@SPAM.comp-arch.net> |
|---|---|
| Date | 2012-03-08 07:25 -0800 |
| Message-ID | <4F58CF5B.8070201@SPAM.comp-arch.net> |
| In reply to | #6296 |
On 3/7/2012 8:35 PM, Brett Davis wrote: > In article<ggtgp-F4CD07.20021128022012@netnews.mchsi.com>, > Brett Davis<ggtgp@yahoo.com> wrote: > >> In article<4F30907E.4020806@SPAM.comp-arch.net>, >> "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote: >> >>> On 2/6/2012 3:36 AM, Brett Davis wrote: >>>> In article<4F2EF96A.1040107@SPAM.comp-arch.net>, >>>> "Andy (Super) Glew"<andy@SPAM.comp-arch.net> wrote: >>> > On the new Ivy Bridge chips just announced the MOV instructions no longer > occupy an execution port. I think AMD has been doing this for years. > > This would seem to negate much of the need/use for register rotation. This question was already partially answered: MOV elimination like Intel does (I wasn't aware that AMD was doing it - they weren't when I was there) consumes instruction fetch, decode, and retirement bandwidth. >> You still have the decode cost using up one of your four decoders, >> but the stated reason for using register rotation is for short loops >> with a small number of instructions. This does not sound like a >> situation where you are decode limited. >> >> If you are decode limited you are generally in a big loop with a large >> count, where register rotation does not help as much. Umm, quite the contrary: I've usually found it to be the smallest loops that are the closest to being limited by decode or graduation bandwidth. Perhaps I have spent too much time tweaking AXPY.
[toc] | [prev] | [next] | [standalone]
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Date | 2012-02-06 12:49 +0000 |
| Message-ID | <2012Feb6.134944@mips.complang.tuwien.ac.at> |
| In reply to | #5724 |
"Andy (Super) Glew" <andy@SPAM.comp-arch.net> writes: >Register windows, on the other hand, increase the number of registers >that need to be renamed. This was a killer for Itanium. I tried playing >games such as renaming only the currently accessible registers (on >Itanium even that was a lot), or the top few procedure contexts, >stalling until retirement and renaming had settled down if you called >and returned too quickly. But this forces you into a copy at retirement >model. Sure, IA-64 allows the implementation to have more register names through the register stack, but it does not force it, and I really don't see that their absence is a "killer". IIRC there are 96 (or was it 128?) register names visible for the register stack. This should be plenty for lots of code. And for code where the call stacks are so deep, with so many parameters and locals, that additional register names would provide a significant benefit, even an implementation without the additional names would still be at least as good as (and probably better than) architectures without register stack/register windows. Or am I missing something? - anton -- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html
[toc] | [prev] | [next] | [standalone]
| From | Quadibloc <jsavard@ecn.ab.ca> |
|---|---|
| Date | 2012-02-04 11:14 -0800 |
| Message-ID | <29b5c840-a78d-47aa-a73a-a95a4897b276@ow3g2000pbc.googlegroups.com> |
| In reply to | #5700 |
On Feb 4, 6:15 am, Brett Davis <gg...@yahoo.com> wrote: > Now that Itanic is dead It's still being manufactured and shipped, and Intel is planning newer versions. Unlike the 68k (if you don't count ColdFire), it still has a little life in it. John Savard
[toc] | [prev] | [next] | [standalone]
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2012-02-03 06:04 -0800 |
| Message-ID | <d1aeaa6e-cf4f-4450-b8a1-58267ae936b0@hs8g2000vbb.googlegroups.com> |
| In reply to | #5657 |
On Feb 2, 10:16 am, Brett Davis <gg...@yahoo.com> wrote: > Are rotating register files still a bad idea? > > Rotating register files used to mean slow clock speeds. > SPARC has had competitive clocks for the past decade, > largely because everyone hit a thermal brick wall... > > I assume that if that brink wall breaks, rotating register files > are back to being doomed, but that looks unlikely. > > Does rotating register files buy you anything net after costs, > or is it just a fiasco in the age of modern OoO pipelines? Are you sure you understand the difference between rotating register files (e.g. Itanium floating point RF) and stacked register files (SPARC, Intel i960, just to list those made in highest volumes; Intel Itanium general-purpose RF could be seen as compromise between stacked and flat RF)?
[toc] | [prev] | [next] | [standalone]
| From | MitchAlsup <MitchAlsup@aol.com> |
|---|---|
| Date | 2012-02-03 12:20 -0800 |
| Message-ID | <6444849.2224.1328300401859.JavaMail.geo-discussion-forums@yqoe12> |
| In reply to | #5657 |
On Thursday, February 2, 2012 2:16:31 AM UTC-6, Brett Davis wrote: > Are rotating register files still a bad idea? Yes, and so are stacked register files. > Rotating register files used to mean slow clock speeds. > SPARC has had competitive clocks for the past decade, > largely because everyone hit a thermal brick wall... The larger the register file, the slower the file can be accessed. A long time ago, we used to 'access' the register file twice a cycle, one for reading and one for writing. Wire delay and clock skew has eliminated this as a proper design choice (at least for top GHz designs). Register renaming has contributed to the size of register files (and/or Reorder buffers/ future files,...) making the problem of register file access more difficult. The bigger the rename pool, the larger the total amount of silicon area devoted to this function. Neither SPARC nor Itanic has shown even moderate returns in performance for the <fairly hideous> register files utilized. > Does rotating register files buy you anything net after costs, > or is it just a fiasco in the age of modern OoO pipelines? Depends on who you believe, but in my opinion, no. Over the entire history of computers, simpler machines were faster, always in clock rate, mostly in delivered application performance. Compare a CDC 7600 to an IBM 360/91. Simpler designs take smaller design teams less time and get done with fewer bugs. Mitch
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.arch
csiph-web