Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.arch > #5657 > unrolled thread

Are rotating register files still a bad idea?

Started byBrett Davis <ggtgp@yahoo.com>
First post2012-02-02 02:16 -0600
Last post2012-02-03 12:20 -0800
Articles 11 on this page of 31 — 15 participants

Back to article view | Back to comp.arch


Contents

  Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-02 02:16 -0600
    Re: Are rotating register files still a bad idea? "Paul A. Clayton" <paaronclayton@gmail.com> - 2012-02-02 07:41 -0800
    Re: Are rotating register files still a bad idea? Nomen Nescio <nobody@dizum.com> - 2012-02-02 19:04 +0100
      Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-02 23:16 -0600
        Re: Are rotating register files still a bad idea? Fritz Wuehler <fritz@spamexpire-201202.rodent.frell.theremailer.net> - 2012-02-03 18:19 +0100
          Re: Are rotating register files still a bad idea? Anne & Lynn Wheeler <lynn@garlic.com> - 2012-02-03 13:09 -0500
            Re: Are rotating register files still a bad idea? Stephen Fuld <SFuld@alumni.cmu.edu.invalid> - 2012-02-03 12:28 -0800
            Re: Are rotating register files still a bad idea? MitchAlsup <MitchAlsup@aol.com> - 2012-02-03 13:04 -0800
          Re: Are rotating register files still a bad idea? Thomas Womack <twomack@chiark.greenend.org.uk> - 2012-02-03 19:15 +0000
            Re: Are rotating register files still a bad idea? Mark Thorson <nospam@sonic.net> - 2012-02-03 13:08 -0800
              Re: Are rotating register files still a bad idea? Thomas Womack <twomack@chiark.greenend.org.uk> - 2012-02-03 22:21 +0000
                Re: Are rotating register files still a bad idea? Mark Thorson <nospam@sonic.net> - 2012-02-03 18:42 -0800
                  Re: Are rotating register files still a bad idea? Glen Overby <coreSPAMsample@charter.net> - 2012-03-01 18:18 -0600
              Re: Are rotating register files still a bad idea? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-02-04 13:26 +0000
              Re: Are rotating register files still a bad idea? Fritz Wuehler <fritz@spamexpire-201202.rodent.frell.theremailer.net> - 2012-02-05 00:58 +0100
            Re: Are rotating register files still a bad idea? Nomen Nescio <nobody@dizum.com> - 2012-02-04 21:00 +0100
              Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-05 00:35 -0600
          Re: Are rotating register files still a bad idea? Quadibloc <jsavard@ecn.ab.ca> - 2012-02-03 10:11 -0800
    Re: Are rotating register files still a bad idea? jgk@panix.com (Joe keane) - 2012-02-03 23:20 +0000
    Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-02-03 22:54 -0800
      Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-04 07:15 -0600
        Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-02-05 13:49 -0800
          Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-06 05:36 -0600
            Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-02-06 18:46 -0800
              Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-02-28 20:02 -0600
                Re: Are rotating register files still a bad idea? Brett Davis <ggtgp@yahoo.com> - 2012-03-07 22:35 -0600
                  Re: Are rotating register files still a bad idea? "Andy (Super) Glew" <andy@SPAM.comp-arch.net> - 2012-03-08 07:25 -0800
          Re: Are rotating register files still a bad idea? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-02-06 12:49 +0000
        Re: Are rotating register files still a bad idea? Quadibloc <jsavard@ecn.ab.ca> - 2012-02-04 11:14 -0800
    Re: Are rotating register files still a bad idea? Michael S <already5chosen@yahoo.com> - 2012-02-03 06:04 -0800
    Re: Are rotating register files still a bad idea? MitchAlsup <MitchAlsup@aol.com> - 2012-02-03 12:20 -0800

Page 2 of 2 — ← Prev page 1 [2]


#5700

FromBrett Davis <ggtgp@yahoo.com>
Date2012-02-04 07:15 -0600
Message-ID<ggtgp-25AEB5.07151604022012@netnews.mchsi.com>
In reply to#5697
In article <4F2CD63A.4050306@SPAM.comp-arch.net>,
 "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote:

> On 2/2/2012 12:16 AM, Brett Davis wrote:
> >
> > Do rotating register files buy you anything net after costs,
> > or is it just a fiasco in the age of modern OoO pipelines?
> 
> When I first saw your post, I thought that you were talking about the 
> rotating register file that Cydrome and Itanium had, that could be made 
> to rotate on every loop iteration, and not SPARC-style overlapping 
> register windows (which Itanium also had).

Opps, wrong terminology, meant register windows.

That Itanium had both types of register rotation made me wince.

> I actually think that Cydrome state rotating register files may be a 
> good idea, albeit neglected.  They allow software pipelining to be done 
> without requiring reg-reg moves and give many of the benefits of loop 
> unrolling without increasing code size.

On actual rotating registers
http://www.cs.nmsu.edu/~rvinyard/itanium/register_rotation.htm

I do not see that rotating registers have a point when you are OoO
with 100 instructions in process. The hardware unrolls loops for you.

I work in embedded with limited OoO and only 6 ops in flight, which
means I unroll loops. Rotating registers could help, but there is
so much other baggage and setup that I do not see a embedded chip 
going down that road. Compiler support is the killer.

You still have the prolog and epilogue plus some setup, all you
save is one or two register copies, per loop. so ~5% fewer 
instructions which means 5% faster. I can sneeze on the code and
get a 5% speedup.

I don't bother hand optimizing unless I can get 20%.

I would rather add LEA to a RISC chip, or ARM shifted compute,
or IBM's cool insert/extract, these will give a similar speedup
on more codes.  << add your favorites here >>

With 32 registers that means half would rotate, sucky, you need
at least 64 registers. The Console guys might consider it, they
have 128 floats/vectors and are mostly in-order.

Now that Itanic is dead I do not have an axe to grind, I am willing
to give anything Itanium did a second look. Better yet Cydrome.

On my third look I noticed that you can sometimes fold the 
prolog and epilogue into the loop, it makes my head hurt and you
do not want to do this if you end up canceling a large number of ops,
which is exactly what will happen most of the time. Defeats the purpose.
You save ~six instructions at the cost of burning ~two dozen in the loop.

Makes my axe grind.

Feel free to make corrections and give your own take of the tradeoffs,
and anything I missed/overlooked.

> Overlapping register windows for function calls I am not so sure about.
> 
> One thing the think about:  GPU style multiple threads in the same 
> register file.
> 
> a) COULD work with rotating register windows for loops
> 
> b) would be much harder to make work with overlapping register windows 
> for function calls.

[toc] | [prev] | [next] | [standalone]


#5724

From"Andy (Super) Glew" <andy@SPAM.comp-arch.net>
Date2012-02-05 13:49 -0800
Message-ID<4F2EF96A.1040107@SPAM.comp-arch.net>
In reply to#5700
On 2/4/2012 5:15 AM, Brett Davis wrote:
> In article<4F2CD63A.4050306@SPAM.comp-arch.net>,
>   "Andy (Super) Glew"<andy@SPAM.comp-arch.net>  wrote:
>
>> On 2/2/2012 12:16 AM, Brett Davis wrote:
>>>
>>> Do rotating register files buy you anything net after costs,
>>> or is it just a fiasco in the age of modern OoO pipelines?
>>
>> When I first saw your post, I thought that you were talking about the
>> rotating register file that Cydrome and Itanium had, that could be made
>> to rotate on every loop iteration, and not SPARC-style overlapping
>> register windows (which Itanium also had).
>
> Opps, wrong terminology, meant register windows.
>
> That Itanium had both types of register rotation made me wince.

Me, too.  And yet I can see nice aspecys of both rotating register files 
for loops and register windows for call/returns, etc.

Oftentimes, when there is more than one good way to extend an 
architecture, like here, but where they conflict - often, in such 
situations, it is better to do neither.

Although let me note one thing: rotating register files don't change the 
number of registers that need to be renamed.  They just mean that you 
either need to rotate the renamings in the register alias table, or that 
you do some arithmetic on the logical register number before it is renamed.

Register windows, on the other hand, increase the number of registers 
that need to be renamed. This was a killer for Itanium.  I tried playing 
games such as renaming only the currently accessible registers (on 
Itanium even that was a lot), or the top few procedure contexts, 
stalling until retirement and renaming had settled down if you called 
and returned too quickly.  But this forces you into a copy at retirement 
model.


>> I actually think that Cydrome state rotating register files may be a
>> good idea, albeit neglected.  They allow software pipelining to be done
>> without requiring reg-reg moves and give many of the benefits of loop
>> unrolling without increasing code size.
>
> On actual rotating registers
> http://www.cs.nmsu.edu/~rvinyard/itanium/register_rotation.htm
>
> I do not see that rotating registers have a point when you are OoO
> with 100 instructions in process. The hardware unrolls loops for you.

And yet people insist of unrolling loops for OOO processors.  And it 
often pays off.

E.g. I have been looking at a loop recently that is 9 instructions long, 
of which only 5 are real work. 4 of 9 are overhead - loop induction 
variables and and branch.  And it fills the machine, maximal IPC.  So 
any unrolling that can reduce the overhead per iteration, eliminating 
any of those overhead instructions, very likely improves performance. 
At the cost of code size.

Unfortunately, rotating registers does not help this aspect of loop 
unrolling - reducing the induction variable overhead.

Rotating registers DOES reduce the code scheduling overhead of loop 
unrolling.

Hmm...

>
> I work in embedded with limited OoO and only 6 ops in flight, which
> means I unroll loops. Rotating registers could help, but there is
> so much other baggage and setup that I do not see a embedded chip
> going down that road. Compiler support is the killer.
>
> You still have the prolog and epilogue plus some setup, all you
> save is one or two register copies, per loop. so ~5% fewer
> instructions which means 5% faster. I can sneeze on the code and
> get a 5% speedup.

If by prologue and epilogue you mean the code that ramps up to the full, 
maximally scheduled, loop body, and then ramps down, then, no, you don't 
need that.  One of the features of rotating register files is that the 
maximally scheduled loop body has loop predicates, that allow the 
prologue and epilogue to be folded in.

(However, I suspect that the ramp up/down code obtained in this way is 
sub-optimal.  Which means that you tradeoff the lower code size, for 
lower performance on small loop counts.


> With 32 registers that means half would rotate, sucky, you need
> at least 64 registers. The Console guys might consider it, they
> have 128 floats/vectors and are mostly in-order.

??  Where do you get "half would rotate"?

That's a number TBD.

The circuit is basically

post_rotation_regnum :=
      if( pre_rotation_regnum & rotmask == rotbase )
      then (((pre_rotation_regnum & ~rotmask)
		+ rotcount) & ~rotmask)
	       | rotbase)
      else pre_rotation_regnum

which I have expressed suboptimally for clarity.

With the above you can make the rotating region any power of two size, 
and any naturaly aligned section of the register file.


>
> Now that Itanic is dead I do not have an axe to grind, I am willing
> to give anything Itanium did a second look. Better yet Cydrome.
>
> On my third look I noticed that you can sometimes fold the
> prolog and epilogue into the loop, it makes my head hurt and you
> do not want to do this if you end up canceling a large number of ops,
> which is exactly what will happen most of the time. Defeats the purpose.
> You save ~six instructions at the cost of burning ~two dozen in the loop.

No, that's exactly my concern.  I think they oversold the benefits of 
rotating loop predicates.  I would reexamined with and without.

[toc] | [prev] | [next] | [standalone]


#5728

FromBrett Davis <ggtgp@yahoo.com>
Date2012-02-06 05:36 -0600
Message-ID<ggtgp-7BA6C0.05364706022012@netnews.mchsi.com>
In reply to#5724
In article <4F2EF96A.1040107@SPAM.comp-arch.net>,
 "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote:

> On 2/4/2012 5:15 AM, Brett Davis wrote:
> > In article<4F2CD63A.4050306@SPAM.comp-arch.net>,
> >   "Andy (Super) Glew"<andy@SPAM.comp-arch.net>  wrote:
> >
> >> On 2/2/2012 12:16 AM, Brett Davis wrote:
> >
> > That Itanium had both types of register rotation made me wince.
> 
> Me, too.  And yet I can see nice aspecys of both rotating register files 
> for loops and register windows for call/returns, etc.
>
> Oftentimes, when there is more than one good way to extend an 
> architecture, like here, but where they conflict - often, in such 
> situations, it is better to do neither.

I agree, good design is not about saying yes to every kitchen sink choice,
it's about saying no, so you can be good at what you do well.
As soon as I saw the Itanium instruction set I knew they were DOOMED.

> Although let me note one thing: rotating register files don't change the 
> number of registers that need to be renamed.  They just mean that you 
> either need to rotate the renamings in the register alias table, or that 
> you do some arithmetic on the logical register number before it is renamed.

Is this cheap/free or will it cause issues above 4GHz?
(As much as I believe high clocks are not possible, times change.)

> Register windows, on the other hand, increase the number of registers 
> that need to be renamed. This was a killer for Itanium.  I tried playing 
> games such as renaming only the currently accessible registers (on 
> Itanium even that was a lot), or the top few procedure contexts, 
> stalling until retirement and renaming had settled down if you called 
> and returned too quickly.  But this forces you into a copy at retirement 
> model.

So register windows are a loser, but not for the rotation reason I thought.
Thanks, the world is full of misidentified reasons for success and failure,
and you cleared one up.

I had heard that one of the big reasons for the failure of the first 
Itanium design had to do with one of the forms of register rotation,
rumor was late in the design Intel had to add two stages to the pipeline
to deal with rotation, and this killed all the performance advantage.

Any comment?

> >> I actually think that Cydrome state rotating register files may be a
> >> good idea, albeit neglected.  They allow software pipelining to be done
> >> without requiring reg-reg moves and give many of the benefits of loop
> >> unrolling without increasing code size.
> >
> > On actual rotating registers
> > http://www.cs.nmsu.edu/~rvinyard/itanium/register_rotation.htm
> >
> > I do not see that rotating registers have a point when you are OoO
> > with 100 instructions in process. The hardware unrolls loops for you.
> 
> And yet people insist of unrolling loops for OOO processors.  And it 
> often pays off.
> 
> E.g. I have been looking at a loop recently that is 9 instructions long, 
> of which only 5 are real work. 4 of 9 are overhead - loop induction 
> variables and and branch.  And it fills the machine, maximal IPC.  So 
> any unrolling that can reduce the overhead per iteration, eliminating 
> any of those overhead instructions, very likely improves performance. 
> At the cost of code size.

I do not optimize for x86 or high end and overlooked the other reasons
for unrolling. Loop induction overhead will force unrolling, you would
think that compilers would be able to detect this and auto-unroll.

The vast majority of optimizations I do have do with C aliasing rules
that compilers have to obey, that cripple performance on low end mostly
in-order processors.

> Unfortunately, rotating registers does not help this aspect of loop 
> unrolling - reducing the induction variable overhead.
> 
> Rotating registers DOES reduce the code scheduling overhead of loop 
> unrolling.
> 
> Hmm...
> 
> >
> > I work in embedded with limited OoO and only 6 ops in flight, which
> > means I unroll loops. Rotating registers could help, but there is
> > so much other baggage and setup that I do not see a embedded chip
> > going down that road. Compiler support is the killer.
> >
> > You still have the prolog and epilogue plus some setup, all you
> > save is one or two register copies, per loop. so ~5% fewer
> > instructions which means 5% faster. I can sneeze on the code and
> > get a 5% speedup.

Whining about a 5% speedup is childish, I was a little unfair to register
rotation.

> If by prologue and epilogue you mean the code that ramps up to the full, 
> maximally scheduled, loop body, and then ramps down, then, no, you don't 
> need that.  One of the features of rotating register files is that the 
> maximally scheduled loop body has loop predicates, that allow the 
> prologue and epilogue to be folded in.
> 
> (However, I suspect that the ramp up/down code obtained in this way is 
> sub-optimal.  Which means that you tradeoff the lower code size, for 
> lower performance on small loop counts.

That is a bad trade, and one I never have never had to make.
On the other hand I am the one that talks about infinitely fast CPU's
that are choked by slow RAM. 

The DRAM fetch in the prologue is going to stall the CPU any way, 
the CPU can merrily run a bunch of canceled math ops at little cost 
while waiting for the load. 

So I can only complain about folding the epilogue being wasteful,
folding the prologue looks wasteful but is free, a bonus even.

On the second hand the false dependancies in the to be canceled ops may
also cause stalls, low end only has a 6 instruction OoO window.

The compiler has little hope in heck of deciding to fold prologues
and epilogues, by default I would not fold either.

Compilers do not know when to unroll, but you can use prologues
and epilogues without rolling. The purpose would be to start the first
load for the loop early, this will give a small performance bump at
a small code increase. Again, compiler not smart enough to decide.

I am warming to register rotation, lots of small potential, all the 
downside is in the hardware design...

> > With 32 registers that means half would rotate, sucky, you need
> > at least 64 registers. The Console guys might consider it, they
> > have 128 floats/vectors and are mostly in-order.
> 
> ??  Where do you get "half would rotate"?
> 
> That's a number TBD.
> 
> The circuit is basically
> 
> post_rotation_regnum :=
>       if( pre_rotation_regnum & rotmask == rotbase )
>       then (((pre_rotation_regnum & ~rotmask)
> 		+ rotcount) & ~rotmask)
> 	       | rotbase)
>       else pre_rotation_regnum
> 
> which I have expressed suboptimally for clarity.
> 
> With the above you can make the rotating region any power of two size, 
> and any naturaly aligned section of the register file.

That info is important, and helps sell register rotation.

> > Now that Itanic is dead I do not have an axe to grind, I am willing
> > to give anything Itanium did a second look. Better yet Cydrome.
> >
> > On my third look I noticed that you can sometimes fold the
> > prolog and epilogue into the loop, it makes my head hurt and you
> > do not want to do this if you end up canceling a large number of ops,
> > which is exactly what will happen most of the time. Defeats the purpose.
> > You save ~six instructions at the cost of burning ~two dozen in the loop.
> 
> No, that's exactly my concern.  I think they oversold the benefits of 
> rotating loop predicates.  I would reexamined with and without.

Itanium was ultimately going for 12 issue of instructions, when you are
issuing so many instructions you can afford to throw away lots of 
instructions with predication. Folding the prolog and epilogue into the
loop makes huge sense, saves code and costs nothing, in theory.

Real CPU's cannot be so wasteful.

[toc] | [prev] | [next] | [standalone]


#5754

From"Andy (Super) Glew" <andy@SPAM.comp-arch.net>
Date2012-02-06 18:46 -0800
Message-ID<4F30907E.4020806@SPAM.comp-arch.net>
In reply to#5728
On 2/6/2012 3:36 AM, Brett Davis wrote:
> In article<4F2EF96A.1040107@SPAM.comp-arch.net>,
>   "Andy (Super) Glew"<andy@SPAM.comp-arch.net>  wrote:

>> Although let me note one thing: rotating register files don't change the
>> number of registers that need to be renamed.  They just mean that you
>> either need to rotate the renamings in the register alias table, or that
>> you do some arithmetic on the logical register number before it is renamed.
>
> Is this cheap/free or will it cause issues above 4GHz?
> (As much as I believe high clocks are not possible, times change.)

It costs pipestages. Not frequency.

But as you note below, pipestages have a performance cost.  The rule of 
thumb in the P6 generation was 5% per pipestage. By Willamette, it was 
1% per pipestage - deeper pipelines, and better branch prediction.

> I had heard that one of the big reasons for the failure of the first
> Itanium design had to do with one of the forms of register rotation,
> rumor was late in the design Intel had to add two stages to the pipeline
> to deal with rotation, and this killed all the performance advantage.

I don't know.

[toc] | [prev] | [next] | [standalone]


#6157

FromBrett Davis <ggtgp@yahoo.com>
Date2012-02-28 20:02 -0600
Message-ID<ggtgp-F4CD07.20021128022012@netnews.mchsi.com>
In reply to#5754
In article <4F30907E.4020806@SPAM.comp-arch.net>,
 "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote:

> On 2/6/2012 3:36 AM, Brett Davis wrote:
> > In article<4F2EF96A.1040107@SPAM.comp-arch.net>,
> >   "Andy (Super) Glew"<andy@SPAM.comp-arch.net>  wrote:
> 
> >> Although let me note one thing: rotating register files don't change the
> >> number of registers that need to be renamed.  They just mean that you
> >> either need to rotate the renamings in the register alias table, or that
> >> you do some arithmetic on the logical register number before it is renamed.
> >
> > Is this cheap/free or will it cause issues above 4GHz?
> > (As much as I believe high clocks are not possible, times change.)
> 
> It costs pipestages. Not frequency.
> 
> But as you note below, pipestages have a performance cost.  The rule of 
> thumb in the P6 generation was 5% per pipestage. By Willamette, it was 
> 1% per pipestage - deeper pipelines, and better branch prediction.

I have been thinking hard on rotating register files, and I think a 
better compromise is adding a rename register instruction.
This is cheaper than a register copy, and perhaps could be handled
by the front end, not needing to occupy any of the instruction pipes.
(Merging a tag onto a nearby instruction if need be for other reasons.)
You could recommend that the rename precede an instruction that uses
the rename.

You still have the decode cost using up one of your four decoders,
but the stated reason for using register rotation is for short loops
with a small number of instructions. This does not sound like a 
situation where you are decode limited.

If you are decode limited you are generally in a big loop with a large
count, where register rotation does not help as much.

I do not know of any CPU's with a rename register instruction,
suspect I am missing something.

[toc] | [prev] | [next] | [standalone]


#6296

FromBrett Davis <ggtgp@yahoo.com>
Date2012-03-07 22:35 -0600
Message-ID<ggtgp-5C3492.22351707032012@netnews.mchsi.com>
In reply to#6157
In article <ggtgp-F4CD07.20021128022012@netnews.mchsi.com>,
 Brett Davis <ggtgp@yahoo.com> wrote:

> In article <4F30907E.4020806@SPAM.comp-arch.net>,
>  "Andy (Super) Glew" <andy@SPAM.comp-arch.net> wrote:
> 
> > On 2/6/2012 3:36 AM, Brett Davis wrote:
> > > In article<4F2EF96A.1040107@SPAM.comp-arch.net>,
> > >   "Andy (Super) Glew"<andy@SPAM.comp-arch.net>  wrote:
> > 
> > >> Although let me note one thing: rotating register files don't change the
> > >> number of registers that need to be renamed.  They just mean that you
> > >> either need to rotate the renamings in the register alias table, or that
> > >> you do some arithmetic on the logical register number before it is renamed.
> > >
> > > Is this cheap/free or will it cause issues above 4GHz?
> > > (As much as I believe high clocks are not possible, times change.)
> > 
> > It costs pipestages. Not frequency.
> > 
> > But as you note below, pipestages have a performance cost.  The rule of 
> > thumb in the P6 generation was 5% per pipestage. By Willamette, it was 
> > 1% per pipestage - deeper pipelines, and better branch prediction.
> 
> I have been thinking hard on rotating register files, and I think a 
> better compromise is adding a rename register instruction.
> This is cheaper than a register copy, and perhaps could be handled
> by the front end, not needing to occupy any of the instruction pipes.
> (Merging a tag onto a nearby instruction if need be for other reasons.)
> You could recommend that the rename precede an instruction that uses
> the rename.

On the new Ivy Bridge chips just announced the MOV instructions no longer
occupy an execution port. I think AMD has been doing this for years.

This would seem to negate much of the need/use for register rotation.

> You still have the decode cost using up one of your four decoders,
> but the stated reason for using register rotation is for short loops
> with a small number of instructions. This does not sound like a 
> situation where you are decode limited.
> 
> If you are decode limited you are generally in a big loop with a large
> count, where register rotation does not help as much.
> 
> I do not know of any CPU's with a rename register instruction,
> suspect I am missing something.

[toc] | [prev] | [next] | [standalone]


#6301

From"Andy (Super) Glew" <andy@SPAM.comp-arch.net>
Date2012-03-08 07:25 -0800
Message-ID<4F58CF5B.8070201@SPAM.comp-arch.net>
In reply to#6296
On 3/7/2012 8:35 PM, Brett Davis wrote:
> In article<ggtgp-F4CD07.20021128022012@netnews.mchsi.com>,
>   Brett Davis<ggtgp@yahoo.com>  wrote:
>
>> In article<4F30907E.4020806@SPAM.comp-arch.net>,
>>   "Andy (Super) Glew"<andy@SPAM.comp-arch.net>  wrote:
>>
>>> On 2/6/2012 3:36 AM, Brett Davis wrote:
>>>> In article<4F2EF96A.1040107@SPAM.comp-arch.net>,
>>>>    "Andy (Super) Glew"<andy@SPAM.comp-arch.net>   wrote:
>>>
> On the new Ivy Bridge chips just announced the MOV instructions no longer
> occupy an execution port. I think AMD has been doing this for years.
>
> This would seem to negate much of the need/use for register rotation.

This question was already partially answered: MOV elimination like Intel 
does (I wasn't aware that AMD was doing it - they weren't when I was 
there) consumes instruction fetch, decode, and retirement bandwidth.

>> You still have the decode cost using up one of your four decoders,
>> but the stated reason for using register rotation is for short loops
>> with a small number of instructions. This does not sound like a
>> situation where you are decode limited.
>>
>> If you are decode limited you are generally in a big loop with a large
>> count, where register rotation does not help as much.

Umm, quite the contrary: I've usually found it to be the smallest loops 
that are the closest to being limited by decode or graduation bandwidth. 
Perhaps I have spent too much time tweaking AXPY.

[toc] | [prev] | [next] | [standalone]


#5731

Fromanton@mips.complang.tuwien.ac.at (Anton Ertl)
Date2012-02-06 12:49 +0000
Message-ID<2012Feb6.134944@mips.complang.tuwien.ac.at>
In reply to#5724
"Andy (Super) Glew" <andy@SPAM.comp-arch.net> writes:
>Register windows, on the other hand, increase the number of registers 
>that need to be renamed. This was a killer for Itanium.  I tried playing 
>games such as renaming only the currently accessible registers (on 
>Itanium even that was a lot), or the top few procedure contexts, 
>stalling until retirement and renaming had settled down if you called 
>and returned too quickly.  But this forces you into a copy at retirement 
>model.

Sure, IA-64 allows the implementation to have more register names
through the register stack, but it does not force it, and I really
don't see that their absence is a "killer".  IIRC there are 96 (or was
it 128?)  register names visible for the register stack.  This should
be plenty for lots of code.  And for code where the call stacks are so
deep, with so many parameters and locals, that additional register
names would provide a significant benefit, even an implementation
without the additional names would still be at least as good as (and
probably better than) architectures without register stack/register
windows.  Or am I missing something?

- anton
-- 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

[toc] | [prev] | [next] | [standalone]


#5769

FromQuadibloc <jsavard@ecn.ab.ca>
Date2012-02-04 11:14 -0800
Message-ID<29b5c840-a78d-47aa-a73a-a95a4897b276@ow3g2000pbc.googlegroups.com>
In reply to#5700
On Feb 4, 6:15 am, Brett Davis <gg...@yahoo.com> wrote:

> Now that Itanic is dead

It's still being manufactured and shipped, and Intel is planning newer
versions. Unlike the 68k (if you don't count ColdFire), it still has a
little life in it.

John Savard

[toc] | [prev] | [next] | [standalone]


#5757

FromMichael S <already5chosen@yahoo.com>
Date2012-02-03 06:04 -0800
Message-ID<d1aeaa6e-cf4f-4450-b8a1-58267ae936b0@hs8g2000vbb.googlegroups.com>
In reply to#5657
On Feb 2, 10:16 am, Brett Davis <gg...@yahoo.com> wrote:
> Are rotating register files still a bad idea?
>
> Rotating register files used to mean slow clock speeds.
> SPARC has had competitive clocks for the past decade,
> largely because everyone hit a thermal brick wall...
>
> I assume that if that brink wall breaks, rotating register files
> are back to being doomed, but that looks unlikely.
>
> Does rotating register files buy you anything net after costs,
> or is it just a fiasco in the age of modern OoO pipelines?

Are you sure you understand the difference between rotating register
files (e.g. Itanium floating point RF) and stacked register files
(SPARC, Intel i960, just to list those made in highest volumes; Intel
Itanium general-purpose RF could be seen as compromise between stacked
and flat RF)?

[toc] | [prev] | [next] | [standalone]


#5761

FromMitchAlsup <MitchAlsup@aol.com>
Date2012-02-03 12:20 -0800
Message-ID<6444849.2224.1328300401859.JavaMail.geo-discussion-forums@yqoe12>
In reply to#5657
On Thursday, February 2, 2012 2:16:31 AM UTC-6, Brett Davis wrote:
> Are rotating register files still a bad idea?

Yes, and so are stacked register files.
 
> Rotating register files used to mean slow clock speeds.
> SPARC has had competitive clocks for the past decade,
> largely because everyone hit a thermal brick wall...

The larger the register file, the slower the file can be accessed.

A long time ago, we used to 'access' the register file twice a cycle, 
one for reading and one for writing. Wire delay and clock skew has
eliminated this as a proper design choice (at least for top GHz
designs).

Register renaming has contributed to the size of register files
(and/or Reorder buffers/ future files,...) making the problem of 
register file access more difficult. The bigger the rename pool,
the larger the total amount of silicon area devoted to this 
function. 

Neither SPARC nor Itanic has shown even moderate returns in
performance for the <fairly hideous> register files utilized.
 
> Does rotating register files buy you anything net after costs,
> or is it just a fiasco in the age of modern OoO pipelines?

Depends on who you believe, but in my opinion, no. Over the entire 
history of computers, simpler machines were faster, always in clock
rate, mostly in delivered application performance. Compare a CDC
7600 to an IBM 360/91. Simpler designs take smaller design teams 
less time and get done with fewer bugs.

Mitch

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.arch


csiph-web