Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 04 Feb 2012 11:12:54 -0600
Message-ID: <4F2D6713.7090902@SPAM.comp-arch.net>
Date: Sat, 04 Feb 2012 09:12:51 -0800
From: "Andy (Super) Glew" <andy@SPAM.comp-arch.net>
Reply-To: andy@SPAM.comp-arch.net
Organization: comp-arch.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
Newsgroups: comp.arch
Subject: Re: M68k add to memory is not a mistake any more
References: <ggtgp-8D1AEA.03180231012012@netnews.mchsi.com> <jgeu27$9rg$1@dont-email.me> <a7csv8-emn1.ln1@ntp6.tmsw.no> <jgiibd$hkq$1@dont-email.me> <ggtgp-0DB112.04020104022012@netnews.mchsi.com>
In-Reply-To: <ggtgp-0DB112.04020104022012@netnews.mchsi.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 74
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-FOU6oATL0osO3axKiP0HJEgOmEYEp/KK0At1c26buHNSFMgU9vmZA927Zj3qYiZwQQ58iTg9n0NjPTk!oZAImLDF1WKgxqefkRhD1P3jz5BPYYmrGRpWMjPYJuVFJhbICpQ8y6sXtfSlrJQ=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 4541
Xref: x330-a1.tempe.blueboxinc.net comp.arch:5705

On 2/4/2012 2:02 AM, Brett Davis wrote:
> In article<jgiibd$hkq$1@dont-email.me>,
>   Stephen Sprunk<stephen@sprunk.org>  wrote:
>
>> On 03-Feb-12 00:42, Terje Mathisen wrote:
>>> Stephen Sprunk wrote:
>>> Re. # of uops: It seems like modern x86 cores tend to go in the opposite
>>> direction and instead combine very common instruction pairs into a
>>> single uop, i.e. stuff like CMP/Jcc.
>>
>> I haven't figured out yet why that's a win--or even how it's possible.
>> The second instruction obviously has a direct dependency on the first,
>> so how can they execute as a single uop?
>>
>> Fusing them together for scheduling purposes makes sense, since they'd
>> likely end up traveling together anyway, but it seems like they'd need
>> to execute sequentially.
>
> Branches are taken care of by the front end predictor, the pipelines
> never see a branch, only sequential code.

The following was probably triggered by a literal reading of the reply. 
  But I suspect that Brett meant something slightly different.

??  On every machine that I have worked on, the pipelines see almost 
every branch, if only to verify that the prediction was correct.

If the predictors are correct, they should not be changing control flow. 
  But they are verifying control flow.

I.e.

* the pipelines check that the branch was taken or not taken as 
predicted, for conditional branches

* the pipelines verify the branch target was as predicted, for indirect 
jumps, indirect calls, and returns.

For some machines the branch target must also be verified for 
conditional jumps, and/or the virtual to physical translation of the 
branch target must be verified (for machines where the predictor 
predicts physical targets, not virtual targets, and where the mapping 
may change out from under you and/or the predictor is not fully tagged).

Unconditional direct jumps and calls might be elided from the pipeline 
(except for the virtual to physical issue).  But I've not seen anyone do 
so.  Calls typically flow down the pipeline, carrying the necessary info 
for register stack adjustment or link registers.

You could handle changes of control flow at different points in the 
pipeline.  Indeed, machines with my fingerprints do as much verification 
as possible early in the pipeline, i.e. at branch decode. And you could 
defer the verification to the retirement or graduation pipestages.  I.e. 
you could have branches skip the out-of-order part of the pipeline.

But this usually loses performance, because it adds latency to detection 
of incorrect branch predictions.

I.e. you could have branches that are correctly predicted never be seen 
by the pipeline, if you restrict pipeline to the OOO part, and not the 
in-order parts of the pipeline before ansd after.  But the incorrectly 
predicted branches, those benefit from OOO execution so as to be seen as 
early as possible.



> The CMP has a bit set that fires an interrupt if the branch prediction
> was wrong.

Not an interrupt.

I've spent 20 years teaching people that branch mispredictions should be 
handled faster than interrupts.