Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Sat, 04 Feb 2012 11:12:54 -0600 Message-ID: <4F2D6713.7090902@SPAM.comp-arch.net> Date: Sat, 04 Feb 2012 09:12:51 -0800 From: "Andy (Super) Glew" Reply-To: andy@SPAM.comp-arch.net Organization: comp-arch.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 Newsgroups: comp.arch Subject: Re: M68k add to memory is not a mistake any more References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Lines: 74 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-FOU6oATL0osO3axKiP0HJEgOmEYEp/KK0At1c26buHNSFMgU9vmZA927Zj3qYiZwQQ58iTg9n0NjPTk!oZAImLDF1WKgxqefkRhD1P3jz5BPYYmrGRpWMjPYJuVFJhbICpQ8y6sXtfSlrJQ= X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 4541 Xref: x330-a1.tempe.blueboxinc.net comp.arch:5705 On 2/4/2012 2:02 AM, Brett Davis wrote: > In article, > Stephen Sprunk wrote: > >> On 03-Feb-12 00:42, Terje Mathisen wrote: >>> Stephen Sprunk wrote: >>> Re. # of uops: It seems like modern x86 cores tend to go in the opposite >>> direction and instead combine very common instruction pairs into a >>> single uop, i.e. stuff like CMP/Jcc. >> >> I haven't figured out yet why that's a win--or even how it's possible. >> The second instruction obviously has a direct dependency on the first, >> so how can they execute as a single uop? >> >> Fusing them together for scheduling purposes makes sense, since they'd >> likely end up traveling together anyway, but it seems like they'd need >> to execute sequentially. > > Branches are taken care of by the front end predictor, the pipelines > never see a branch, only sequential code. The following was probably triggered by a literal reading of the reply. But I suspect that Brett meant something slightly different. ?? On every machine that I have worked on, the pipelines see almost every branch, if only to verify that the prediction was correct. If the predictors are correct, they should not be changing control flow. But they are verifying control flow. I.e. * the pipelines check that the branch was taken or not taken as predicted, for conditional branches * the pipelines verify the branch target was as predicted, for indirect jumps, indirect calls, and returns. For some machines the branch target must also be verified for conditional jumps, and/or the virtual to physical translation of the branch target must be verified (for machines where the predictor predicts physical targets, not virtual targets, and where the mapping may change out from under you and/or the predictor is not fully tagged). Unconditional direct jumps and calls might be elided from the pipeline (except for the virtual to physical issue). But I've not seen anyone do so. Calls typically flow down the pipeline, carrying the necessary info for register stack adjustment or link registers. You could handle changes of control flow at different points in the pipeline. Indeed, machines with my fingerprints do as much verification as possible early in the pipeline, i.e. at branch decode. And you could defer the verification to the retirement or graduation pipestages. I.e. you could have branches skip the out-of-order part of the pipeline. But this usually loses performance, because it adds latency to detection of incorrect branch predictions. I.e. you could have branches that are correctly predicted never be seen by the pipeline, if you restrict pipeline to the OOO part, and not the in-order parts of the pipeline before ansd after. But the incorrectly predicted branches, those benefit from OOO execution so as to be seen as early as possible. > The CMP has a bit set that fires an interrupt if the branch prediction > was wrong. Not an interrupt. I've spent 20 years teaching people that branch mispredictions should be handled faster than interrupts.