Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Sat, 11 Feb 2012 17:35:30 -0600 Message-ID: <4F36FB3A.3070304@SPAM.comp-arch.net> Date: Sat, 11 Feb 2012 15:35:22 -0800 From: "Andy (Super) Glew" Reply-To: andy@SPAM.comp-arch.net Organization: comp-arch.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 Newsgroups: comp.arch Subject: Re: M68k add to memory is not a mistake any more References: <4F2CD4FA.4050004@SPAM.comp-arch.net> <4F341C7B.9070703@SPAM.comp-arch.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Lines: 53 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-6Pu2YeZKriCam+Dx74I2bdI498mdi/rr2x00mzJR0RtGz8Kwv0SWliw9/l0gdhKfqUoTaSKb/XfAUnj!O9a8FzQAeMmHG4K/bVrZk1wgdmiS3dIUuu/n1gT47pSQPRf6aesm1UoFZ71s+L4= X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 4168 Xref: x330-a1.tempe.blueboxinc.net comp.arch:5885 On 2/11/2012 10:29 AM, Tim McCaffrey wrote: > In article, "terje.mathisenattmsw.no" says... >> > >> The part of disliking anything totally fixed is borne out by the x86 >> usage of CL/CX/ECX/RCX for all variable shift counts: I have never seen >> a compiler that understood that said shift(s) could be so >> (latency-)critical that all the rest of the register allocations had to >> be switched around, just so that those particular shift counts would end >> up naturally in ECX. >> > > Not to mention all the other instructions with dedicated registers (string > instructions (ECX, EDI, ESI, EAX), I/O, loop& jcxz, Multiply, divide, etc). > (BTW, why is jcxz expensive and CMP CX,0/JZ is not?) http://download.intel.com/design/pentiumii/manuals/24281603.pdf gives JCXZ a uop count of 2, versus 1 uop for the "normal" Jccs. I don't think that JCXZ is intrinsically expensive. It could be decoded into a uop TargetIP := JumpIfReg0(CX,Offset), if Intel's datapath supported that. It's just that there is only so much room in the fast path decoders. The "normal" Jcc pretty much have to be fast. They look something like TargetIP := Jcc(Flag,Offset). JCXZ would require something verging on a new uop type (I say "verging on", since there is not that much difference between JumpIfReg0 and Jcc, on an out of order machine if the flags are attached to the high bits of a physical register - e.g. bits >31 on a 32 bit machine, >63 on a 64 bit machine, etc. But if the flags are a completely differehnt type of dataflow operand, then a new uop type.) JCXZ isn't close to Jcc, so would have required an extra pattern - minterms - in the fast decoder to make fast. But Jcc is close to a whole slew of other slow, deprecated, instructions - the other LOOP instructions, IN and OUT - so probably naturally wants to be covered by minterms for slowness. Back on earlier in-order machines, JCXZ was deprecated. E.g. on Pentium it was NP, non-paired. JCXZ only comes with an 8 bit displacement, as do the LOOP instructions, whereas the other Jccs can have 16 bit displacements. This is probably the kiss of death: JCZ and LOOP* mikght well have been made fast, if they had 16 bit displacements, displacements large enough to be useful. But, with only an 8 bit displacement, nobody uses them; and since nobody uses them, nobody optimizes for them. Compare-Jcc fusion, although equivalent in "complexity" to JCXZ, is much more useful/much more widely used.