Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Wed, 15 Feb 2012 03:07:16 -0600 Message-ID: <4F3B75C3.40708@SPAM.comp-arch.net> Date: Wed, 15 Feb 2012 01:07:15 -0800 From: "Andy (Super) Glew" Reply-To: andy@SPAM.comp-arch.net Organization: comp-arch.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 Newsgroups: comp.arch Subject: Re: Single Thread Performance References: <1083982.844.1328506553065.JavaMail.geo-discussion-forums@vbtr6> <18985540.1742.1328921030856.JavaMail.geo-discussion-forums@yqks7> <4F38875A.2050407@SPAM.comp-arch.net> <12562993.824.1329107776365.JavaMail.geo-discussion-forums@ynnj2> <4F392235.1020804@SPAM.comp-arch.net> <16314332.739.1329152301273.JavaMail.geo-discussion-forums@ynnk21> <21210493.750.1329240371457.JavaMail.geo-discussion-forums@ynt13> <6c4s09-g6d1.ln1@ntp6.tmsw.no> In-Reply-To: <6c4s09-g6d1.ln1@ntp6.tmsw.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Lines: 36 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-q0uTFPX3mBvPKhY+hYdGiiZ6+3KfD5NemkYxR67VDwiqmlSZQiwzhfbsFMERnChZueR/eHV0TXZEG53!ifHpnQrXIY4nnPEzMWokM2Dx8ByvAWEdx/HU/A39KT5rF+DO7S8mdpiMtZ22 X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 3619 Xref: x330-a1.tempe.blueboxinc.net comp.arch:5948 On 2/14/2012 11:44 PM, Terje Mathisen wrote: > MitchAlsup wrote: >> In particular, if one knows how many CPUs are contending for entries >> in a run queue, one can access different portions fo the run queue >> and end up interference free. So from the above example: The first >> access is thrown away detecting interference in the cache >> (avoiding the latency of the ARU). When interference has been >> detected, the CPU changes mode and the nest ASF prolog is done in a >> "slow and methodological" way directing its lock requests to the ARU. >> The third trip thruogh the logic remains in the "S&M" mode, but this >> time the SW takes the conflict number and marches down the run queue >> to an element that will not be under contention. So on the 3rd pass >> all 17 processors grab all 17 entries from the queue conflict free. >> Presto O(3) time. > > Mitch, how does this compare with an XADD-based approach? > > I.e. all 17 cpus tries to do a fetch&add on the queue index, one of them > wins immediately, the other 16 will do a hw retry, another one succeeds > etc. > > I guess the total number of bus attempts/transactions would be > 17+16+15..., i.e. O(n^2) instead of O(3)? (0) Yes, that O(N^2) pattern is what happens when you blindly implement any test-and-set. Including XADD. (1) However, Alan Kagi showed that by adding delays, and/or queues in the lock controller, you can make it O(n) - 1 request per winner. Alain's work descended from the QOLB, Queue on Lock Bit, work. But the key is, those queues don't need to be architectural - you can just do it in the lock controller. (2) For extra credit, you can imagine a fetch-and-op combining style thing, making it O(1) in cost.