Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #14481
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!xlned.com!feeder7.xlned.com!news2.euro.net!postnews2.euro.net!news.wanadoo.nl!not-for-mail |
|---|---|
| From | mhx@iae.nl (Marcel Hendrix) |
| Subject | Re: SSE2 |
| Newsgroups | comp.lang.forth |
| Message-ID | <81078807968435@frunobulax.edu> (permalink) |
| Date | Sat, 28 Jul 2012 15:28:54 +0200 |
| References | <2012Jul27.165605@mips.complang.tuwien.ac.at> |
| X-Newsreader | iForth 2.0 console (October 21, 2006) |
| Lines | 40 |
| Organization | Wanadoo |
| NNTP-Posting-Date | 28 Jul 2012 13:28:27 GMT |
| NNTP-Posting-Host | s529d937f.adsl.wanadoo.nl |
| X-Trace | 1343482107 dr6.euro.net 221 82.157.147.127:64505 |
| X-Complaints-To | abuse@wanadoo.nl |
| Xref | csiph.com comp.lang.forth:14481 |
Show key headers only | View raw
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes Re: SSE2 > mhx@iae.nl (Marcel Hendrix) writes: >>What I did is derived from the MiniBLAS sources. As SSE2 operates on >>4 doubles at a time, speedups of around 4 are suggesting themselves. >>However, I can find no trace of this. An obvious reason could be that >>memory throughput can not keep up with the FP units. Strange, as one >>would expect this hardware problem to be fixed by now. > If the problem in your benchmark is that it is memory bandwidth > limited, no, that problem is not "fixed". See my answer to Paul. The problem was that the S/DDOT code needs large vectors to become effective. For small sizes it does almost nothing. [,,] > Hmm, if the problem is memory bandwidth, I would expect all variants > to have the same performance (unless you use additional indirection > vectors or varying memory layouts). That's a good argument. [..] > My _guess_ is that your compiler produces some stack memory stores for > some of the variants, with some stack memory fetches soon after, and > these cost quie a bit of performance. At least they used to. I would be interested to read more about this. What is the problem here, exactly? Is a stackframe better than push/pop? > You > don't have a data-near-code problem, do you? No, I allways check for these, but I have not come across anything like it since quite a long time. -marcel
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000
csiph-web