Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #14459
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | Re: SSE2 |
| References | <56151908968435@frunobulax.edu> |
| Date | 2012-07-27 08:03 -0700 |
| Message-ID | <7x394dnt48.fsf@ruckus.brouhaha.com> (permalink) |
| Organization | Nightsong/Fort GNOX |
mhx@iae.nl (Marcel Hendrix) writes: > What I did is derived from the MiniBLAS sources. As SSE2 operates on > 4 doubles at a time, speedups of around 4 are suggesting themselves. Scalar operations use 2 doubles at a time so that suggests speedup of 2 rather than 4. But, a parallel 64 bit multiplier is probably about the most expensive resource in the ALU, so there may be just one of them. In that case it would be the bottleneck and DDOT would be about the same as 2 scalar multiplications. You could try a single precision version that would give more speedup. Are you using the DPPD (double precision dot product) instruction from SSE4.1? I'd expect that to be fastest. Later i7's (Sandy Bridge) have the 256 bit AVX instructions that may be faster still, and the forthcoming Haswell will have fused MAC. For maximum speed of course you want to use a GPU.
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000
csiph-web