Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #14459

Re: SSE2

From Paul Rubin <no.email@nospam.invalid>
Newsgroups comp.lang.forth
Subject Re: SSE2
References <56151908968435@frunobulax.edu>
Date 2012-07-27 08:03 -0700
Message-ID <7x394dnt48.fsf@ruckus.brouhaha.com> (permalink)
Organization Nightsong/Fort GNOX

Show all headers | View raw


mhx@iae.nl (Marcel Hendrix) writes:
> What I did is derived from the MiniBLAS sources. As SSE2 operates on 
> 4 doubles at a time, speedups of around 4 are suggesting themselves. 

Scalar operations use 2 doubles at a time so that suggests speedup of 2
rather than 4.  But, a parallel 64 bit multiplier is probably about the
most expensive resource in the ALU, so there may be just one of them.
In that case it would be the bottleneck and DDOT would be about the same
as 2 scalar multiplications.  You could try a single precision version
that would give more speedup.

Are you using the DPPD (double precision dot product) instruction from
SSE4.1?  I'd expect that to be fastest.

Later i7's (Sandy Bridge) have the 256 bit AVX instructions that may be
faster still, and the forthcoming Haswell will have fused MAC.  For
maximum speed of course you want to use a GPU.

Back to comp.lang.forth | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
  Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
    Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
  Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
    Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
      Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
      Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
        Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
          Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
            Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
            Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
    Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
      Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
  Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000

csiph-web