Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #14481

Re: SSE2

From mhx@iae.nl (Marcel Hendrix)
Subject Re: SSE2
Newsgroups comp.lang.forth
Message-ID <81078807968435@frunobulax.edu> (permalink)
Date 2012-07-28 15:28 +0200
References <2012Jul27.165605@mips.complang.tuwien.ac.at>
Organization Wanadoo

Show all headers | View raw


anton@mips.complang.tuwien.ac.at (Anton Ertl) writes Re: SSE2

> mhx@iae.nl (Marcel Hendrix) writes:
>>What I did is derived from the MiniBLAS sources. As SSE2 operates on 
>>4 doubles at a time, speedups of around 4 are suggesting themselves. 
>>However, I can find no trace of this. An obvious reason could be that
>>memory throughput can not keep up with the FP units. Strange, as one 
>>would expect this hardware problem to be fixed by now.

> If the problem in your benchmark is that it is memory bandwidth
> limited, no, that problem is not "fixed". 

See my answer to Paul. The problem was that the S/DDOT code needs
large vectors to become effective. For small sizes it does almost
nothing. 

[,,]

> Hmm, if the problem is memory bandwidth, I would expect all variants
> to have the same performance (unless you use additional indirection
> vectors or varying memory layouts).

That's a good argument.

[..]
> My _guess_ is that your compiler produces some stack memory stores for
> some of the variants, with some stack memory fetches soon after, and
> these cost quie a bit of performance.  At least they used to.

I would be interested to read more about this. What is the problem here,
exactly? Is a stackframe better than push/pop?

>                                                                You
> don't have a data-near-code problem, do you?

No, I allways check for these, but I have not come across anything 
like it since quite a long time.

-marcel

Back to comp.lang.forth | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
  Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
    Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
  Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
    Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
      Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
      Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
        Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
          Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
            Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
            Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
    Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
      Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
  Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000

csiph-web