Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #14460
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | Re: SSE2 |
| Date | 2012-07-27 14:56 +0000 |
| Organization | Institut fuer Computersprachen, Technische Universitaet Wien |
| Message-ID | <2012Jul27.165605@mips.complang.tuwien.ac.at> (permalink) |
| References | <56151908968435@frunobulax.edu> |
mhx@iae.nl (Marcel Hendrix) writes:
>What I did is derived from the MiniBLAS sources. As SSE2 operates on
>4 doubles at a time, speedups of around 4 are suggesting themselves.
>However, I can find no trace of this. An obvious reason could be that
>memory throughput can not keep up with the FP units. Strange, as one
>would expect this hardware problem to be fixed by now.
If the problem in your benchmark is that it is memory bandwidth
limited, no, that problem is not "fixed". Much software run on PCs is
not bandwidth limited, so it would not be economical to add an
expensive memory system for more bandwidth, certainly not for every
PC. There are "PCs" with more memory bandwidth (typically servers),
and supercomputers often have more memory bandwidth relative to FLOPS,
but even there the trend is to lower that ratio (because additional
FPUs are cheaper than additional bandwidth, and additional FPUs are
useful for some supercomputer software, and because many
supercomputers are based on PC CPUs nowadays). GPUs tend to have a
high memory bandwidth, but also high FLOPS.
> FORTH> 100000000 TEST
> Using 64 bits floats.
> mul : 1.2400000000000000000e+0011 2.055 seconds elapsed.
> mul1 : 1.2400000000000000000e+0011 3.141 seconds elapsed.
> mul2 : 1.2400000000000000000e+0011 1.230 seconds elapsed.
> mul3 : 1.2400000000000000000e+0011 1.604 seconds elapsed.
> mmul : 4.6000000000000000000e+0010 2.376 seconds elapsed.
> mmul1 : 4.6000000000000000000e+0010 2.854 seconds elapsed.
> mmul3 : 4.6000000000000000000e+0010 3.785 seconds elapsed.
> mmul4 : 4.6000000000000000000e+0010 2.933 seconds elapsed. ok
>
>The fastest DDOT variant, MUL2, uses direct memory addressing. In
>general, this can only work when Forth compiles at runtime (with
>quotations :-).
Hmm, if the problem is memory bandwidth, I would expect all variants
to have the same performance (unless you use additional indirection
vectors or varying memory layouts).
>MUL1 is using indexed addressing. This is quite a lot slower (I hate
>this, because I have put a lot of effort in the iForth compiler to
>support it fully. After 10 years, it looks like Intel has still not
>found a way to improve the speed of this flexible addressing scheme.)
If the memory accesses are the same, then your problem is not memory
bandwidth.
My _guess_ is that your compiler produces some stack memory stores for
some of the variants, with some stack memory fetches soon after, and
these cost quie a bit of performance. At least they used to. You
don't have a data-near-code problem, do you?
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000
csiph-web