Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #14463
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | Re: SSE2 |
| Date | 2012-07-27 16:28 +0000 |
| Organization | Institut fuer Computersprachen, Technische Universitaet Wien |
| Message-ID | <2012Jul27.182801@mips.complang.tuwien.ac.at> (permalink) |
| References | <56151908968435@frunobulax.edu> <2012Jul27.165605@mips.complang.tuwien.ac.at> <7xy5m5mc4j.fsf@ruckus.brouhaha.com> |
Paul Rubin <no.email@nospam.invalid> writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> If the problem in your benchmark is that it is memory bandwidth
>> limited, no, that problem is not "fixed".
>
>Yeah I wouldn't have expected memory bandwidth to be the problem either,
>in a sequential-access problem like dot product.
It can be, if your data does not fit into the cache.
E.g., if you have 2 channels with DDR3-12800 memory, you get at best
25600MB/s of memory bandwidth, if your CPU core is capable of fetching
2 8-byte floats per cycle and of doing one F+ and one F* per cycle and
has 3GHz clock speed, that would be 2*8*3000=48000MB/s bandwidth
requirement; and that's just one core.
>One simple test is
>access all the data before beginning DDOT, to get it all into cache.
>There are also some cache prefetch instructions in the x86 that let you
>get the data into cache ahead of time, when you know what the access
>pattern is going to be.
For a fixed-stride access pattern the hardware prefetcher should be
able to cover that nicely, no software prefetching needed.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000
csiph-web