Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #14463

Re: SSE2

From anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups comp.lang.forth
Subject Re: SSE2
Date 2012-07-27 16:28 +0000
Organization Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID <2012Jul27.182801@mips.complang.tuwien.ac.at> (permalink)
References <56151908968435@frunobulax.edu> <2012Jul27.165605@mips.complang.tuwien.ac.at> <7xy5m5mc4j.fsf@ruckus.brouhaha.com>

Show all headers | View raw


Paul Rubin <no.email@nospam.invalid> writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> If the problem in your benchmark is that it is memory bandwidth
>> limited, no, that problem is not "fixed". 
>
>Yeah I wouldn't have expected memory bandwidth to be the problem either,
>in a sequential-access problem like dot product.

It can be, if your data does not fit into the cache.

E.g., if you have 2 channels with DDR3-12800 memory, you get at best
25600MB/s of memory bandwidth, if your CPU core is capable of fetching
2 8-byte floats per cycle and of doing one F+ and one F* per cycle and
has 3GHz clock speed, that would be 2*8*3000=48000MB/s bandwidth
requirement; and that's just one core.

>One simple test is
>access all the data before beginning DDOT, to get it all into cache.
>There are also some cache prefetch instructions in the x86 that let you
>get the data into cache ahead of time, when you know what the access
>pattern is going to be.

For a fixed-stride access pattern the hardware prefetcher should be
able to cover that nicely, no software prefetching needed.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: http://www.forth200x.org/forth200x.html
   EuroForth 2012: http://www.euroforth.org/ef12/

Back to comp.lang.forth | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
  Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
    Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
  Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
    Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
      Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
      Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
        Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
          Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
            Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
            Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
    Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
      Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
  Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000

csiph-web