Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #14748
| From | mhx@iae.nl (Marcel Hendrix) |
|---|---|
| Subject | Re: SSE2 |
| Newsgroups | comp.lang.forth |
| Message-ID | <01031498958435@frunobulax.edu> (permalink) |
| Date | 2012-08-05 21:00 +0200 |
| References | <87k3xj2c9m.fsf@mosquito.pool> |
| Organization | Wanadoo |
mhx@iae.nl (Marcel Hendrix) writes Re: SSE2 [..] > Hmm, I see my matrix code does not yet work for odd rowsize (because > then a row address may become a multiple of 8, not 16, as required) I have now fixed this, and guess what: although I use movups (unaligned fetch) instead of movaps (aligned fetch) everywhere, the speed drops less than 5 - 10%. Interesting. This so close to optimal that I didn't special case aligned access for the iForth kernel words (see the final results below). For the new tests, I only compare general purpose words -- the high-level Forth and SSE replacements are fully general (work aligned / unaligned, for any size, and no buffer is used to keep transposed matrices). The main conclusion is again that SSE2 has advantages, but for short double vectors you won't see it. -marcel === ( singles ) === FORTH> TESTS DOT/AXPY using 32 bits floats. Vector size = 16 mul0 (dot) : 1.2000000000000000000e+0009 0.226 seconds elapsed. mul1 (dot_sse2) : 1.2000000000000000000e+0009 0.136 seconds elapsed. mmul0 (axpy) : 6.0000000000000000000e+0008 0.358 seconds elapsed. mmul1 (axpy_sse2) : 6.0000000000000000000e+0008 0.341 seconds elapsed. Note: SINGLE maxint == 16777217, printout may be wrong. ok FORTH> TESTS DOT/AXPY using 32 bits floats. Vector size = 1024 mul0 (dot) : 5.2377600000000000000e+0011 1.088 seconds elapsed. mul1 (dot_sse2) : 5.2377600000000000000e+0011 0.189 seconds elapsed. mmul0 (axpy) : 2.0951040000000000000e+0012 5.648 seconds elapsed. mmul1 (axpy_sse2) : 2.0951040000000000000e+0012 2.329 seconds elapsed. Note: SINGLE maxint == 16777217, printout may be wrong. ok === ( doubles ) === FORTH> TESTS DOT/AXPY using 64 bits floats. Vector size = 16 mul0 (dot) : 1.2000000000000000000e+0009 0.237 seconds elapsed. mul1 (dot_sse2) : 1.2000000000000000000e+0009 0.146 seconds elapsed. mmul0 (axpy) : 6.0000000000000000000e+0008 0.435 seconds elapsed. mmul1 (axpy_sse2) : 6.0000000000000000000e+0008 0.349 seconds elapsed. ok FORTH> TESTS DOT/AXPY using 64 bits floats. Vector size = 1024 mul0 (dot) : 5.2377600000000000000e+0011 1.095 seconds elapsed. mul1 (dot_sse2) : 5.2377600000000000000e+0011 0.370 seconds elapsed. mmul0 (axpy) : 2.0951040000000000000e+0012 5.718 seconds elapsed. mmul1 (axpy_sse2) : 2.0951040000000000000e+0012 3.080 seconds elapsed. ok === mm_old.frt === 500x500 mm - normal algorithm 0.290 secs. 500x500 mm - temporary variable in loop 0.439 secs. 500x500 mm - unrolled inner loop, factor of 4 0.321 secs. 500x500 mm - unrolled inner loop, factor of 8 0.294 secs. 500x500 mm - unrolled inner loop, factor of 16 0.280 secs. 500x500 mm - pointers used to access matrices 0.350 secs. 500x500 mm - pointers used, unrolled by 8 0.258 secs. 500x500 mm - transposed B matrix 0.400 secs. 500x500 mm - interchanged inner loops 0.423 secs. 500x500 mm - blocking, step size of 20 0.457 secs. 500x500 mm - Robert's algorithm 0.064 secs. 500x500 mm - T. Maeno's algorithm, subarray 20x20 0.372 secs. 500x500 mm - Generic Maeno, subarray 20x20 0.392 secs. 500x500 mm - D. Warner's algorithm, subarray 20x20 0.372 secs. 500x500 mm - SSE2 0.064 secs. ========================================================= ===== Total using no extensions and using no hackery 4.776 secs. ok
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-27 16:20 +0200
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:03 -0700
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 14:56 +0000
Re: SSE2 Paul Rubin <no.email@nospam.invalid> - 2012-07-27 08:56 -0700
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-27 16:28 +0000
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-07-28 01:47 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-31 00:28 +0200
Re: SSE2 David Kuehling <dvdkhlng@gmx.de> - 2012-08-01 11:27 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-01 20:23 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-08-05 21:00 +0200
Re: SSE2 mhx@iae.nl (Marcel Hendrix) - 2012-07-28 15:28 +0200
Re: SSE2 anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-07-31 14:29 +0000
Re: SSE2 albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2012-07-27 16:31 +0000
csiph-web