Groups | Search | Server Info | Keyboard shortcuts | Login | Register
Groups > comp.arch.arithmetic > #87
| From | Terje Mathisen <terje.mathisen@tmsw.no> |
|---|---|
| Newsgroups | comp.arch.arithmetic |
| Subject | Re: SSE "denormals are zeros" |
| Date | 2016-06-18 08:07 +0800 |
| Organization | Aioe.org NNTP Server |
| Message-ID | <nk23b9$1kr$1@gioia.aioe.org> (permalink) |
| References | <nk17fh$qt$1@news.albasani.net> <hnc8mbdnomamn1mfavimjsp3u82d9pre66@4ax.com> <nk1dvd$dtj$1@news.albasani.net> |
Bonita Montero wrote: > > I printed the sums only to prevent the compiler from optimizing away > the summation. The result is that on my Xeon E3-1240 (Skylake) each > iteratoin takes four clock-cycles when "d" is non-denormal. When "d" > is a denormal, each iteration takes about 150 clock cycles! I'd never > believe denormals would have such a huge performance-impact if I > wouldn't have seen the opposite. Ouch!!! That is really horrible. :-( I have worked on implementing fp for the Mill cpu, there is no way you should allow denormals (on input and/or output to add more than a cycle of two to your processing time. To get to 150 cycles you effectively need a trap & fixup. According to Mitch Alsup you can handle denormals inline, in hw, with a total of 6 gate delays which is fraction of a cycle on any current process. > > And what about GPUs? I suppose they don't support denormals. > Is this right? > Usually so, yeah. The easiest is to treat denormals as zero, in which case you can do all your special-case handling with a very small lookup table based on the exponent field only: 00.0 -> Zero 00.1 to ff.e -> Normal ff.f -> Inf or NaN If you want/need to handle NaNs you still need to look at the mantissa for maximal exponents, but you can do that in parallel with the normal processing anyway, with plenty of time to spare. Handling denorms however require both a scan for first non-zero mantissa bit, a shift to normalize and adjusting the (internal) exponent so this could easily take several cycles unless you are smart. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
Back to comp.arch.arithmetic | Previous | Next — Previous in thread | Find similar
SSE "denormals are zeros" Bonita Montero <Bonita.Montero@gmail.com> - 2016-06-17 18:11 +0200
Re: SSE "denormals are zeros" Robert Wessel <robertwessel2@yahoo.com> - 2016-06-17 12:33 -0500
Re: SSE "denormals are zeros" Robert Wessel <robertwessel2@yahoo.com> - 2016-06-17 12:36 -0500
Re: SSE "denormals are zeros" Bonita Montero <Bonita.Montero@gmail.com> - 2016-06-17 20:02 +0200
Re: SSE "denormals are zeros" Terje Mathisen <terje.mathisen@tmsw.no> - 2016-06-18 08:07 +0800
csiph-web