Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.arch.arithmetic > #87

Re: SSE "denormals are zeros"

From Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups comp.arch.arithmetic
Subject Re: SSE "denormals are zeros"
Date 2016-06-18 08:07 +0800
Organization Aioe.org NNTP Server
Message-ID <nk23b9$1kr$1@gioia.aioe.org> (permalink)
References <nk17fh$qt$1@news.albasani.net> <hnc8mbdnomamn1mfavimjsp3u82d9pre66@4ax.com> <nk1dvd$dtj$1@news.albasani.net>

Show all headers | View raw


Bonita Montero wrote:
>
> I printed the sums only to prevent the compiler from optimizing away
> the summation. The result is that on my Xeon E3-1240 (Skylake) each
> iteratoin takes four clock-cycles when "d" is non-denormal. When "d"
> is a denormal, each iteration takes about 150 clock cycles! I'd never
> believe denormals would have such a huge performance-impact if I
> wouldn't have seen the opposite.

Ouch!!!

That is really horrible. :-(

I have worked on implementing fp for the Mill cpu, there is no way you 
should allow denormals (on input and/or output to add more than a cycle 
of two to your processing time.

To get to 150 cycles you effectively need a trap & fixup.

According to Mitch Alsup you can handle denormals inline, in hw, with a 
total of 6 gate delays which is fraction of a cycle on any current process.
>
> And what about GPUs? I suppose they don't support denormals.
> Is this right?
>
Usually so, yeah.

The easiest is to treat denormals as zero, in which case you can do all 
your special-case handling with a very small lookup table based on the 
exponent field only:

00.0 -> Zero
00.1 to ff.e -> Normal
ff.f -> Inf or NaN

If you want/need to handle NaNs you still need to look at the mantissa 
for maximal exponents, but you can do that in parallel with the normal 
processing anyway, with plenty of time to spare.

Handling denorms however require both a scan for first non-zero mantissa 
bit, a shift to normalize and adjusting the (internal) exponent so this 
could easily take several cycles unless you are smart.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Back to comp.arch.arithmetic | Previous | NextPrevious in thread | Find similar


Thread

SSE "denormals are zeros" Bonita Montero <Bonita.Montero@gmail.com> - 2016-06-17 18:11 +0200
  Re: SSE "denormals are zeros" Robert Wessel <robertwessel2@yahoo.com> - 2016-06-17 12:33 -0500
    Re: SSE "denormals are zeros" Robert Wessel <robertwessel2@yahoo.com> - 2016-06-17 12:36 -0500
    Re: SSE "denormals are zeros" Bonita Montero <Bonita.Montero@gmail.com> - 2016-06-17 20:02 +0200
      Re: SSE "denormals are zeros" Terje Mathisen <terje.mathisen@tmsw.no> - 2016-06-18 08:07 +0800

csiph-web