Groups > comp.arch.embedded > #7760 > unrolled thread

Floating point vs fixed arithmetics (signed 64-bit)

Started by	kishor <kiishor@gmail.com>
First post	2012-03-26 02:22 -0700
Last post	2012-03-28 22:59 +0300
Articles	20 on this page of 56 — 20 participants

Back to article view | Back to comp.arch.embedded

  Floating point vs fixed arithmetics (signed 64-bit) kishor <kiishor@gmail.com> - 2012-03-26 02:22 -0700
    Re: Floating point vs fixed arithmetics (signed 64-bit) "Boudewijn Dijkstra" <sp4mtr4p.boudewijn@indes.com> - 2012-03-26 12:08 +0200
    Re: Floating point vs fixed arithmetics (signed 64-bit) Arlet Ottens <usenet+5@c-scape.nl> - 2012-03-26 13:14 +0200
      Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-26 13:24 +0200
        Re: Floating point vs fixed arithmetics (signed 64-bit) kishor <kiishor@gmail.com> - 2012-03-26 05:24 -0700
          Re: Floating point vs fixed arithmetics (signed 64-bit) Fredrik Östman <Fredrik_Oestman@work.invalid> - 2012-03-26 12:38 +0000
            Re: Floating point vs fixed arithmetics (signed 64-bit) kishor <kiishor@gmail.com> - 2012-03-26 06:33 -0700
              Re: Floating point vs fixed arithmetics (signed 64-bit) Arlet Ottens <usenet+5@c-scape.nl> - 2012-03-26 15:49 +0200
              Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-26 15:45 +0200
              Re: Floating point vs fixed arithmetics (signed 64-bit) Fredrik Östman <Fredrik_Oestman@work.invalid> - 2012-03-26 14:34 +0000
          Re: Floating point vs fixed arithmetics (signed 64-bit) Arlet Ottens <usenet+5@c-scape.nl> - 2012-03-26 15:34 +0200
            Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-03-26 12:25 -0500
              Re: Floating point vs fixed arithmetics (signed 64-bit) Arlet Ottens <usenet+5@c-scape.nl> - 2012-03-26 20:19 +0200
                Re: Floating point vs fixed arithmetics (signed 64-bit) Rich Webb <bbew.ar@mapson.nozirev.ten> - 2012-03-26 16:45 -0400
                  Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-03-26 17:15 -0500
                    Re: Floating point vs fixed arithmetics (signed 64-bit) Rich Webb <bbew.ar@mapson.nozirev.ten> - 2012-03-26 19:09 -0400
                      Re: Floating point vs fixed arithmetics (signed 64-bit) kishor <kiishor@gmail.com> - 2012-03-27 04:59 -0700
                        Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-27 15:25 +0200
                          Re: Floating point vs fixed arithmetics (signed 64-bit) David T. Ashley <dashley@gmail.com> - 2012-03-29 13:17 -0400
    Re: Floating point vs fixed arithmetics (signed 64-bit) "Paul E. Bennett" <Paul_E.Bennett@topmail.co.uk> - 2012-03-27 11:28 +0100
    Re: Floating point vs fixed arithmetics (signed 64-bit) David T. Ashley <dashley@gmail.com> - 2012-03-27 11:28 -0400
      Re: Floating point vs fixed arithmetics (signed 64-bit) upsidedown@downunder.com - 2012-03-27 18:52 +0300
        Re: Floating point vs fixed arithmetics (signed 64-bit) David T. Ashley <dashley@gmail.com> - 2012-03-27 13:02 -0400
          Re: Floating point vs fixed arithmetics (signed 64-bit) Walter Banks <walter@bytecraft.com> - 2012-03-27 13:56 -0500
            Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-03-27 14:17 -0500
              Re: Floating point vs fixed arithmetics (signed 64-bit) Walter Banks <walter@bytecraft.com> - 2012-03-27 15:35 -0500
                Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.please> - 2012-03-27 22:36 -0500
            Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-28 09:00 +0200
            Re: Floating point vs fixed arithmetics (signed 64-bit) j.m.granville@gmail.com - 2012-03-30 04:08 -0700
              Re: Floating point vs fixed arithmetics (signed 64-bit) Mark Borgerson <mborgerson@comcast.net> - 2012-04-02 22:52 -0700
                Re: Floating point vs fixed arithmetics (signed 64-bit) John Devereux <john@devereux.me.uk> - 2012-04-03 11:33 +0100
                  Re: Floating point vs fixed arithmetics (signed 64-bit) Anders.Montonen@kapsi.spam.stop.fi.invalid - 2012-04-03 12:05 +0000
                    Re: Floating point vs fixed arithmetics (signed 64-bit) John Devereux <john@devereux.me.uk> - 2012-04-03 16:34 +0100
                      Re: Floating point vs fixed arithmetics (signed 64-bit) Paul <paul@pcserviceselectronics.co.uk> - 2012-04-04 09:35 +0100
              Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-04-03 13:52 -0500
                Re: Floating point vs fixed arithmetics (signed 64-bit) Mark Borgerson <mborgerson@comcast.net> - 2012-04-04 16:50 -0700
                  Re: Floating point vs fixed arithmetics (signed 64-bit) John Devereux <john@devereux.me.uk> - 2012-04-05 11:48 +0100
          Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-28 09:17 +0200
            Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-03-28 12:20 -0500
              Re: Floating point vs fixed arithmetics (signed 64-bit) Andrew Reilly <areilly---@bigpond.net.au> - 2012-03-28 22:44 +0000
                Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-03-28 18:35 -0500
                  Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-29 10:58 +0200
                  Re: Floating point vs fixed arithmetics (signed 64-bit) Mark Borgerson <mborgerson@comcast.net> - 2012-03-29 07:56 -0700
                    Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.com> - 2012-03-29 16:52 -0500
                      Re: Floating point vs fixed arithmetics (signed 64-bit) Mark Borgerson <mborgerson@comcast.net> - 2012-03-29 21:19 -0700
                        Re: Floating point vs fixed arithmetics (signed 64-bit) Tim Wescott <tim@seemywebsite.please> - 2012-03-30 00:42 -0500
                Re: Floating point vs fixed arithmetics (signed 64-bit) upsidedown@downunder.com - 2012-03-29 07:19 +0300
                  Re: Floating point vs fixed arithmetics (signed 64-bit) Andrew Reilly <areilly---@bigpond.net.au> - 2012-03-29 11:53 +0000
                    Re: Floating point vs fixed arithmetics (signed 64-bit) Walter Banks <walter@bytecraft.com> - 2012-03-29 09:40 -0500
                    Re: Floating point vs fixed arithmetics (signed 64-bit) upsidedown@downunder.com - 2012-03-29 23:46 +0300
                  Re: Floating point vs fixed arithmetics (signed 64-bit) Walter Banks <walter@bytecraft.com> - 2012-03-29 09:28 -0500
                    Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-29 16:58 +0200
              Re: Floating point vs fixed arithmetics (signed 64-bit) David Brown <david@westcontrol.removethisbit.com> - 2012-03-29 10:09 +0200
              Re: Floating point vs fixed arithmetics (signed 64-bit) Clifford Heath <cjh@no.spam.please.net> - 2012-04-01 18:08 +1000
            Re: Floating point vs fixed arithmetics (signed 64-bit) dp <dp@tgi-sci.com> - 2012-03-28 02:38 -0700
        Re: Floating point vs fixed arithmetics (signed 64-bit) upsidedown@downunder.com - 2012-03-28 22:59 +0300

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

#7795

From	David T. Ashley <dashley@gmail.com>
Date	2012-03-27 11:28 -0400
Message-ID	<jdm3n7pd6aavn3cfbg5ugvl6fmbh738sdv@4ax.com>
In reply to	#7760

On Mon, 26 Mar 2012 02:22:21 -0700 (PDT), kishor <kiishor@gmail.com>
wrote:

>	Hi friends,
>	I am working on stellaris LM3s6965 (cortex-m3) & Keil 4.20 for data
>acquisition. ADC
>	is signed 24-bit.
>
>	To perform software Gain calibration I have two options,
>
>	1. 64-bit fixed width arithmetic
>		uint16_t Gain;		// 0x8000 means gain is 1
>		int32_t ADC_Reading;		// It contains 24-bit signed integer ADC
>reading
>
>		ADC_Reading = ((int64_t)ADC_Reading * Gain) / 0x8000;           //
>Gain calibration
>
>		// As multiplication of signed 24-bit & unsigned 16-bit will not fit
>into 32-bit variable
>		// I typecast it to int64_t.
>
>	2. Single precision Float
>		float Gain;
>		int32_t ADC_Reading;		// It contains 24-bit signed integer ADC
>reading
>
>		ADC_Reading = ADC_Reading * Gain;                     // Gain
>calibration
>
>		Which is better for performance wise.
>
>		Thanks,
>		Kishore.

Without FPU support, assuming that the processor has basic integer
multiplication instructions, integer operations are ALWAYS faster than
floating-point operations.  Usually _far_ faster.  And always more
precise.

The general nature of computers is that all data into the computer has
to be quantized in some way (the machine can only accept digital
data), and all data out has to be quantized in some way (again, the
machine can only output digital data).

There is already quantization error coming in because it is entering a
discrete system.  How much error depends on the quality of the
hardware, which usually depends on how much one was willing to spend
on it.

One measure of "goodness" of calculations is whether, for a given set
of inputs (all integers), one can prove analytically that one is able
to select the best outputs (again, all integers).  This confines any
error to the hardware rather than the software.

It ends up that for many types of calculations, using integer
operations, one can meet this measure of goodness.  However, one
usually requires larger integers than development tools support in a
native way.  Which means inline assembly or large integer libraries
which were written in assembly-language.  Preferably the latter.

In the specific case of linearly scaling by a factor, generally what
one wants to do is select a rational number h/k close to the real
number to be multiplied by.

There are two subcases.

k = 2^q may be a power of two, in which case it is an integer
multiplication followed by a shift or a "byte pluck".  It should be
obvious why this is extremely efficient.

2^q may be something other than a power of two, which is the general
case.  In that case, you may find this web page helpful:

http://www.dtashley.com/howtos/2007/01/best_rational_approximation/

Finding the best rational approximation when k is not a power of 2 is
a topic from number theory, and all the information you are likely to
need is at the page above.  Software is included.

You're welcome.

Dave Ashley

[toc] | [prev] | [next] | [standalone]

#7796

From	upsidedown@downunder.com
Date	2012-03-27 18:52 +0300
Message-ID	<u0o3n75ffv4gqng8qh16fdap2ugevln58q@4ax.com>
In reply to	#7795

On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
<dashley@gmail.com> wrote:

>
>Without FPU support, assuming that the processor has basic integer
>multiplication instructions, integer operations are ALWAYS faster than
>floating-point operations.  Usually _far_ faster.  And always more
>precise.

Floating point instructions MUL/DIV are trivial, just multiply/divide
the mantissa and add/sub the exponent.

With FP add/sub you have to denormalize one operand and then normalize
the result, which can be quite time consuming, without sufficient HW
support.

This can be really time consuming, if the HW is designed by an idiot.

[toc] | [prev] | [next] | [standalone]

#7800

From	David T. Ashley <dashley@gmail.com>
Date	2012-03-27 13:02 -0400
Message-ID	<qcs3n7ponjd09bq27qgb648fnmovco9mn1@4ax.com>
In reply to	#7796

On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:

>On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
><dashley@gmail.com> wrote:
>
>>
>>Without FPU support, assuming that the processor has basic integer
>>multiplication instructions, integer operations are ALWAYS faster than
>>floating-point operations.  Usually _far_ faster.  And always more
>>precise.
>
>Floating point instructions MUL/DIV are trivial, just multiply/divide
>the mantissa and add/sub the exponent.
>
>With FP add/sub you have to denormalize one operand and then normalize
>the result, which can be quite time consuming, without sufficient HW
>support.
>
>This can be really time consuming, if the HW is designed by an idiot.

Your observations are valid.  But I have yet to see a practical
example of something that can be done faster and with equal accuracy
in floating point vs. using integer operations.

I concur with your observations.  After reading your first paragaph
... yeah, floating-point multiplication is pretty simple so long as
the floating point format is sane.

Before reading your post, I my mental model was that floating-point
operations might be 20 times as slow as integer operations.  Now I'm
thinking maybe 2-3 times.

DTA.

[toc] | [prev] | [next] | [standalone]

#7802

From	Walter Banks <walter@bytecraft.com>
Date	2012-03-27 13:56 -0500
Message-ID	<4F720D71.DFDA7F7F@bytecraft.com>
In reply to	#7800

"David T. Ashley" wrote:

> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>
> >On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
> ><dashley@gmail.com> wrote:
> >
> >>
> >>Without FPU support, assuming that the processor has basic integer
> >>multiplication instructions, integer operations are ALWAYS faster than
> >>floating-point operations.  Usually _far_ faster.  And always more
> >>precise.
> >
> >Floating point instructions MUL/DIV are trivial, just multiply/divide
> >the mantissa and add/sub the exponent.
> >
> >With FP add/sub you have to denormalize one operand and then normalize
> >the result, which can be quite time consuming, without sufficient HW
> >support.
> >
> >This can be really time consuming, if the HW is designed by an idiot.
>
> Your observations are valid.  But I have yet to see a practical
> example of something that can be done faster and with equal accuracy
> in floating point vs. using integer operations.
>
> I concur with your observations.  After reading your first paragaph
> ... yeah, floating-point multiplication is pretty simple so long as
> the floating point format is sane.
>
> Before reading your post, I my mental model was that floating-point
> operations might be 20 times as slow as integer operations.  Now I'm
> thinking maybe 2-3 times.

I did a fixed point support package for our 8 bit embedded systems
compilers and one interesting metric came out of the project.

Given a number of bits in a number and similar error checking fixed
or float took very similar amounts of execution time and code size
in applications.

For example 32 bit float and 32 bit fixed point. They are not exact
but they are close. In the end much to my surprise the choice is
dynamic range or resolution.

There are other factors IEEE754 has potentially much more error
checking but not all libraries a written to support it, and not
applications need it.

Regards,

w..
--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

[toc] | [prev] | [next] | [standalone]

#7803

From	Tim Wescott <tim@seemywebsite.com>
Date	2012-03-27 14:17 -0500
Message-ID	<68qdnRLw64LUj-_SnZ2dnUVZ_qSdnZ2d@web-ster.com>
In reply to	#7802

On Tue, 27 Mar 2012 13:56:49 -0500, Walter Banks wrote:

> "David T. Ashley" wrote:
> 
>> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>>
>> >On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>> ><dashley@gmail.com> wrote:
>> >
>> >
>> >>Without FPU support, assuming that the processor has basic integer
>> >>multiplication instructions, integer operations are ALWAYS faster
>> >>than floating-point operations.  Usually _far_ faster.  And always
>> >>more precise.
>> >
>> >Floating point instructions MUL/DIV are trivial, just multiply/divide
>> >the mantissa and add/sub the exponent.
>> >
>> >With FP add/sub you have to denormalize one operand and then normalize
>> >the result, which can be quite time consuming, without sufficient HW
>> >support.
>> >
>> >This can be really time consuming, if the HW is designed by an idiot.
>>
>> Your observations are valid.  But I have yet to see a practical example
>> of something that can be done faster and with equal accuracy in
>> floating point vs. using integer operations.
>>
>> I concur with your observations.  After reading your first paragaph ...
>> yeah, floating-point multiplication is pretty simple so long as the
>> floating point format is sane.
>>
>> Before reading your post, I my mental model was that floating-point
>> operations might be 20 times as slow as integer operations.  Now I'm
>> thinking maybe 2-3 times.
> 
> I did a fixed point support package for our 8 bit embedded systems
> compilers and one interesting metric came out of the project.
> 
> Given a number of bits in a number and similar error checking fixed or
> float took very similar amounts of execution time and code size in
> applications.
> 
> For example 32 bit float and 32 bit fixed point. They are not exact but
> they are close. In the end much to my surprise the choice is dynamic
> range or resolution.
> 
> There are other factors IEEE754 has potentially much more error checking
> but not all libraries a written to support it, and not applications need
> it.

That's interesting, because in my experience fixed-point fractional 
arithmetic (i.e., 0x7fffffff = 1 - 2^-31, 0x80000001 = -1 + 2^-31), with 
saturation-on-add, is significantly faster (3x to 10x) than floating 
point on all the machines I've tried it except for those with floating-
point hardware.

I have a portable version that works on just about anything that's ANSI-C 
compatible, and when I really need speed I rewrite the arithmetic 
routines in assembly for about a 2x increase.

The only processor that came close to matching it was the TMS320F2812, 
where we used the ANSI-C compatible version that was just about matched 
by the floating-point package that came with the tool set (and I _know_ 
that TI cut corners with that floating point package).  That's the _only_ 
processor in my experience where the floating point could keep up with 
the ANSI-C version, and I would expect that had I written an assembly 
version it would have been faster yet.

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

[toc] | [prev] | [next] | [standalone]

#7805

From	Walter Banks <walter@bytecraft.com>
Date	2012-03-27 15:35 -0500
Message-ID	<4F722482.AB146260@bytecraft.com>
In reply to	#7803


Tim Wescott wrote:

> On Tue, 27 Mar 2012 13:56:49 -0500, Walter Banks wrote:
>
> > "David T. Ashley" wrote:
> >
> >> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
> >>
> >> >On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
> >> ><dashley@gmail.com> wrote:
> >> >
> >> >
> >> >>Without FPU support, assuming that the processor has basic integer
> >> >>multiplication instructions, integer operations are ALWAYS faster
> >> >>than floating-point operations.  Usually _far_ faster.  And always
> >> >>more precise.
> >> >
> >> >Floating point instructions MUL/DIV are trivial, just multiply/divide
> >> >the mantissa and add/sub the exponent.
> >> >
> >> >With FP add/sub you have to denormalize one operand and then normalize
> >> >the result, which can be quite time consuming, without sufficient HW
> >> >support.
> >> >
> >> >This can be really time consuming, if the HW is designed by an idiot.
> >>
> >> Your observations are valid.  But I have yet to see a practical example
> >> of something that can be done faster and with equal accuracy in
> >> floating point vs. using integer operations.
> >>
> >> I concur with your observations.  After reading your first paragaph ...
> >> yeah, floating-point multiplication is pretty simple so long as the
> >> floating point format is sane.
> >>
> >> Before reading your post, I my mental model was that floating-point
> >> operations might be 20 times as slow as integer operations.  Now I'm
> >> thinking maybe 2-3 times.
> >
> > I did a fixed point support package for our 8 bit embedded systems
> > compilers and one interesting metric came out of the project.
> >
> > Given a number of bits in a number and similar error checking fixed or
> > float took very similar amounts of execution time and code size in
> > applications.
> >
> > For example 32 bit float and 32 bit fixed point. They are not exact but
> > they are close. In the end much to my surprise the choice is dynamic
> > range or resolution.
> >
> > There are other factors IEEE754 has potentially much more error checking
> > but not all libraries a written to support it, and not applications need
> > it.
>
> That's interesting, because in my experience fixed-point fractional
> arithmetic (i.e., 0x7fffffff = 1 - 2^-31, 0x80000001 = -1 + 2^-31), with
> saturation-on-add, is significantly faster (3x to 10x) than floating
> point on all the machines I've tried it except for those with floating-
> point hardware.
>
> I have a portable version that works on just about anything that's ANSI-C
> compatible, and when I really need speed I rewrite the arithmetic
> routines in assembly for about a 2x increase.
>
> The only processor that came close to matching it was the TMS320F2812,
> where we used the ANSI-C compatible version that was just about matched
> by the floating-point package that came with the tool set (and I _know_
> that TI cut corners with that floating point package).  That's the _only_
> processor in my experience where the floating point could keep up with
> the ANSI-C version, and I would expect that had I written an assembly
> version it would have been faster yet.

What you saw was what I was expecting. My points in the post was to be
careful in assuming that fixed is going to be dramatically better. At least for
8 bits the variable size in bits is a significant factor when all math is
multiprecision.

One of the keys in our metrics was the target was 8 bit processors
and there was an exchange between precision and dynamic range
but the bit sizes remained the same.

Real applications are probably dominated by scaling and precision
reducing the number of bits used by fixed point for the same application.

It didn't make sense until I realized that it was 8 bit processors using
software mults and divides and 32 bit floating point uses for the most
part 24bit mults and divides and a few adds/subtracts for the exponents.
32 bit fixed point uses 32 bit mults/divides adding to the cycle count.

My experience with 32 bit processors is similar to yours although
I don't have metrics to back it up.


Walter..

[toc] | [prev] | [next] | [standalone]

#7810

From	Tim Wescott <tim@seemywebsite.please>
Date	2012-03-27 22:36 -0500
Message-ID	<uf2dnaefP47GGu_SnZ2dnUVZ_r6dnZ2d@web-ster.com>
In reply to	#7805

On Tue, 27 Mar 2012 15:35:14 -0500, Walter Banks wrote:

> Tim Wescott wrote:
> 
>> On Tue, 27 Mar 2012 13:56:49 -0500, Walter Banks wrote:
>>
>> > "David T. Ashley" wrote:
>> >
>> >> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>> >>
>> >> >On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>> >> ><dashley@gmail.com> wrote:
>> >> >
>> >> >
>> >> >>Without FPU support, assuming that the processor has basic integer
>> >> >>multiplication instructions, integer operations are ALWAYS faster
>> >> >>than floating-point operations.  Usually _far_ faster.  And always
>> >> >>more precise.
>> >> >
>> >> >Floating point instructions MUL/DIV are trivial, just
>> >> >multiply/divide the mantissa and add/sub the exponent.
>> >> >
>> >> >With FP add/sub you have to denormalize one operand and then
>> >> >normalize the result, which can be quite time consuming, without
>> >> >sufficient HW support.
>> >> >
>> >> >This can be really time consuming, if the HW is designed by an
>> >> >idiot.
>> >>
>> >> Your observations are valid.  But I have yet to see a practical
>> >> example of something that can be done faster and with equal accuracy
>> >> in floating point vs. using integer operations.
>> >>
>> >> I concur with your observations.  After reading your first paragaph
>> >> ... yeah, floating-point multiplication is pretty simple so long as
>> >> the floating point format is sane.
>> >>
>> >> Before reading your post, I my mental model was that floating-point
>> >> operations might be 20 times as slow as integer operations.  Now I'm
>> >> thinking maybe 2-3 times.
>> >
>> > I did a fixed point support package for our 8 bit embedded systems
>> > compilers and one interesting metric came out of the project.
>> >
>> > Given a number of bits in a number and similar error checking fixed
>> > or float took very similar amounts of execution time and code size in
>> > applications.
>> >
>> > For example 32 bit float and 32 bit fixed point. They are not exact
>> > but they are close. In the end much to my surprise the choice is
>> > dynamic range or resolution.
>> >
>> > There are other factors IEEE754 has potentially much more error
>> > checking but not all libraries a written to support it, and not
>> > applications need it.
>>
>> That's interesting, because in my experience fixed-point fractional
>> arithmetic (i.e., 0x7fffffff = 1 - 2^-31, 0x80000001 = -1 + 2^-31),
>> with saturation-on-add, is significantly faster (3x to 10x) than
>> floating point on all the machines I've tried it except for those with
>> floating- point hardware.
>>
>> I have a portable version that works on just about anything that's
>> ANSI-C compatible, and when I really need speed I rewrite the
>> arithmetic routines in assembly for about a 2x increase.
>>
>> The only processor that came close to matching it was the TMS320F2812,
>> where we used the ANSI-C compatible version that was just about matched
>> by the floating-point package that came with the tool set (and I _know_
>> that TI cut corners with that floating point package).  That's the
>> _only_ processor in my experience where the floating point could keep
>> up with the ANSI-C version, and I would expect that had I written an
>> assembly version it would have been faster yet.
> 
> What you saw was what I was expecting. My points in the post was to be
> careful in assuming that fixed is going to be dramatically better. At
> least for 8 bits the variable size in bits is a significant factor when
> all math is multiprecision.
> 
> One of the keys in our metrics was the target was 8 bit processors and
> there was an exchange between precision and dynamic range but the bit
> sizes remained the same.
> 
> Real applications are probably dominated by scaling and precision
> reducing the number of bits used by fixed point for the same
> application.
> 
> It didn't make sense until I realized that it was 8 bit processors using
> software mults and divides and 32 bit floating point uses for the most
> part 24bit mults and divides and a few adds/subtracts for the exponents.
> 32 bit fixed point uses 32 bit mults/divides adding to the cycle count.
> 
> My experience with 32 bit processors is similar to yours although I
> don't have metrics to back it up.

Ah.  I see your point.  9 multiplies and some shifting during addition 
vs. 16 multiplies might well turn out to be a wash.

The first serious control loop I did was quite starved for clock cycles, 
and used a 24-bit accumulator, but with an 8 x 16 (or 8 x 8) multiply, 
and had 16-bit data paths other than that.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

[toc] | [prev] | [next] | [standalone]

#7811

From	David Brown <david@westcontrol.removethisbit.com>
Date	2012-03-28 09:00 +0200
Message-ID	<u7OdnXv23q9MJe_SnZ2dnUVZ8i2dnZ2d@lyse.net>
In reply to	#7802

On 27/03/2012 20:56, Walter Banks wrote:
>
>
> "David T. Ashley" wrote:
>
>> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>>
>>> On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>>> <dashley@gmail.com>  wrote:
>>>
>>>>
>>>> Without FPU support, assuming that the processor has basic integer
>>>> multiplication instructions, integer operations are ALWAYS faster than
>>>> floating-point operations.  Usually _far_ faster.  And always more
>>>> precise.
>>>
>>> Floating point instructions MUL/DIV are trivial, just multiply/divide
>>> the mantissa and add/sub the exponent.
>>>
>>> With FP add/sub you have to denormalize one operand and then normalize
>>> the result, which can be quite time consuming, without sufficient HW
>>> support.
>>>
>>> This can be really time consuming, if the HW is designed by an idiot.
>>
>> Your observations are valid.  But I have yet to see a practical
>> example of something that can be done faster and with equal accuracy
>> in floating point vs. using integer operations.
>>
>> I concur with your observations.  After reading your first paragaph
>> ... yeah, floating-point multiplication is pretty simple so long as
>> the floating point format is sane.
>>
>> Before reading your post, I my mental model was that floating-point
>> operations might be 20 times as slow as integer operations.  Now I'm
>> thinking maybe 2-3 times.
>
> I did a fixed point support package for our 8 bit embedded systems
> compilers and one interesting metric came out of the project.
>
> Given a number of bits in a number and similar error checking fixed
> or float took very similar amounts of execution time and code size
> in applications.

That's not a big surprise - with floating point, the actual arithmetic 
is 24-bit, which will be quite a lot faster than 32-bit on a small 8-bit 
machine (especially if it doesn't have enough registers or data pointers).

>
> For example 32 bit float and 32 bit fixed point. They are not exact
> but they are close. In the end much to my surprise the choice is
> dynamic range or resolution.
>
> There are other factors IEEE754 has potentially much more error
> checking but not all libraries a written to support it, and not
> applications need it.
>
>
> Regards,
>
>
> w..
> --
> Walter Banks
> Byte Craft Limited
> http://www.bytecraft.com
>
>
>

[toc] | [prev] | [next] | [standalone]

#7904

From	j.m.granville@gmail.com
Date	2012-03-30 04:08 -0700
Message-ID	<18231389.1481.1333105718864.JavaMail.geo-discussion-forums@yneo2>
In reply to	#7802

On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> I did a fixed point support package for our 8 bit embedded systems
> compilers and one interesting metric came out of the project.
> 
> Given a number of bits in a number and similar error checking fixed
> or float took very similar amounts of execution time and code size
> in applications.
> 
> For example 32 bit float and 32 bit fixed point. They are not exact
> but they are close. In the end much to my surprise the choice is
> dynamic range or resolution.

 That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.

 We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.

 With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.

 Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.

[toc] | [prev] | [next] | [standalone]

#7911

From	Mark Borgerson <mborgerson@comcast.net>
Date	2012-04-02 22:52 -0700
Message-ID	<MPG.29e42fee860c48669898d7@news.eternal-september.org>
In reply to	#7904

In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
forums@yneo2>, j.m.granville@gmail.com says...
> 
> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> > I did a fixed point support package for our 8 bit embedded systems
> > compilers and one interesting metric came out of the project.
> > 
> > Given a number of bits in a number and similar error checking fixed
> > or float took very similar amounts of execution time and code size
> > in applications.
> > 
> > For example 32 bit float and 32 bit fixed point. They are not exact
> > but they are close. In the end much to my surprise the choice is
> > dynamic range or resolution.
> 
>  That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.
> 
>  We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.
> 
>  With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.
>
Have you actually found and used a 32-bit ADC?   For and ADC with a 5V 
range, that would mean just a few nanovolts per LSB!!! 
>  Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.

My experience is that I'm lucky to get 20 noise-free bits on any system 
actually connected to an MPU (for a single conversion).  Still, that 
would push the limits on FP with only 24 bits in the  mantissa if I were 
to do any significant oversampling.  I remember  professors in
chemistry and physics warning me that the uncertainty in my final result
should have error limits corresponding the the precision of my inputs.  
Still, roundoff errors could eventually degrade the result past the 
limits of the input for some calculations. 

The reality of the oceanographic sensors I work with is that 16 bits 
gets you right into the noise level of the real world for most
experiments.

However, if you are doing long-term integrations of variable inputs,
roundoff error could come back to haunt  you.

Mark Borgerson

[toc] | [prev] | [next] | [standalone]

#7914

From	John Devereux <john@devereux.me.uk>
Date	2012-04-03 11:33 +0100
Message-ID	<87iphhje94.fsf@devereux.me.uk>
In reply to	#7911

Mark Borgerson <mborgerson@comcast.net> writes:

> In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
> forums@yneo2>, j.m.granville@gmail.com says...
>> 
>> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
>> > I did a fixed point support package for our 8 bit embedded systems
>> > compilers and one interesting metric came out of the project.
>> > 
>> > Given a number of bits in a number and similar error checking fixed
>> > or float took very similar amounts of execution time and code size
>> > in applications.
>> > 
>> > For example 32 bit float and 32 bit fixed point. They are not exact
>> > but they are close. In the end much to my surprise the choice is
>> > dynamic range or resolution.
>> 
>>  That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.
>> 
>>  We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.
>> 
>>  With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.
>>
> Have you actually found and used a 32-bit ADC?   For and ADC with a 5V 
> range, that would mean just a few nanovolts per LSB!!! 

Only actual chip I have heard of is a sigma-delta from TI. Of course
8-10 of these bit are marketing. I would look it up for you but the
flash selection tool is still "initializing" for me on their site...

The best ADC I have seen is a HP 3458A meter, the equivalent of a 28 bit
chip ADC.

It might just be possible to make a 32 bit ADC using a josephson
junction array, if you have a liquid helium supply handy :)

[...]


-- 

John Devereux

[toc] | [prev] | [next] | [standalone]

#7915

From	Anders.Montonen@kapsi.spam.stop.fi.invalid
Date	2012-04-03 12:05 +0000
Message-ID	<jlep1j$ljm$1@speranza.aioe.org>
In reply to	#7914

John Devereux <john@devereux.me.uk> wrote:

> Only actual chip I have heard of is a sigma-delta from TI. Of course
> 8-10 of these bit are marketing. I would look it up for you but the
> flash selection tool is still "initializing" for me on their site...

Off-topic, but as far as I can tell TI are not using Flash in any of 
their selection tools, only HTML5. Unfortunately their backend sometimes 
glitches out, usually when you need to look up one of their components.

Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT 
high-temperature variant is even available in DIP packaging for the low, 
low price of $218.75 ea.

-a

[toc] | [prev] | [next] | [standalone]

#7919

From	John Devereux <john@devereux.me.uk>
Date	2012-04-03 16:34 +0100
Message-ID	<87sjgkj0bs.fsf@devereux.me.uk>
In reply to	#7915

Anders.Montonen@kapsi.spam.stop.fi.invalid writes:

> John Devereux <john@devereux.me.uk> wrote:
>
>> Only actual chip I have heard of is a sigma-delta from TI. Of course
>> 8-10 of these bit are marketing. I would look it up for you but the
>> flash selection tool is still "initializing" for me on their site...
>
> Off-topic, but as far as I can tell TI are not using Flash in any of 
> their selection tools, only HTML5. Unfortunately their backend sometimes 
> glitches out, usually when you need to look up one of their
> components.

Oh really? Good for them. I apologise to TI, I admit I was using quite
an old browser. 

In fact it seems to work very well in a slightly more modern one. It is
one of the few such manufacturer "selection tools" that uses the whole
width of the browser window. Most are crippled to uselessness by some
stupid marketeers desire to exactly control appearance.

> Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT 
> high-temperature variant is even available in DIP packaging for the low, 
> low price of $218.75 ea.
>
> -a

-- 

John Devereux

[toc] | [prev] | [next] | [standalone]

#7928

From	Paul <paul@pcserviceselectronics.co.uk>
Date	2012-04-04 09:35 +0100
Message-ID	<MPG.29e6182bf6c8f617989713@172.16.0.1>
In reply to	#7919

In article <87sjgkj0bs.fsf@devereux.me.uk>, john@devereux.me.uk says...
> 
> Anders.Montonen@kapsi.spam.stop.fi.invalid writes:
> 
> > John Devereux <john@devereux.me.uk> wrote:
> >
> >> Only actual chip I have heard of is a sigma-delta from TI. Of course
> >> 8-10 of these bit are marketing. I would look it up for you but the
> >> flash selection tool is still "initializing" for me on their site...
> >
> > Off-topic, but as far as I can tell TI are not using Flash in any of 
> > their selection tools, only HTML5. Unfortunately their backend sometimes 
> > glitches out, usually when you need to look up one of their
> > components.
> 
> Oh really? Good for them. I apologise to TI, I admit I was using quite
> an old browser. 
> 
> In fact it seems to work very well in a slightly more modern one. It is
> one of the few such manufacturer "selection tools" that uses the whole
> width of the browser window. Most are crippled to uselessness by some
> stupid marketeers desire to exactly control appearance.

Because the marketeer or developer believe everyone has the same system 
and screen sie as them. Then it looks right when printed out on a piece 
of paper and handed to the board to look at. Don't even get me on fonts 
specified in pixels :)

-- 
Paul Carpenter          | paul@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/>    PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/>  GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate

[toc] | [prev] | [next] | [standalone]

#7924

From	Tim Wescott <tim@seemywebsite.com>
Date	2012-04-03 13:52 -0500
Message-ID	<p72dndn_Y4-U2ubSnZ2dnUVZ_r6dnZ2d@web-ster.com>
In reply to	#7904

On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote:

> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
>> I did a fixed point support package for our 8 bit embedded systems
>> compilers and one interesting metric came out of the project.
>> 
>> Given a number of bits in a number and similar error checking fixed or
>> float took very similar amounts of execution time and code size in
>> applications.
>> 
>> For example 32 bit float and 32 bit fixed point. They are not exact but
>> they are close. In the end much to my surprise the choice is dynamic
>> range or resolution.
> 
>  That makes sense for 8 bit cores, but there is another issue besides
>  speed the OP may need to consider and that is granularity.
> 
>  We had one application where floating point was more convenient, but
>  gave lower precision than a 32*32:64/32 because the float uses 23+1
>  bits to store the number. The other bits are exponent, and give dynamic
>  range, but NOT precision.
> 
>  With 24b ADCs that may start to matter and certainly with 32 bit ADCs,
>  you would need to watch it very carefully.

If you do any filtering at all, the 25 bits of precision often matter 
with a _16_ bit ADC, when they aren't a show-stopper altogether.  It 
wouldn't be sensible to even _think_ about filtering the output of a 24-
bit ADC with single-precision floating point data paths unless the ADC 
had been exceedingly poorly chosen or applied, and had essentially 
useless content in the last several bits.

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

[toc] | [prev] | [next] | [standalone]

#7936

From	Mark Borgerson <mborgerson@comcast.net>
Date	2012-04-04 16:50 -0700
Message-ID	<MPG.29e577bf82329c999898d9@news.eternal-september.org>
In reply to	#7924

In article <p72dndn_Y4-U2ubSnZ2dnUVZ_r6dnZ2d@web-ster.com>, 
tim@seemywebsite.com says...
> 
> On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote:
> 
> > On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> >> I did a fixed point support package for our 8 bit embedded systems
> >> compilers and one interesting metric came out of the project.
> >> 
> >> Given a number of bits in a number and similar error checking fixed or
> >> float took very similar amounts of execution time and code size in
> >> applications.
> >> 
> >> For example 32 bit float and 32 bit fixed point. They are not exact but
> >> they are close. In the end much to my surprise the choice is dynamic
> >> range or resolution.
> > 
> >  That makes sense for 8 bit cores, but there is another issue besides
> >  speed the OP may need to consider and that is granularity.
> > 
> >  We had one application where floating point was more convenient, but
> >  gave lower precision than a 32*32:64/32 because the float uses 23+1
> >  bits to store the number. The other bits are exponent, and give dynamic
> >  range, but NOT precision.
> > 
> >  With 24b ADCs that may start to matter and certainly with 32 bit ADCs,
> >  you would need to watch it very carefully.
> 
> If you do any filtering at all, the 25 bits of precision often matter 
> with a _16_ bit ADC, when they aren't a show-stopper altogether.  It 
> wouldn't be sensible to even _think_ about filtering the output of a 24-
> bit ADC with single-precision floating point data paths unless the ADC 
> had been exceedingly poorly chosen or applied, and had essentially 
> useless content in the last several bits.

I agree with your point  about filtering with 16-bit ADCs.  I generally 
implement FIRs with about 20 taps---which is easiy done
with a 16 x 16 -> 32-bit MAC.  There's no real advantage to floating
point there, and with 16-bit data inputs, dynamic range is not
a problem.

I've usually found that getting the full 24 bits from a 24-bit ADC is 
next to impossible.  The CS5534 that I've used comes with a table that
lists the effective number of bits vs cycle time.  IIRC, need to go to
7-1/2 conversions per second to get over 20 bits.  At  30 or 60 
conversions per second, you're down in the 18 bits range.  However, the
built-in 60Hz rejection is quite helpful for some applications.

Floating point does have it's uses though--where dynamic range is high
and some of  the numbers start out very large----as in chemistry 
calculations where you may start with constants like 6.02245x10^23.
32-bit floating point may not be suitable for exactly counting the
hydrogen ions in a beaker of analyte, but it can give you reasonable
results within the limits of chemical sensors you might use
(Such as pH meter with a 4-digit display.)

Mark Borgerson

[toc] | [prev] | [next] | [standalone]

#7939

From	John Devereux <john@devereux.me.uk>
Date	2012-04-05 11:48 +0100
Message-ID	<87ty0y4fpz.fsf@devereux.me.uk>
In reply to	#7936

Mark Borgerson <mborgerson@comcast.net> writes:

> In article <p72dndn_Y4-U2ubSnZ2dnUVZ_r6dnZ2d@web-ster.com>, 
> tim@seemywebsite.com says...
>> 
>> On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote:
>> 
>> > On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
>> >> I did a fixed point support package for our 8 bit embedded systems
>> >> compilers and one interesting metric came out of the project.
>> >> 
>> >> Given a number of bits in a number and similar error checking fixed or
>> >> float took very similar amounts of execution time and code size in
>> >> applications.
>> >> 
>> >> For example 32 bit float and 32 bit fixed point. They are not exact but
>> >> they are close. In the end much to my surprise the choice is dynamic
>> >> range or resolution.
>> > 
>> >  That makes sense for 8 bit cores, but there is another issue besides
>> >  speed the OP may need to consider and that is granularity.
>> > 
>> >  We had one application where floating point was more convenient, but
>> >  gave lower precision than a 32*32:64/32 because the float uses 23+1
>> >  bits to store the number. The other bits are exponent, and give dynamic
>> >  range, but NOT precision.
>> > 
>> >  With 24b ADCs that may start to matter and certainly with 32 bit ADCs,
>> >  you would need to watch it very carefully.
>> 
>> If you do any filtering at all, the 25 bits of precision often matter 
>> with a _16_ bit ADC, when they aren't a show-stopper altogether.  It 
>> wouldn't be sensible to even _think_ about filtering the output of a 24-
>> bit ADC with single-precision floating point data paths unless the ADC 
>> had been exceedingly poorly chosen or applied, and had essentially 
>> useless content in the last several bits.
>
> I agree with your point  about filtering with 16-bit ADCs.  I generally 
> implement FIRs with about 20 taps---which is easiy done
> with a 16 x 16 -> 32-bit MAC.  There's no real advantage to floating
> point there, and with 16-bit data inputs, dynamic range is not
> a problem.
>
> I've usually found that getting the full 24 bits from a 24-bit ADC is 
> next to impossible.  The CS5534 that I've used comes with a table that
> lists the effective number of bits vs cycle time.  IIRC, need to go to
> 7-1/2 conversions per second to get over 20 bits.  At  30 or 60 
> conversions per second, you're down in the 18 bits range.  However, the
> built-in 60Hz rejection is quite helpful for some applications.
>
> Floating point does have it's uses though--where dynamic range is high
> and some of  the numbers start out very large----as in chemistry 
> calculations where you may start with constants like 6.02245x10^23.
> 32-bit floating point may not be suitable for exactly counting the
> hydrogen ions in a beaker of analyte, but it can give you reasonable
> results within the limits of chemical sensors you might use
> (Such as pH meter with a 4-digit display.)

I find it can be nice for generating the final "result" when a
complicated formula is involved. Or even if not that complicated but
there is some horrible mixture of units involved, Convert everything to
floating point SI unit and just do the calculation, instead of carefully
scaling everything and checking for loss of precision and overflows at
every sub-step.



-- 

John Devereux

[toc] | [prev] | [next] | [standalone]

#7813

From	David Brown <david@westcontrol.removethisbit.com>
Date	2012-03-28 09:17 +0200
Message-ID	<P6idndrAmPpHIe_SnZ2dnUVZ7vOdnZ2d@lyse.net>
In reply to	#7800

On 27/03/2012 19:02, David T. Ashley wrote:
> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>
>> On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>> <dashley@gmail.com>  wrote:
>>
>>>
>>> Without FPU support, assuming that the processor has basic integer
>>> multiplication instructions, integer operations are ALWAYS faster than
>>> floating-point operations.  Usually _far_ faster.  And always more
>>> precise.
>>
>> Floating point instructions MUL/DIV are trivial, just multiply/divide
>> the mantissa and add/sub the exponent.
>>
>> With FP add/sub you have to denormalize one operand and then normalize
>> the result, which can be quite time consuming, without sufficient HW
>> support.
>>
>> This can be really time consuming, if the HW is designed by an idiot.
>
> Your observations are valid.  But I have yet to see a practical
> example of something that can be done faster and with equal accuracy
> in floating point vs. using integer operations.
>

It depends on the chip, the type of floating point hardware it has, the 
operations you need, the compiler, and the code quality.  For a lot of 
heavy calculations done with integer arithmetic, you need a number of 
"extra" instructions as well as the basic add, subtract, multiply and 
divides.  You might need shifts for scaling, mask operations, extra code 
to get the signs right, etc.  And the paths for these are likely to be 
highly serialised, with each depending directly on the results of the 
previous operation, which slows down pipelining.  With hardware floating 
point, you have a much simpler instruction stream, which can result in 
faster throughput even if the actual latency for the calculations is the 
same.

This effect increases with the size and complexity of the processor. 
Obviously it is dependent on the processor having floating point 
hardware for the precision needed (single or double), but once you have 
any sort of hardware floating point you should re-check all your 
assumptions about speed differences.  You could be wrong in either 
direction.

> I concur with your observations.  After reading your first paragaph
> ... yeah, floating-point multiplication is pretty simple so long as
> the floating point format is sane.
>
> Before reading your post, I my mental model was that floating-point
> operations might be 20 times as slow as integer operations.  Now I'm
> thinking maybe 2-3 times.
>
> DTA.

[toc] | [prev] | [next] | [standalone]

#7815

From	Tim Wescott <tim@seemywebsite.com>
Date	2012-03-28 12:20 -0500
Message-ID	<--adnfOVp7zu1e7SnZ2dnUVZ_rWdnZ2d@web-ster.com>
In reply to	#7813

On Wed, 28 Mar 2012 09:17:14 +0200, David Brown wrote:

> On 27/03/2012 19:02, David T. Ashley wrote:
>> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>>
>>> On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>>> <dashley@gmail.com>  wrote:
>>>
>>>
>>>> Without FPU support, assuming that the processor has basic integer
>>>> multiplication instructions, integer operations are ALWAYS faster
>>>> than floating-point operations.  Usually _far_ faster.  And always
>>>> more precise.
>>>
>>> Floating point instructions MUL/DIV are trivial, just multiply/divide
>>> the mantissa and add/sub the exponent.
>>>
>>> With FP add/sub you have to denormalize one operand and then normalize
>>> the result, which can be quite time consuming, without sufficient HW
>>> support.
>>>
>>> This can be really time consuming, if the HW is designed by an idiot.
>>
>> Your observations are valid.  But I have yet to see a practical example
>> of something that can be done faster and with equal accuracy in
>> floating point vs. using integer operations.
>>
>>
> It depends on the chip, the type of floating point hardware it has, the
> operations you need, the compiler, and the code quality.  For a lot of
> heavy calculations done with integer arithmetic, you need a number of
> "extra" instructions as well as the basic add, subtract, multiply and
> divides.  You might need shifts for scaling, mask operations, extra code
> to get the signs right, etc.  And the paths for these are likely to be
> highly serialised, with each depending directly on the results of the
> previous operation, which slows down pipelining.  With hardware floating
> point, you have a much simpler instruction stream, which can result in
> faster throughput even if the actual latency for the calculations is the
> same.
> 
> This effect increases with the size and complexity of the processor.
> Obviously it is dependent on the processor having floating point
> hardware for the precision needed (single or double), but once you have
> any sort of hardware floating point you should re-check all your
> assumptions about speed differences.  You could be wrong in either
> direction.

The key point is "it is dependent on the processor having floating point 
hardware for the precision needed".  And, I might add, on other things -- 
see Walter Banks's comments in another sub-thread about 32-bit floating 
point vs. 32-bit integer math.  

In my experience with signal processing and control loops, having a 
library that implements fixed-point, fractional arithmetic with 
saturation on addition and shift-up is often faster that floating point 
_or_ "pure" integer math, and sidesteps a number of problems with both.  
It's at the cost of a learning curve with anyone using the package, but 
it works well.

On all the processors I've tried it except for x86 processors, there's 
been a 3-20x speedup once I've hand-written the assembly code to do the 
computation (and that's without understanding or trying to accommodate 
any pipelines that may exist).

But on the x86 -- which is the _only_ processor that I've tried it that 
had floating point -- 32-bit fractional arithmetic is slower than 64-bit 
floating point.

So, yes -- whether integer (or fixed point) arithmetic is going to be 
faster than floating point depends _a lot_ on the processor.  So instead 
of automatically deciding to do everything "the hard way" and feeling 
clever and virtuous thereby, you should _benchmark_ the performance of a 
code sample with floating point vs. whatever fixed-point poison you 
choose.

Then, even if fixed point is significantly faster, you should look at the 
time consumed by floating point and ask if it's really necessary to save 
that time: even cheapo 8-bit processors run pretty fast these days, and 
can implement fairly complex control laws at 10 or even 100Hz using 
double-precision floating point arithmetic.  If floating point will do, 
fixed point is a waste of effort.  And if floating point is _faster_, 
fixed point is just plain stupid.

So, benchmark, think, make an informed decision, and then that virtuous 
glow that surrounds you after you make your decision will be earned.

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

[toc] | [prev] | [next] | [standalone]

#7820

From	Andrew Reilly <areilly---@bigpond.net.au>
Date	2012-03-28 22:44 +0000
Message-ID	<9thiigFdvbU2@mid.individual.net>
In reply to	#7815

On Wed, 28 Mar 2012 12:20:51 -0500, Tim Wescott wrote:

> On Wed, 28 Mar 2012 09:17:14 +0200, David Brown wrote:
> 
>> On 27/03/2012 19:02, David T. Ashley wrote:
>>> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>>>
>>>> On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>>>> <dashley@gmail.com>  wrote:
>>>>
>>>>
>>>>> Without FPU support, assuming that the processor has basic integer
>>>>> multiplication instructions, integer operations are ALWAYS faster
>>>>> than floating-point operations.  Usually _far_ faster.  And always
>>>>> more precise.
>>>>
>>>> Floating point instructions MUL/DIV are trivial, just multiply/divide
>>>> the mantissa and add/sub the exponent.
>>>>
>>>> With FP add/sub you have to denormalize one operand and then
>>>> normalize the result, which can be quite time consuming, without
>>>> sufficient HW support.
>>>>
>>>> This can be really time consuming, if the HW is designed by an idiot.
>>>
>>> Your observations are valid.  But I have yet to see a practical
>>> example of something that can be done faster and with equal accuracy
>>> in floating point vs. using integer operations.
>>>
>>>
>> It depends on the chip, the type of floating point hardware it has, the
>> operations you need, the compiler, and the code quality.  For a lot of
>> heavy calculations done with integer arithmetic, you need a number of
>> "extra" instructions as well as the basic add, subtract, multiply and
>> divides.  You might need shifts for scaling, mask operations, extra
>> code to get the signs right, etc.  And the paths for these are likely
>> to be highly serialised, with each depending directly on the results of
>> the previous operation, which slows down pipelining.  With hardware
>> floating point, you have a much simpler instruction stream, which can
>> result in faster throughput even if the actual latency for the
>> calculations is the same.
>> 
>> This effect increases with the size and complexity of the processor.
>> Obviously it is dependent on the processor having floating point
>> hardware for the precision needed (single or double), but once you have
>> any sort of hardware floating point you should re-check all your
>> assumptions about speed differences.  You could be wrong in either
>> direction.
> 
> The key point is "it is dependent on the processor having floating point
> hardware for the precision needed".  And, I might add, on other things
> --
> see Walter Banks's comments in another sub-thread about 32-bit floating
> point vs. 32-bit integer math.
> 
> In my experience with signal processing and control loops, having a
> library that implements fixed-point, fractional arithmetic with
> saturation on addition and shift-up is often faster that floating point
> _or_ "pure" integer math, and sidesteps a number of problems with both.
> It's at the cost of a learning curve with anyone using the package, but
> it works well.
> 
> On all the processors I've tried it except for x86 processors, there's
> been a 3-20x speedup once I've hand-written the assembly code to do the
> computation (and that's without understanding or trying to accommodate
> any pipelines that may exist).

Weren't you the one that said that your (tuned) ARM C code was generally 
only a factor of 1.2 worse than the best hand-tweaked assembly code?  
Maybe not, but I've seen it said in these parts.  Certainly, my 
experience is that that is quite good rule of thumb, and it is very 
difficult to get more than a factor of two between assembler and C unless 
the platform in question has a very poor C compiler or the assembly code 
is actually implementing a different algorithm (which is sometimes 
possible, but much rarer in these days of well-supplied intrinsic 
function libraries.)

> But on the x86 -- which is the _only_ processor that I've tried it that
> had floating point -- 32-bit fractional arithmetic is slower than 64-bit
> floating point.

One thing that gives float a particualr edge on the x86(32) (but which 
can also apply to other processors) is that using floating point means 
that you don't have to use the precious integer register set for data: it 
can be used for pointers, counters and other control periphera, leaving 
the working "data state" in the FPU registers.  Modern SIMD units can do 
integer operations as well as floating point, so the "extra state" 
argument might seem weaker, but I've never seen a compiler use SIMD 
registers for integer calculations (unless forced to with intrinsic 
functions).

> So, yes -- whether integer (or fixed point) arithmetic is going to be
> faster than floating point depends _a lot_ on the processor.  So instead
> of automatically deciding to do everything "the hard way" and feeling
> clever and virtuous thereby, you should _benchmark_ the performance of a
> code sample with floating point vs. whatever fixed-point poison you
> choose.

Fast isn't always the only consideration, though.  Floating point is 
*always* going to be more power-hungry than fixed point, simply because 
it is doing a bunch of extra work at run-time that fixed-point forces you 
to hoist to compile-time.

The advice to benchmark is excellent, of course.  Particularly because 
the results won't necessarily be what you expect.

Cheers,

-- 
Andrew

[toc] | [prev] | [next] | [standalone]

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

csiph-web

Floating point vs fixed arithmetics (signed 64-bit)

Contents

#7795

#7796

#7800

#7802

#7803

#7805

#7810

#7811

#7904

#7911

#7914

#7915

#7919

#7928

#7924

#7936

#7939

#7813

#7815

#7820