Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Andrew Reilly <areilly---@bigpond.net.au>
Newsgroups: comp.arch.embedded
Subject: Re: Floating point vs fixed arithmetics (signed 64-bit)
Date: 28 Mar 2012 22:44:32 GMT
Lines: 110
Message-ID: <9thiigFdvbU2@mid.individual.net>
References: <f492a14e-767b-4ed1-b889-653f2c8dc12d@pd5g2000pbc.googlegroups.com> <jdm3n7pd6aavn3cfbg5ugvl6fmbh738sdv@4ax.com> <u0o3n75ffv4gqng8qh16fdap2ugevln58q@4ax.com> <qcs3n7ponjd09bq27qgb648fnmovco9mn1@4ax.com> <P6idndrAmPpHIe_SnZ2dnUVZ7vOdnZ2d@lyse.net> <--adnfOVp7zu1e7SnZ2dnUVZ_rWdnZ2d@web-ster.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net U02C0W/gWm6o5YCxucHRHgZQ3Nh2DIcioJXzc2zYE4UrS6r66i
Cancel-Lock: sha1:IWo4anAWyWZ3DYtQj7VhbuWXIkw=
User-Agent: Pan/0.135 (Tomorrow I'll Wake Up and Scald Myself with Tea; GIT 30dc37b master)
Xref: csiph.com comp.arch.embedded:7820

On Wed, 28 Mar 2012 12:20:51 -0500, Tim Wescott wrote:

> On Wed, 28 Mar 2012 09:17:14 +0200, David Brown wrote:
> 
>> On 27/03/2012 19:02, David T. Ashley wrote:
>>> On Tue, 27 Mar 2012 18:52:09 +0300, upsidedown@downunder.com wrote:
>>>
>>>> On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>>>> <dashley@gmail.com>  wrote:
>>>>
>>>>
>>>>> Without FPU support, assuming that the processor has basic integer
>>>>> multiplication instructions, integer operations are ALWAYS faster
>>>>> than floating-point operations.  Usually _far_ faster.  And always
>>>>> more precise.
>>>>
>>>> Floating point instructions MUL/DIV are trivial, just multiply/divide
>>>> the mantissa and add/sub the exponent.
>>>>
>>>> With FP add/sub you have to denormalize one operand and then
>>>> normalize the result, which can be quite time consuming, without
>>>> sufficient HW support.
>>>>
>>>> This can be really time consuming, if the HW is designed by an idiot.
>>>
>>> Your observations are valid.  But I have yet to see a practical
>>> example of something that can be done faster and with equal accuracy
>>> in floating point vs. using integer operations.
>>>
>>>
>> It depends on the chip, the type of floating point hardware it has, the
>> operations you need, the compiler, and the code quality.  For a lot of
>> heavy calculations done with integer arithmetic, you need a number of
>> "extra" instructions as well as the basic add, subtract, multiply and
>> divides.  You might need shifts for scaling, mask operations, extra
>> code to get the signs right, etc.  And the paths for these are likely
>> to be highly serialised, with each depending directly on the results of
>> the previous operation, which slows down pipelining.  With hardware
>> floating point, you have a much simpler instruction stream, which can
>> result in faster throughput even if the actual latency for the
>> calculations is the same.
>> 
>> This effect increases with the size and complexity of the processor.
>> Obviously it is dependent on the processor having floating point
>> hardware for the precision needed (single or double), but once you have
>> any sort of hardware floating point you should re-check all your
>> assumptions about speed differences.  You could be wrong in either
>> direction.
> 
> The key point is "it is dependent on the processor having floating point
> hardware for the precision needed".  And, I might add, on other things
> --
> see Walter Banks's comments in another sub-thread about 32-bit floating
> point vs. 32-bit integer math.
> 
> In my experience with signal processing and control loops, having a
> library that implements fixed-point, fractional arithmetic with
> saturation on addition and shift-up is often faster that floating point
> _or_ "pure" integer math, and sidesteps a number of problems with both.
> It's at the cost of a learning curve with anyone using the package, but
> it works well.
> 
> On all the processors I've tried it except for x86 processors, there's
> been a 3-20x speedup once I've hand-written the assembly code to do the
> computation (and that's without understanding or trying to accommodate
> any pipelines that may exist).

Weren't you the one that said that your (tuned) ARM C code was generally 
only a factor of 1.2 worse than the best hand-tweaked assembly code?  
Maybe not, but I've seen it said in these parts.  Certainly, my 
experience is that that is quite good rule of thumb, and it is very 
difficult to get more than a factor of two between assembler and C unless 
the platform in question has a very poor C compiler or the assembly code 
is actually implementing a different algorithm (which is sometimes 
possible, but much rarer in these days of well-supplied intrinsic 
function libraries.)

> But on the x86 -- which is the _only_ processor that I've tried it that
> had floating point -- 32-bit fractional arithmetic is slower than 64-bit
> floating point.

One thing that gives float a particualr edge on the x86(32) (but which 
can also apply to other processors) is that using floating point means 
that you don't have to use the precious integer register set for data: it 
can be used for pointers, counters and other control periphera, leaving 
the working "data state" in the FPU registers.  Modern SIMD units can do 
integer operations as well as floating point, so the "extra state" 
argument might seem weaker, but I've never seen a compiler use SIMD 
registers for integer calculations (unless forced to with intrinsic 
functions).

> So, yes -- whether integer (or fixed point) arithmetic is going to be
> faster than floating point depends _a lot_ on the processor.  So instead
> of automatically deciding to do everything "the hard way" and feeling
> clever and virtuous thereby, you should _benchmark_ the performance of a
> code sample with floating point vs. whatever fixed-point poison you
> choose.

Fast isn't always the only consideration, though.  Floating point is 
*always* going to be more power-hungry than fixed point, simply because 
it is doing a bunch of extra work at run-time that fixed-point forces you 
to hoist to compile-time.

The advice to benchmark is excellent, of course.  Particularly because 
the results won't necessarily be what you expect.

Cheers,

-- 
Andrew