Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.compilers
Subject: Re: Optimization techniques
Date: Thu, 25 Apr 2019 21:58:06 +0200
Organization: Compilers Central
Lines: 220
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <19-04-021@comp.compilers>
References: <72d208c9-169f-155c-5e73-9ca74f78e390@gkc.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="40547"; mail-complaints-to="abuse@iecc.com"
Keywords: arithmetic, optimize
Posted-Date: 25 Apr 2019 16:17:16 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-GB
Xref: csiph.com comp.compilers:2205

On 25/04/2019 17:36, Martin Ward wrote:
> David Brown <david.brown@hesbynett.no> wrote:
>> And there are undefined behaviours which could be given definitions, but
>> doing so is not actually a positive thing.  The prime example is signed
>> integer overflow.  Since overflowing your integers is almost always an
>> error in the code anyway, there are no benefits in giving it defined
>> behaviour.
>
> This is completely backwards. If signed overflow was given a defined
> behaviour (such as the two's complement result), then compilers for
> CPUs which do not implement two's complement operations
> would have to generate less efficient code (but does anyone still
> make such a CPU?).

True - but as you say, such cpus are not realistic.

> Any program which guarantees not to overflow would be unaffected.

Incorrect.  Its correctness would be unaffected, but efficiency can be
quite a bit different.

The compiler knows all sorts of things when signed integer arithmetic
does not overflow, and can use this for optimisation.

It knows that "x * 2 / 2" is "x".  It knows that "x + 1 > x" is true.
It knows that "for (i = 0; i <= n; i++)" will always end, after n + 1
iterations - after checking for n being <= 0 at the start of the loop.
It knows that adding two non-negative numbers gives a non-negative
number.  It knows that abs(x) is not negative.  It knows that the
distributive and commutative laws of mathematical integers hold for C
signed integer types.  It knows that it can simplify multiplications and
divisions using normal mathematics.

These are significant.  With modern compilers doing all sorts of
inlining, function cloning, cross-module optimisation (link-time
optimisation), and with modern C++ code bases using templates, there is
a great deal of scope for this kind of manipulation and optimisation.
People don't write "x * 2 / 2" directly, but after inlining, constant
propagation, constants, macros, etc., these sorts of things turn up.
And knowing that the compiler will sort of the details to get the best
code here, means programmers can concentrate on writing their source in
the clearest and most maintainable manner.


The other big thing about making overflow undefined is that the compiler
and tools can help check for mistakes.  Compilers can spot cases where
there are overflows with compile-time constants, and tools like
sanitizers can add run-time checks to help debug code.  So when your
program has counted up 2147483647 apples, and you add another one to the
cart, your debugging tools can tell you you have made a mistake.  If you
are using wrapping integers, the tools have to believe you meant that
adding another apple gives you -2147483648.  Your code has a bug, but
the tools can't help you out.


It is a serious mistake to mix up "defined behaviour" and "correct
behaviour".  Only defined behaviour can be correct, but you can't fix
incorrect code by making it defined behaviour.


> Any program which expects a two's complement
> result could now be relied upon to work correctly, and not suddenly
> produce random results when the next version of the compuler comes out!

If your code relies on two's complement wrapping integers, then you need
to use a language or a compiler that guarantees that behaviour.
Standard C doesn't give you that - if your code relies on it, your code
is not correct standard C code.  Why should people who know the language
correctly have to pay a price for people who don't know what they are doing?

But in order to be as useful as possible to as many people as possible,
compilers such as gcc and clang offer command-line switches to help
here.  If you want to program in a language that is mostly like C except
that signed integer overflow is defined with wrapping semantics, then
these compilers support that language via the "-fwrapv" switch.  (Other
compilers may have similar switches - I only mention the ones I know.)

However, if you write your code in not-quite-C, you can't really expect
it to work on a standard C compiler.


It is certainly true that "two's complement wrapping" behaviour will not
turn a correct C program into an incorrect one, as any defined behaviour
satisfies "undefined behaviour" for correctness.  But it will reduce the
efficiency of the correct C code, and it will reduce the compiler and
tools' ability to help find bugs.  And that is a cost I don't want to pay.


> Any program which expected a different result (one's complement anyone?)
> would need additional code.

Yes, if there are any such systems still in use (I have only heard of one).

>
> With the current situation, anyone wanting to avoid
> undefined behaviour (and don't we all?) has to write code like
> this for any signed operation:
>
> signed int sum;
> if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
>      ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
>    /* Handle error */
> } else {
>    sum = si_a + si_b;
> }

No, they don't.

When I want to add two integers, I usually write "a + b".  But then, I
make a point of knowing what my code does, what my values are, what
their ranges are, and I don't write code that might not make sense.
(Barring bugs, of course - I am no more perfect than any other
programmer.  That's why I want to give my tools the best chance they
have at helping spot problems.)


It is rare that you do arithmetic on data without any idea of what that
data might be or mean - and thus that you have a solid idea of what
ranges are valid.  And when you are working with general functions that
handle raw values without meaning, you punt the problem to the client -
it is perfectly acceptable to write a function "sum" that is defined to
give the correct sum of two ints as long as the result is in the range
of an int - and thus the function has undefined behaviour outside that
range.  That's just normal function specification, preconditions and
design by contract - something we all do in all coding, even though not
everyone knows it by name.


Finally, if you have an unusual case where you need to spot and handle
overflow errors, there are many ways to do it.  In general it is best to
either be sure that the calculation is valid (such as by using larger
types), or to check ranges in advance, or (for code where efficiency is
more important than portability) use compiler features as necessary to
get the results you need.  What you /don't/ do is say "my compiler
should be using two's complement arithmetic on my two's complement cpu.
I don't really care if the result is meaningless - it's defined, and
that's good enough for me".


> See:
>
> https://wiki.sei.cmu.edu/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow
>
>
>>  But mathematical identities such as associativity and commutativity are
>> valid because signed integer overflow does not happen - thus "a * (b +
>> c)" can be changed to "(a * b) + (a * c)".
>
> The following slide set discusses some of the problems with undefined
> behaviour starting on slide 43:
>
> http://www0.cs.ucl.ac.uk/staff/B.Karp/3007/s2018/lectures/3007-lecture5-C-arith-undef-behav.pdf
>
>
> Examples of problems include privilege elevation exploits
> and denial of service attacks.

These are just program bugs, because programmers didn't do the right
thing.  It happens - few programmers or programs are perfect, totally
regardless of the language and how particular operations are defined.
Some of these examples are obvious, and should never have passed code
review - they demonstrate failures in the development methodology, not
issues with undefined behaviour.  You check first, then use the data -
not the other way round.  You don't drive your car straight out into a
junction, then check if you hit something and back up - you check first,
then drive when it is safe.  You don't dereference a pointer, and then
check if it is valid - you check first.

C is a language for engineers - for people who know what they are doing,
and want top efficiency.  If you want a language that holds your hand
and checks everything, giving you polite warnings and exceptions when
there is a problem, then another language would suit you better.  (This
is not a condemnation of other languages - I also write a fair bit of
Python, precisely because it manages lots of little things
automatically.)  C is designed on the assumption that people know the
language - and C compilers assume that.

Many of the potential problems discussed in that paper can be identified
at compile time by the compiler.  That does require that you use a good
compiler (or good static error checker), and that you know how to use
your tools - I don't have much patience for those that refuse to learn
their tools.

I am all in favour of ideas and methods that reduce the risk of bugs in
code.  But making signed integer overflow defined as two's complement
would be a step backwards, not a step forwards.  The same goes for
limiting optimisations that assume code is correct.  Progress will be
made through better education and more awareness of the potential
problems (such as the earlier slides in your link, about the dangers of
mixing signed and unsigned types), through better tools, and through
better defaults for tools.  Compilers like gcc can spot a good deal of
the problems discussed - but only if you enable the right warning flags.
  I'd prefer lots of warning flags enabled by default, and provide a "I
write very weird code - trust me and turn off the warnings" flag for
those that don't like them.


>
> There was a discussion about undefined behaviour on this list
> in March/April last year. Some examples:
>
> https://blog.regehr.org/archives/213
> https://blog.regehr.org/archives/759
>

I have read a number of John Regehr's articles.  He seems to be fond of
publishing unhelpful and condescending criticism.

> Gcc may optimize out tests for buffer overflows
> because of integer overflows:
>
> https://lwn.net/Articles/278137/
>

gcc has removed code that doesn't do anything useful.  That is a /good/
thing - and something programmers rely on.  Especially with C++
templates, removing dead code is an absolutely essential optimisation.

The trick is to write the tests correctly so that they make sense and do
what you want them to do.