Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: David Brown Newsgroups: comp.compilers Subject: Re: Optimization techniques Date: Thu, 25 Apr 2019 21:58:06 +0200 Organization: Compilers Central Lines: 220 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-04-021@comp.compilers> References: <72d208c9-169f-155c-5e73-9ca74f78e390@gkc.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="40547"; mail-complaints-to="abuse@iecc.com" Keywords: arithmetic, optimize Posted-Date: 25 Apr 2019 16:17:16 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-GB Xref: csiph.com comp.compilers:2205 On 25/04/2019 17:36, Martin Ward wrote: > David Brown wrote: >> And there are undefined behaviours which could be given definitions, but >> doing so is not actually a positive thing.  The prime example is signed >> integer overflow.  Since overflowing your integers is almost always an >> error in the code anyway, there are no benefits in giving it defined >> behaviour. > > This is completely backwards. If signed overflow was given a defined > behaviour (such as the two's complement result), then compilers for > CPUs which do not implement two's complement operations > would have to generate less efficient code (but does anyone still > make such a CPU?). True - but as you say, such cpus are not realistic. > Any program which guarantees not to overflow would be unaffected. Incorrect. Its correctness would be unaffected, but efficiency can be quite a bit different. The compiler knows all sorts of things when signed integer arithmetic does not overflow, and can use this for optimisation. It knows that "x * 2 / 2" is "x". It knows that "x + 1 > x" is true. It knows that "for (i = 0; i <= n; i++)" will always end, after n + 1 iterations - after checking for n being <= 0 at the start of the loop. It knows that adding two non-negative numbers gives a non-negative number. It knows that abs(x) is not negative. It knows that the distributive and commutative laws of mathematical integers hold for C signed integer types. It knows that it can simplify multiplications and divisions using normal mathematics. These are significant. With modern compilers doing all sorts of inlining, function cloning, cross-module optimisation (link-time optimisation), and with modern C++ code bases using templates, there is a great deal of scope for this kind of manipulation and optimisation. People don't write "x * 2 / 2" directly, but after inlining, constant propagation, constants, macros, etc., these sorts of things turn up. And knowing that the compiler will sort of the details to get the best code here, means programmers can concentrate on writing their source in the clearest and most maintainable manner. The other big thing about making overflow undefined is that the compiler and tools can help check for mistakes. Compilers can spot cases where there are overflows with compile-time constants, and tools like sanitizers can add run-time checks to help debug code. So when your program has counted up 2147483647 apples, and you add another one to the cart, your debugging tools can tell you you have made a mistake. If you are using wrapping integers, the tools have to believe you meant that adding another apple gives you -2147483648. Your code has a bug, but the tools can't help you out. It is a serious mistake to mix up "defined behaviour" and "correct behaviour". Only defined behaviour can be correct, but you can't fix incorrect code by making it defined behaviour. > Any program which expects a two's complement > result could now be relied upon to work correctly, and not suddenly > produce random results when the next version of the compuler comes out! If your code relies on two's complement wrapping integers, then you need to use a language or a compiler that guarantees that behaviour. Standard C doesn't give you that - if your code relies on it, your code is not correct standard C code. Why should people who know the language correctly have to pay a price for people who don't know what they are doing? But in order to be as useful as possible to as many people as possible, compilers such as gcc and clang offer command-line switches to help here. If you want to program in a language that is mostly like C except that signed integer overflow is defined with wrapping semantics, then these compilers support that language via the "-fwrapv" switch. (Other compilers may have similar switches - I only mention the ones I know.) However, if you write your code in not-quite-C, you can't really expect it to work on a standard C compiler. It is certainly true that "two's complement wrapping" behaviour will not turn a correct C program into an incorrect one, as any defined behaviour satisfies "undefined behaviour" for correctness. But it will reduce the efficiency of the correct C code, and it will reduce the compiler and tools' ability to help find bugs. And that is a cost I don't want to pay. > Any program which expected a different result (one's complement anyone?) > would need additional code. Yes, if there are any such systems still in use (I have only heard of one). > > With the current situation, anyone wanting to avoid > undefined behaviour (and don't we all?) has to write code like > this for any signed operation: > > signed int sum; > if (((si_b > 0) && (si_a > (INT_MAX - si_b))) || >     ((si_b < 0) && (si_a < (INT_MIN - si_b)))) { >   /* Handle error */ > } else { >   sum = si_a + si_b; > } No, they don't. When I want to add two integers, I usually write "a + b". But then, I make a point of knowing what my code does, what my values are, what their ranges are, and I don't write code that might not make sense. (Barring bugs, of course - I am no more perfect than any other programmer. That's why I want to give my tools the best chance they have at helping spot problems.) It is rare that you do arithmetic on data without any idea of what that data might be or mean - and thus that you have a solid idea of what ranges are valid. And when you are working with general functions that handle raw values without meaning, you punt the problem to the client - it is perfectly acceptable to write a function "sum" that is defined to give the correct sum of two ints as long as the result is in the range of an int - and thus the function has undefined behaviour outside that range. That's just normal function specification, preconditions and design by contract - something we all do in all coding, even though not everyone knows it by name. Finally, if you have an unusual case where you need to spot and handle overflow errors, there are many ways to do it. In general it is best to either be sure that the calculation is valid (such as by using larger types), or to check ranges in advance, or (for code where efficiency is more important than portability) use compiler features as necessary to get the results you need. What you /don't/ do is say "my compiler should be using two's complement arithmetic on my two's complement cpu. I don't really care if the result is meaningless - it's defined, and that's good enough for me". > See: > > https://wiki.sei.cmu.edu/confluence/display/c/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow > > >>  But mathematical identities such as associativity and commutativity are >> valid because signed integer overflow does not happen - thus "a * (b + >> c)" can be changed to "(a * b) + (a * c)". > > The following slide set discusses some of the problems with undefined > behaviour starting on slide 43: > > http://www0.cs.ucl.ac.uk/staff/B.Karp/3007/s2018/lectures/3007-lecture5-C-arith-undef-behav.pdf > > > Examples of problems include privilege elevation exploits > and denial of service attacks. These are just program bugs, because programmers didn't do the right thing. It happens - few programmers or programs are perfect, totally regardless of the language and how particular operations are defined. Some of these examples are obvious, and should never have passed code review - they demonstrate failures in the development methodology, not issues with undefined behaviour. You check first, then use the data - not the other way round. You don't drive your car straight out into a junction, then check if you hit something and back up - you check first, then drive when it is safe. You don't dereference a pointer, and then check if it is valid - you check first. C is a language for engineers - for people who know what they are doing, and want top efficiency. If you want a language that holds your hand and checks everything, giving you polite warnings and exceptions when there is a problem, then another language would suit you better. (This is not a condemnation of other languages - I also write a fair bit of Python, precisely because it manages lots of little things automatically.) C is designed on the assumption that people know the language - and C compilers assume that. Many of the potential problems discussed in that paper can be identified at compile time by the compiler. That does require that you use a good compiler (or good static error checker), and that you know how to use your tools - I don't have much patience for those that refuse to learn their tools. I am all in favour of ideas and methods that reduce the risk of bugs in code. But making signed integer overflow defined as two's complement would be a step backwards, not a step forwards. The same goes for limiting optimisations that assume code is correct. Progress will be made through better education and more awareness of the potential problems (such as the earlier slides in your link, about the dangers of mixing signed and unsigned types), through better tools, and through better defaults for tools. Compilers like gcc can spot a good deal of the problems discussed - but only if you enable the right warning flags. I'd prefer lots of warning flags enabled by default, and provide a "I write very weird code - trust me and turn off the warnings" flag for those that don't like them. > > There was a discussion about undefined behaviour on this list > in March/April last year. Some examples: > > https://blog.regehr.org/archives/213 > https://blog.regehr.org/archives/759 > I have read a number of John Regehr's articles. He seems to be fond of publishing unhelpful and condescending criticism. > Gcc may optimize out tests for buffer overflows > because of integer overflows: > > https://lwn.net/Articles/278137/ > gcc has removed code that doesn't do anything useful. That is a /good/ thing - and something programmers rely on. Especially with C++ templates, removing dead code is an absolutely essential optimisation. The trick is to write the tests correctly so that they make sense and do what you want them to do.