Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: David Brown Newsgroups: comp.compilers Subject: Re: Undefined behaviour, was: for or against equality Date: Fri, 7 Jan 2022 15:56:22 +0100 Organization: Compilers Central Lines: 150 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-01-029@comp.compilers> References: <17d70d74-1cf1-cc41-6b38-c0b307aeb35a@gkc.org.uk> <22-01-016@comp.compilers> <22-01-018@comp.compilers> <7f4f52f2-49ee-9e80-1f03-c3fb9c74f574@gkc.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="31420"; mail-complaints-to="abuse@iecc.com" Keywords: standards, semantics Posted-Date: 07 Jan 2022 20:27:05 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-GB In-Reply-To: <7f4f52f2-49ee-9e80-1f03-c3fb9c74f574@gkc.org.uk> Xref: csiph.com comp.compilers:2805 On 07/01/2022 15:02, Martin Ward wrote: > On 06/01/2022 08:11, David Brown wrote: >> The trick is to memorize the/defined/  behaviours, and stick to them. > > Isn't the set of defined behaviours bigger than the set > of undefined behaviours? How do you know what is defined > if you don't know what is undefined? You know what is "defined" because you can find the definition for it - everything else is undefined. You could enumerate all defined behaviours for a language - after all, the documentation (language standards, compiler manual, library documentation, etc.) is finite. It doesn't really make sense to try to find how many undefined behaviours there are - it's like asking how many things are there that are apples. Language standards tell you the defined behaviour for a language. Anything that is not there, is undefined - that's simply what the word "undefined" means. Note that there are many other things besides language standards that define behaviour of code in practice - compilers or interpreters can add their own definitions to things that are not defined by the language standards, as can additional standards such as POSIX. If you write a function "foo" - perhaps written in the same language (such as C), perhaps in a completely different language - then its behaviour is not defined by the language standards. It is not mentioned anywhere in those documents, so it is undefined. (That is different from functions whose behaviour is specified in the standard, such as "memcpy".) Undefined behaviour, as far as language standards are concerned, are omnipresent in programming - for all languages. The problem only comes when you attempt to execute something that does not have its behaviour defined /anywhere/. Then it is incorrect code - a bug. When I learned to program (i.e., during my university education rather than from books, magazines and trial and error previous to that), we were very clear about how a function is specified. You have a pre-condition and a post-condition. The function can assume the pre-condition is logically "true", and it will guarantee that the post-condition is true at the exit. (Typically you also have an "invariant" that is a clause in both parts, but that is just for convenience.) If the function is called when the pre-condition is false, the function has no obligation to do anything - it can give an error, launch nasal daemons, give the answer it thinks the programmer hoped for, or anything else. The behaviour is undefined. This concept has existed since the dawn of programming: """ On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Charles Babbage """ The C standards contain a fair number of explicit undefined behaviours. They do that for convenience and clarity, and often to encourage compiler developers towards greater efficiency rather than run-time checks, and to encourage programmers towards not assuming particular behaviours even if one compiler happens to define the behaviour. So a compiler writer knows that they can assume "a + b" never overflows (for integer arithmetic), and a programmer knows that they can't assume signed arithmetic is wrapping even if the compiler they are using at the time /guarantees/ wrapping behaviour. (I have never seen a C compiler that guarantees this without explicit flags.) C is a language that expects the programmer to take responsibility for his or her code, and ensure that it is correct. Fortunately, good compiler developers know this is difficult and provide tools to help people find their bugs. Thus you have a language that can give efficient results, /and/ provide good debugging and run-time checking, as long as you get good tools and understand how to use them. > > For example, a = b + c is precisely defined in C and C++ for > floating point variables, but the result can be "undefined behaviour" > for ordinary 32 bit signed integer values. > Actually, it is not precisely defined for floating point operations - if there is an "exceptional condition" during the evaluation (the result is not mathematically defined or not in the range of representable values for its type), the behaviour is undefined. That applies to all expressions - integer and floating point. Now, it is very common (but certainly not universal) for C implementations to use IEEE floating point formats and rules. These provide the "mathematical definitions" for floating point operations, including handling of calculations outside the normal ranges. But if you are not using these, such calculations could result in undefined behaviour. (For example, if you use "gcc -ffast-math", the compiler will assume that all expressions are normal finite numbers - that's perfectly valid for C, and can be very much more efficient on a lot of targets.) Signed integer overflow is undefined behaviour on most compilers (the size is not necessarily 32-bit). The only one I know that defines the behaviour is gcc (and compatibles, such as clang and icc) with the "-fwrapv" flag enabled. And of course that makes perfect sense. It is logical to assume that if you add two positive numbers, you get a positive number - it is illogical to suppose that sometimes the "correct" answer will be negative. Some programming languages (such as Java) specifically define signed integer arithmetic to be wrapping - the result is that sometimes you get the wrong answer in Java, while in C you would get undefined behaviour. Wrong answers are less helpful - leaving the behaviour undefined means you get more efficient code and that you can use debugging tools (such as gcc's -fsantitize=undefined) to help find the errors in your code. > If you want to stick to defined behaviours then you need > to add extra code. For example, CERT recommends: > >   if (((si_b > 0) && (si_a > (INT_MAX - si_b))) || >       ((si_b < 0) && (si_a < (INT_MIN - si_b)))) { >     /* Handle error */ >   } else { >     sum = si_a + si_b; >   } > That is /not/ code to "stick to defined behaviours". It is code to identify problems and perhaps find some way to handle it (depending on what the "handle error" code is). You can "stick to defined behaviour" much more simply: int sum = (unsigned int) si_a + (unsigned int) si_b; The behaviour is fully defined, and the result will be wrong if there is an overflow - just like when you use a language that has fully defined signed integer arithmetic by wrapping. The answer here is /not/ to worry about what happens when your expressions overflow and you get undefined behaviour. The answer is to think about the code you are writing, and make sure that the types and expressions you write are appropriate for the values you have. Check your values for validity when you get them in (from files, user input, etc.), then write code that is correct for the full range of values. Simple. (Well, as simple as any programming!)