Path: csiph.com!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: David Brown Newsgroups: comp.compilers Subject: Re: Undefined Behavior Optimizations in C Date: Fri, 6 Jan 2023 16:12:25 +0100 Organization: A noiseless patient Spider Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-01-017@comp.compilers> References: <23-01-009@comp.compilers> <23-01-011@comp.compilers> <23-01-012@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="30902"; mail-complaints-to="abuse@iecc.com" Keywords: optimize, semantics Posted-Date: 06 Jan 2023 12:03:27 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <23-01-012@comp.compilers> Content-Language: en-GB Xref: csiph.com comp.compilers:3286 On 06/01/2023 01:22, gah4 wrote: > On Thursday, January 5, 2023 at 10:13:08 AM UTC-8, Spiros Bousbouras wrote: >> On 5 Jan 2023 10:05:49 +0000 >> "Lucian Popescu" wrote: > >>> I'm currently working on an academic project that analyzes the speedup gain of >>> Undefined Behavior Optimizations in C. > (snip) > >>> To test the theory that the UB Optimizations introduce more risks than >>> speedup gains, > >> Isn't this comparing apples and oranges ? > > Probably. > > You can quantify speed-up, but it is harder to quantify risk. > > You might be able to quantify debug time, and how much longer > it takes to debug a program with such behavior. > > Most important when debugging, is that you can trust the compiler to > do what you said. That they don't, has always been part of > optimization, but these UB make it worse. The trouble with undefined behaviour is that, in general, you cannot trust the compiler to "do what you say" because it cannot know what you have said. A computer language like C is defined by its standard. This says what particular combinations of characters in the source code actually mean. If what you write does not fit the specified and documented patterns (or the pattern is explicitly labelled "undefined behaviour"), then it does not mean anything at all. So if you write "give me some prime numbers" as your C code, it means nothing and the compiler can't help you. If you write "int * p = 0; int x = *p;", it means nothing and the compiler can't help you. (Well, the compiler might be able to give helpful error messages!) When you write "x = *p;", you are saying to the compiler "It is a fact that p is valid pointer to data of a type compatible to *p, all the constraints required for the assignment operation are met, there is no partial overlap between x and *p, there are no data races, and no range errors converting any floating point values. Given that, act as though the value of x is now equal to *p after any required conversions." You might /think/ you are saying "read the value at address p, and store it in the memory reserved for x". When you write "x = (y * 30) / 15;" (for "int" x and y), you might /think/ you are asking the compiler to multiply by 30 and then divide by 15. But you are actually telling it that there would be no overflow if it were to multiply y by 30, and thus it can use simple mathematical equalities to reduce the expression to "x = y * 2;". You can /always/ trust the compiler to do what you said, barring occasional bugs in the compiler. What you cannot always do is trust the programmer to know what he or she /actually/ said, or to write what he or she meant. And outside of flags that change the language semantics, such as gcc's -fwrapv or -fno-strict-aliasing, enabling or disabling optimisations does not change the meaning of the code. It might affect how code is generated (and therefore its speed and size), but if the behaviour is different then it is because your code did not say what you thought it said. And yes, I know debugging optimised code can be difficult. Sometimes that means adding extra "volatile" qualifiers, or "asm volatile("" ::: "memory");" fences, in order to have good breakpoint spots or debugging information.