Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Spiros Bousbouras Newsgroups: comp.compilers Subject: Re: Undefined Behavior Optimizations in C Date: Wed, 18 Jan 2023 13:14:35 -0000 (UTC) Organization: Aioe.org NNTP Server Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-01-062@comp.compilers> References: <23-01-027@comp.compilers> <23-01-031@comp.compilers> <23-01-041@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="85677"; mail-complaints-to="abuse@iecc.com" Keywords: C, optimize Posted-Date: 18 Jan 2023 11:35:37 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3330 On Wed, 11 Jan 2023 14:20:49 +0100 David Brown wrote: > C was designed from day one to be a high-level language, not an > assembler of any sort. Limitations of weaker earlier compilers does > not mean the language was supposed to work that way. For those who want an abstract or portable assembler , there exists c9x.me/compile/ .I've never used it but at least it aims to be that , unlike C. I would be curious to know of other analogous projects. I guess the "register transfer language" of GCC is somewhat analogous. > I first used a C compiler that optimised on the assumption that UB > didn't happen some 25 years ago. (In particular, it assumed signed > integer arithmetic never overflowed.) I have encountered several times the claim that compilers assume that UB does not happen and I don't understand it. Lets consider 2 examples : x + 1 > x in C where x is a signed integer. Compilers will often treat this as always true with the following reasoning : - if x does not have the maximum value which fits in its type then the meaning of the C expressions is the same as their mathematical meaning so the expression evaluates to true. - if x has the maximum value which fits in its type then x + 1 is not defined so any translation (including treating the whole expression as true) is valid. There's no assumption that UB (undefined behaviour) will not happen, both possibilities are accounted for. Another example is ... *some_pointer_object ... [ some_pointer_object does not get modified in this part of the code and has not been declared as volatile ] if (some_pointer_object == NULL) ... If some_pointer_object is not NULL then the test can be omitted ; if it is NULL then the earlier dereference is UB so any translation is valid including omitting the test. Again, there's no assumpion that UB will not happen. So the request that C compilers should stop assuming that UB will not happen seems to me completely misguided. I think what is really meant is that, in reasoning what a valid translation is, C compilers (or the authors of the compilers) should not employ the notion of UB. But then how should UB be translated ? Again there exists the assumption or claim that there is some intuitively obvious translation and compilers should go for that. First, I'm not sure that there exists such a common intuition even among humans and second, even if it does , how does one go from an intuition to an algorithm C compilers can use to do translation ? Lots of things are intuitively obvious but creating an algorithm to duplicate the human intuition is a hard problem, one which has not been solved in many cases and perhaps even one which is unsolvable in some cases. I've seen the suggestion that compilers should describe their behaviour in terms of assembly generated (possibly some kind of abstract assembly) as opposed to higher terms. I'm not sure if this is possible and, even if it is, I would not find it useful. I tend to think of what I want my code to do in higher terms and then bring it down to the level of the language with successive refinements. If parts of C were described in assembly terms then it would potentially force me to do at least 1 more refinement step with no benefit. A more productive avenue is for people to give definitions, as precise as possible, to the kinds of UB which has caused them problems and then try to convince compiler writers to implement such extensions if they don't do so already. In this area even compiler documentation should perhaps improve. For example, from the GCC man page -fdelete-null-pointer-checks Use global dataflow analysis to identify and eliminate useless checks for null pointers. The compiler assumes that dereferencing a null pointer would have halted the program. If a pointer is checked after it has already been dereferenced, it cannot be null. In some environments, this assumption is not true, and programs can safely dereference null pointers. Use -fno-delete-null-pointer-checks to disable this optimization for programs which depend on that behavior. .The above still doesn't tell me what is supposed to happen when a NULL pointer is dereferenced even with the -fno-delete-null-pointer-checks flag. I'm guessing it's impossible to give a general definition. One can in specific systems but in general no so perhaps the above description does the best possible. Another example -fstrict-overflow Allow the compiler to assume strict signed overflow rules, depending on the language being compiled. For C (and C++) this means that overflow when doing arithmetic with signed numbers is undefined, which means that the compiler may assume that it will not happen. This is poor phrasing, in particular the part "which means that the compiler may assume that it will not happen" is redundant. There is no reason for the compiler to assume anything about which execution paths will happen during runtime to conclude for example that x + 1 > x can be translated as true. The above quote gives an unnecessarily circuitous reasoning as to why the expression can be translated as true. I give a more direct reasoning above. > It annoys /me/ intensely that people complain about this sort of thing, > and yet apparently haven't bothered to read the compiler manuals to see > how to get the effects they want. Compile with "-fno-strict-aliasing", > or (better, IMHO) add this to your code: > > #pragma GCC optimize ("-fno-strict-aliasing") > > Now, if you want to complain that the gcc documentation is not great, Yeah, it would be good if there was a more precise specification as to what additional guarantees beyond the C standard this gives. For translating other languages into C, this seems to be important for achieving object allocation and garbage collection since relying on the native malloc() and related is generally not adequate, at least not if your garbage collector is allowed to move objects.