Path: csiph.com!1.us.feeder.erje.net!feeder.erje.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: David Brown Newsgroups: comp.compilers Subject: Re: Undefined Behavior Optimizations in C Date: Wed, 11 Jan 2023 14:20:49 +0100 Organization: A noiseless patient Spider Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-01-041@comp.compilers> References: <23-01-027@comp.compilers> <23-01-031@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="95876"; mail-complaints-to="abuse@iecc.com" Keywords: C, standards Posted-Date: 11 Jan 2023 18:11:50 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-GB In-Reply-To: <23-01-031@comp.compilers> Xref: csiph.com comp.compilers:3309 On 10/01/2023 11:46, Jon Chesterfield wrote: >> So before we decide if UB optimizations are actually allowed by the > standard we need to decide what "ignoring the situation completely > with unpredictable results" actually means. > > [1] https://port70.net/~nsz/c/c89/rationale/ > > Lucian > > WG14 are aware of UB optimising compilers and could have steered away from > this path, but haven't. It's been decades now. The pointer provenance work > seeks to apply aliasing rules even more aggressively. GCC and clang are > both pursuing faster codegen via exploiting undefined behaviour. > > C, the WG14 ISO defined language, as implemented by the primary open source > toolchains, is thus unfit for my purposes. I'm not clear what use that > language has. It seems to be very popular, so many people find it fit for their purposes. (I certainly find it, along with C++, a good fit for my low-level small-systems embedded programming, and I am quite happy with "UB optimisations" as you call them.) But some people don't like it, which is fair enough. And certainly no one thinks either the language or the tools are perfect. Some people want a language that is mostly like C, except for certain features - and accessing objects in memory using different pointer types is a common request. This is why both gcc and clang (and a few other compilers) have a flag that gives you this behaviour "-fno-strict-aliasing". I always find it ironic that the compilers that some people complain "doesn't do what I want" or "doesn't do what old compilers did" are precisely the compilers that give you these options. > > C, the typed assembler of ye olde times, is a profoundly useful language. It's a myth. It never existed. There has simply been a steady improvement in the optimisation of correct code as compilers have got more sophisticated. There are compilers that document and define behaviour for certain things that are undefined behaviour in the C standards, but I have never heard of a compiler that claims to understand the programmers' intentions even when they write incorrect code. C was designed from day one to be a high-level language, not an assembler of any sort. Limitations of weaker earlier compilers does not mean the language was supposed to work that way. I first used a C compiler that optimised on the assumption that UB didn't happen some 25 years ago. (In particular, it assumed signed integer arithmetic never overflowed.) > One just can't use GCC or clang to build it reliably. You mean newer tools treat your code bugs in different ways from older tools? There's a solution for that. > It annoys me intensely that the type aliasing rules capture something a > whole program optimising compiler can usually work out for itself anyway, > while preventing me from reading 128bit integers from the same memory I > fetch_add 32bit integers into. > It annoys /me/ intensely that people complain about this sort of thing, and yet apparently haven't bothered to read the compiler manuals to see how to get the effects they want. Compile with "-fno-strict-aliasing", or (better, IMHO) add this to your code: #pragma GCC optimize ("-fno-strict-aliasing") Now, if you want to complain that the gcc documentation is not great, or that flags like this should be documented along with the standards flags rather than optimisation flags, I'll happily agree. (I don't know if clang does better here.) But don't complain that the compiler is a problem. And there are other ways to handle this in gcc. Use "may_alias" types. Or use volatile accesses. Or use memcpy(). Or use unions. There are two /real/ problems here. One is that C is not, and never has been, the language that some people think it is - and thus they get frustrated when they find out there code is not as correct as they thought. A second is that there are weak compilers out there that on the one hand lull developers into a false understanding of the language due to their limited code optimisations, and on the other hand make safe alternatives such as "memcpy" highly inefficient on their tools. What this means is that different compilers, including gcc and clang, are perfectly capable of generating code that efficiently mixes accesses of different kinds to the same object. But the details of the code you write to get the effects are different - C is not as portable here as it should be. For code that needs to work well on multiple toolchains, you quickly end up with a header that has conditional compilation and macros that vary depending on the compiler in use. That is ugly and awkward, but I know of no better way.