Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: David Brown Newsgroups: comp.compilers Subject: Re: what is defined, was for or against equality Date: Sun, 9 Jan 2022 23:00:46 +0100 Organization: A noiseless patient Spider Lines: 177 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-01-038@comp.compilers> References: <17d70d74-1cf1-cc41-6b38-c0b307aeb35a@gkc.org.uk> <22-01-016@comp.compilers> <22-01-018@comp.compilers> <22-01-020@comp.compilers> <22-01-027@comp.compilers> <22-01-032@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="38539"; mail-complaints-to="abuse@iecc.com" Keywords: Fortran, C, optimize, standards, comment Posted-Date: 09 Jan 2022 17:44:26 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-01-032@comp.compilers> Content-Language: en-GB Xref: csiph.com comp.compilers:2814 On 08/01/2022 10:31, Thomas Koenig wrote: > Spiros Bousbouras schrieb: >> This seems to me exactly like the C model. What difference do you see ? > > First, I see a difference in result. Highly intelligent and > knowledgable people argue vehemently if a program should be able > to use undefined behavior or not, and lot of vitriol is directed > against compiler writers who use the assumption that undefined > behavior cannot happen in their compilers for optimization, > especially if it turns out that existing code was broken and no > longer works after a compiler upgrade (Just read a few of Linus > Torvald's comments on that matter). People want compilers to do what the programmer meant, not what he or she wrote. And in particular, if a compiler did one thing once, they want it to continue to do the same thing with the same code - as long as they got what they wanted the first time round. This is, of course, entirely natural for humans. But it is not natural for computer programs like compilers. Linus Torvald's is known for blowing his top on matters that he either does not understand, or when he has mixed his personal opinions with facts, or while only looking at a small part of the big picture. (He is also known as an incredible programmer, a world-class project leader, and a charismatic visionary who revolutionised the software world - but that's beside the point here!). A key example of his complaints in this area revolve around a function that was something equivalent to : int foo(int * p) { int x = *p; if (!p) return -1; return x; } His complaint was that the compiler saw that "*p" was accessed, and therefore assumed "p" could not be zero and optimised away the test. The compiler did exactly what it was asked to do - the optimisation is perfectly valid according to the C standards and additional definitions given by the compiler. But it was not what the programmer wanted, and not what older versions of the compiler had done. Of course, when a new optimisation simply makes object code more efficient, programmers want that - they don't /always/ want the compiler to handle things the way older versions did. They want the compiler to read their minds and see what they meant to write, and generate optimal code for that. None of this is helped by the fact that C code often has to work efficiently on a variety of targets and compilers, and some compilers give extra guarantees about how they interpret code beyond the definitions given in the C standards. Many more compilers can be relied upon in practice to work in particular ways, though they don't guarantee or document it, and this means the most efficient code that works in practice on one compiler may be wrong and give incorrect results on another compiler. You can write C code that is correct and widely portable, but you can't write C code that is correct, optimally efficient, and widely portable. The big question here, is why do you think Fortran is any different? In theory, there isn't a difference - nothing you have said here convinces me that there is any fundamental difference between Fortran and C in regards to undefined behaviour. (And there's no difference in the implementations - the most commonly used Fortran compilers also handle C, C++, and perhaps other languages.) I believe it is a matter of who writes Fortran programs, and what these programs do. Now, I don't know or use Fortran myself, so I might be wrong here. However, it seems to me that Fortran is typically used by experienced professional programmers and for scientific or numerical programming. C is used by a much wider range of programmers, for a much wider range of programming tasks. I think it is inevitable that you'll get more people programming in C when they are not fully sure of what they are doing, more code where subtle mistakes can be made, more people using C when other languages would have been better choices, and more C programmers who are likely to blame their tools for their own mistakes. > > I see C conflating two separate concepts: Programm errors and > behavior that is outside the standard. "Undefined behavior is > always a programming error" does not work; that would make > > #include > #include > > int main() > { > char a[] = "Hello, world!\n"; > write (1, a, strlen(a)); > return 0; > } > C does not have a "write" function in the standard library. So the behaviour of "write" is not defined by the C standards - but that does not mean the behaviour is undefined. It just means it is defined elsewhere, not in the C standards. If the programmer doesn't know what the "write" function does or how it is specified, then it might be undefined behaviour - certainly it is bad programming. > not more and not less erroneous than > > int main() > { > int *p = 0; > *p = 42; > } > > whereas I would argue that there is an important difference between > the two. > There is no fundamental difference - if you know the behaviour is defined, it is defined. (The program is then correct or incorrect depending on how that definition matches your requirements.) If not, it is undefined (and incorrect). In neither case is the behaviour defined by the C standard, but the behaviour could be defined by something else (library documentation or external definition of "write", or a C compiler that specifically says it defines the behaviour of dereferencing null pointers). > If the C standard replaced "the behavior is undefined" with "the > program is in error, and the subsequent behavior is undefined" > or something along those lines, the discussion would be much > muted. > That sounds like you dislike the "time travel" aspect of C's undefined behaviour. Many would agree with that - they don't like the idea that undefined behaviour later in the program can be used to change the behaviour of code earlier on. The C standard considers undefined behaviour to be program-wide - if you execute something that has undefined behaviour (remembering that this means there is no definition /anywhere/ of what will happen), the whole program is wrong and you can't expect anything from it. People often find this disturbing. They think perhaps it is fair enough that dereferencing a null pointer can crash a program, but it shouldn't affect things that came before it. However, there are two key points to think about. First, the standards handling of undefined behaviour means that a compiler /can/ use UB to change the object code generated for earlier source code, not that it /must/ do so. A compiler always balances efficient code generation with ease-of-use and ease-of-debugging. The ideal balance point will depend on the programmer writing the code, so compiler flags are used to tune it, but surprises can still happen. The other point is to consider how the standards could say anything else. If the standards required observable behaviour to be completed before undefined behaviour occurred, the results would be terrible. Dereferencing a null pointer or dividing by zero could cause a complete crash (remember the "Windows for Warships" affair? A single divide by zero brought the whole ship network down, leaving it dead in the water for hours). That means the compiler would need to make sure any volatile writes had hit main memory before reading a pointer. It would have to ensure all file stream buffers were flushed to disk before doing a division. You can be sure Linus Torvalds would have a thing or two to say about such a compiler. > (Somebody may point out to me that this what the standard is > actually saying. If so, that would sort of reinforce my argument > that it should be clearer :-) [Fortran has in principle historically allowed rather aggressive optimization, e.g., A*B+A*C can turn into A*(B+C). On the other hand, in the real world, when IBM improved their optimizing compiler Fortran H into Fortran X, the developers said any new optimization had to produce bit identical results to what the old compiler did. So this is not a new issue. -John]