Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.compilers
Subject: Re: what is defined, was for or against equality
Date: Sun, 9 Jan 2022 23:00:46 +0100
Organization: A noiseless patient Spider
Lines: 177
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-01-038@comp.compilers>
References: <17d70d74-1cf1-cc41-6b38-c0b307aeb35a@gkc.org.uk> <22-01-016@comp.compilers> <22-01-018@comp.compilers> <22-01-020@comp.compilers> <22-01-027@comp.compilers> <22-01-032@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="38539"; mail-complaints-to="abuse@iecc.com"
Keywords: Fortran, C, optimize, standards, comment
Posted-Date: 09 Jan 2022 17:44:26 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-01-032@comp.compilers>
Content-Language: en-GB
Xref: csiph.com comp.compilers:2814

On 08/01/2022 10:31, Thomas Koenig wrote:
> Spiros Bousbouras <spibou@gmail.com> schrieb:

>> This seems to me exactly like the C model. What difference do you see ?
>
> First, I see a difference in result.  Highly intelligent and
> knowledgable people argue vehemently if a program should be able
> to use undefined behavior or not, and lot of vitriol is directed
> against compiler writers who use the assumption that undefined
> behavior cannot happen in their compilers for optimization,
> especially if it turns out that existing code was broken and no
> longer works after a compiler upgrade (Just read a few of Linus
> Torvald's comments on that matter).

People want compilers to do what the programmer meant, not what he or
she wrote.  And in particular, if a compiler did one thing once, they
want it to continue to do the same thing with the same code - as long as
they got what they wanted the first time round.

This is, of course, entirely natural for humans.  But it is not natural
for computer programs like compilers.

Linus Torvald's is known for blowing his top on matters that he either
does not understand, or when he has mixed his personal opinions with
facts, or while only looking at a small part of the big picture.  (He is
also known as an incredible programmer, a world-class project leader,
and a charismatic visionary who revolutionised the software world - but
that's beside the point here!).

A key example of his complaints in this area revolve around a function
that was something equivalent to :

int foo(int * p) {
	int x = *p;
	if (!p) return -1;
	return x;
}

His complaint was that the compiler saw that "*p" was accessed, and
therefore assumed "p" could not be zero and optimised away the test.
The compiler did exactly what it was asked to do - the optimisation is
perfectly valid according to the C standards and additional definitions
given by the compiler.  But it was not what the programmer wanted, and
not what older versions of the compiler had done.

Of course, when a new optimisation simply makes object code more
efficient, programmers want that - they don't /always/ want the compiler
to handle things the way older versions did.  They want the compiler to
read their minds and see what they meant to write, and generate optimal
code for that.


None of this is helped by the fact that C code often has to work
efficiently on a variety of targets and compilers, and some compilers
give extra guarantees about how they interpret code beyond the
definitions given in the C standards.  Many more compilers can be relied
upon in practice to work in particular ways, though they don't guarantee
or document it, and this means the most efficient code that works in
practice on one compiler may be wrong and give incorrect results on
another compiler.  You can write C code that is correct and widely
portable, but you can't write C code that is correct, optimally
efficient, and widely portable.



The big question here, is why do you think Fortran is any different?  In
theory, there isn't a difference - nothing you have said here convinces
me that there is any fundamental difference between Fortran and C in
regards to undefined behaviour.  (And there's no difference in the
implementations - the most commonly used Fortran compilers also handle
C, C++, and perhaps other languages.)

I believe it is a matter of who writes Fortran programs, and what these
programs do.  Now, I don't know or use Fortran myself, so I might be
wrong here.  However, it seems to me that Fortran is typically used by
experienced professional programmers and for scientific or numerical
programming.  C is used by a much wider range of programmers, for a much
wider range of programming tasks.  I think it is inevitable that you'll
get more people programming in C when they are not fully sure of what
they are doing, more code where subtle mistakes can be made, more people
using C when other languages would have been better choices, and more C
programmers who are likely to blame their tools for their own mistakes.



>
> I see C conflating two separate concepts:  Programm errors and
> behavior that is outside the standard.  "Undefined behavior is
> always a programming error" does not work; that would make
>
> #include <unistd.h>
> #include <string.h>
>
> int main()
> {
>   char a[] = "Hello, world!\n";
>   write (1, a, strlen(a));
>   return 0;
> }
>

C does not have a "write" function in the standard library.  So the
behaviour of "write" is not defined by the C standards - but that does
not mean the behaviour is undefined.  It just means it is defined
elsewhere, not in the C standards.  If the programmer doesn't know what
the "write" function does or how it is specified, then it might be
undefined behaviour - certainly it is bad programming.


> not more and not less erroneous than
>
> int main()
> {
>   int *p = 0;
>   *p = 42;
> }
>
> whereas I would argue that there is an important difference between
> the two.
>

There is no fundamental difference - if you know the behaviour is
defined, it is defined.  (The program is then correct or incorrect
depending on how that definition matches your requirements.)  If not, it
is undefined (and incorrect).  In neither case is the behaviour defined
by the C standard, but the behaviour could be defined by something else
(library documentation or external definition of "write", or a C
compiler that specifically says it defines the behaviour of
dereferencing null pointers).

> If the C standard replaced "the behavior is undefined" with "the
> program is in error, and the subsequent behavior is undefined"
> or something along those lines, the discussion would be much
> muted.
>

That sounds like you dislike the "time travel" aspect of C's undefined
behaviour.  Many would agree with that - they don't like the idea that
undefined behaviour later in the program can be used to change the
behaviour of code earlier on.  The C standard considers undefined
behaviour to be program-wide - if you execute something that has
undefined behaviour (remembering that this means there is no definition
/anywhere/ of what will happen), the whole program is wrong and you
can't expect anything from it.

People often find this disturbing.  They think perhaps it is fair enough
that dereferencing a null pointer can crash a program, but it shouldn't
affect things that came before it.

However, there are two key points to think about.  First, the standards
handling of undefined behaviour means that a compiler /can/ use UB to
change the object code generated for earlier source code, not that it
/must/ do so.  A compiler always balances efficient code generation with
ease-of-use and ease-of-debugging.  The ideal balance point will depend
on the programmer writing the code, so compiler flags are used to tune
it, but surprises can still happen.

The other point is to consider how the standards could say anything
else.  If the standards required observable behaviour to be completed
before undefined behaviour occurred, the results would be terrible.
Dereferencing a null pointer or dividing by zero could cause a complete
crash (remember the "Windows for Warships" affair?  A single divide by
zero brought the whole ship network down, leaving it dead in the water
for hours).  That means the compiler would need to make sure any
volatile writes had hit main memory before reading a pointer.  It would
have to ensure all file stream buffers were flushed to disk before doing
a division.  You can be sure Linus Torvalds would have a thing or two to
say about such a compiler.

> (Somebody may point out to me that this what the standard is
> actually saying.  If so, that would sort of reinforce my argument
> that it should be clearer :-)
[Fortran has in principle historically allowed rather aggressive optimization,
e.g., A*B+A*C can turn into A*(B+C).  On the other hand, in the real world,
when IBM improved their optimizing compiler Fortran H into Fortran X, the
developers said any new optimization had to produce bit identical results
to what the old compiler did.  So this is not a new issue. -John]