Groups > comp.compilers > #3676 > unrolled thread

Undefined behaviour in C23

Started by	Martin Ward <mwardgkc@gmail.com>
First post	2025-08-20 14:06 +0100
Last post	2025-09-06 17:15 +0000
Articles	16 — 7 participants

Back to article view | Back to comp.compilers

  Undefined behaviour in C23 Martin Ward <mwardgkc@gmail.com> - 2025-08-20 14:06 +0100
    Re: Undefined behaviour in C23 Kaz Kylheku <643-408-1753@kylheku.com> - 2025-08-20 18:33 +0000
      Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-21 15:02 +0200
        Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-21 12:53 -0700
          Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-22 17:58 +0200
            Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-08-22 17:16 +0000
            Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-22 15:11 -0700
              Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-23 16:55 +0200
                Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-23 15:58 -0700
                Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-25 22:13 -0400
                  Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-26 13:41 -0400
                    Re: Undefined behaviour in C23 Michael S <already5chosen@yahoo.com.dmarc.email> - 2025-08-26 22:28 +0300
                      Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-26 16:53 -0400
    Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-08-21 05:44 +0000
      Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-22 18:42 +0200
        Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-09-06 17:15 +0000

#3676 — Undefined behaviour in C23

From	Martin Ward <mwardgkc@gmail.com>
Date	2025-08-20 14:06 +0100
Subject	Undefined behaviour in C23
Message-ID	<25-08-002@comp.compilers>

In the SEI CERT C Soding Standards we read:

"According to the C Standard, Annex J, J.2 [ISO/IEC 9899:2024],
the behavior of a program is undefined in the circumstances outlined
in the following table."

The table has 221 numbered cases and can be found here:

<https://wiki.sei.cmu.edu/confluence/display/c/CC.%2BUndefined%2BBehavior>

According to the C Standard Committee (paraphrasing) "You may eat
from any tree in the garden of coding, except for any of the 221
trees of undefined behaviour. If you eat from any of the 221 trees
of undefined behaviour your program may die, either immediately or at
some unspecified time in the future, or may do absolutely anything at
any future time. You must study the Book of the Knowledge of Defined
and Undefined (the 758 page C23 standard document) to learn exactly
how to recognise each of the 221 trees of undefined behaviour.
Please pay the cashier $250.00 to purchase a copy of the Book
of the Knowledge of Defined and Undefined".


\--
Martin

Dr Martin Ward | Email: [martin@gkc.org.uk](mailto:martin@gkc.org.uk) | <http://www.gkc.org.uk>
G.K.Chesterton site: <http://www.gkc.org.uk/gkc> | Erdos number: 4
[When a language is 50 years old and there is a mountain of legacy code that
they really don't want to break, it accumulates a lot of cruft. If we were
starting now we'd get something more like Go.

On the other hand, there's the python approach in which they deprecate and
remove little used and crufty features, but old python code doesn't work any
more unless you go back and update it every year or two. -John]

[toc] | [next] | [standalone]

#3677

From	Kaz Kylheku <643-408-1753@kylheku.com>
Date	2025-08-20 18:33 +0000
Message-ID	<25-08-003@comp.compilers>
In reply to	#3676

On 2025-08-20, Martin Ward <mwardgkc@gmail.com> wrote:
> In the SEI CERT C Soding Standards we read:
>
> "According to the C Standard, Annex J, J.2 [ISO/IEC 9899:2024],
> the behavior of a program is undefined in the circumstances outlined
> in the following table."
>
> The table has 221 numbered cases and can be found here:
>
><https://wiki.sei.cmu.edu/confluence/display/c/CC.%2BUndefined%2BBehavior>
>
> According to the C Standard Committee (paraphrasing) "You may eat
> from any tree in the garden of coding, except for any of the 221
> trees of undefined behaviour. If you eat from any of the 221 trees
> of undefined behaviour your program may die, either immediately or at
> some unspecified time in the future, or may do absolutely anything at
> any future time. You must study the Book of the Knowledge of Defined
> and Undefined (the 758 page C23 standard document) to learn exactly
> how to recognise each of the 221 trees of undefined behaviour.
> Please pay the cashier $250.00 to purchase a copy of the Book
> of the Knowledge of Defined and Undefined".

The list is incomplete.

For instance, the behavior is undefined when translation units are linked into a
program and contain the use of an external name that is neither defined by the
standard, nor by those translation units.

(Actual behavior may range from diagnosing an unresolved reference, to resolving
it to something in the implementation for which no requirements are given in the
standard.)

The behavior is undefined when #include directive is processed that resolves
neither to a standard header, nor to any of the files presented for processing
by the implementation.

(Actual behavior may be that the header doesn't resole to anything, which is a
constraint violation. Or it may resolve to something in the implementation which
replaces it with a sequence of tokens, for the nature of which ISO C imposes no
requirements.)

Any situation in the standard whereby we are not able to deduce the requirements
for a given program or construct is undefined behavior.

#include <windows.h> is undefined behavior.

fdopen(0, "r") is undefined behavior.

We can make a C impementation in which #include <windows.h> recursively deletes
the current directory, right there at at compile time, and that implementation
couldn't be called nonconforming because of that. The #include <windows.h>
construct can have all the consequences that are given in the definition of
undefined behavior: terminating translation or execution with or without a
diagnostic message, behaving unpredictably, or in a documented manner
characteristic of the implementation.

All platform specific headers and functions are effectively documented
extensions replacing undefined behavior, which another impelmentation could
neglect to define, or define arbitrarily (including in evil ways).

Once you grok the fact that almost real work in C takes place via undefined
behavior (very few programs are maximally portable and strictly conforming) you
stop sweating it.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#3679

From	David Brown <david.brown@hesbynett.no>
Date	2025-08-21 15:02 +0200
Message-ID	<25-08-005@comp.compilers>
In reply to	#3677

On 20/08/2025 20:33, Kaz Kylheku wrote:
> On 2025-08-20, Martin Ward <mwardgkc@gmail.com> wrote:
>> In the SEI CERT C Soding Standards we read:
>>
>> "According to the C Standard, Annex J, J.2 [ISO/IEC 9899:2024],
>> the behavior of a program is undefined in the circumstances outlined
>> in the following table."
>>
>> The table has 221 numbered cases and can be found here:
>>
>> <https://wiki.sei.cmu.edu/confluence/display/c/CC.%2BUndefined%2BBehavior>
>>
>> According to the C Standard Committee (paraphrasing) "You may eat
>> from any tree in the garden of coding, except for any of the 221
>> trees of undefined behaviour. If you eat from any of the 221 trees
>> of undefined behaviour your program may die, either immediately or at
>> some unspecified time in the future, or may do absolutely anything at
>> any future time. You must study the Book of the Knowledge of Defined
>> and Undefined (the 758 page C23 standard document) to learn exactly
>> how to recognise each of the 221 trees of undefined behaviour.
>> Please pay the cashier $250.00 to purchase a copy of the Book
>> of the Knowledge of Defined and Undefined".
>
> The list is incomplete.
>

Under "4. Conformance", the C standards says :

"""
If a "shall" or "shall not" requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is undefined.
  Undefined behaviour is otherwise indicated in this International
Standard by the words "undefined behavior" or by the omission of any
explicit definition of behavior.  There is no difference in emphasis
among these three; they all describe "behavior that is undefined".
"""

So no list could ever be complete here, since anything whose behaviour
is not defined in the C standards is undefined behaviour. I have
always found that slightly at odds with the definition under "3.
Terms, definitions, and symbols" of "behavior, upon use of a
nonportable or erroneous program construct or of erroneous data, for
which this International Standard imposes no requirements". In my
mind, things like externally defined functions (used correctly) could
be considered UB by the section 4 definitions but not by the section 3
definitions.

>
> All platform specific headers and functions are effectively documented
> extensions replacing undefined behavior, which another impelmentation could
> neglect to define, or define arbitrarily (including in evil ways).
>
> Once you grok the fact that almost real work in C takes place via undefined
> behavior (very few programs are maximally portable and strictly conforming) you
> stop sweating it.
>

Yes, indeed - though that is the "section 4" type of UB rather than
the "section 3" definition - erroneous code and data should of course
be avoided at all times.

People who think UB is a bad thing should spend a little time thinking
about the phrase "garbage in, garbage out".  If you write nonsensical
code, or give sensible code nonsensical data, you can't expect sensible
results.  This has been known since the dawn of programmable computers
(and comes from mathematics and the domains of functions):

"""
On two occasions I have been asked, – "Pray, Mr. Babbage, if you put
into the machine wrong figures, will the right answers come out?" ... I
am not able rightly to apprehend the kind of confusion of ideas that
could provoke such a question
"""

UB (both definitions) is an essential part of all programming languages
- after all, if you have a bug in your code, you have UB, and no
programming language has made it impossible to write bugs in your code.
C just has some things that are undefined in C but defined in some other
languages, and it is a bit more open and honest about UB than many
language definitions.

[toc] | [prev] | [next] | [standalone]

#3681

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2025-08-21 12:53 -0700
Message-ID	<25-08-007@comp.compilers>
In reply to	#3679

David Brown <david.brown@hesbynett.no> writes:
[...]
> Under "4. Conformance", the C standards says :
>
> """
> If a "shall" or "shall not" requirement that appears outside of a
> constraint or runtime-constraint is violated, the behavior is undefined.
>   Undefined behaviour is otherwise indicated in this International
> Standard by the words "undefined behavior" or by the omission of any
> explicit definition of behavior.  There is no difference in emphasis
> among these three; they all describe "behavior that is undefined".
> """
>
> So no list could ever be complete here, since anything whose behaviour
> is not defined in the C standards is undefined behaviour. I have
> always found that slightly at odds with the definition under "3.
> Terms, definitions, and symbols" of "behavior, upon use of a
> nonportable or erroneous program construct or of erroneous data, for
> which this International Standard imposes no requirements". In my
> mind, things like externally defined functions (used correctly) could
> be considered UB by the section 4 definitions but not by the section 3
> definitions.

I don't see an inconsistency.

A C program that includes a non-standard header that's not part of
the program (e.g., `#include <windows.h>`) and calls a function
declared in that header has undefined behavior as far as the C
standard is concerned.  The program could be compiled in a conforming
environment that has its own <windows.h> header with a declaration
for different implementation of the same name.

That's undefined behavior under both the section 3 definition (use
of a nonportable program construct) and the section 4 definition
(the omission of any explicit definition of behavior).

[...]

> UB (both definitions) is an essential part of all programming languages
> - after all, if you have a bug in your code, you have UB, and no
> programming language has made it impossible to write bugs in your code.
> C just has some things that are undefined in C but defined in some other
> languages, and it is a bit more open and honest about UB than many
> language definitions.

No, a bug in your code is not necessarily undefined behavior.  It could
easily be code whose behavior is well defined by the language standard,
but that behavior isn't what the programmer intended.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#3682

From	David Brown <david.brown@hesbynett.no>
Date	2025-08-22 17:58 +0200
Message-ID	<25-08-008@comp.compilers>
In reply to	#3681

On 21/08/2025 21:53, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
> [...]
>> Under "4. Conformance", the C standards says :
>>
>> """
>> If a "shall" or "shall not" requirement that appears outside of a
>> constraint or runtime-constraint is violated, the behavior is undefined.
>>    Undefined behaviour is otherwise indicated in this International
>> Standard by the words "undefined behavior" or by the omission of any
>> explicit definition of behavior.  There is no difference in emphasis
>> among these three; they all describe "behavior that is undefined".
>> """
>>
>> So no list could ever be complete here, since anything whose behaviour
>> is not defined in the C standards is undefined behaviour. I have
>> always found that slightly at odds with the definition under "3.
>> Terms, definitions, and symbols" of "behavior, upon use of a
>> nonportable or erroneous program construct or of erroneous data, for
>> which this International Standard imposes no requirements". In my
>> mind, things like externally defined functions (used correctly) could
>> be considered UB by the section 4 definitions but not by the section 3
>> definitions.
>
> I don't see an inconsistency.
>
> A C program that includes a non-standard header that's not part of
> the program (e.g., `#include <windows.h>`) and calls a function
> declared in that header has undefined behavior as far as the C
> standard is concerned.  The program could be compiled in a conforming
> environment that has its own <windows.h> header with a declaration
> for different implementation of the same name.
>
> That's undefined behavior under both the section 3 definition (use
> of a nonportable program construct) and the section 4 definition
> (the omission of any explicit definition of behavior).
>
> [...]

If you declare and call a function "foo" that is written in fully
portable C code, but not part of the current translation unit being
compiled (perhaps it has been separately compiled or included in a
library), then it would be UB by the section 4 definition (since the C
standards don't say anything about what "foo" does, nor does your code).

But the code that calls "foo" is portable and not erroneous, so it is
not UB by the section 3 definition.

Perhaps it is possible to argue that that it is UB by both definitions,
or defined behaviour by both definitions - with enough pushing and
squeezing of how the words of the standard are interpreted.  But it is,
at best, unclear and poorly explained - even if you don't agree that it
is inconsistent.

Add to that, the C standard has a specific term for features that are
non-portable but not undefined behaviour - "implementation-defined
behaviour".  Code that relies on "int" being 32-bit is not portable, but
it is not UB when compiled on implementations for which "int" /is/ 32-bit.

As I see it, there are (at least) two significantly different types of
UB in C, by the definitions in the standard.  If you call an external
non-portable function "foo" in your code, the compiler assumes that
doing so does not result in the execution of UB - despite that being
"type 4" UB.  On the other hand, if your code dereferences a pointer,
the compiler can assume that the pointer is not invalid - because
dereferencing an invalid pointer would be "type 3" UB.

>> UB (both definitions) is an essential part of all programming languages
>> - after all, if you have a bug in your code, you have UB, and no
>> programming language has made it impossible to write bugs in your code.
>> C just has some things that are undefined in C but defined in some other
>> languages, and it is a bit more open and honest about UB than many
>> language definitions.
>
> No, a bug in your code is not necessarily undefined behavior.  It could
> easily be code whose behavior is well defined by the language standard,
> but that behavior isn't what the programmer intended.

When I write code, /I/ define what the behaviour of the code should be.
A bug in the code means it is not acting according to my definitions -
it is UB.  It may still be acting according to the definitions of the C
abstract machine given in the C standards (you are correct there).  Even
if it has C-standard UB, it will still be acting according to the
definitions of the target machine's instruction set.  Behaviour is
defined on multiple levels, only one of which is the C standard.

[toc] | [prev] | [next] | [standalone]

#3684

From	anton@mips.complang.tuwien.ac.at
Date	2025-08-22 17:16 +0000
Message-ID	<25-08-010@comp.compilers>
In reply to	#3682

David Brown <david.brown@hesbynett.no> writes:
>On 21/08/2025 21:53, Keith Thompson wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>> UB (both definitions) is an essential part of all programming languages
>>> - after all, if you have a bug in your code, you have UB, and no
>>> programming language has made it impossible to write bugs in your code.
>>> C just has some things that are undefined in C but defined in some other
>>> languages, and it is a bit more open and honest about UB than many
>>> language definitions.
>>
>> No, a bug in your code is not necessarily undefined behavior.  It could
>> easily be code whose behavior is well defined by the language standard,
>> but that behavior isn't what the programmer intended.
>
>
>When I write code, /I/ define what the behaviour of the code should be.
>A bug in the code means it is not acting according to my definitions -
>it is UB.

Yes, Humpty Dumpty.  Meanwhile, the rest of the world says that a
program exercises undefined behaviour when the *programming language*
specification does not define what the program does (or explicitly
undefines it), and calls programs incorrect (or, colloquially, buggy)
that do not behave as the *program* specification or requirements
demand; a buggy program may be defined according to the programming
language specification; e.g., I believe that the following program is
defined according to one of the C standards:

#include <stdio.h>

int main(void)
{
  puts("B");
}

If the specification of the program is to print "A" followed by a
newline, the program is incorrect, even though its behaviour is
defined.

A correct program may be undefined according to the programming
language specification (see Kaz Kylheku's examples).

Which brings us to your claim:
>>> UB (both definitions) is an essential part of all programming languages

This is nonsense.  If the programming language defines the behaviour
of all programs it accepts, and rejects all the others, it does not
have undefined behaviour.  As an example, Java tries to live up to an
even higher standard (no implementation-dependent behaviour aka "write
once, run everywhere"), but I am not sure if it succeeds in that.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#3685

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2025-08-22 15:11 -0700
Message-ID	<25-08-011@comp.compilers>
In reply to	#3682

comp.lang.c would probably be a better place for this discussion,
but cross-posting between moderated and unmoderated newsgroups is
likely to cause problems.

David Brown <david.brown@hesbynett.no> writes:
> On 21/08/2025 21:53, Keith Thompson wrote:
[...]
> If you declare and call a function "foo" that is written in fully
> portable C code, but not part of the current translation unit being
> compiled (perhaps it has been separately compiled or included in a
> library), then it would be UB by the section 4 definition (since the C
> standards don't say anything about what "foo" does, nor does your code).

If the translation unit that defined "foo" is part of your program, then
your code *does* define its behavior.  Linking multiple translation
units into a program is specified by the C standard; it's translation
phase 8.

> But the code that calls "foo" is portable and not erroneous, so it is
> not UB by the section 3 definition.

If "foo" is defined by your program, either in the current
translation unit or in another one, the call is well defined
(assuming "foo" doesn't do something silly like dividing by zero).
If "foo" is defined outside your program, the C standard has nothing
to say about it.  It could even be implemented in a language other
than C.

The *behavior* of such a call is not portable.  (And the execution of
such a call is definitely undefined behavior if the visible declaration
is inconsistent with the definition.)

The section 3 definition of "undefined behavior" is a bit informal.
It's not clear what it means by "erroneous", for example.  Section
4 is more precise, and states that UB can be indicated "by the
omission of any explicit definition of behavior" (in the standard).
The standard omits any definition of the behavior of foo().

[...]

> Add to that, the C standard has a specific term for features that are
> non-portable but not undefined behaviour - "implementation-defined
> behaviour".  Code that relies on "int" being 32-bit is not portable, but
> it is not UB when compiled on implementations for which "int" /is/ 32-bit.

That's not what "implementation-defined behavior" means in C.
Cases of implementation-defined behavior are explicitly called out in
the standard, and an implementation must document how it treats each
instance of implementation-defined behavior.  Each implementation
must document the range of int.  There is no such requirement for
the behavior of "foo" defined in some non-standard header.

>> No, a bug in your code is not necessarily undefined behavior.  It could
>> easily be code whose behavior is well defined by the language standard,
>> but that behavior isn't what the programmer intended.
>
> When I write code, /I/ define what the behaviour of the code should be.
> A bug in the code means it is not acting according to my definitions -
> it is UB.  It may still be acting according to the definitions of the C
> abstract machine given in the C standards (you are correct there).  Even
> if it has C-standard UB, it will still be acting according to the
> definitions of the target machine's instruction set.  Behaviour is
> defined on multiple levels, only one of which is the C standard.

"Undefined behavior" is a technical term defined by the C standard.
It's not just behavior that is not defined.  It is behavior that
is not defined *by the C standard*.  If I write printf("goodbye\n")
when I meant to write printf("hello\n"), that's incorrect behavior,
but it's not undefined behavior.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#3686

From	David Brown <david.brown@hesbynett.no>
Date	2025-08-23 16:55 +0200
Message-ID	<25-08-012@comp.compilers>
In reply to	#3685

On 23/08/2025 00:11, Keith Thompson wrote:
> comp.lang.c would probably be a better place for this discussion,
> but cross-posting between moderated and unmoderated newsgroups is
> likely to cause problems.

Yes - but some comments have also wandered slightly from being just
applicable to C.  Still, it is not really a compiler discussion.

[FYI, cross-posting to comp.compilers and other groups works because
your moderator's scripts know how to handle it. -John]

> David Brown <david.brown@hesbynett.no> writes:
>> On 21/08/2025 21:53, Keith Thompson wrote:
> [...]
>> If you declare and call a function "foo" that is written in fully
>> portable C code, but not part of the current translation unit being
>> compiled (perhaps it has been separately compiled or included in a
>> library), then it would be UB by the section 4 definition (since the C
>> standards don't say anything about what "foo" does, nor does your code).
>
> If the translation unit that defined "foo" is part of your program, then
> your code *does* define its behavior.  Linking multiple translation
> units into a program is specified by the C standard; it's translation
> phase 8.

No.

The C standard does not define how this linking or combing is done - it
only covers certain specific aspects of the linking that relate directly
to C.  The behaviour of the function "foo" here is not defined in the C
standards, and if the source code is not available when translating a
different translation unit, the behaviour of "foo" is undefined.

>> But the code that calls "foo" is portable and not erroneous, so it is
>> not UB by the section 3 definition.
>
> If "foo" is defined by your program, either in the current
> translation unit or in another one, the call is well defined
> (assuming "foo" doesn't do something silly like dividing by zero).
> If "foo" is defined outside your program, the C standard has nothing
> to say about it.  It could even be implemented in a language other
> than C.
>
> The *behavior* of such a call is not portable.  (And the execution of
> such a call is definitely undefined behavior if the visible declaration
> is inconsistent with the definition.)

The C code being translated has code to call the function - the call is
defined (assuming declarations and definitions are consistent), but the
effect of the call is not defined - it is therefore UB.

>
> The section 3 definition of "undefined behavior" is a bit informal.

And yet it is in the section labelled "Terms, definitions and symbols".

> It's not clear what it means by "erroneous", for example.  Section
> 4 is more precise, and states that UB can be indicated "by the
> omission of any explicit definition of behavior" (in the standard).
> The standard omits any definition of the behavior of foo().
>

I agree that the definitions are somewhat vague and missing details, but
I also think they are somewhat inconsistent.

> [...]
>
>> Add to that, the C standard has a specific term for features that are
>> non-portable but not undefined behaviour - "implementation-defined
>> behaviour".  Code that relies on "int" being 32-bit is not portable, but
>> it is not UB when compiled on implementations for which "int" /is/ 32-bit.
>
> That's not what "implementation-defined behavior" means in C.
> Cases of implementation-defined behavior are explicitly called out in
> the standard, and an implementation must document how it treats each
> instance of implementation-defined behavior.  Each implementation
> must document the range of int.  There is no such requirement for
> the behavior of "foo" defined in some non-standard header.

Yes, exactly - implementation-defined behaviours are things that are not
portable, but are not undefined behaviour, because they must be defined
by the implementation.  (The C standard usually also gives some specific
options or minimum requirements for those definitions.)

>
>>> No, a bug in your code is not necessarily undefined behavior.  It could
>>> easily be code whose behavior is well defined by the language standard,
>>> but that behavior isn't what the programmer intended.
>>
>> When I write code, /I/ define what the behaviour of the code should be.
>> A bug in the code means it is not acting according to my definitions -
>> it is UB.  It may still be acting according to the definitions of the C
>> abstract machine given in the C standards (you are correct there).  Even
>> if it has C-standard UB, it will still be acting according to the
>> definitions of the target machine's instruction set.  Behaviour is
>> defined on multiple levels, only one of which is the C standard.
>
> "Undefined behavior" is a technical term defined by the C standard.
> It's not just behavior that is not defined.

Section 4 says precisely that behaviour that is not defined by the C
standard, is "undefined behaviour" in exactly the same way as things
that are explicitly labelled "undefined behaviour" in the standard.

And that, I think, is the root of the problem - the C standard is on the
one hand trying to classify, define and describe things as "undefined
behaviour" as a technical term in the C standard, while on the other
hand it is also trying to say these are things that have no definition
or descriptions of their behaviours.

> It is behavior that
> is not defined *by the C standard*.  If I write printf("goodbye\n")
> when I meant to write printf("hello\n"), that's incorrect behavior,
> but it's not undefined behavior.
>

I agree that it is not C undefined behaviour, yes.  But it can be
undefined behaviour at a higher level in the design and specification of
the program.  As I see it, programming is the process of taking higher
level specification of a task down through layers until you have
something that is executable on a computer - a bug is when the code at a
layer is not following the defined behaviour it should be following.  I
was perhaps not as clear as I should have been that I was not talking
only about C-level "undefined behaviour", as the term is defined
(approximately) in the C standards.

[toc] | [prev] | [next] | [standalone]

#3688

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2025-08-23 15:58 -0700
Message-ID	<25-08-014@comp.compilers>
In reply to	#3686

This discussion started in comp.compilers (which is moderated).
I've cross-posted to comp.lang.c and set followups there.

I've snipped some context.

David Brown <david.brown@hesbynett.no> writes:
> On 23/08/2025 00:11, Keith Thompson wrote:
>> comp.lang.c would probably be a better place for this discussion,
>> but cross-posting between moderated and unmoderated newsgroups is
>> likely to cause problems.
>
> Yes - but some comments have also wandered slightly from being just
> applicable to C.  Still, it is not really a compiler discussion.
>
> [FYI, cross-posting to comp.compilers and other groups works because
> your moderator's scripts know how to handle it. -John]
>
>> David Brown <david.brown@hesbynett.no> writes:
>>> On 21/08/2025 21:53, Keith Thompson wrote:
>> [...]
>>> If you declare and call a function "foo" that is written in fully
>>> portable C code, but not part of the current translation unit being
>>> compiled (perhaps it has been separately compiled or included in a
>>> library), then it would be UB by the section 4 definition (since the C
>>> standards don't say anything about what "foo" does, nor does your code).
>>
>> If the translation unit that defined "foo" is part of your program, then
>> your code *does* define its behavior.  Linking multiple translation
>> units into a program is specified by the C standard; it's translation
>> phase 8.
>
> No.

Yes.

> The C standard does not define how this linking or combing is done - it
> only covers certain specific aspects of the linking that relate directly
> to C.  The behaviour of the function "foo" here is not defined in the C
> standards, and if the source code is not available when translating a
> different translation unit, the behaviour of "foo" is undefined.

It doesn't matter how linking is done, only that it follows the
semantics of C.  The function "foo" is *part of the program*.
The fact that it's in a different translation unit doesn't affect
its semantics.

A C program can be made up of multiple translation units.
The behavior of a call to foo() is the same whether it's defined
in the same translation unit or not.  Linking, translation phase 8,
ensures that that's the case.

[...]

> The C code being translated has code to call the function - the call is
> defined (assuming declarations and definitions are consistent), but the
> effect of the call is not defined - it is therefore UB.

Again, the effect of the call is specified by the definition of foo,
which is *part of your program*.

[...]

>>> Add to that, the C standard has a specific term for features that are
>>> non-portable but not undefined behaviour - "implementation-defined
>>> behaviour".  Code that relies on "int" being 32-bit is not portable, but
>>> it is not UB when compiled on implementations for which "int" /is/ 32-bit.
>>
>> That's not what "implementation-defined behavior" means in C.
>> Cases of implementation-defined behavior are explicitly called out in
>> the standard, and an implementation must document how it treats each
>> instance of implementation-defined behavior.  Each implementation
>> must document the range of int.  There is no such requirement for
>> the behavior of "foo" defined in some non-standard header.
>
> Yes, exactly - implementation-defined behaviours are things that are not
> portable, but are not undefined behaviour, because they must be defined
> by the implementation.  (The C standard usually also gives some specific
> options or minimum requirements for those definitions.)

You're ignoring two critical requirements for behavior to be
implementation-defined: it has to be specified by the standard
as implementation-defined (unlike undefined behavior, there is no
implicit implementation-defined behavior), and the implementation's
choice of behavior must be documented.  Neither of these applies
to a function defined outside your program (unless it's part of
the C standard library).

[...]

> Section 4 says precisely that behaviour that is not defined by the C
> standard, is "undefined behaviour" in exactly the same way as things
> that are explicitly labelled "undefined behaviour" in the standard.

Yes.

> And that, I think, is the root of the problem - the C standard is on the
> one hand trying to classify, define and describe things as "undefined
> behaviour" as a technical term in the C standard, while on the other
> hand it is also trying to say these are things that have no definition
> or descriptions of their behaviours.

I see no problem.  For any construct whose behavior is undefined, the
standard says that it provides no definition of its behavior.  This is
equally true whether the standard explicitly says the behavior is
undefined or it just omits any definition of the behavior.

>> It is behavior that
>> is not defined *by the C standard*.  If I write printf("goodbye\n")
>> when I meant to write printf("hello\n"), that's incorrect behavior,
>> but it's not undefined behavior.
>>
>
> I agree that it is not C undefined behaviour, yes.
[...]

Excellent.

It's entirely possible for a construct that has undefined behavior
according to the C standard to have its behavior defined elsewhere
(e.g., by POSIX).  When discussing C, using the phrase "undefined
behavior" with a meaning other than the C standard definition will
only cause confusion.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#3689

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2025-08-25 22:13 -0400
Message-ID	<25-08-015@comp.compilers>
In reply to	#3686

David Brown <david.brown@hesbynett.no> writes:
> On 23/08/2025 00:11, Keith Thompson wrote:
...
>> David Brown <david.brown@hesbynett.no> writes:
>>> On 21/08/2025 21:53, Keith Thompson wrote:
>> [...]
>>> If you declare and call a function "foo" that is written in fully
>>> portable C code, but not part of the current translation unit being
>>> compiled (perhaps it has been separately compiled or included in a
>>> library), then it would be UB by the section 4 definition (since the C
>>> standards don't say anything about what "foo" does, nor does your code).
...
> The C standard does not define how this linking or combing is done - it
> only covers certain specific aspects of the linking that relate directly
> to C.  The behaviour of the function "foo" here is not defined in the C
> standards, and if the source code is not available when translating a
> different translation unit, the behaviour of "foo" is undefined.

I remember having an immensely frustrating discussion on this issue a
couple of decades ago.
If foo was written in fully portable C code, then that C code enables
the C standard to define what the behavior of that code is. If you lose
your last copy of the source code, you cannot confirm what that defined
behavior should be, but the behavior remains defined by the code that
has since gone missing.
The absence of that source code will make it hard to determine whether
the module can be safely linked to other modules, or to determine what
the defined behavior of the linked program should be - but if the
missing code said the right things to give the combined program defined
behavior, the implementation is still required to generate that behavior.
Not being able to determine what the standard-defined behavior of a
program should be, is for practical purposes precisely as useless as if
the behavior were undefined - but that doesn't make the behavior undefined.
And for that reason, I don't see much point in continuing to debate this
point. The last time I debated it, the discussion went on for many
months, and was not at all illuminating.

[toc] | [prev] | [next] | [standalone]

#3690

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2025-08-26 13:41 -0400
Message-ID	<25-08-016@comp.compilers>
In reply to	#3689

On 2025-08-25 22:13, James Kuyper wrote:
...
> I remember having an immensely frustrating discussion on this issue a
> couple of decades ago.

The discussion was on comp.std.c, the Subject: was "clrsc and UB", and
my participation in the discussion started 2002-02-05.
[Yeah, it's not like this is a new topic. -John]

[toc] | [prev] | [next] | [standalone]

#3691

From	Michael S <already5chosen@yahoo.com.dmarc.email>
Date	2025-08-26 22:28 +0300
Message-ID	<25-08-017@comp.compilers>
In reply to	#3690

On Tue, 26 Aug 2025 13:41:14 -0400
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 2025-08-25 22:13, James Kuyper wrote:
> ...
> > I remember having an immensely frustrating discussion on this issue
> > a couple of decades ago.
>
> The discussion was on comp.std.c, the Subject: was "clrsc and UB", and
> my participation in the discussion started 2002-02-05.
> [Yeah, it's not like this is a new topic. -John]

Don't you mean "clrscr and UB" ?

[toc] | [prev] | [next] | [standalone]

#3693

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2025-08-26 16:53 -0400
Message-ID	<25-08-019@comp.compilers>
In reply to	#3691

On 2025-08-26 15:28, Michael S wrote:
> On Tue, 26 Aug 2025 13:41:14 -0400
> James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
...
>> The discussion was on comp.std.c, the Subject: was "clrsc and UB", and
>> my participation in the discussion started 2002-02-05.
>> [Yeah, it's not like this is a new topic. -John]
>
> Don't you mean "clrscr and UB" ?

I didn't think I could mess up something that short, so I typed it by
hand. :-(

[toc] | [prev] | [next] | [standalone]

#3678

From	anton@mips.complang.tuwien.ac.at
Date	2025-08-21 05:44 +0000
Message-ID	<25-08-004@comp.compilers>
In reply to	#3676

Martin Ward <mwardgkc@gmail.com> writes:
[actually, John Levine writes:]
>[When a language is 50 years old and there is a mountain of legacy code that
>they really don't want to break, it accumulates a lot of cruft.

But there is a very vocal group of people who argue that programs that
exercise undefined behaviour are already broken (and they often use
stronger words that that) and that compilers are allowed to (and
should) compile them to code that behaves differently than earlier
compilers that the new compiler supposedly is just a new version of.

So according to this argument, when something that the legacy code
does is declared as undefined behaviour, this breaks this program.

And the practice is that the people in C compiler maintenance reject
bug reports as RESOLVED INVALID when the code exercises undefined
behaviour, even when the code works as intended in earlier versions of
the compiler and when the breakage could be easily fixed (e.g., for
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804> and
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709> by using movdqu
instead of movdqa).

But they not always do so: The SATD function from the SPEC benchmark
464.h264ref exercises undefined behaviour, and a pre-release version
of gcc-4.8 generated code that did not behave as intended.  The
release version of gcc-4.8 compiled 464.h264ref as intended (but later
a similar case that was not in a SPEC program
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66875> was rejected as
RESOLVED INVALID).  When I brought this up, the reactions reached from
flat-out denial that it ever happened (despite it being widely
publicized <https://lwn.net/Articles/544123/>) through a claim that
the "optimization" turned out to have no benefit (and yet the similar
case mentioned above still was "optimized" in a later gcc version) to
a statement along the lines that 464.h264ref is a relevant benchmark.

The last reaction seems to be the most plausible to me.  The people
working on the optimizers tend to evaluate their performance on a
number of benchmarks, i.e., "relevant benchmarks", and of course these
benchmarks must be compiled as intended, so that's what happens.  My
guess is that "relevant benchmarks" are industry standard benchmarks
like SPEC, but also programs coming from paying customers.

They also have their test suites of programs for regression testing,
and any behavioural change in these programs that is visible in this
regression testing probably leads to applying the optimization in a
less aggressive way.

How do tests get added into the regression test suite?  Ideally, if
somebody reports a case where a program behaves in one way in an
earlier version of the same compiler and differently in a later
version, that program and its original behaviour should usually be
added to the test suite
<https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>, but in gcc
this does not happen (see the bug reports linked to above).
Apparently gcc has some other criteria for adding programs to the test
suite.

So, is C still usable when you do not maintain one of those programs
that are considered to be relevant by C compiler maintainers?  My
experience is that the amount of breakage for the code I maintain has
been almost non-existent in the last 15 years.  A big part of that is
that we use lots of flags to tell the compiler that certain behaviour
is defined even if the C standard does not define it.  Currently we
try the following flags with the versions of gcc or clang that support
them:

-fno-gcse -fcaller-saves -fno-defer-pop -fno-inline -fwrapv
-fchar-unsigned -fno-strict-aliasing -fno-cse-follow-jumps
-fno-reorder-blocks -fno-reorder-blocks-and-partition
-fno-toplevel-reorder -fno-trigraphs -falign-labels=1 -falign-loops=1
-falign-jumps=1 -fno-delete-null-pointer-checks -fcf-protection=none
-fno-tree-vectorize -mllvm=--tail-dup-indirect-size=0

Some of these flags just disable certain transformations; in those
cases there is no flag for defining the language in the way that our
program relies on, but only the optimization transforms it in a way
that is contrary to our intentions.  In other cases, in particular
-fno-tree-vectorize, using the flag just avoids slowdowns from the
"optimization".

Another big part of the lack of breakage experience is probably the
code in the regression tests of the compiler, whatever the criteria
are used for including this code.  I.e., our code rides in the
slipstream of this code.

>On the other hand, there's the python approach in which they deprecate and
>remove little used and crufty features, but old python code doesn't work any
>more unless you go back and update it every year or two. -John]

Is it so bad with Python?  From what I read, after the huge problems
that Python had with migrating the existing code base from Python2 to
Python3 (where Python3 was intentionally not backwards compatible with
Python2), they had decided not to make backwards-incompatible changes
to the language in the future.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[python may say they're not making backwards-incompatible changes
but 3.13 will delete some modules I use from the standard library.
It's not a huge problem to find copies of them and install them
locallb, but if I don't, my scripts will break. -John]

[toc] | [prev] | [next] | [standalone]

#3683

From	David Brown <david.brown@hesbynett.no>
Date	2025-08-22 18:42 +0200
Message-ID	<25-08-009@comp.compilers>
In reply to	#3678

On 21/08/2025 07:44, anton@mips.complang.tuwien.ac.at wrote:
> Martin Ward <mwardgkc@gmail.com> writes:
> [actually, John Levine writes:]
>> [When a language is 50 years old and there is a mountain of legacy code that
>> they really don't want to break, it accumulates a lot of cruft.
>
> But there is a very vocal group of people who argue that programs that
> exercise undefined behaviour are already broken (and they often use
> stronger words that that) and that compilers are allowed to (and
> should) compile them to code that behaves differently than earlier
> compilers that the new compiler supposedly is just a new version of.

Yes.

It is good that compilers often support ways to get the "old" behaviour
if the user wants.  But new compiler versions should not be held back by
the limitations of old compilers - that would stifle progress.  Imagine
if car manufacturers had to limit the speeds of new cars to 10 miles per
hour, because some drivers a century ago assumed that they could safely
put their foot flat on the accelerator without hitting the horse and
cart in front of them.

And also remember that broken code is not necessarily useless code.  For
programs of reasonable size, very few are completely bug-free.  And yet
we still manage to use them if they are good enough, despite being
imperfect.

>
> So according to this argument, when something that the legacy code
> does is declared as undefined behaviour, this breaks this program.
>
> And the practice is that the people in C compiler maintenance reject
> bug reports as RESOLVED INVALID when the code exercises undefined
> behaviour, even when the code works as intended in earlier versions of
> the compiler and when the breakage could be easily fixed (e.g., for
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804> and
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709> by using movdqu
> instead of movdqa).
>

I don't see any problem with these being marked as "resolved invalid".

There is definitely a challenge in writing C code that is maximally
efficient on a wide variety of compilers - old and new, powerfully
optimising and weakly optimising, and for different target
architectures.  C code can't always be written in an ideal and fully
portable way.  This can be handled by using abstractions, compiler
detection and conditional compilation for things like block copies - use
unaligned non-conforming large moves if you know it is safe on a
particular implementation, and fall back to safe but possibly slow
memcpy() (or memmove()) in general.  That kind of solution, of course,
has its own disadvantages in development time, code complexity, testing,
etc.  The C programming world is not perfect.

But the solution is certainly /not/ to say that people everyone correct
C code and compiling with high optimisations should get slower results
because someone else previously wrote code that made unwarranted and
unchecked assumptions about particular compilers and particular target
processors.

> But they not always do so: The SATD function from the SPEC benchmark
> 464.h264ref exercises undefined behaviour, and a pre-release version
> of gcc-4.8 generated code that did not behave as intended.  The
> release version of gcc-4.8 compiled 464.h264ref as intended (but later
> a similar case that was not in a SPEC program
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66875> was rejected as
> RESOLVED INVALID).

So the gcc developers made an exception for a particularly important,
useful and common case?  Again that doesn't sound unreasonable to me.
Sometimes there are trade-offs - some code is more important than other
code.  The C compiler development world is not perfect either.

> When I brought this up, the reactions reached from
> flat-out denial that it ever happened (despite it being widely
> publicized <https://lwn.net/Articles/544123/>) through a claim that
> the "optimization" turned out to have no benefit (and yet the similar
> case mentioned above still was "optimized" in a later gcc version) to
> a statement along the lines that 464.h264ref is a relevant benchmark.
>

Maybe this particular case was handled badly, or at least the
communications involved were bad.  It was over a decade ago, in a
pre-release candidate compiler.  (Pre-release candidates are used
precisely to check if changes cause trouble with real-world code.)  How
long are you going to hold a grudge about this?

> The last reaction seems to be the most plausible to me.  The people
> working on the optimizers tend to evaluate their performance on a
> number of benchmarks, i.e., "relevant benchmarks", and of course these
> benchmarks must be compiled as intended, so that's what happens.  My
> guess is that "relevant benchmarks" are industry standard benchmarks
> like SPEC, but also programs coming from paying customers.
>

For gcc, a typical benchmark is the Linux kernel and a wide selection of
open source programs - people do full rebuilds of whole Linux
distributions as part of testing before full compiler releases.  SPEC is
a special case, because these programs are used on a wide variety of C
compilers for comparison between toolchains, not just for one compiler.

> They also have their test suites of programs for regression testing,
> and any behavioural change in these programs that is visible in this
> regression testing probably leads to applying the optimization in a
> less aggressive way.
>

I would assume that they try to avoid UB in their test suite code
(though of course gcc developers can have bugs and mistakes like anyone
else).  Sometimes test suite code is fixed when new bugs are found in it.

> How do tests get added into the regression test suite?  Ideally, if
> somebody reports a case where a program behaves in one way in an
> earlier version of the same compiler and differently in a later
> version, that program and its original behaviour should usually be
> added to the test suite
> <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>, but in gcc
> this does not happen (see the bug reports linked to above).

In what bizarre world would that be "ideal" ?

If you want gcc 4.8 without tree optimisations, you can get it.  If you
want to use gcc 15 and not enable tree optimisations, that's fine too.
If you want to write code that can be highly optimised with automatic
generation of vector instructions that only work on aligned data, don't
faff around going out of your way to write bad C code that messes with
pointer types to create unaligned accesses.  It's your choice.  Let the
rest of us that get our data alignment correct (and you get that
naturally in C - you only get the UB if you've played silly buggers with
pointer casts to write "smart" code) get faster results.

Examples can be added to the test suite if they are useful.  They can be
added to test new features - there is no point in a test to see if the
compiler generates code that matches an old compiler version unless
there is a specific new feature flag to give defined semantics matching
the old behaviour.  (An example of that would be the
"-fno-delete-null-pointer-check" flag.)

> Apparently gcc has some other criteria for adding programs to the test
> suite.
>
> So, is C still usable when you do not maintain one of those programs
> that are considered to be relevant by C compiler maintainers?  My
> experience is that the amount of breakage for the code I maintain has
> been almost non-existent in the last 15 years.  A big part of that is
> that we use lots of flags to tell the compiler that certain behaviour
> is defined even if the C standard does not define it.

That sounds like you have a solution to your problem.

> Currently we
> try the following flags with the versions of gcc or clang that support
> them:
>
> -fno-gcse -fcaller-saves -fno-defer-pop -fno-inline -fwrapv
> -fchar-unsigned -fno-strict-aliasing -fno-cse-follow-jumps
> -fno-reorder-blocks -fno-reorder-blocks-and-partition
> -fno-toplevel-reorder -fno-trigraphs -falign-labels=1 -falign-loops=1
> -falign-jumps=1 -fno-delete-null-pointer-checks -fcf-protection=none
> -fno-tree-vectorize -mllvm=--tail-dup-indirect-size=0
>

That sounds like you have code that uses a great deal of UB and relies
on a wide range of very specific code generation and semantics that are
not defined anywhere, in the C standards or compiler documentation.
Maybe that's what you need for your projects, and if it works for you,
fine.  But it is a very unusual situation, and cannot be extrapolated to
a non-negligible amount of C code.  (Some flags, such as "-fwrapv" and
"-fno-strict-aliasing", are needed to counter unwarranted assumptions in
a more significant body of existing C code.)

> Some of these flags just disable certain transformations; in those
> cases there is no flag for defining the language in the way that our
> program relies on, but only the optimization transforms it in a way
> that is contrary to our intentions.  In other cases, in particular
> -fno-tree-vectorize, using the flag just avoids slowdowns from the
> "optimization".

You know better than the solid majority of programmers that
"optimisation" is as much an art as a science, and that getting the best
from a combination of code, compiler and target processor is no simple
task.  Compilers enable optimisations in groups (-O1, -O2, -O3, etc.)
based on what usually gives better results for a range of code bases and
a range of target devices - there are no guarantees for any particular
combination.

>
> Another big part of the lack of breakage experience is probably the
> code in the regression tests of the compiler, whatever the criteria
> are used for including this code.  I.e., our code rides in the
> slipstream of this code.
>
>> On the other hand, there's the python approach in which they deprecate and
>> remove little used and crufty features, but old python code doesn't work any
>> more unless you go back and update it every year or two. -John]
>
> Is it so bad with Python?  From what I read, after the huge problems
> that Python had with migrating the existing code base from Python2 to
> Python3 (where Python3 was intentionally not backwards compatible with
> Python2), they had decided not to make backwards-incompatible changes
> to the language in the future.
>

IME there are only sometimes issues with new Python versions, but the
Python 2 to 3 incompatibilities are still a widespread problem.

[toc] | [prev] | [next] | [standalone]

#3697

From	anton@mips.complang.tuwien.ac.at
Date	2025-09-06 17:15 +0000
Message-ID	<25-09-003@comp.compilers>
In reply to	#3683

David Brown <david.brown@hesbynett.no> writes:
>On 21/08/2025 07:44, anton@mips.complang.tuwien.ac.at wrote:
>> Martin Ward <mwardgkc@gmail.com> writes:
>Imagine
>if car manufacturers had to limit the speeds of new cars to 10 miles per
>hour, because some drivers a century ago assumed that they could safely
>put their foot flat on the accelerator without hitting the horse and
>cart in front of them.

The latter assumption is wrong even with 10mph. If cars had been limited to
10mph, that would hopefully have prevented the kind of "progress" that is taking
>1M lives per year, every year. But that's a different discussion.

>> And the practice is that the people in C compiler maintenance reject
>> bug reports as RESOLVED INVALID when the code exercises undefined
>> behaviour, even when the code works as intended in earlier versions of
>> the compiler and when the breakage could be easily fixed (e.g., for
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804> and
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709> by using movdqu
>> instead of movdqa).
...
>But the solution is certainly /not/ to say that people everyone correct
>C code and compiling with high optimisations should get slower results
>because someone else previously wrote code that made unwarranted and
>unchecked assumptions about particular compilers and particular target
>processors.

Ah, yes, that claim, as usual without empirical support. I actually measured it
for such a claim made in
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709#c10>, and found that there
is no performance advantage on K10 and Core 2 (CPUs that were claimed to benefit
from movdqa), nor on Sandy Bridge, Haswell, or Skylake from using movdqa instead
of movdqu. The biggest speed difference in favour of MOVDQA was a factor 1.0014
on Core 2, but there it would have been better to just use scalar code. Read all
about it at <http://www.complang.tuwien.ac.at/anton/autovectors/>.

>> But they not always do so: The SATD function from the SPEC benchmark
>> 464.h264ref exercises undefined behaviour, and a pre-release version
>> of gcc-4.8 generated code that did not behave as intended.  The
>> release version of gcc-4.8 compiled 464.h264ref as intended (but later
>> a similar case that was not in a SPEC program
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66875> was rejected as
>> RESOLVED INVALID).
>
>So the gcc developers made an exception for a particularly important,
>useful and common case?

No, they made an exception for a benchmark.

>> When I brought this up, the reactions reached from
>> flat-out denial that it ever happened (despite it being widely
>> publicized <https://lwn.net/Articles/544123/>) through a claim that
>> the "optimization" turned out to have no benefit (and yet the similar
>> case mentioned above still was "optimized" in a later gcc version) to
>> a statement along the lines that 464.h264ref is a relevant benchmark.
>>
>
>Maybe this particular case was handled badly, or at least the
>communications involved were bad.  It was over a decade ago, in a
>pre-release candidate compiler.  (Pre-release candidates are used
>precisely to check if changes cause trouble with real-world code.)  How
>long are you going to hold a grudge about this?

Have they changed their attitude since then? If not, why should it it matter
that this was over ten years ago?

>> They also have their test suites of programs for regression testing,
>> and any behavioural change in these programs that is visible in this
>> regression testing probably leads to applying the optimization in a
>> less aggressive way.
>>
>
>I would assume that they try to avoid UB in their test suite code
>(though of course gcc developers can have bugs and mistakes like anyone
>else).

Throwing out all programs with undefined behaviour from their test suite would
probably reduce the test suite a lot, and would ensure that regressions like not
compiling the Linux kernel as intended would reappear all the time.

>> How do tests get added into the regression test suite?  Ideally, if
>> somebody reports a case where a program behaves in one way in an
>> earlier version of the same compiler and differently in a later
>> version, that program and its original behaviour should usually be
>> added to the test suite
>> <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>, but in gcc
>> this does not happen (see the bug reports linked to above).
>
>In what bizarre world would that be "ideal" ?

In a world where an existing program that works as intended on one version of
the compiler is expected to work on later versions of the compiler.

>> In other cases, in particular
>> -fno-tree-vectorize, using the flag just avoids slowdowns from the
>> "optimization".
>
>You know better than the solid majority of programmers that
>"optimisation" is as much an art as a science

I always thought optimization was engineering. Anyway, to actually back up my
claim with numbers (unlike the handwaving that usually goes along with claims of
speedups from assuming that C programs don't perform undefined behaviour),
here's some data.

The measurements were done with Gforth commit
4224ab5fafea970dade64b04493ef690da8b3c32 compiled and run on Debian 12
(gcc-12.2.0), and run on core 1 of a Ryzen 8700G (Zen4 ~5GHz).  Two
versions were measured:

gforth-fast-no-tree-vectorize is the gforth-fast built by default.

gforth-fast-tree-vectorize is built by removing "no-tree-vectorize"
from configure.ac and rebuilding from scratch.

Here are numbers from running "gforth-fast-... onebench.fs".  The
numbers are times in seconds.

 sieve bubble matrix   fib   fft
 0.020  0.021  0.011 0.029 0.014 gforth-fast-no-tree-vectorize
 0.365  0.369  0.348 0.435 0.184 gforth-fast-tree-vectorize

So that's slowdown factors of 13.1-31.6 from using tree-vectorize.

Where is that coming from?

The first thing I notice is that gforth-fast-tree-vectorize sanity checks the
code produced by gcc and decides to disable dynamic code generation and all the
optimizations that build on that. So let's disable that for
gforth-fast-no-tree-vectorize, too:

 sieve bubble matrix   fib   fft
 0.020  0.021  0.011 0.029 0.014 gforth-fast-no-tree-vectorize
 0.145  0.134  0.120 0.145 0.057 gforth-fast-no-tree-vectorize --no-dynamic
 0.365  0.369  0.348 0.435 0.184 gforth-fast-tree-vectorize

So -ftree-vectorize achieves a slowdown factor of 4.1-10.9 by disabling Gforth's
dynamic code generation, and a slowdown by a factor 2.5-3.2 beyond that. Where
does the latter come from? Let's look at the Forth word "@", which loads a cell
(a machine word) from memory:

For gforth-fast-no-tree-vectorize --no-dynamic

' disasm-gdb is discode  ok
see @
Code @
   0x0000558ff440e50f <gforth_engine2+6927>:    add    $0x8,%rbx
   0x0000558ff440e513 <gforth_engine2+6931>:    mov    0x0(%r13),%r13
   0x0000558ff440e517 <gforth_engine2+6935>:    mov    (%rbx),%rax
   0x0000558ff440e51a <gforth_engine2+6938>:    jmp    *%rax
end-code

The second instruction does the actual work, the rest is threaded-code
dispatch (optimized away in typical code if dynamic code generation is
enabled).

Now with gforth-fast-tree-vectorize:

Code @
   0x000055aa501f75e6 <gforth_engine2+11238>:   add    $0x8,%rbx
   0x000055aa501f75ea <gforth_engine2+11242>:   mov    (%r8),%rcx
   0x000055aa501f75ed <gforth_engine2+11245>:   mov    (%rbx),%rax
   0x000055aa501f75f0 <gforth_engine2+11248>:   mov    %r14,0x8(%rsp)
   0x000055aa501f75f5 <gforth_engine2+11253>:   mov    %rax,%r11
   0x000055aa501f75f8 <gforth_engine2+11256>:   mov    %r15,%r9
   0x000055aa501f75fb <gforth_engine2+11259>:   mov    %rcx,0x10(%rsp)
   0x000055aa501f7600 <gforth_engine2+11264>:   jmp    0x55aa501f4a99 <gforth_engine2+153>
end-code
0x55aa501f4a99 56 discode
   0x000055aa501f4a99 <gforth_engine2+153>:     movq   0x8(%rsp),%xmm0
   0x000055aa501f4a9f <gforth_engine2+159>:     movq   %r9,%xmm1
   0x000055aa501f4aa4 <gforth_engine2+164>:     movhps 0x8(%rsp),%xmm1
   0x000055aa501f4aa9 <gforth_engine2+169>:     movhps 0x10(%rsp),%xmm0
   0x000055aa501f4aae <gforth_engine2+174>:     movhlps %xmm0,%xmm5
   0x000055aa501f4ab1 <gforth_engine2+177>:     movq   %xmm0,%r14
   0x000055aa501f4ab6 <gforth_engine2+182>:     movq   %xmm1,%r15
   0x000055aa501f4abb <gforth_engine2+187>:     movhps %xmm1,0x18(%rsp)
   0x000055aa501f4ac0 <gforth_engine2+192>:     movq   %xmm5,%r8
   0x000055aa501f4ac5 <gforth_engine2+197>:     mov    %r15,%rdi
   0x000055aa501f4ac8 <gforth_engine2+200>:     mov    %r14,%rsi
   0x000055aa501f4acb <gforth_engine2+203>:     mov    %r8,%rcx
   0x000055aa501f4ace <gforth_engine2+206>:     jmp    *%r11

GCC produced similar code in the 3.x timeframe without auto-vectorization, but
they eventually managed to fix that. My guess at what is happening here is that
the auto-vectorizer tries to vectorize accesses to adjacent memory locations
somewhere in gforth_engine2(), this reduces the precision of the liveness
tracking, resulting in all these register-register and register-memory moves,
and they migrate from the original places to the shared indirect jump that gcc
internally introduces for all the occurences of "goto *" in the source code.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [standalone]

csiph-web

Undefined behaviour in C23

Contents

#3676 — Undefined behaviour in C23

#3677

#3679

#3681

#3682

#3684

#3685

#3686

#3688

#3689

#3690

#3691

#3693

#3678

#3683

#3697