Groups | Search | Server Info | Login | Register

Re: Undefined behaviour in C23

Path	csiph.com!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From	anton@mips.complang.tuwien.ac.at
Newsgroups	comp.compilers
Subject	Re: Undefined behaviour in C23
Date	Sat, 06 Sep 2025 17:15:18 +0000
Organization	Compilers Central
Sender	johnl%iecc.com
Approved	comp.compilers@iecc.com
Message-ID	<25-09-003@comp.compilers> (permalink)
References	<25-08-002@comp.compilers> <25-08-004@comp.compilers> <25-08-009@comp.compilers>
MIME-Version	1.0
Content-Type	text/plain; charset="UTF-8"
Injection-Info	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="65878"; mail-complaints-to="abuse@iecc.com"
Keywords	C, standards
Posted-Date	06 Sep 2025 18:53:08 EDT
X-submission-address	compilers@iecc.com
X-moderator-address	compilers-request@iecc.com
X-FAQ-and-archives	http://compilers.iecc.com
Xref	csiph.com comp.compilers:3697

Show key headers only | View raw

David Brown <david.brown@hesbynett.no> writes:
>On 21/08/2025 07:44, anton@mips.complang.tuwien.ac.at wrote:
>> Martin Ward <mwardgkc@gmail.com> writes:
>Imagine
>if car manufacturers had to limit the speeds of new cars to 10 miles per
>hour, because some drivers a century ago assumed that they could safely
>put their foot flat on the accelerator without hitting the horse and
>cart in front of them.

The latter assumption is wrong even with 10mph. If cars had been limited to
10mph, that would hopefully have prevented the kind of "progress" that is taking
>1M lives per year, every year. But that's a different discussion.

>> And the practice is that the people in C compiler maintenance reject
>> bug reports as RESOLVED INVALID when the code exercises undefined
>> behaviour, even when the code works as intended in earlier versions of
>> the compiler and when the breakage could be easily fixed (e.g., for
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804> and
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709> by using movdqu
>> instead of movdqa).
...
>But the solution is certainly /not/ to say that people everyone correct
>C code and compiling with high optimisations should get slower results
>because someone else previously wrote code that made unwarranted and
>unchecked assumptions about particular compilers and particular target
>processors.

Ah, yes, that claim, as usual without empirical support. I actually measured it
for such a claim made in
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709#c10>, and found that there
is no performance advantage on K10 and Core 2 (CPUs that were claimed to benefit
from movdqa), nor on Sandy Bridge, Haswell, or Skylake from using movdqa instead
of movdqu. The biggest speed difference in favour of MOVDQA was a factor 1.0014
on Core 2, but there it would have been better to just use scalar code. Read all
about it at <http://www.complang.tuwien.ac.at/anton/autovectors/>.

>> But they not always do so: The SATD function from the SPEC benchmark
>> 464.h264ref exercises undefined behaviour, and a pre-release version
>> of gcc-4.8 generated code that did not behave as intended.  The
>> release version of gcc-4.8 compiled 464.h264ref as intended (but later
>> a similar case that was not in a SPEC program
>> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66875> was rejected as
>> RESOLVED INVALID).
>
>So the gcc developers made an exception for a particularly important,
>useful and common case?

No, they made an exception for a benchmark.

>> When I brought this up, the reactions reached from
>> flat-out denial that it ever happened (despite it being widely
>> publicized <https://lwn.net/Articles/544123/>) through a claim that
>> the "optimization" turned out to have no benefit (and yet the similar
>> case mentioned above still was "optimized" in a later gcc version) to
>> a statement along the lines that 464.h264ref is a relevant benchmark.
>>
>
>Maybe this particular case was handled badly, or at least the
>communications involved were bad.  It was over a decade ago, in a
>pre-release candidate compiler.  (Pre-release candidates are used
>precisely to check if changes cause trouble with real-world code.)  How
>long are you going to hold a grudge about this?

Have they changed their attitude since then? If not, why should it it matter
that this was over ten years ago?

>> They also have their test suites of programs for regression testing,
>> and any behavioural change in these programs that is visible in this
>> regression testing probably leads to applying the optimization in a
>> less aggressive way.
>>
>
>I would assume that they try to avoid UB in their test suite code
>(though of course gcc developers can have bugs and mistakes like anyone
>else).

Throwing out all programs with undefined behaviour from their test suite would
probably reduce the test suite a lot, and would ensure that regressions like not
compiling the Linux kernel as intended would reappear all the time.

>> How do tests get added into the regression test suite?  Ideally, if
>> somebody reports a case where a program behaves in one way in an
>> earlier version of the same compiler and differently in a later
>> version, that program and its original behaviour should usually be
>> added to the test suite
>> <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>, but in gcc
>> this does not happen (see the bug reports linked to above).
>
>In what bizarre world would that be "ideal" ?

In a world where an existing program that works as intended on one version of
the compiler is expected to work on later versions of the compiler.

>> In other cases, in particular
>> -fno-tree-vectorize, using the flag just avoids slowdowns from the
>> "optimization".
>
>You know better than the solid majority of programmers that
>"optimisation" is as much an art as a science

I always thought optimization was engineering. Anyway, to actually back up my
claim with numbers (unlike the handwaving that usually goes along with claims of
speedups from assuming that C programs don't perform undefined behaviour),
here's some data.

The measurements were done with Gforth commit
4224ab5fafea970dade64b04493ef690da8b3c32 compiled and run on Debian 12
(gcc-12.2.0), and run on core 1 of a Ryzen 8700G (Zen4 ~5GHz).  Two
versions were measured:

gforth-fast-no-tree-vectorize is the gforth-fast built by default.

gforth-fast-tree-vectorize is built by removing "no-tree-vectorize"
from configure.ac and rebuilding from scratch.

Here are numbers from running "gforth-fast-... onebench.fs".  The
numbers are times in seconds.

 sieve bubble matrix   fib   fft
 0.020  0.021  0.011 0.029 0.014 gforth-fast-no-tree-vectorize
 0.365  0.369  0.348 0.435 0.184 gforth-fast-tree-vectorize

So that's slowdown factors of 13.1-31.6 from using tree-vectorize.

Where is that coming from?

The first thing I notice is that gforth-fast-tree-vectorize sanity checks the
code produced by gcc and decides to disable dynamic code generation and all the
optimizations that build on that. So let's disable that for
gforth-fast-no-tree-vectorize, too:

 sieve bubble matrix   fib   fft
 0.020  0.021  0.011 0.029 0.014 gforth-fast-no-tree-vectorize
 0.145  0.134  0.120 0.145 0.057 gforth-fast-no-tree-vectorize --no-dynamic
 0.365  0.369  0.348 0.435 0.184 gforth-fast-tree-vectorize

So -ftree-vectorize achieves a slowdown factor of 4.1-10.9 by disabling Gforth's
dynamic code generation, and a slowdown by a factor 2.5-3.2 beyond that. Where
does the latter come from? Let's look at the Forth word "@", which loads a cell
(a machine word) from memory:

For gforth-fast-no-tree-vectorize --no-dynamic

' disasm-gdb is discode  ok
see @
Code @
   0x0000558ff440e50f <gforth_engine2+6927>:    add    $0x8,%rbx
   0x0000558ff440e513 <gforth_engine2+6931>:    mov    0x0(%r13),%r13
   0x0000558ff440e517 <gforth_engine2+6935>:    mov    (%rbx),%rax
   0x0000558ff440e51a <gforth_engine2+6938>:    jmp    *%rax
end-code

The second instruction does the actual work, the rest is threaded-code
dispatch (optimized away in typical code if dynamic code generation is
enabled).

Now with gforth-fast-tree-vectorize:

Code @
   0x000055aa501f75e6 <gforth_engine2+11238>:   add    $0x8,%rbx
   0x000055aa501f75ea <gforth_engine2+11242>:   mov    (%r8),%rcx
   0x000055aa501f75ed <gforth_engine2+11245>:   mov    (%rbx),%rax
   0x000055aa501f75f0 <gforth_engine2+11248>:   mov    %r14,0x8(%rsp)
   0x000055aa501f75f5 <gforth_engine2+11253>:   mov    %rax,%r11
   0x000055aa501f75f8 <gforth_engine2+11256>:   mov    %r15,%r9
   0x000055aa501f75fb <gforth_engine2+11259>:   mov    %rcx,0x10(%rsp)
   0x000055aa501f7600 <gforth_engine2+11264>:   jmp    0x55aa501f4a99 <gforth_engine2+153>
end-code
0x55aa501f4a99 56 discode
   0x000055aa501f4a99 <gforth_engine2+153>:     movq   0x8(%rsp),%xmm0
   0x000055aa501f4a9f <gforth_engine2+159>:     movq   %r9,%xmm1
   0x000055aa501f4aa4 <gforth_engine2+164>:     movhps 0x8(%rsp),%xmm1
   0x000055aa501f4aa9 <gforth_engine2+169>:     movhps 0x10(%rsp),%xmm0
   0x000055aa501f4aae <gforth_engine2+174>:     movhlps %xmm0,%xmm5
   0x000055aa501f4ab1 <gforth_engine2+177>:     movq   %xmm0,%r14
   0x000055aa501f4ab6 <gforth_engine2+182>:     movq   %xmm1,%r15
   0x000055aa501f4abb <gforth_engine2+187>:     movhps %xmm1,0x18(%rsp)
   0x000055aa501f4ac0 <gforth_engine2+192>:     movq   %xmm5,%r8
   0x000055aa501f4ac5 <gforth_engine2+197>:     mov    %r15,%rdi
   0x000055aa501f4ac8 <gforth_engine2+200>:     mov    %r14,%rsi
   0x000055aa501f4acb <gforth_engine2+203>:     mov    %r8,%rcx
   0x000055aa501f4ace <gforth_engine2+206>:     jmp    *%r11

GCC produced similar code in the 3.x timeframe without auto-vectorization, but
they eventually managed to fix that. My guess at what is happening here is that
the auto-vectorizer tries to vectorize accesses to adjacent memory locations
somewhere in gforth_engine2(), this reduces the precision of the liveness
tracking, resulting in all these register-register and register-memory moves,
and they migrate from the original places to the shared indirect jump that gcc
internally introduces for all the occurences of "goto *" in the source code.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Back to comp.compilers | Previous | Next — Previous in thread | Find similar

Thread

Undefined behaviour in C23 Martin Ward <mwardgkc@gmail.com> - 2025-08-20 14:06 +0100
  Re: Undefined behaviour in C23 Kaz Kylheku <643-408-1753@kylheku.com> - 2025-08-20 18:33 +0000
    Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-21 15:02 +0200
      Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-21 12:53 -0700
        Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-22 17:58 +0200
          Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-08-22 17:16 +0000
          Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-22 15:11 -0700
            Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-23 16:55 +0200
              Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-23 15:58 -0700
              Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-25 22:13 -0400
                Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-26 13:41 -0400
                Re: Undefined behaviour in C23 Michael S <already5chosen@yahoo.com.dmarc.email> - 2025-08-26 22:28 +0300
                Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-26 16:53 -0400
  Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-08-21 05:44 +0000
    Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-22 18:42 +0200
      Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-09-06 17:15 +0000

csiph-web