Groups | Search | Server Info | Login | Register
Groups > comp.compilers > #3697
| Path | csiph.com!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end |
|---|---|
| From | anton@mips.complang.tuwien.ac.at |
| Newsgroups | comp.compilers |
| Subject | Re: Undefined behaviour in C23 |
| Date | Sat, 06 Sep 2025 17:15:18 +0000 |
| Organization | Compilers Central |
| Sender | johnl%iecc.com |
| Approved | comp.compilers@iecc.com |
| Message-ID | <25-09-003@comp.compilers> (permalink) |
| References | <25-08-002@comp.compilers> <25-08-004@comp.compilers> <25-08-009@comp.compilers> |
| MIME-Version | 1.0 |
| Content-Type | text/plain; charset="UTF-8" |
| Injection-Info | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="65878"; mail-complaints-to="abuse@iecc.com" |
| Keywords | C, standards |
| Posted-Date | 06 Sep 2025 18:53:08 EDT |
| X-submission-address | compilers@iecc.com |
| X-moderator-address | compilers-request@iecc.com |
| X-FAQ-and-archives | http://compilers.iecc.com |
| Xref | csiph.com comp.compilers:3697 |
Show key headers only | View raw
David Brown <david.brown@hesbynett.no> writes: >On 21/08/2025 07:44, anton@mips.complang.tuwien.ac.at wrote: >> Martin Ward <mwardgkc@gmail.com> writes: >Imagine >if car manufacturers had to limit the speeds of new cars to 10 miles per >hour, because some drivers a century ago assumed that they could safely >put their foot flat on the accelerator without hitting the horse and >cart in front of them. The latter assumption is wrong even with 10mph. If cars had been limited to 10mph, that would hopefully have prevented the kind of "progress" that is taking >1M lives per year, every year. But that's a different discussion. >> And the practice is that the people in C compiler maintenance reject >> bug reports as RESOLVED INVALID when the code exercises undefined >> behaviour, even when the code works as intended in earlier versions of >> the compiler and when the breakage could be easily fixed (e.g., for >> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66804> and >> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709> by using movdqu >> instead of movdqa). ... >But the solution is certainly /not/ to say that people everyone correct >C code and compiling with high optimisations should get slower results >because someone else previously wrote code that made unwarranted and >unchecked assumptions about particular compilers and particular target >processors. Ah, yes, that claim, as usual without empirical support. I actually measured it for such a claim made in <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709#c10>, and found that there is no performance advantage on K10 and Core 2 (CPUs that were claimed to benefit from movdqa), nor on Sandy Bridge, Haswell, or Skylake from using movdqa instead of movdqu. The biggest speed difference in favour of MOVDQA was a factor 1.0014 on Core 2, but there it would have been better to just use scalar code. Read all about it at <http://www.complang.tuwien.ac.at/anton/autovectors/>. >> But they not always do so: The SATD function from the SPEC benchmark >> 464.h264ref exercises undefined behaviour, and a pre-release version >> of gcc-4.8 generated code that did not behave as intended. The >> release version of gcc-4.8 compiled 464.h264ref as intended (but later >> a similar case that was not in a SPEC program >> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66875> was rejected as >> RESOLVED INVALID). > >So the gcc developers made an exception for a particularly important, >useful and common case? No, they made an exception for a benchmark. >> When I brought this up, the reactions reached from >> flat-out denial that it ever happened (despite it being widely >> publicized <https://lwn.net/Articles/544123/>) through a claim that >> the "optimization" turned out to have no benefit (and yet the similar >> case mentioned above still was "optimized" in a later gcc version) to >> a statement along the lines that 464.h264ref is a relevant benchmark. >> > >Maybe this particular case was handled badly, or at least the >communications involved were bad. It was over a decade ago, in a >pre-release candidate compiler. (Pre-release candidates are used >precisely to check if changes cause trouble with real-world code.) How >long are you going to hold a grudge about this? Have they changed their attitude since then? If not, why should it it matter that this was over ten years ago? >> They also have their test suites of programs for regression testing, >> and any behavioural change in these programs that is visible in this >> regression testing probably leads to applying the optimization in a >> less aggressive way. >> > >I would assume that they try to avoid UB in their test suite code >(though of course gcc developers can have bugs and mistakes like anyone >else). Throwing out all programs with undefined behaviour from their test suite would probably reduce the test suite a lot, and would ensure that regressions like not compiling the Linux kernel as intended would reappear all the time. >> How do tests get added into the regression test suite? Ideally, if >> somebody reports a case where a program behaves in one way in an >> earlier version of the same compiler and differently in a later >> version, that program and its original behaviour should usually be >> added to the test suite >> <https://www.complang.tuwien.ac.at/papers/ertl17kps.pdf>, but in gcc >> this does not happen (see the bug reports linked to above). > >In what bizarre world would that be "ideal" ? In a world where an existing program that works as intended on one version of the compiler is expected to work on later versions of the compiler. >> In other cases, in particular >> -fno-tree-vectorize, using the flag just avoids slowdowns from the >> "optimization". > >You know better than the solid majority of programmers that >"optimisation" is as much an art as a science I always thought optimization was engineering. Anyway, to actually back up my claim with numbers (unlike the handwaving that usually goes along with claims of speedups from assuming that C programs don't perform undefined behaviour), here's some data. The measurements were done with Gforth commit 4224ab5fafea970dade64b04493ef690da8b3c32 compiled and run on Debian 12 (gcc-12.2.0), and run on core 1 of a Ryzen 8700G (Zen4 ~5GHz). Two versions were measured: gforth-fast-no-tree-vectorize is the gforth-fast built by default. gforth-fast-tree-vectorize is built by removing "no-tree-vectorize" from configure.ac and rebuilding from scratch. Here are numbers from running "gforth-fast-... onebench.fs". The numbers are times in seconds. sieve bubble matrix fib fft 0.020 0.021 0.011 0.029 0.014 gforth-fast-no-tree-vectorize 0.365 0.369 0.348 0.435 0.184 gforth-fast-tree-vectorize So that's slowdown factors of 13.1-31.6 from using tree-vectorize. Where is that coming from? The first thing I notice is that gforth-fast-tree-vectorize sanity checks the code produced by gcc and decides to disable dynamic code generation and all the optimizations that build on that. So let's disable that for gforth-fast-no-tree-vectorize, too: sieve bubble matrix fib fft 0.020 0.021 0.011 0.029 0.014 gforth-fast-no-tree-vectorize 0.145 0.134 0.120 0.145 0.057 gforth-fast-no-tree-vectorize --no-dynamic 0.365 0.369 0.348 0.435 0.184 gforth-fast-tree-vectorize So -ftree-vectorize achieves a slowdown factor of 4.1-10.9 by disabling Gforth's dynamic code generation, and a slowdown by a factor 2.5-3.2 beyond that. Where does the latter come from? Let's look at the Forth word "@", which loads a cell (a machine word) from memory: For gforth-fast-no-tree-vectorize --no-dynamic ' disasm-gdb is discode ok see @ Code @ 0x0000558ff440e50f <gforth_engine2+6927>: add $0x8,%rbx 0x0000558ff440e513 <gforth_engine2+6931>: mov 0x0(%r13),%r13 0x0000558ff440e517 <gforth_engine2+6935>: mov (%rbx),%rax 0x0000558ff440e51a <gforth_engine2+6938>: jmp *%rax end-code The second instruction does the actual work, the rest is threaded-code dispatch (optimized away in typical code if dynamic code generation is enabled). Now with gforth-fast-tree-vectorize: Code @ 0x000055aa501f75e6 <gforth_engine2+11238>: add $0x8,%rbx 0x000055aa501f75ea <gforth_engine2+11242>: mov (%r8),%rcx 0x000055aa501f75ed <gforth_engine2+11245>: mov (%rbx),%rax 0x000055aa501f75f0 <gforth_engine2+11248>: mov %r14,0x8(%rsp) 0x000055aa501f75f5 <gforth_engine2+11253>: mov %rax,%r11 0x000055aa501f75f8 <gforth_engine2+11256>: mov %r15,%r9 0x000055aa501f75fb <gforth_engine2+11259>: mov %rcx,0x10(%rsp) 0x000055aa501f7600 <gforth_engine2+11264>: jmp 0x55aa501f4a99 <gforth_engine2+153> end-code 0x55aa501f4a99 56 discode 0x000055aa501f4a99 <gforth_engine2+153>: movq 0x8(%rsp),%xmm0 0x000055aa501f4a9f <gforth_engine2+159>: movq %r9,%xmm1 0x000055aa501f4aa4 <gforth_engine2+164>: movhps 0x8(%rsp),%xmm1 0x000055aa501f4aa9 <gforth_engine2+169>: movhps 0x10(%rsp),%xmm0 0x000055aa501f4aae <gforth_engine2+174>: movhlps %xmm0,%xmm5 0x000055aa501f4ab1 <gforth_engine2+177>: movq %xmm0,%r14 0x000055aa501f4ab6 <gforth_engine2+182>: movq %xmm1,%r15 0x000055aa501f4abb <gforth_engine2+187>: movhps %xmm1,0x18(%rsp) 0x000055aa501f4ac0 <gforth_engine2+192>: movq %xmm5,%r8 0x000055aa501f4ac5 <gforth_engine2+197>: mov %r15,%rdi 0x000055aa501f4ac8 <gforth_engine2+200>: mov %r14,%rsi 0x000055aa501f4acb <gforth_engine2+203>: mov %r8,%rcx 0x000055aa501f4ace <gforth_engine2+206>: jmp *%r11 GCC produced similar code in the 3.x timeframe without auto-vectorization, but they eventually managed to fix that. My guess at what is happening here is that the auto-vectorizer tries to vectorize accesses to adjacent memory locations somewhere in gforth_engine2(), this reduces the precision of the liveness tracking, resulting in all these register-register and register-memory moves, and they migrate from the original places to the shared indirect jump that gcc internally introduces for all the occurences of "goto *" in the source code. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/
Back to comp.compilers | Previous | Next — Previous in thread | Find similar
Undefined behaviour in C23 Martin Ward <mwardgkc@gmail.com> - 2025-08-20 14:06 +0100
Re: Undefined behaviour in C23 Kaz Kylheku <643-408-1753@kylheku.com> - 2025-08-20 18:33 +0000
Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-21 15:02 +0200
Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-21 12:53 -0700
Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-22 17:58 +0200
Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-08-22 17:16 +0000
Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-22 15:11 -0700
Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-23 16:55 +0200
Re: Undefined behaviour in C23 Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-08-23 15:58 -0700
Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-25 22:13 -0400
Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-26 13:41 -0400
Re: Undefined behaviour in C23 Michael S <already5chosen@yahoo.com.dmarc.email> - 2025-08-26 22:28 +0300
Re: Undefined behaviour in C23 James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-08-26 16:53 -0400
Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-08-21 05:44 +0000
Re: Undefined behaviour in C23 David Brown <david.brown@hesbynett.no> - 2025-08-22 18:42 +0200
Re: Undefined behaviour in C23 anton@mips.complang.tuwien.ac.at - 2025-09-06 17:15 +0000
csiph-web