Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #134532
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | C compiler optimization and Forth engines (was: EuroForth 2025 ...) |
| Date | 2026-01-24 11:28 +0000 |
| Organization | Institut fuer Computersprachen, Technische Universitaet Wien |
| Message-ID | <2026Jan24.122830@mips.complang.tuwien.ac.at> (permalink) |
| References | <69688c01$1@news.ausics.net> <2026Jan15.130413@mips.complang.tuwien.ac.at> <nnd$3a148ef5$137ee4b5@b1e8191b89e23503> <2026Jan16.183803@mips.complang.tuwien.ac.at> <nnd$7cecfc2e$135c60e6@11ec9b68cac8aeb0> |
Hans Bezemer <the.beez.speaks@gmail.com> writes:
>I've done my thing, compiled 4tH with optimizations -O3 till -O0.
>I thought, let's make this simple and execute ALL benchmarks I got. Some
>of them have become useless, though for the simple reason hardware has
>become that much better.
>
>But still, here it is. Overall, the performance consistently
>deteriorates, aka -O3 gives the best performance.
Which compiler and which hardware?
For a random program, I would expect higher optimization levels to
produe faster code. For a Forth system and these recent gccs, the
auto-vectorization of adjacent memory accesses may lead to similar
problems as in the C bubble-sort benchmark. In Gforth, this actually
happens unless we disable vectorization (which we normally do), and,
moreover, with the vectorized code, gcc introduces additional
inefficiencies (see below).
Here's the output of ./gforth-fast onebench.fs compiled from the
current development version with gcc-12.2 and running on a Ryzen 5800X
(numbers are times, lower is better):
sieve bubble matrix fib fft gcc options
0.025 0.023 0.013 0.033 0.016 -O2
0.025 0.023 0.013 0.037 0.016 -O3 -fno-tree-vectorize (gforth default)
0.404 0.418 0.377 0.472 0.244 -O3 (with auto vectorization)
0.145 0.122 0.124 0.122 0.073 gforth default, using --no-dynamic
So how is the code different? Here's the code for ROT:
-O3 (auto-vectorized) -O3 -fno-tree-vec... -O2
add $0x8,%rbx add $0x8,%rbx add $0x8,%rbx
movq 0x8(%r10),%xmm1 mov 0x8(%r10),%rdx mov 0x8(%r10),%rdx
mov 0x10(%r10),%rcx mov 0x10(%r10),%rax mov 0x10(%r10),%rax
punpcklqdq %xmm1,%xmm1 mov %r13,0x8(%r10) mov %r13,0x8(%r10)
punpckhqdq %xmm1,%xmm0 mov %rdx,0x10(%r10) mov %rdx,0x10(%r10)
movups %xmm0,0x8(%r10) mov %rax,%r13 mov %rax,%r13
mov (%rbx),%rax mov (%rbx),%rax mov (%rbx),%rax
mov %r14,0x8(%rsp) jmp *%rax jmp *%rax
mov %rax,%r11
mov %r15,%r9
mov %rcx,0x10(%rsp)
jmp 0x55bff2a58a99
So in this case -O3 without auto-vectorization generates the same code
as -O2. Auto-vectorization, OTOH, replaces
mov 0x8(%r10),%rdx
mov 0x10(%r10),%rax
with
movq 0x8(%r10),%xmm1
and then performs the rotation with the punpck instructions, finally
storing two cells into memory with movups. For some reason it also
separately loads 0x10(%r10) into %rcx (instead of extracting it from
%xmm1), and eventually stores it to 0x10(%rsp), which seems to be one
of the locations of the TOS.
I expect that gcc's auto-vectorization will do similar things to
primitives like ROT 2! 2SWAP (all of which are hit in gforth) in other
Forth systems with a C substrate, because they all tend to access two
(or more) adjacent cells.
But the big hit with the auto-vectorized code is not these changes,
but what happens at the end of the primitive: without
auto-vectorization there is the indirect jump of the threaded-code
dispatch, but with auto-vectorization it jumps to 0x55bff2a58a99:
0x000055bff2a58a99 <gforth_engine2+153>: movq 0x8(%rsp),%xmm0
0x000055bff2a58a9f <gforth_engine2+159>: movq %r9,%xmm1
0x000055bff2a58aa4 <gforth_engine2+164>: movhps 0x8(%rsp),%xmm1
0x000055bff2a58aa9 <gforth_engine2+169>: movhps 0x10(%rsp),%xmm0
0x000055bff2a58aae <gforth_engine2+174>: movhlps %xmm0,%xmm5
0x000055bff2a58ab1 <gforth_engine2+177>: movq %xmm0,%r14
0x000055bff2a58ab6 <gforth_engine2+182>: movq %xmm1,%r15
0x000055bff2a58abb <gforth_engine2+187>: movhps %xmm1,0x18(%rsp)
0x000055bff2a58ac0 <gforth_engine2+192>: movq %xmm5,%r8
0x000055bff2a58ac5 <gforth_engine2+197>: mov %r15,%rdi
0x000055bff2a58ac8 <gforth_engine2+200>: mov %r14,%rsi
0x000055bff2a58acb <gforth_engine2+203>: mov %r8,%rcx
0x000055bff2a58ace <gforth_engine2+206>: jmp *%r11
We can see here that, among other things 0x10(%rsp) (the TOS) is
loaded into %xmm0 and then moved through %xmm5 into %r8 and the %rcx,
as well as through %r14 into %rsi so at the end TOS resides in all
those places. And I see that other primitives expect the TOS in some
of those places, e.g. 1+:
-O3 (auto-vectorized) -O3 -fno-tree-vec...
add $0x8,%rbx add $0x8,%rbx
lea 0x1(%r8),%rcx add $0x1,%r13
mov (%rbx),%rax mov (%rbx),%rax
mov %r14,0x8(%rsp) jmp *%rax
mov %rax,%r11
mov %r15,%r9
mov %rcx,0x10(%rsp)
jmp 0x55bff2a58a99
Jumping to 0x55bff2a58a99 instead of performing an indirect jump
disables dynamic native code generation in Gforth and all the
optimizations that are based on it. You can see in the --no-dynamic
line how much that costs. The remaining factor of 3 is probably due
to the large number of additional instructions that are performed in
the auto-vectorized engine.
What is the 4th code for ROT with -O2 and -O3?
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
EuroForth 2025 preliminary proceedings dxf <dxforth@gmail.com> - 2026-01-15 17:41 +1100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-15 12:04 +0000
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-16 15:25 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-16 17:38 +0000
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-22 16:51 +0100
C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 11:28 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 16:47 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-25 23:31 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-26 19:24 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-27 15:44 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-29 18:27 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) albert@spenarnc.xs4all.nl - 2026-01-30 13:20 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-30 18:00 +0000
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-16 23:10 -0800
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-17 16:58 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-17 20:21 -0800
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-18 15:26 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-18 22:17 +0000
Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-18 16:34 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:35 -0800
Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-20 12:12 +0100
Coroutines in Forth Gerry Jackson <do-not-use@swldwa.uk> - 2026-04-02 20:59 +0100
Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 18:02 -0700
Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 21:21 -0700
Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-19 23:26 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-19 15:22 -0800
Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-20 10:44 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:36 +0000
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:33 -0800
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:17 +0000
csiph-web