Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #134534
| From | peter <peter.noreply@tin.it> |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) |
| Date | 2026-01-25 23:31 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <20260125233110.000034b4@tin.it> (permalink) |
| References | (2 earlier) <nnd$3a148ef5$137ee4b5@b1e8191b89e23503> <2026Jan16.183803@mips.complang.tuwien.ac.at> <nnd$7cecfc2e$135c60e6@11ec9b68cac8aeb0> <2026Jan24.122830@mips.complang.tuwien.ac.at> <2026Jan24.174716@mips.complang.tuwien.ac.at> |
On Sat, 24 Jan 2026 16:47:16 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >Hans Bezemer <the.beez.speaks@gmail.com> writes:
> >>I've done my thing, compiled 4tH with optimizations -O3 till -O0.
> >>I thought, let's make this simple and execute ALL benchmarks I got. Some
> >>of them have become useless, though for the simple reason hardware has
> >>become that much better.
> >>
> >>But still, here it is. Overall, the performance consistently
> >>deteriorates, aka -O3 gives the best performance.
> >
> >Which compiler and which hardware?
> >
> >For a random program, I would expect higher optimization levels to
> >produe faster code. For a Forth system and these recent gccs, the
> >auto-vectorization of adjacent memory accesses may lead to similar
> >problems as in the C bubble-sort benchmark. In Gforth, this actually
> >happens unless we disable vectorization (which we normally do), and,
> >moreover, with the vectorized code, gcc introduces additional
> >inefficiencies (see below).
> >
> >Here's the output of ./gforth-fast onebench.fs compiled from the
> >current development version with gcc-12.2 and running on a Ryzen 5800X
> >(numbers are times, lower is better):
> >
> > sieve bubble matrix fib fft gcc options
> > 0.025 0.023 0.013 0.033 0.016 -O2
> > 0.025 0.023 0.013 0.037 0.016 -O3 -fno-tree-vectorize (gforth default)
> > 0.404 0.418 0.377 0.472 0.244 -O3 (with auto vectorization)
> > 0.145 0.122 0.124 0.122 0.073 gforth default, using --no-dynamic
>
> I have now also tried it with gcc-14.2, and that produces better code.
> Results from a Xeon E-2388G (Rocket Lake):
>
> sieve bubble matrix fib fft gcc options
> 0.032 0.032 0.015 0.037 0.014 -O2
> 0.035 0.032 0.015 0.037 0.014 -O3 -fno-tree-vectorize (gforth default)
> 0.033 0.034 0.016 0.032 0.014 -O3 (with auto vectorization)
>
> The code for ROT and 2SWAP does not use auto-vectorization, and the
> code for 2! uses auto-vectorization in a way that reduces the
> instruction count:
>
> -O3 (auto-vectorized) -O3 -fno-tree-vectorize
> add $0x8,%rbx add $0x8,%rbx
> movq 0x8(%r13),%xmm0 mov 0x10(%r13),%rax
> add $0x18,%r13 mov 0x8(%r13),%rdx
> movhps -0x8(%r13),%xmm0 add $0x18,%r13
> movups %xmm0,(%r8) mov %rdx,(%r8)
> mov 0x0(%r13),%r8 mov %rax,0x8(%r8)
> mov (%rbx),%rax mov 0x0(%r13),%r8
> jmp *%rax mov (%rbx),%rax
> jmp *%rax
>
> And the common tail with all these move instructions is gone.
>
> - anton
What does your C code looks like? I could not get clang or gcc to auto vectories
with my existing code
UNS64 *tmp64 = (UNS64*)TOP;
tmp64[0] = sp[0];
tmp64[1] = sp[1];
TOP = sp[2];
sp += 3;
In the end I changed my code to tell the compiler that it is a vector with
typedef UNS64 v2u64 __attribute__((vector_size(16))) __attribute__((aligned(8)));
and
*(v2u64*)TOP = *(v2u64*)sp;
TOP=sp[2];
sp=sp+3;
this will produce
vmovups xmm0, xmmword ptr [rdx]
vmovups xmmword ptr [r8], xmm0
mov r8, qword ptr [rdx + 16]
add rdx, 24
movzx r9d, byte ptr [rcx] // nesting code
inc rcx
jmp qword ptr [rax + 8*r9]
But also using memcpy((UNS64*)TOP, (UNS64*)sp,16); gives the same code!
Looks like it is working also in ARM64
BR
Peter
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
EuroForth 2025 preliminary proceedings dxf <dxforth@gmail.com> - 2026-01-15 17:41 +1100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-15 12:04 +0000
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-16 15:25 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-16 17:38 +0000
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-22 16:51 +0100
C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 11:28 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 16:47 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-25 23:31 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-26 19:24 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-27 15:44 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-29 18:27 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) albert@spenarnc.xs4all.nl - 2026-01-30 13:20 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-30 18:00 +0000
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-16 23:10 -0800
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-17 16:58 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-17 20:21 -0800
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-18 15:26 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-18 22:17 +0000
Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-18 16:34 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:35 -0800
Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-20 12:12 +0100
Coroutines in Forth Gerry Jackson <do-not-use@swldwa.uk> - 2026-04-02 20:59 +0100
Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 18:02 -0700
Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 21:21 -0700
Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-19 23:26 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-19 15:22 -0800
Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-20 10:44 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:36 +0000
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:33 -0800
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:17 +0000
csiph-web