Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #134534

Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...)

From peter <peter.noreply@tin.it>
Newsgroups comp.lang.forth
Subject Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...)
Date 2026-01-25 23:31 +0100
Organization A noiseless patient Spider
Message-ID <20260125233110.000034b4@tin.it> (permalink)
References (2 earlier) <nnd$3a148ef5$137ee4b5@b1e8191b89e23503> <2026Jan16.183803@mips.complang.tuwien.ac.at> <nnd$7cecfc2e$135c60e6@11ec9b68cac8aeb0> <2026Jan24.122830@mips.complang.tuwien.ac.at> <2026Jan24.174716@mips.complang.tuwien.ac.at>

Show all headers | View raw


On Sat, 24 Jan 2026 16:47:16 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> >Hans Bezemer <the.beez.speaks@gmail.com> writes:
> >>I've done my thing, compiled 4tH with optimizations -O3 till -O0.
> >>I thought, let's make this simple and execute ALL benchmarks I got. Some 
> >>of them have become useless, though for the simple reason hardware has 
> >>become that much better.
> >>
> >>But still, here it is. Overall, the performance consistently 
> >>deteriorates, aka -O3 gives the best performance.
> >
> >Which compiler and which hardware?
> >
> >For a random program, I would expect higher optimization levels to
> >produe faster code.  For a Forth system and these recent gccs, the
> >auto-vectorization of adjacent memory accesses may lead to similar
> >problems as in the C bubble-sort benchmark.  In Gforth, this actually
> >happens unless we disable vectorization (which we normally do), and,
> >moreover, with the vectorized code, gcc introduces additional
> >inefficiencies (see below).
> >
> >Here's the output of ./gforth-fast onebench.fs compiled from the
> >current development version with gcc-12.2 and running on a Ryzen 5800X
> >(numbers are times, lower is better):
> >
> > sieve bubble matrix   fib   fft gcc options
> > 0.025  0.023  0.013 0.033 0.016 -O2
> > 0.025  0.023  0.013 0.037 0.016 -O3 -fno-tree-vectorize (gforth default)
> > 0.404  0.418  0.377 0.472 0.244 -O3 (with auto vectorization)
> > 0.145  0.122  0.124 0.122 0.073 gforth default, using --no-dynamic
> 
> I have now also tried it with gcc-14.2, and that produces better code.
> Results from a Xeon E-2388G (Rocket Lake):
> 
>  sieve bubble matrix   fib   fft gcc options
>  0.032  0.032  0.015 0.037 0.014 -O2 
>  0.035  0.032  0.015 0.037 0.014 -O3 -fno-tree-vectorize (gforth default)
>  0.033  0.034  0.016 0.032 0.014 -O3 (with auto vectorization)
> 
> The code for ROT and 2SWAP does not use auto-vectorization, and the
> code for 2! uses auto-vectorization in a way that reduces the
> instruction count:
> 
> -O3 (auto-vectorized)     -O3 -fno-tree-vectorize
> add    $0x8,%rbx          add $0x8,%rbx      
> movq   0x8(%r13),%xmm0    mov 0x10(%r13),%rax
> add    $0x18,%r13         mov 0x8(%r13),%rdx 
> movhps -0x8(%r13),%xmm0   add $0x18,%r13     
> movups %xmm0,(%r8)        mov %rdx,(%r8)     
> mov    0x0(%r13),%r8      mov %rax,0x8(%r8)  
> mov    (%rbx),%rax        mov 0x0(%r13),%r8  
> jmp    *%rax              mov (%rbx),%rax    
>                           jmp *%rax          
> 
> And the common tail with all these move instructions is gone.
> 
> - anton

What does your C code looks like? I could not get clang or gcc to auto vectories
with my existing code

  	UNS64 *tmp64 = (UNS64*)TOP; 
        tmp64[0] = sp[0]; 
        tmp64[1] = sp[1]; 
        TOP = sp[2]; 
        sp += 3;


In the end I changed my code to tell the compiler that it is a vector with

typedef UNS64 v2u64 __attribute__((vector_size(16))) __attribute__((aligned(8)));

and
        *(v2u64*)TOP = *(v2u64*)sp;
        TOP=sp[2];
        sp=sp+3; 

this will produce

	vmovups	xmm0, xmmword ptr [rdx]
	vmovups	xmmword ptr [r8], xmm0
	mov	r8, qword ptr [rdx + 16]
	add	rdx, 24

	movzx	r9d, byte ptr [rcx]	// nesting code
	inc	rcx
	jmp	qword ptr [rax + 8*r9]

But also using memcpy((UNS64*)TOP, (UNS64*)sp,16); gives the same code!

Looks like it is working also in ARM64
BR
Peter

Back to comp.lang.forth | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

EuroForth 2025 preliminary proceedings dxf <dxforth@gmail.com> - 2026-01-15 17:41 +1100
  Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-15 12:04 +0000
    Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-16 15:25 +0100
      Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-16 17:38 +0000
        Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-22 16:51 +0100
          C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 11:28 +0000
            Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 16:47 +0000
              Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-25 23:31 +0100
                Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-26 19:24 +0000
                Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-27 15:44 +0100
                Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-29 18:27 +0000
                Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) albert@spenarnc.xs4all.nl - 2026-01-30 13:20 +0100
                Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-30 18:00 +0000
      Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-16 23:10 -0800
        Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-17 16:58 +0100
          Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-17 20:21 -0800
            Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-18 15:26 +0100
          Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-18 22:17 +0000
        Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-18 16:34 +0100
          Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:35 -0800
            Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-20 12:12 +0100
            Coroutines in Forth Gerry Jackson <do-not-use@swldwa.uk> - 2026-04-02 20:59 +0100
              Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 18:02 -0700
                Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 21:21 -0700
        Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-19 23:26 +0100
          Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-19 15:22 -0800
            Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-20 10:44 +0100
            Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:36 +0000
          Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:33 -0800
          Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:17 +0000

csiph-web