Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.forth > #134538
| From | peter <peter.noreply@tin.it> |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) |
| Date | 2026-01-27 15:44 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <20260127154455.00000f73@tin.it> (permalink) |
| References | (4 earlier) <nnd$7cecfc2e$135c60e6@11ec9b68cac8aeb0> <2026Jan24.122830@mips.complang.tuwien.ac.at> <2026Jan24.174716@mips.complang.tuwien.ac.at> <20260125233110.000034b4@tin.it> <2026Jan26.202443@mips.complang.tuwien.ac.at> |
On Mon, 26 Jan 2026 19:24:43 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> peter <peter.noreply@tin.it> writes:
> >On Sat, 24 Jan 2026 16:47:16 GMT
> >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> >> I have now also tried it with gcc-14.2, and that produces better code.
> >> Results from a Xeon E-2388G (Rocket Lake):
> >>
> >> sieve bubble matrix fib fft gcc options
> >> 0.032 0.032 0.015 0.037 0.014 -O2
> >> 0.035 0.032 0.015 0.037 0.014 -O3 -fno-tree-vectorize (gforth default)
> >> 0.033 0.034 0.016 0.032 0.014 -O3 (with auto vectorization)
> >>
> >> The code for ROT and 2SWAP does not use auto-vectorization, and the
> >> code for 2! uses auto-vectorization in a way that reduces the
> >> instruction count:
> >>
> >> -O3 (auto-vectorized) -O3 -fno-tree-vectorize
> >> add $0x8,%rbx add $0x8,%rbx
> >> movq 0x8(%r13),%xmm0 mov 0x10(%r13),%rax
> >> add $0x18,%r13 mov 0x8(%r13),%rdx
> >> movhps -0x8(%r13),%xmm0 add $0x18,%r13
> >> movups %xmm0,(%r8) mov %rdx,(%r8)
> >> mov 0x0(%r13),%r8 mov %rax,0x8(%r8)
> >> mov (%rbx),%rax mov 0x0(%r13),%r8
> >> jmp *%rax mov (%rbx),%rax
> >> jmp *%rax
> >>
> >> And the common tail with all these move instructions is gone.
> >>
> >> - anton
> >
> >What does your C code looks like? I could not get clang or gcc to auto vectories
> >with my existing code
> >
> > UNS64 *tmp64 = (UNS64*)TOP;
> > tmp64[0] = sp[0];
> > tmp64[1] = sp[1];
> > TOP = sp[2];
> > sp += 3;
>
> Gforth's source code for 2! is:
>
> 2! ( w1 w2 a_addr -- ) core two_store
> ""Store @i{w2} into the cell at @i{c-addr} and @i{w1} into the next cell.""
> a_addr[0] = w2;
> a_addr[1] = w1;
>
> A generator produces the following from that, which is passed to gcc:
>
> LABEL(two_store) /* 2! ( w1 w2 a_addr -- ) S1 -- S1 */
> /* Store @i{w2} into the cell at @i{c-addr} and @i{w1} into the next cell. */
> NAME("2!")
> ip += 1;
> LABEL1(two_store)
> {
> DEF_CA
> MAYBE_UNUSED Cell w1;
> MAYBE_UNUSED Cell w2;
> MAYBE_UNUSED Cell * a_addr;
> NEXT_P0;
> vm_Cell2w(sp[2],w1);
> vm_Cell2w(sp[1],w2);
> vm_Cell2a_(spTOS,a_addr);
> #ifdef VM_DEBUG
> if (vm_debug) {
> fputs(" w1=", vm_out); printarg_w(w1);
> fputs(" w2=", vm_out); printarg_w(w2);
> fputs(" a_addr=", vm_out); printarg_a_(a_addr);
> }
> #endif
> sp += 3;
> {
> #line 1815 "prim"
> a_addr[0] = w2;
> a_addr[1] = w1;
> #line 10136 "prim-fast.i"
> }
>
> #ifdef VM_DEBUG
> if (vm_debug) {
> fputs(" -- ", vm_out); fputc('\n', vm_out);
> }
> #endif
> NEXT_P1;
> spTOS = sp[0];
> LABEL2(two_store)
> NAME1("l2-two_store")
> NEXT_P1_5;
> LABEL3(two_store)
> NAME1("l3-two_store")
> DO_GOTO;
> }
>
> There are a lot of macros in this code, and I fear that expanding them
> makes the code even less readable, but the essence for the
> auto-vectorized part is something like:
>
> w1 = sp[2];
> w2 = sp[1];
> a_addr = spTOS;
> sp += 3;
> a_addr[0] = w2;
> a_addr[1] = w1;
> spTOS = sp[0];
>
> My guess is that in your code the compiler expected that sp[1] might
> alias with tmp64[0], and therefore did not vectorize the loads and the
> stores, whereas in the Gforth code, the loads both happen first, and
> then the two stores, and gcc can vectorize that. I doubt that there
> is a big benefit from that, though.
Yes that was it. changing to:
UNS64 *tmp64 = (UNS64*)TOP;
UNS64 d0=sp[0];
UNS64 d1=sp[1];
tmp64[0] = d0;
tmp64[1] = d1;
TOP = sp[2];
sp += 3;
made the compiler (clang-21 in this case) generate the expected code
>
> >typedef UNS64 v2u64 __attribute__((vector_size(16))) __attribute__((aligned(8)));
>
> I'll have to remember the aligned attribute for future games with gcc
> explicit vectorization.
Without that it will generate the opcodes that needs 16 byte alignment
BR
Peter
> - anton
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
EuroForth 2025 preliminary proceedings dxf <dxforth@gmail.com> - 2026-01-15 17:41 +1100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-15 12:04 +0000
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-16 15:25 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-16 17:38 +0000
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-22 16:51 +0100
C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 11:28 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 16:47 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-25 23:31 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-26 19:24 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-27 15:44 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-29 18:27 +0000
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) albert@spenarnc.xs4all.nl - 2026-01-30 13:20 +0100
Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-30 18:00 +0000
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-16 23:10 -0800
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-17 16:58 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-17 20:21 -0800
Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-18 15:26 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-18 22:17 +0000
Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-18 16:34 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:35 -0800
Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-20 12:12 +0100
Coroutines in Forth Gerry Jackson <do-not-use@swldwa.uk> - 2026-04-02 20:59 +0100
Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 18:02 -0700
Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 21:21 -0700
Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-19 23:26 +0100
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-19 15:22 -0800
Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-20 10:44 +0100
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:36 +0000
Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:33 -0800
Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:17 +0000
csiph-web