Groups | Search | Server Info | Keyboard shortcuts | Login | Register
Groups > comp.lang.forth > #134245
| From | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
|---|---|
| Newsgroups | comp.lang.forth |
| Subject | Re: 3dup again |
| Date | 2025-10-04 08:04 +0000 |
| Organization | Institut fuer Computersprachen, Technische Universitaet Wien |
| Message-ID | <2025Oct4.100409@mips.complang.tuwien.ac.at> (permalink) |
| References | (1 earlier) <mk5b0uFubm8U1@mid.individual.net> <68de4aaa@news.ausics.net> <2025Oct2.224440@mips.complang.tuwien.ac.at> <nnd$36c0a5f5$2a9d9179@1179128ec025831e> <mk9i5hFld65U1@mid.individual.net> |
minforth <minforth@gmx.net> writes:
>Am 03.10.2025 um 11:02 schrieb albert@spenarnc.xs4all.nl:
>> The problem with 3DUP is that it is actually used in context.
>> What is the data that is going to 3DUP ped? In my view this
>> amounts to double use of data that is in registers (32 in the riscv)
>> anyway, after an optimiser does his thing.
>>
>
>Code inlining will mend it.
Inlining is important for Forth, but it does not make what has been
called an "analytical optimizer" unnecessary; on the contraray,
inlining increases the benefit we get from the analytical optimizer.
E.g., let's consider
: 3dup.1 ( a b c -- a b c a b c ) >r 2dup r@ -rot r> ;
: 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;
: 3dup.3 {: a b c :} a b c a b c ;
: 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;
: foo.1 3dup.1 + ! ;
: foo.2 3dup.2 + ! ;
: foo.3 3dup.3 + ! ;
: foo.4 3dup.4 + ! ;
The result produced by VFX64 is:
foo.1 foo.2 foo.3 foo.4
PUSH EBX MOV EDX, EBX CALL 3DUP.3 MOV EDX, EBX
MOV EBX, [ESP] ADD EBX, [EBP] ADD EBX, [EBP] ADD EBX, [EBP]
POP EDX MOV ECX, [EBP+04] MOV EDX, [EBP+04] MOV ECX, [EBP+04]
ADD EDX, [EBP] MOV 0 [EBX], ECX MOV 0 [EBX], EDX MOV 0 [EBX], ECX
MOV ECX, [EBP+04] MOV EBX, EDX MOV EBX, [EBP+08] MOV EBX, EDX
MOV 0 [EDX], ECX NEXT, LEA EBP, [EBP+0C] NEXT,
NEXT, NEXT,
VFX is only analytical about the data stack, and as a consequence, the
implementations of 3dup that only use the data stack work best. When
the return stack is used, as in 3dup.1/foo.1, VFX produces
instructions (PUSH for >R, MOV ..., [ESP] for R@ and POP for R>) for
the return-stack operations. When locals are used, VFX actually
disables inlining and just calls 3DUP.3.
Other Forth systems make too little use of inlining, and I have to
resort to macros to simulate it. We cannot use proper macros for
3dup.3 (the locals-using variant), so I used EVALUATE-based macros;
this is just for experimental use, not for production, don't do this
at home:-)
Let's see what VFX64 produces for FOO.3 with this:
FOO.3
( 080C0C50 8BD4 ) MOV EDX, ESP
( 080C0C52 FF7504 ) PUSH [EBP+04]
( 080C0C55 FF7500 ) PUSH [EBP]
( 080C0C58 53 ) PUSH EBX
( 080C0C59 52 ) PUSH EDX
( 080C0C5A 57 ) PUSH EDI
( 080C0C5B 8BFC ) MOV EDI, ESP
( 080C0C5D 81EC00000000 ) SUB ESP, 00000000
( 080C0C63 8B5D08 ) MOV EBX, [EBP+08]
( 080C0C66 8D6D0C ) LEA EBP, [EBP+0C]
( 080C0C69 8B5708 ) MOV EDX, [EDI+08]
( 080C0C6C 03570C ) ADD EDX, [EDI+0C]
( 080C0C6F 8B4F08 ) MOV ECX, [EDI+08]
( 080C0C72 8B470C ) MOV EAX, [EDI+0C]
( 080C0C75 8D6DEC ) LEA EBP, [EBP+-14]
( 080C0C78 894D04 ) MOV [EBP+04], ECX
( 080C0C7B 894508 ) MOV [EBP+08], EAX
( 080C0C7E 8B4F10 ) MOV ECX, [EDI+10]
( 080C0C81 894D0C ) MOV [EBP+0C], ECX
( 080C0C84 895D10 ) MOV [EBP+10], EBX
( 080C0C87 8BDA ) MOV EBX, EDX
( 080C0C89 8B5710 ) MOV EDX, [EDI+10]
( 080C0C8C 895500 ) MOV [EBP], EDX
( 080C0C8F 8B5500 ) MOV EDX, [EBP]
( 080C0C92 8913 ) MOV 0 [EBX], EDX
( 080C0C94 8B5D04 ) MOV EBX, [EBP+04]
( 080C0C97 8D6D08 ) LEA EBP, [EBP+08]
( 080C0C9A 8B6704 ) MOV ESP, [EDI+04]
( 080C0C9D 8B3F ) MOV EDI, 0 [EDI]
( 080C0C9F C3 ) NEXT,
So inlining did not mend that.
Here's what lxf produces:
foo.1 foo.2 foo.3 foo.4
mov eax , ebx mov eax , ebx mov eax , ebx mov eax , ebx
add eax , [ebp] add eax , [ebp] add eax , [ebp] add eax , [ebp]
mov ecx , [ebp+4h] mov ecx , [ebp+4h] mov ecx , [ebp+4h] mov ecx , [ebp+4h]
mov [eax] , ecx mov [eax] , ecx mov [eax] , ecx mov [eax] , ecx
ret near ret near ret near ret near
So, because lxf is analytical about the return stack (and, through
that, about locals), inlining produces the same very good code in all
these cases.
You may notice that lxf produces a register-register move less than
VFX does for FOO.2/FOO.4. That's because VFX decided to modify the
TOS register (and has to restore it later), whereas lxf decided to
modify a copy of that register. One would have to make additional
observations to determine if lxf was just lucky here or if it
consistently makes the right decision in such cases.
And here's the code that gforth-fast (which does not have an
analytical optimizer) produces:
foo.1 foo.2 foo.3 foo.4
>r 1->0 third 1->1 >l >l 1->1 dup 1->1
mov -8[r14],r13 mov [r10],r13 >l 1->1 mov [r10],r13
sub r14,$08 sub r10,$08 mov -$08[rbp],r13 sub r10,$08
2dup 0->2 mov r13,$18[r10] mov rdx,$08[r10] 2over 1->3
mov r13,$10[r10] third 1->2 mov rax,rbp mov r15,$18[r10]
mov r15,$08[r10] mov r15,$10[r10] add r10,$10 mov r9,$10[r10]
i 2->3 third 2->3 lea rbp,-$10[rbp] rot 3->3
mov r9,[r14] mov r9,$08[r10] mov -$10[rax],rdx mov rax,r13
-rot 3->2 + 3->2 mov r13,[r10] mov r13,r15
mov [r10],r9 add r15,r9 >l @local0 1->1 mov r15,r9
sub r10,$08 ! 2->0 @local0 1->1 mov r9,rax
r> 2->3 mov [r15],r13 mov rax,rbp + 3->2
mov r9,[r14] ;s 0->1 lea rbp,-$08[rbp] add r15,r9
add r14,$08 mov r13,$08[r10] mov -$08[rax],r13 ! 2->0
+ 3->2 add r10,$08 @local1 1->2 mov [r15],r13
add r15,r9 mov rbx,[r14] mov r15,$08[rbp] ;s 0->1
! 2->0 add r14,$08 @local2 2->3 mov r13,$08[r10]
mov [r15],r13 mov rax,[rbx] mov r9,$10[rbp] add r10,$08
;s 0->1 jmp eax @local0 3->1 mov rbx,[r14]
mov r13,$08[r10] mov -$10[r10],r9 add r14,$08
add r10,$08 sub r10,$18 mov rax,[rbx]
mov rbx,[r14] mov $10[r10],r15 jmp eax
add r14,$08 mov $18[r10],r13
mov rax,[rbx] mov r13,$00[rbp]
jmp eax @local1 1->2
mov r15,$08[rbp]
@local2 2->3
mov r9,$10[rbp]
+ 3->2
add r15,r9
! 2->0
mov [r15],r13
lit 0->1
#24
mov r13,$60[rbx]
lp+! 1->1
add r10,$08
add rbp,r13
mov r13,[r10]
;s 1->1
mov rbx,[r14]
add r14,$08
mov rax,[rbx]
jmp eax
Here inlining helps a little, but the disadvantages of the approach
are still obvious. With less optimization (e.g., no stack caching),
inlining would have helped even less.
And while we are at it, here's SwiftForth:
foo.1 foo.2 foo.3 foo.4
RBX PUSH -8 [RBP] RBP LEA -8 [RBP] RBP LEA -8 [RBP] RBP LEA
0 [RBP] RBX MOV RBX 0 [RBP] MOV RBX 0 [RBP] MOV RBX 0 [RBP] MOV
8 [RBP] RBP LEA 10 [RBP] RBX MOV 18 # EBX MOV -10 [RBP] RBP LEA
-10 [RBP] RBP LEA -8 [RBP] RBP LEA LSPACE CALL RBX 8 [RBP] MOV
RBX 8 [RBP] MOV RBX 0 [RBP] MOV RBX 10 [R13] MOV 20 [RBP] RAX MOV
10 [RBP] RAX MOV 10 [RBP] RBX MOV 0 [RBP] RBX MOV RAX 0 [RBP] MOV
RAX 0 [RBP] MOV -8 [RBP] RBP LEA 8 [RBP] RBP LEA 18 [RBP] RBX MOV
-8 [RBP] RBP LEA RBX 0 [RBP] MOV RBX 8 [R13] MOV RBX RCX MOV
RBX 0 [RBP] MOV 10 [RBP] RBX MOV 0 [RBP] RBX MOV 8 [RBP] RBX MOV
0 [RSP] RBX MOV 0 [RBP] RAX MOV 8 [RBP] RBP LEA 0 [RBP] RAX MOV
RBX RCX MOV 8 [RBP] RCX MOV RBX 0 [R13] MOV RAX 8 [RBP] MOV
8 [RBP] RAX MOV RCX 0 [RBX] [RAX] MOV 0 [RBP] RBX MOV RCX 0 [RBP] MOV
0 [RBP] RBX MOV 10 [RBP] RBX MOV 8 [RBP] RBP LEA 0 [RBP] RAX MOV
RAX 0 [RBP] MOV 18 [RBP] RBP LEA -8 [RBP] RBP LEA 8 [RBP] RCX MOV
RCX 8 [RBP] MOV RET RBX 0 [RBP] MOV RCX 0 [RBX] [RAX] MOV
RAX POP 0 [R13] RBX MOV 10 [RBP] RBX MOV
RAX RBX ADD -8 [RBP] RBP LEA 18 [RBP] RBP LEA
0 [RBP] RAX MOV RBX 0 [RBP] MOV RET
RAX 0 [RBX] MOV 8 [R13] RBX MOV
8 [RBP] RBX MOV -8 [RBP] RBP LEA
10 [RBP] RBP LEA RBX 0 [RBP] MOV
RET 10 [R13] RBX MOV
-8 [RBP] RBP LEA
RBX 0 [RBP] MOV
0 [R13] RBX MOV
-8 [RBP] RBP LEA
RBX 0 [RBP] MOV
8 [R13] RBX MOV
-8 [RBP] RBP LEA
RBX 0 [RBP] MOV
10 [R13] RBX MOV
0 [RBP] RAX MOV
8 [RBP] RCX MOV
RCX 0 [RBX] [RAX] MOV
10 [RBP] RBX MOV
18 [RBP] RBP LEA
RET
And finally, iForth:
foo.1/foo.4 foo.2 foo.3
pop rbx mov rbx, [rsp #16 +] qword pop rbx
pop rdi mov rcx, rbx lea rsi, [rsi #-16 +] qword
pop rax mov rbx, [rsp 8 +] qword mov [esi] dword, rbx
mov [edi ebx*1]dword,rax push rcx pop rbx
push rax mov rcx, rbx lea rsi, [rsi #-16 +] qword
push rdi mov rbx, [rsp 8 +] qword mov [esi] dword, rbx
push rbx pop rdi pop rbx
; mov [ecx ebx*1] dword, rdi lea rsi, [rsi #-16 +] qword
; mov [esi] dword, rbx
mov rbx, [rsi #16 +] qword
add rbx, [rsi #32 +] qword
mov rdi, [rsi] qword
mov rax, [rsi #16 +] qword
mov rdx, [rsi #32 +] qword
mov [ebx] dword, rdi
push rdi
push rax
push rdx
add rsi, #48 b#
;
It's interesting that Gforth produced the same code for FOO.1 and
FOO.4, but different code for FOO.2. Both variants are suboptimal
IMO, because the contain unnecessary pushes.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html
EuroForth 2025 registration: https://euro.theforth.net/
Back to comp.lang.forth | Previous | Next — Previous in thread | Next in thread | Find similar
Generating a random sequence of Forth words anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-09-30 16:33 +0000
Re: Generating a random sequence of Forth words minforth <minforth@gmx.net> - 2025-10-01 11:20 +0200
Re: Generating a random sequence of Forth words anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-10-01 17:10 +0000
Re: Generating a random sequence of Forth words Hans Bezemer <the.beez.speaks@gmail.com> - 2025-10-01 17:11 +0200
Re: Generating a random sequence of Forth words minforth <minforth@gmx.net> - 2025-10-01 20:42 +0200
Re: Generating a random sequence of Forth words dxf <dxforth@gmail.com> - 2025-10-02 19:49 +1000
Re: Generating a random sequence of Forth words albert@spenarnc.xs4all.nl - 2025-10-02 13:07 +0200
Re: Generating a random sequence of Forth words dxf <dxforth@gmail.com> - 2025-10-03 18:22 +1000
3dup again (was: Generating a random sequence of Forth words) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-10-02 20:44 +0000
Re: 3dup again (was: Generating a random sequence of Forth words) albert@spenarnc.xs4all.nl - 2025-10-03 11:02 +0200
Re: 3dup again minforth <minforth@gmx.net> - 2025-10-03 11:09 +0200
Re: 3dup again anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-10-04 08:04 +0000
Re: 3dup again Hans Bezemer <the.beez.speaks@gmail.com> - 2025-10-05 11:29 +0200
Re: Generating a random sequence of Forth words antispam@fricas.org (Waldek Hebisch) - 2025-10-15 19:19 +0000
Re: Generating a random sequence of Forth words anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2025-10-24 15:55 +0000
csiph-web