Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #135039

Re: ?DUP

From Hans Bezemer <the.beez.speaks@gmail.com>
Newsgroups comp.lang.forth
Subject Re: ?DUP
Date 2026-05-01 13:05 +0200
Organization A noiseless patient Spider
Message-ID <10t21da$1c6cc$1@dont-email.me> (permalink)
References <2026May1.084125@mips.complang.tuwien.ac.at>

Show all headers | View raw


On 01-05-2026 08:41, Anton Ertl wrote:
> While looking at the strcmp code, I saw that the code that used ?DUP
> was large, and replaced it with code using DUP and, in the 0 branch,
> DROP.  So now I look closely at the code produced for ?DUP and the
> alternatives.
> 
> Since the beginning Gforth has provided ?DUP-IF and ?DUP-0=-IF, which
> have the advantage that the value is not tested twice (once in ?DUP
> and once in IF), and there are not two branches.  One might imagine
> that this produces the optimal code, but see below.
> 
> Other systems may or may not combine the ?DUP and the IF by themselves
> to avoid the two branches mentioned above.
> 
> Here are three simple words that all do the same thing:
> 
> : ?dup-test1 ( n1 -- n2 )
>      ?dup if exit then 1 ;
> 
> : ?dup-test2 ( n1 -- n2 )
>      dup if exit then drop 1 ;
> 
> [defined] ?dup-if [if]
> : ?dup-test3 ( n1 -- n2 )
>      ?dup-if exit then 1 ;
> [then]
> 
> 
> Here's the code that gforth-fast produces for them (the first line
> shows only the code that is different):
> 
> ?dup if exit then       dup if exit then drop   ?dup-if exit then
> ?dup    1->1            dup    1->2             ?dup-?branch    1->1
>   add    $0x8,%r10        mov    %r13,%r15       <?dup-test2+$18>
>   test   %r13,%r13       ?branch    2->1          add    $0x10,%rbx
>   je     0x7f6ca1271b18  <?dup-test3+$20>         mov    -0x8(%rbx),%rax
>   mov    %r13,-0x8(%r10)  add    $0x20,%rbx       add    $0x8,%r10
>   sub    $0x8,%r10        mov    (%rbx),%rax      test   %r13,%r13
>   sub    $0x8,%r10        test   %r15,%r15        je     x
> ?branch    1->1          jne    ;s               mov    %r13,-0x8(%r10)
> <?dup-test1+$20>         jmp    *%rax            mov    %rbx,%rax
>   add    $0x20,%rbx      ;s    1->1               sub    $0x8,%r10
>   add    $0x8,%r10        mov    (%r14),%rbx    x:mov    (%r10),%r13
>   test   %r13,%r13        add    $0x8,%r14        mov    %rax,%rbx
>   mov    (%rbx),%rax      mov    (%rbx),%rax      mov    (%rbx),%rax
>   mov    (%r10),%r13      jmp    *%rax            jmp    *%rax
>   jne    ;s              drop    1->0            ;s    1->1
>   jmp    *%rax           lit    0->1              mov    (%r14),%rbx
> ;s    1->1              #1                       add    $0x8,%r14
>   mov    (%r14),%rbx      mov    0x10(%rbx),%r13  mov    (%rbx),%rax
>   add    $0x8,%r14       ;s    1->1               jmp    *%rax
>   mov    (%rbx),%rax      mov    (%r14),%rbx     lit    1->1
>   jmp    *%rax            add    $0x8,%r14       #1
> lit    1->1              mov    (%rbx),%rax      mov    %r13,(%r10)
> #1                       jmp    *%rax            sub    $0x8,%r10
>   mov    %r13,(%r10)                              mov    0x8(%rbx),%r13
>   sub    $0x8,%r10                               ;s    1->1
>   mov    0x8(%rbx),%r13                           mov    (%r14),%rbx
> ;s    1->1                                       add    $0x8,%r14
>   mov    (%r14),%rbx                              mov    (%rbx),%rax
>   add    $0x8,%r14                                jmp    *%rax
>   mov    (%rbx),%rax
>   jmp    *%rax
> 
> So in the left column ?DUP checks the TOS and performs one branch, and
> ?BRANCH checks and branches again.  The taken jne is the fall-through
> path on the Forth level, the Forth-level taken branch is performed by
> the "jmp *rax".  All control-flow in Gforth works in the
> direct-threaded code way, as seen in ?BRANCH and ;S.
> 
> In the middle column we see nicely that the DUP is compiled to only
> one instruction (and ?BRANCH) becomes shorter than in the left
> column. DROP is compiled to 0 instructions (and it makes the following
> LIT cheaper, compare with the LIT in the left and the right column).
> All of this is due to multi-state stack caching.
> 
> The right column drops TOS when TOS=0 (and restores the canonical
> state that TOS is in %r13), and ?DUP-?BRANCH uses the threaded-code
> dispatch on both outcomes.  The latter could be fixed, but the code
> would probably not become shorter.
> 
> Overall the middle-column approach shines.  This is a best case for
> this approach, but I think that it would still be at least on par for
> other cases.
> 
> Currently there are 60 instances of ?DUP-IF and one instance of
> ?DUP-0=-IF in Gforth.  It may be a good idea to replace them whith DUP
> IF ... THEN DROP.  Another day maybe.
> 
> Let's see how things look for other systems:
> 
> iForth 5.1-mini:
>   ?dup if exit then   dup if exit then drop
>    pop  rbx             pop  rbx
>    or   rbx, rbx        cmp  rbx, 0 b#
>    je   x               je   y
>    push rbx             push rbx
> x:cmp  rbx, 0 b#       jmp  z
>    je   y               pop  rbx
>    jmp  z             y:push 1 b#
> y:push 1 b#          z:;
> z:;
> 
> lxf:
>   ?dup if exit then         dup if exit then drop
>    or      ebx , ebx         cmp     ebx , # 0h
>    je      "0804FBEE"        je      "0804FC14"
>    mov     [ebp-4h] , ebx    ret     near
>    lea     ebp , [ebp-4h]  x:mov     ebx , # 1h
>    cmp     ebx , # 0h        ret     near
>    mov     ebx , [ebp]
>    lea     ebp , [ebp+4h]
>    je      x
>    ret     near
> x:mov     [ebp-4h] , ebx
>    mov     ebx , # 1h
>    lea     ebp , [ebp-4h]
>    ret     near
> 
> SwiftForth 4.0.0-RC89
>   ?dup if exit then         dup if exit then drop
>    RBX RBX OR                 RBX RBX OR
>    x JNZ                      x JZ
>    0 [RBP] RBX MOV            RET
>    8 [RBP] RBP LEA          x:0 [RBP] RBX MOV
>    y JMP                      8 [RBP] RBP LEA
> x:RET                        -8 [RBP] RBP LEA
> y:-8 [RBP] RBP LEA           RBX 0 [RBP] MOV
>    RBX 0 [RBP] MOV            1 # EBX MOV
>    1 # EBX MOV                RET
>    RET
> 
> SwiftForth peephole-optimizes ?DUP IF into ?DUP-IF with the rule:
> 
> OPTIMIZE ?DUP (IF)     SUBSTITUTE ?DUP-IF
> 
> (but, interestingly, "[defined] ?DUP-IF" produces 0 (while "locate
> ?DUP-IF" shows the source code).  It does not know how to optimize
> "DROP 1", but the result of the right-hand approach is still better.
> 
> vfx64 5.43
>   ?dup if exit then         dup if exit then drop
>    TEST    RBX, RBX           TEST    RBX, RBX
>    JNZ/NE  x                  JZ/E    x
>    MOV     RBX, [RBP]         RET/NEXT
>    LEA     RBP, [RBP+08]    x:MOV     EBX, # 00000001
>    JMP     y                  RET/NEXT
> x:RET/NEXT
> y:LEA     RBP, [RBP+-08]
>    MOV     [RBP], RBX
>    MOV     EBX, # 00000001
>    RET/NEXT
> 
> VFX also seems to optimize ?DUP IF to avoid the double checking, but
> the result is still worse than for the approach in the second column.
> 
> Bottom line: On all of these systems, DUP IF ... THEN DROP produces
> shorter code than ?DUP IF ... THEN or (if present) ?DUP-IF ... THEN.
> ?DUP and ?DUP-IF may be good ideas for traditional threaded-code
> systems, but produce larger code on these more sophisticated systems.
> 
> Of course, with additional sophistication, a compiler could produce
> the same code for ?DUP IF ... THEN as for DUP IF ... THEN DROP, but
> for now we are not there, and is ?DUP IF used often enough to add such
> sophistication?  In SwiftForth's source code there are 100 occurences
> of ?DUP, and 57 of ?DUP IF.
> 
> - anton

Frankly - I don't implement ?DUP. For the same reason I detest >FLOAT.
Words should *ALWAYS* have one - and only one stack diagram. It's 
already hard enough to follow the stack flow - let alone if you 
introduce conditional stack diagrams.

I love speed as much as the next guy - unless it destroys the clarity of 
the code.

Consequently, I always use DUP IF .. ELSE DROP .. THEN, being assured 
that the same stack diagram is provided by both branches.

In the roughly 400 example programs in the 4tH repository, ?DUP appears 
exactly 4 times - in programs I ported from Forth as verbatim as I could.

In the about 650 libraries of 4tH, ?DUP appears exactly *ONE* time - in 
the library where it is defined. And it shares that space with PICK and 
ROLL, a few other notorious citizens of NO-NO land.

Personally, I see where you're coming from with ?DUP-TEST - I just don't 
recognize the pattern, but I do like that the stack diagram remains the 
same.

However, I do enjoy that compilers seem to agree with the practice I've 
exercised for roughly the last 30 years ;-)

Hans Bezemer

Back to comp.lang.forth | Previous | NextPrevious in thread | Find similar


Thread

?DUP anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-05-01 06:41 +0000
  Re: ?DUP anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-05-01 10:36 +0000
  Re: ?DUP Hans Bezemer <the.beez.speaks@gmail.com> - 2026-05-01 13:05 +0200

csiph-web