Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.forth > #134509 > unrolled thread

EuroForth 2025 preliminary proceedings

Started bydxf <dxforth@gmail.com>
First post2026-01-15 17:41 +1100
Last post2026-01-20 22:17 +0000
Articles 10 on this page of 30 — 7 participants

Back to article view | Back to comp.lang.forth


Contents

  EuroForth 2025 preliminary proceedings dxf <dxforth@gmail.com> - 2026-01-15 17:41 +1100
    Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-15 12:04 +0000
      Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-16 15:25 +0100
        Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-16 17:38 +0000
          Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-22 16:51 +0100
            C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 11:28 +0000
              Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-24 16:47 +0000
                Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-25 23:31 +0100
                  Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-26 19:24 +0000
                    Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) peter <peter.noreply@tin.it> - 2026-01-27 15:44 +0100
                      Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-29 18:27 +0000
                        Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) albert@spenarnc.xs4all.nl - 2026-01-30 13:20 +0100
                          Re: C compiler optimization and Forth engines (was: EuroForth 2025 ...) anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-30 18:00 +0000
        Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-16 23:10 -0800
          Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-17 16:58 +0100
            Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-17 20:21 -0800
              Re: EuroForth 2025 preliminary proceedings Hans Bezemer <the.beez.speaks@gmail.com> - 2026-01-18 15:26 +0100
            Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-18 22:17 +0000
          Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-18 16:34 +0100
            Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:35 -0800
              Re: EuroForth 2025 preliminary proceedings albert@spenarnc.xs4all.nl - 2026-01-20 12:12 +0100
              Coroutines in Forth Gerry Jackson <do-not-use@swldwa.uk> - 2026-04-02 20:59 +0100
                Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 18:02 -0700
                  Re: Coroutines in Forth Paul Rubin <no.email@nospam.invalid> - 2026-04-04 21:21 -0700
          Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-19 23:26 +0100
            Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-19 15:22 -0800
              Re: EuroForth 2025 preliminary proceedings peter <peter.noreply@tin.it> - 2026-01-20 10:44 +0100
              Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:36 +0000
            Re: EuroForth 2025 preliminary proceedings Paul Rubin <no.email@nospam.invalid> - 2026-01-20 00:33 -0800
            Re: EuroForth 2025 preliminary proceedings anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-01-20 22:17 +0000

Page 2 of 2 — ← Prev page 1 [2]


#134525

Fromalbert@spenarnc.xs4all.nl
Date2026-01-20 12:12 +0100
Message-ID<nnd$22c04faa$70acf6bc@a6aa6b616b7ee345>
In reply to#134523
In article <87bjioptpk.fsf@nightsong.com>,
Paul Rubin  <no.email@nospam.invalid> wrote:
>albert@spenarnc.xs4all.nl writes:
>> If you pass an address a as a tail call is it approximately equal
>> to coroutines:
>
>No I don't think so.  The tail call is just a jump to that address
>(changes the program counter).  A coroutine jump also has to change the
>stack pointer.  See the section "Knuth's coroutines" here:

Which stack pointer do you mean? The data stack pointer or the
return stack pointer?
Where is the program to continue after performing the tail call?
Probably the same if the tail call was not present.

Pushing a address on the return stack, then continue interpreting
is tantamount to a jump.

>
>https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
>
>Some Forths have a CO primitive that I think is similar.  There is
>something like it on the Greenarrays processor.

You can see the CO primitive used in the example. The CO name is
original by me, Chuck Moore uses ;: , not implying they are exactly
the same.

Groetjes Albert
-- 
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

[toc] | [prev] | [next] | [standalone]


#134856 — Coroutines in Forth

FromGerry Jackson <do-not-use@swldwa.uk>
Date2026-04-02 20:59 +0100
SubjectCoroutines in Forth
Message-ID<10qmhrc$1csud$1@dont-email.me>
In reply to#134523
On 20/01/2026 08:35, Paul Rubin wrote:
> albert@spenarnc.xs4all.nl writes:
>> If you pass an address a as a tail call is it approximately equal
>> to coroutines:
> 
> No I don't think so.  The tail call is just a jump to that address
> (changes the program counter).  A coroutine jump also has to change the
> stack pointer.  See the section "Knuth's coroutines" here:
> 
> https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
> 
> Some Forths have a CO primitive that I think is similar.  There is
> something like it on the Greenarrays processor.

Never having looked at coroutines before I tried implementing the 
examples in the above paper. The resulting code is:

-1 constant EOF

: get-char  ( ca u -- ca' u' [ch | EOF] )
    ?dup 0= if drop EOF exit then
    over c@ >r 1 /string r>
;

: is-alpha ( ch -- f )  'a' 'z' 1+ within  ; \ adequate for testing

create token 32 chars allot
2variable tok

: init-token  ( ca u ) token 0 tok 2!  ;

: add-to-token  ( ch -- )  tok 2@ + c! 1 tok +! ;

: got-token  ( type -- )
    cr if ." Word: " else ." Punct: " then
    tok 2@ type
   init-token
;

: ?got-token  ( ca u ) \ To handle tokens left in the buffer e.g. on EOF
    tok @ if 1 got-token then \ Only needed for WORD tokens
;

\ The consumer
: parser  ( ch -- )
    dup is-alpha
    if
       begin
          add-to-token exit
[:] parser.alpha   ( ch2 )  \ Re-enter here
           dup is-alpha 0=
       until
    then
   ?got-token
    add-to-token
    0 got-token       ( Punctuation token )
    exit
[:] parser.end
    ?got-token
;

\ The producer
: decompressor  ( ca u -- )
    init-token
    begin
       get-char                      ( -- ca'u' ch ) \ or ( -- EOF )
       dup EOF =
       if drop parser.end cr ." End of file" cr exit then
       dup
    while
      dup $FF =
      if drop get-char >r get-char   ( -- ca' u' ch ) ( R: -- len )
         dup parser r> 1
         ?do  dup parser.alpha loop
         drop
       else
          parser
       then
    repeat
;

init-token
s\" \xFF\x03abc%\xFF\x02-&xyz@qwerty" decompressor .s
\ Displays
Word: aaabc
Punct: %
Punct: -
Punct: -
Punct: &
Word: xyz
Punct: @
Word: qwerty
End of file

Switching between the routines is achieved by calls to different entry 
points in the Parser, and by EXIT or ; back to the decompressor. This is 
not Standard Forth and uses an extension that I implemented over 10 
years ago, tested but never used. I cannot remember my motivation for 
doing so perhaps somebody mentioned it as an idea. I'd be pleased if 
somebody could point to any previous use of this technique.

This extension is a word called [:] that behaves like : and can only be 
used in the middle of a standard colon definition. Usage e.g.
    : foo ... [:] bar ... [:] baz ... ;
to create named entry points into colon definition FOO.
Such an entry definition compiles no executable code of its own and 
execution of the enclosing definition simply executes the body of the 
definition as if the alternative entry points weren't there e.g.
    : foo 1 . [:] bar 2 . [:] baz 3 . ;
    foo 1 2 3
executing the alternative entry points executes the code following 
definition of the entry point e.g.
    bar 2 3
    baz 3
Using BAR etc in another colon definition calls that entry point just 
like any standard colon definition to run the code following that entry 
point.
An entry point does not use the control stack in any way, therefore it 
can be positioned inside any control structure.
An entry point becomes visible immediately its definition is complete, 
so it can be called from anywhere in the rest of the enclosing definition.
As a pseudo colon definition an entry point has an execution token that 
can be obtained or used by the usual set of words such as ' POSTPONE etc.
Execution of code following an entry point can be terminated at any time 
using EXIT

In the example above in addition to PARSER there are two other entry 
points PARSER.ALPHA (in the middle of a loop) and PARSER.END

There are other uses for [:] entry points:
1. Coroutines (see example above)

2. Recursion by name e.g.
    : foo ... [:] bar ... bar ...;  \ recurses to BAR or
    : foo [foo] ... foo ... ;
    The second example can be achieved by using SYNONYM
    e.g. SYNONYM FOO RECURSE but the first can include initialisation

3. Obtain the xt of the current definition, which has been requested a 
few times on c.l.f e.g.
    : x [:] my-xt ['] my-xt ... ;  \ Removes the need for LATEST

4. Generators akin to those of Python with next e.g.
    variable n : foo n ! exit [:] foo.next n @ 1 n +! ;
    1 foo
    foo.next   ( -- 1 )
    foo.next   ( -- 2 )

5. Debugging by inserting entry points at which a definition can be run 
with known test data on the stack

6. Nested colon definitions e.g. a possible use - unsure of its utility.
    : a 1 . [: [:] b 2 . 3 . ;] drop 4 . ;
    b 2 3
    b is effectively a nested colon definition - forbidden by the standard

-- 
Gerry

[toc] | [prev] | [next] | [standalone]


#134866 — Re: Coroutines in Forth

FromPaul Rubin <no.email@nospam.invalid>
Date2026-04-04 18:02 -0700
SubjectRe: Coroutines in Forth
Message-ID<87cy0eb4od.fsf@nightsong.com>
In reply to#134856
Gerry Jackson <do-not-use@swldwa.uk> writes:
> Never having looked at coroutines before I tried implementing the
> examples in the above paper. The resulting code is: ...

It would take me a while to understand that, but [:] is cool (I haven't
seen it before), and it lets you do something similar to protothreads.

For coroutines in general you should also read this:

https://doi.org/10.1145/1462166.1462167

It discusses "stackful" coroutines where each coroutine has a separate
call stack that is preserved across coroutine jumps.  It's similar to
Forth's cooperative multitasking.

[toc] | [prev] | [next] | [standalone]


#134868 — Re: Coroutines in Forth

FromPaul Rubin <no.email@nospam.invalid>
Date2026-04-04 21:21 -0700
SubjectRe: Coroutines in Forth
Message-ID<878qb2avga.fsf@nightsong.com>
In reply to#134866
Paul Rubin <no.email@nospam.invalid> writes:
> https://doi.org/10.1145/1462166.1462167

Ehh, that article is very theoretical.  I had forgotten what it was
like, or maybe confused it with a different article.  The below is
probably more readable:

https://www.lua.org/doc/jucs04.pdf

[toc] | [prev] | [next] | [standalone]


#134520

Frompeter <peter.noreply@tin.it>
Date2026-01-19 23:26 +0100
Message-ID<20260119232635.00007bcd@tin.it>
In reply to#134513
On Fri, 16 Jan 2026 23:10:24 -0800
Paul Rubin <no.email@nospam.invalid> wrote:

> Hans Bezemer <the.beez.speaks@gmail.com> writes:
> > 5. I added GCC extension support to 4tH in version 3.62.0. At the
> > time, it improved performance by about 25%. By accident I found out
> > that was no longer true. switch() based was faster. I didn't know
> > there had been changes in that regard to GCC.
> 
> If you mean the goto *a feature, these days you might try using tail
> calls instead.  GCC and LLVM both now support a musttail attribute that
> ensures this optimization, or signals a compile-time error if it can't.
> 
> https://lwn.net/Articles/1033373/

I got interested to understand how tail calls could improve compared 
to computed gotos. So I took the five first "opcodes" from the VM in 
NTF64/LXF64 to compared the generated asm.
The VM was written from the begining in X64 assembler (13 years ago)
4 years ago I also implemented the VM i C to simplify porting to ARM64.
At that time the asm version was about 10% faster then the generated 
C code, today the speed is about the same. C compilers have improved.
It was implemented using computed gotos, usingthe following macro
as the nesting code ending each "opcode"

#define RELOAD()  code=*ip++; goto *jmp_table[code]	

for the tail call version it was changed to

RELOAD() opcode func=(opcode)tbl[*ip++]; __attribute__((musttail)) 
                 return func(ip, tbl, TOP, FTOP, sp, rp, fp, lp)

(line brooken to be readable)

The noop "opcode has just the nesting and produces the following code

	movzx	r9d, byte ptr [rcx]
	inc	rcx
	jmp	qword ptr [rax + 8*r9]

and for the tailcall version

	movzx	eax, byte ptr [r12]
	inc	r12
	mov	rax, qword ptr [r13 + 8*rax]
	rex64 jmp	rax   

both compiled with 
clang -S -Wall -O2 -masm=intel -o vm8test3.asm vm8tail.c       

As I suspected the code is practically identical!

It also turns out that the musttail attribute is not necessary
It will generate a tailcall aanyway. The difference is that with
musttail it will report an error if it cannot do the tailcall.

Much more important is the __attribute__((preserve_none)) before
each function. This indicated that more registers will be used to pass
parameters. As seen above I pass 8 parameters to each function and
they need to be in registers to match the asmbler written code.
This is done automatically in the goto version as everything is in
one function there. 

In the end it is more how you like to write your VM, as one function
or one for each "opcode".

Unfortunately GCC does not recognize preserve_none and uses the stack
for some parameters

Here is my test code

// VM8 C variant using computed goto

#include <stdint.h>

#define UNS8  unsigned char
#define INT64 long long int
#define UNS64 unsigned long long int

#define RELOAD()  code=*ip++; goto *jmp_table[code]	

void VM8(UNS8 *ip, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp ) {

const static void* jmp_table[] = {	
	&&noop,
	&&swap,
	&&rot,
	&&eqzero,
	&&negate,
};
    
	UNS8 code=*ip;
	UNS64 tmp;
	UNS64 TOP=*sp++;
//	double FTOP=*fp++;

	RELOAD();
	

	noop: 			// do nothing
		RELOAD();
	swap: 			//  swap
        tmp=sp[0];
		sp[0]=TOP;
		TOP=tmp;
		RELOAD();
	rot: 			//  rot
		tmp=TOP;
		TOP=sp[1];
		sp[1]=sp[0];
		sp[0]=tmp;
		RELOAD();
	eqzero: 		//  0=
		TOP=-(TOP==0);
		RELOAD();
	negate:  		// negate
		TOP=-TOP;
		RELOAD();
	
	
} //vm8


And here is the tail call version. Sorry for the long lines!

// VM8 C variant using tailcalls

#include <stdint.h>

#define UNS8  unsigned char
#define INT64 long long int
#define UNS64 unsigned long long int


typedef  __attribute__((preserve_none)) void (*opcode) (UNS8*, UNS64*, UNS64, double, UNS64*, UNS64*, double*, UNS64*); 

#define RELOAD() opcode func=(opcode)tbl[*ip++]; __attribute__((musttail)) return func(ip, tbl, TOP, FTOP, sp, rp, fp, lp)	

#define FUNC   __attribute__((preserve_none)) void

FUNC	noop(UNS8 *ip, UNS64 *tbl, UNS64 TOP, double FTOP, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp )  			// do nothing
        {
		RELOAD();
        }
        
FUNC	swap(UNS8 *ip, UNS64 *tbl, UNS64 TOP, double FTOP, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp )  			//  swap
		{UNS64 tmp;
        tmp=sp[0];
		sp[0]=TOP;
		TOP=tmp;
		RELOAD();}
        
FUNC	rot(UNS8 *ip, UNS64 *tbl, UNS64 TOP, double FTOP, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp )  			//  rot
		{UNS64 tmp=TOP;
		TOP=sp[1];
		sp[1]=sp[0];
		sp[0]=tmp;
		RELOAD();}

FUNC	eqzero(UNS8 *ip, UNS64 *tbl, UNS64 TOP, double FTOP, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp )  		//  0=
		{TOP=-(TOP==0);
		RELOAD();}

FUNC	negate(UNS8 *ip, UNS64 *tbl, UNS64 TOP, double FTOP, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp )   		// negate
		{TOP=-TOP;
		RELOAD();}

opcode jmp_table[]={	
	noop,
	swap,
	rot,
	eqzero,
	negate,
};
        

        
void VM8(UNS8 *ip, UNS64 *sp, UNS64 *rp, double *fp, UNS64 *lp ) {


    UNS64 *tbl=(UNS64*)&jmp_table;
    UNS64 TOP=*sp++;
    double FTOP=*fp++;
        

    opcode func=(opcode)tbl[*ip++]; 
    func( ip, tbl, TOP, FTOP, sp, rp, fp, lp);

}	
	
 //vm8

BR
Peter



[toc] | [prev] | [next] | [standalone]


#134521

FromPaul Rubin <no.email@nospam.invalid>
Date2026-01-19 15:22 -0800
Message-ID<87o6mpp4rk.fsf@nightsong.com>
In reply to#134520
peter <peter.noreply@tin.it> writes:
> for the tail call version it was changed to
>
> RELOAD() opcode func=(opcode)tbl[*ip++]; __attribute__((musttail)) 
>                  return func(ip, tbl, TOP, FTOP, sp, rp, fp, lp)


You could possibly use "inline RELOAD() { .... ;}" instead of the macro.

> and for the tailcall version
> 	mov	rax, qword ptr [r13 + 8*rax]
> 	rex64 jmp	rax   

I wonder why the tailcall version didn't combine the mov with the jmp
like the other version did.

> It also turns out that the musttail attribute is not necessary
> It will generate a tailcall aanyway. The difference is that with
> musttail it will report an error if it cannot do the tailcall.

Yes, TCO has been present since the beginning but it's been
opportunistic rather than something you can rely on.  

> Unfortunately GCC does not recognize preserve_none and uses the stack
> for some parameters

Oh that's interesting.  I half remember there being some other feature
for that, but who knows.  Does -fwhole-program help?

[toc] | [prev] | [next] | [standalone]


#134524

Frompeter <peter.noreply@tin.it>
Date2026-01-20 10:44 +0100
Message-ID<20260120104440.00000ab0@tin.it>
In reply to#134521
On Mon, 19 Jan 2026 15:22:07 -0800
Paul Rubin <no.email@nospam.invalid> wrote:

> peter <peter.noreply@tin.it> writes:
> > for the tail call version it was changed to
> >
> > RELOAD() opcode func=(opcode)tbl[*ip++]; __attribute__((musttail)) 
> >                  return func(ip, tbl, TOP, FTOP, sp, rp, fp, lp)
> 
> 
> You could possibly use "inline RELOAD() { .... ;}" instead of the macro.

No that did not work. I also do not want the compiler to mess with this.
The pre-processor expansion does exactly what I want.
Musttail requires the parameters to exactly match on the incoming and 
outgoing calls. Like if it is a recursive call.


> 
> > and for the tailcall version
> > 	mov	rax, qword ptr [r13 + 8*rax]
> > 	rex64 jmp	rax   
> 
> I wonder why the tailcall version didn't combine the mov with the jmp
> like the other version did.

I do also wonder about that! From my previous testing it will not make a 
difference speed-wise.

> 
> > It also turns out that the musttail attribute is not necessary
> > It will generate a tailcall aanyway. The difference is that with
> > musttail it will report an error if it cannot do the tailcall.
> 
> Yes, TCO has been present since the beginning but it's been
> opportunistic rather than something you can rely on.  
> 
> > Unfortunately GCC does not recognize preserve_none and uses the stack
> > for some parameters

It looks like it recognizes it but choose to ignore it.
That is what the warning messages say.


> Oh that's interesting.  I half remember there being some other feature
> for that, but who knows.  Does -fwhole-program help?

I will for sure continue to use the computed goto also in the future.
The complete VM8 function containing 157 opcodes is about 1200 
lines of code. 255 are for the function-array. that leaves 945 lines
for 157 opcode, about 6 lines per opcode!

BR
Peter

[toc] | [prev] | [next] | [standalone]


#134528

Fromanton@mips.complang.tuwien.ac.at (Anton Ertl)
Date2026-01-20 22:36 +0000
Message-ID<2026Jan20.233605@mips.complang.tuwien.ac.at>
In reply to#134521
Paul Rubin <no.email@nospam.invalid> writes:
>peter <peter.noreply@tin.it> writes:
>> and for the tailcall version
>> 	mov	rax, qword ptr [r13 + 8*rax]
>> 	rex64 jmp	rax   
>
>I wonder why the tailcall version didn't combine the mov with the jmp
>like the other version did.

I always wonder about that when I see the code generated by gcc for
goto *.  With any gcc since 3.0, I see direct-threaded dispatch
compiled to code like:

 add    $0x8,%rbx
 mov    (%rbx),%rax
 jmp    *%rax

GCC-2.95 and earlier know how to combine the last two instructions into

 jmp    (%rbx)

>Yes, TCO has been present since the beginning but it's been
>opportunistic rather than something you can rely on.

I tried to use tail-call optimization for threaded-code dispatch in
gcc in 1995, and even described it as theoretical possibility
[ertl95pldi], but gcc of that time did not tail-call optimize code
like that shown by Peter.

@InProceedings{ertl95pldi,
  author =       "M. Anton Ertl",
  title =        "Stack Caching for Interpreters",
  booktitle =    "SIGPLAN Conference on Programming Language
                  Design and Implementation (PLDI'95)",
  year =         "1995",
  crossref =     "sigplan95",
  pages =	 "315--327",
  url =		 "https://www.complang.tuwien.ac.at/papers/ertl95pldi.ps.gz",
  abstract =     "An interpreter can spend a significant part of its
                  execution time on arguments of virtual machine
                  instructions. This paper explores two methods to
                  reduce this overhead for virtual stack machines by
                  caching top-of-stack values in (real machine)
                  registers. The {\em dynamic method} is based on
                  having, for every possible state of the cache, one
                  specialized version of the whole interpreter; the
                  execution of an instruction usually changes the
                  state of the cache and the next instruction is
                  executed in the version corresponding to the new
                  state. In the {\em static method} a state machine
                  that keeps track of the cache state is added to the
                  compiler. Common instructions exist in specialized
                  versions for several states, but it is not necessary
                  to have a version of every instruction for every
                  cache state. Stack manipulation instructions are
                  optimized away."
}

>> Unfortunately GCC does not recognize preserve_none and uses the stack
>> for some parameters
>
>Oh that's interesting.  I half remember there being some other feature
>for that, but who knows.

Explicit register variables.

>Does -fwhole-program help?

Unlikely.  How should it?

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html
EuroForth 2025 registration: https://euro.theforth.net/

[toc] | [prev] | [next] | [standalone]


#134522

FromPaul Rubin <no.email@nospam.invalid>
Date2026-01-20 00:33 -0800
Message-ID<87fr80pttv.fsf@nightsong.com>
In reply to#134520
peter <peter.noreply@tin.it> writes:
> // VM8 C variant using tailcalls ...
> #define FUNC   __attribute__((preserve_none)) void

Can you add static to that?  It stops the symbol from being exported, so
the compiler can omit the function call sequence when appropriate.

I think -fwhole-program isn't likely to work so it wasn't a helpful
suggestion, sorry.

[toc] | [prev] | [next] | [standalone]


#134527

Fromanton@mips.complang.tuwien.ac.at (Anton Ertl)
Date2026-01-20 22:17 +0000
Message-ID<2026Jan20.231745@mips.complang.tuwien.ac.at>
In reply to#134520
peter <peter.noreply@tin.it> writes:
>On Fri, 16 Jan 2026 23:10:24 -0800
>Paul Rubin <no.email@nospam.invalid> wrote:
>The VM was written from the begining in X64 assembler (13 years ago)
>4 years ago I also implemented the VM i C to simplify porting to ARM64.
>At that time the asm version was about 10% faster then the generated 
>C code, today the speed is about the same. C compilers have improved.

My impression is that they have not really improved in decades, for
the kinds of code in Gforth (and, I guess NTF64/LXF64).  Except that
we can now use one or two registers more on AMD64 if we do things
right.

>Much more important is the __attribute__((preserve_none)) before
>each function. This indicated that more registers will be used to pass
>parameters. As seen above I pass 8 parameters to each function and
>they need to be in registers to match the asmbler written code.
>This is done automatically in the goto version as everything is in
>one function there. 
>
>In the end it is more how you like to write your VM, as one function
>or one for each "opcode".
>
>Unfortunately GCC does not recognize preserve_none and uses the stack
>for some parameters

With gcc you can use explicit register variables instead.

In any case, the tail-calling technique is not as portable as I would
like.  Depending on the compiler and ABI/architecture, one wants to
pass more or fewer VM registers as parameters, and maybe deal with the
rest with explicit register variables.  Sure, we can find ways to
parameterize this stuff so the main body of the code does not see the
difference, but working on a new architecture out of the box will
either not work or require a lot of work.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html
EuroForth 2025 registration: https://euro.theforth.net/

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.lang.forth


csiph-web