Path: csiph.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: minforth <minforth@gmx.net>
Newsgroups: comp.lang.forth
Subject: Re: Generating a random sequence of Forth words
Date: Wed, 1 Oct 2025 11:20:56 +0200
Lines: 130
Message-ID: <mk4a3oFovluU1@mid.individual.net>
References: <2025Sep30.183350@mips.complang.tuwien.ac.at>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net YSq9rWv3LIvJneWs/t0g2gSUe7t0sp2YQVuezZxy0BIl221X+N
Cancel-Lock: sha1:fT9UF6c0jO5OSaBrsYrp6cnjvsI= sha256:m89G2yf9qJ2nTi08RLr4Tyk5AJMYHoATSRemAS0chLo=
User-Agent: Mozilla Thunderbird
In-Reply-To: <2025Sep30.183350@mips.complang.tuwien.ac.at>
Xref: csiph.com comp.lang.forth:134224

Am 30.09.2025 um 18:33 schrieb Anton Ertl:
> I wanted to generate a random sequence consisting of a number, NOOP,
> and DROP.  When EVALUATEing the sequence, starting with an empty
> stack, it should end with an empty stack, and should not underflow the
> stack.
> 
> I implemented this twice.  The first attempt just generated a random
> word/number in the normal case, and a specific kind of word/number in
> the boundary cases:
> 
> : genseq {: u -- :}
>      0 u 0 do
>          case
>              dup 0= ?of 1+ ." 9 " endof
>              dup i + u u>= ?of 1- ." drop " endof
>              dup i + 1+ u u>= ?of ." noop " endof
>              rnd 3 um* nip 1- dup case
>                  -1 of ." drop " endof
>                  0  of ." noop " endof
>                  1  of ." 8 " endof
>              endcase
>              +
>              0 endcase
>      loop
>      drop ;
> 
> When I do 16 GENSEQ with seed 1234, I get the following sequence:
> 
> 9 drop 9 8 8 noop drop drop 8 noop drop noop noop 8 drop drop
> 
> The requirements are distributed across several places in this word,
> which I did not like.  I also did not like the non-randomness of the
> sequence of DROPs at the end, and the non-randomness of the 9 whenever
> the stack is empty.  I also found that I needed to add a clause for
> when the stack depth is one less than the number of remaining words.
> 
> So I thought about other ways to do it.
> 
> Eventually I came up with the following approach: Starts with counters
> for the different kinds of words/numbers.  Whenever one word/number is
> generated, its counter is reduced.  At each step the probability of
> generating a kind of word/number is proportional to the remaining
> counter for that word/number.  The same-depth requirement is
> implemented by having the same initial counter for the number and
> DROP.  The no-underflow requirement is implemented by skipping the
> output of a DROP if the count of remaining DROPs = the count of the
> remaining numbers.
> 
> Here's the code:
> 
> create counts 3 cells allot
> 
> : initcounts ( u -- )
>      dup 3 / dup counts ! dup counts 2 th ! 2* - counts cell+ ! ;
> 
> : .counts ( -- )
>      counts 3 cell array>mem mem+do
>          i @ 4 .r
>      loop ;
> 
> : countsth-- ( u -- )
>      -1 counts rot th +! ;
> 
> : select1 ( u -- )
>      \ u is the index of the word
>      dup case
>          0 of ." 9 "    countsth-- endof
>          1 of ." noop " countsth-- endof
>          2 of counts @ counts 2 th @ u< if
>                  ." drop " countsth--
>              else
>                  drop endif
>          endof
>      endcase ;
> 
> : sum ( addr u -- u2 )
>      \ addr u is an array with u elements, u2 is the sum of the elements
>      0 -rot cell array>mem mem+do
>          i @ +
>      loop ;
> 
> : genseq ( u -- )
>      initcounts
>      begin
>          counts 3 sum dup while
>              rnd um* nip ( sum )
>              3 0 u+do ( sum1 )
>                  counts i th @ 2dup u< if
>                      i select1 then
>                  - loop
>              drop
>      repeat
>      drop ;
> 
> The code is much longer, but I like it better.  The requirements are
> concentrated in two places.
> 
> 16 GENSEQ generates
> 
> 9 9 noop noop drop noop 9 drop noop drop noop 9 drop 9 noop drop
> 
> for seed 123.  A disadvantage of this approach is that it does not
> generate a word/number on each iteration through the main loop.  I
> have thought about how to achieve this, but the result would be quite
> a bit more complex; it's also only a minor issue.  E.g., for
> generating 1000 words/numbers with seed 123, the main loop has 1036
> iterations.
> 
> The second approach takes quite a bit more time; for generating a
> million words/numbers into a string on a Zen 4, 192_876_892 cycles
> compared to 146_644_156 cycles.
> 
> The original goal was to produce a sequence where various parts of the
> text interpreter have branch mispredictions; the less predictable the
> sequence, the better.  In this respect the second sequence is
> significantly better.  For a sequence of 1000 words/numbers,
> text-interpreted 1000 times, the number of mispredictions is:
> 
> 728_322 first approach
> 821_813 second approach
> 
> (with 247_125 mispredictions coming from (mainly) Gforth startup).
>

How about
1) quick generation of a suboptimal list
2) create an array of pointers to the list elements
3) random shuffle the array e.g.
https://en.wikipedia.org/wiki/Fisher-Yates_shuffle