Path: csiph.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: minforth Newsgroups: comp.lang.forth Subject: Re: Generating a random sequence of Forth words Date: Wed, 1 Oct 2025 11:20:56 +0200 Lines: 130 Message-ID: References: <2025Sep30.183350@mips.complang.tuwien.ac.at> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net YSq9rWv3LIvJneWs/t0g2gSUe7t0sp2YQVuezZxy0BIl221X+N Cancel-Lock: sha1:fT9UF6c0jO5OSaBrsYrp6cnjvsI= sha256:m89G2yf9qJ2nTi08RLr4Tyk5AJMYHoATSRemAS0chLo= User-Agent: Mozilla Thunderbird In-Reply-To: <2025Sep30.183350@mips.complang.tuwien.ac.at> Xref: csiph.com comp.lang.forth:134224 Am 30.09.2025 um 18:33 schrieb Anton Ertl: > I wanted to generate a random sequence consisting of a number, NOOP, > and DROP. When EVALUATEing the sequence, starting with an empty > stack, it should end with an empty stack, and should not underflow the > stack. > > I implemented this twice. The first attempt just generated a random > word/number in the normal case, and a specific kind of word/number in > the boundary cases: > > : genseq {: u -- :} > 0 u 0 do > case > dup 0= ?of 1+ ." 9 " endof > dup i + u u>= ?of 1- ." drop " endof > dup i + 1+ u u>= ?of ." noop " endof > rnd 3 um* nip 1- dup case > -1 of ." drop " endof > 0 of ." noop " endof > 1 of ." 8 " endof > endcase > + > 0 endcase > loop > drop ; > > When I do 16 GENSEQ with seed 1234, I get the following sequence: > > 9 drop 9 8 8 noop drop drop 8 noop drop noop noop 8 drop drop > > The requirements are distributed across several places in this word, > which I did not like. I also did not like the non-randomness of the > sequence of DROPs at the end, and the non-randomness of the 9 whenever > the stack is empty. I also found that I needed to add a clause for > when the stack depth is one less than the number of remaining words. > > So I thought about other ways to do it. > > Eventually I came up with the following approach: Starts with counters > for the different kinds of words/numbers. Whenever one word/number is > generated, its counter is reduced. At each step the probability of > generating a kind of word/number is proportional to the remaining > counter for that word/number. The same-depth requirement is > implemented by having the same initial counter for the number and > DROP. The no-underflow requirement is implemented by skipping the > output of a DROP if the count of remaining DROPs = the count of the > remaining numbers. > > Here's the code: > > create counts 3 cells allot > > : initcounts ( u -- ) > dup 3 / dup counts ! dup counts 2 th ! 2* - counts cell+ ! ; > > : .counts ( -- ) > counts 3 cell array>mem mem+do > i @ 4 .r > loop ; > > : countsth-- ( u -- ) > -1 counts rot th +! ; > > : select1 ( u -- ) > \ u is the index of the word > dup case > 0 of ." 9 " countsth-- endof > 1 of ." noop " countsth-- endof > 2 of counts @ counts 2 th @ u< if > ." drop " countsth-- > else > drop endif > endof > endcase ; > > : sum ( addr u -- u2 ) > \ addr u is an array with u elements, u2 is the sum of the elements > 0 -rot cell array>mem mem+do > i @ + > loop ; > > : genseq ( u -- ) > initcounts > begin > counts 3 sum dup while > rnd um* nip ( sum ) > 3 0 u+do ( sum1 ) > counts i th @ 2dup u< if > i select1 then > - loop > drop > repeat > drop ; > > The code is much longer, but I like it better. The requirements are > concentrated in two places. > > 16 GENSEQ generates > > 9 9 noop noop drop noop 9 drop noop drop noop 9 drop 9 noop drop > > for seed 123. A disadvantage of this approach is that it does not > generate a word/number on each iteration through the main loop. I > have thought about how to achieve this, but the result would be quite > a bit more complex; it's also only a minor issue. E.g., for > generating 1000 words/numbers with seed 123, the main loop has 1036 > iterations. > > The second approach takes quite a bit more time; for generating a > million words/numbers into a string on a Zen 4, 192_876_892 cycles > compared to 146_644_156 cycles. > > The original goal was to produce a sequence where various parts of the > text interpreter have branch mispredictions; the less predictable the > sequence, the better. In this respect the second sequence is > significantly better. For a sequence of 1000 words/numbers, > text-interpreted 1000 times, the number of mispredictions is: > > 728_322 first approach > 821_813 second approach > > (with 247_125 mispredictions coming from (mainly) Gforth startup). > How about 1) quick generation of a suboptimal list 2) create an array of pointers to the list elements 3) random shuffle the array e.g. https://en.wikipedia.org/wiki/Fisher-Yates_shuffle