Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.arch > #114964 > unrolled thread

A useless machine

Started byThomas Koenig <tkoenig@netcologne.de>
First post2026-02-14 09:57 +0000
Last post2026-02-18 21:56 +0100
Articles 20 on this page of 81 — 12 participants

Back to article view | Back to comp.arch


Contents

  A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-14 09:57 +0000
    Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-14 18:48 +0000
      Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-14 20:18 +0100
        Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-14 21:22 +0000
          Re: A useless machine Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2026-02-14 13:30 -0800
          Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-15 12:50 +0100
            Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-21 13:48 -0600
              Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-21 21:40 +0100
                Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-21 15:17 -0600
                  Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-22 04:50 -0600
                Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-12 21:01 -0700
              Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-22 14:43 +0100
                Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-22 12:38 -0600
                  Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-23 10:51 +0100
                    Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-26 01:27 +0000
                      Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-26 09:21 +0100
                      Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-12 21:11 -0700
        Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 14:58 +0100
        Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-21 12:27 -0600
          Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-22 15:14 +0100
        Re: A useless machine Stefan Monnier <monnier@iro.umontreal.ca> - 2026-02-22 11:41 -0500
          Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-23 10:53 +0100
            Re: A useless machine Stefan Monnier <monnier@iro.umontreal.ca> - 2026-02-23 10:52 -0500
    Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-02-14 21:19 -0800
    Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-15 03:58 -0500
      Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-15 13:14 +0100
        Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 15:46 +0000
        Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-15 10:56 -0500
          Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-15 19:16 +0000
            Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 22:40 +0100
              Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-26 19:31 +0000
          Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-18 18:22 +0200
            Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-18 22:56 -0500
              Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-19 12:10 +0200
                Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-21 01:56 -0500
                  Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-21 19:23 +0200
                  Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-21 20:36 +0200
                    Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 10:46 +0200
                      Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-22 12:45 +0100
                        Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 16:35 +0200
                    Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-22 19:12 -0500
                      Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-22 19:47 -0500
                  Re: A useless machine EricP <ThatWouldBeTelling@thevillage.com> - 2026-02-21 15:39 -0500
                    Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 03:16 +0200
                      Re: A useless machine EricP <ThatWouldBeTelling@thevillage.com> - 2026-02-22 14:48 -0500
                        Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 23:25 +0200
              Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-19 11:51 +0100
                Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-19 23:12 +0200
                  Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-20 15:11 +0100
              Re: A useless machine Stefan Monnier <monnier@iro.umontreal.ca> - 2026-02-19 15:27 -0500
                Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-20 11:41 +0200
                Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-22 18:41 -0600
                  Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-23 11:10 +0000
                    Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-23 17:24 -0600
                      Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-24 06:44 +0000
                        Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 01:38 -0600
                          Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 03:16 -0600
                            Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 04:52 -0600
                          Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-24 11:16 +0000
                            Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 13:19 -0600
                        Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-12 21:14 -0700
            Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-20 08:01 +0000
      Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-15 15:35 +0200
        Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-15 18:33 +0200
          Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 19:19 +0000
            Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 12:31 +0200
              Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-17 15:41 +0200
                Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-17 18:17 +0200
      Re: A useless machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-02-15 22:41 +0000
    Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 14:52 +0100
      Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 16:45 +0000
        Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 18:01 +0100
          Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 19:42 +0000
            Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 22:59 +0100
              Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-15 21:53 -0500
                Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 12:40 +0200
                  Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 20:34 +0200
                Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-16 16:08 +0100
    Re: A useless machine Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2026-02-16 07:50 -0800
      Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 18:31 +0200
      Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-18 21:56 +0100

Page 3 of 5 — ← Prev page 1 2 [3] 4 5  Next page →


#115103

FromRobert Finch <robfi680@gmail.com>
Date2026-02-22 19:12 -0500
Message-ID<10ng622$2hurh$1@dont-email.me>
In reply to#115070
On 2026-02-21 1:36 p.m., Michael S wrote:
> On Sat, 21 Feb 2026 01:56:03 -0500
> Robert Finch <robfi680@gmail.com> wrote:
> 
>> On 2026-02-19 5:10 a.m., Michael S wrote:
>>> On Wed, 18 Feb 2026 22:56:19 -0500
>>> Robert Finch <robfi680@gmail.com> wrote:
>>>    
>>>> On 2026-02-18 11:22 a.m., Michael S wrote:
>>>>> On Sun, 15 Feb 2026 10:56:51 -0500
>>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>>       
>>>>>> I have coded a version that tests up to 8192 number
>>>>>> simultaneously. It is supposed to be able to run at over 200
>>>>>> MHz. Some of the number would take a large number of iterations
>>>>>> though.
>>>>>>
>>>>>> I found something strange, the code to count for more numbers
>>>>>> does not increase the size of the core very much. (But it takes
>>>>>> longer to build.) I am not sure if this occurs because there is
>>>>>> a mitsake somewhere, or if the tools are able to figure out how
>>>>>> to share all the logic. I am thinking the logic for adders is
>>>>>> shared somehow. 128-bit arithmetic is being used.
>>>>>>      
>>>>>
>>>>> There is a mistake in your code. No other explanation is possible.
>>>>>
>>>>> You should look for 3 sorts of mistakes
>>>>> 1. Core itself.
>>>>> The most likely mistake is failing to look for one of the ending
>>>>> conditions. You have to look for 3 conditions:
>>>>> a) - value produced at current step is smaller than initial value.
>>>>>     That is successful ending. Here you signal to the scheduler
>>>>> that you are ready to accept the next value.
>>>>> b) - overflow. Next value exceeds the width of register.
>>>>> c) - timeout. You were running for more than N steps. N=8192 looks
>>>>> like the reasonable value.
>>>>> a) and b) are suspected failures.
>>>>> The handling for b) and c) is similar - you raise alert signal and
>>>>> stop. Your test unit has to provide supervisor circuity with the
>>>>> mean to fetch the initial value that caused suspected failure and
>>>>> to clear alert condition.
>>>>> After alert is cleared your unit returns to the same state as
>>>>> after success - waiting for next initial value.
>>>>
>>>> I found the mistake. Core was working, but groups of cores were
>>>> not. Done signals were missing.
>>>>   
>>>>>
>>>>> 2. Scheduler.
>>>>> In proof-of-concept design the scheduler does not have to be very
>>>>> realistic. For example, it does not have to be efficiently
>>>>> pipelined. But it should be able to monitor ready signals of all
>>>>> test units and to supply different initial data to different
>>>>> units. Supplying the same data to all units simultaneously is a
>>>>> sort of mistake that explains your observed behavior.
>>>>>       
>>>> There is really simple scheduling ATM. There is just a bunch of
>>>> done signals that are gated together. Meaning the slowest
>>>> calculation of the group will slow everything else down. Once all
>>>> the dones are true the next set of numbers are loaded.
>>>>   
>>>>> 3. Supervisor
>>>>> Again, in proof-of-concept design the supervisor does not have to
>>>>> be very realistic. But it should be able to selectively exercise
>>>>> all core feature related to alert, including fetching of initial
>>>>> data and clearing of alert.
>>>>>
>>>>> After you coded everything correctly, do not even try to compile
>>>>> test with 8000 units. That is totally unrealistic. Even if you
>>>>> find huge FPGA in which such number of testers could fit (and I
>>>>> am not sure that such FPGA exists) the compilation will take many
>>>>> hours, possibly days.
>>>>>
>>>>>
>>>>>       
>>>> Yeah 8,000 was definitely out of the picture. I had since coded
>>>> 24x24 matrix (576) and although the logic would fit that turned
>>>> out not to be able to route. So, it is down to 20x20. 20x20 with
>>>> no real scheduling fits.
>>>>
>>>> I tried displaying progress as an on-screen bitmap, but that just
>>>> appeared noisy. I have been working on an SCP so may be able to use
>>>> that to manage the scheduling with some custom logic.
>>>>
>>>>
>>>>   
>>>
>>> Does 20x20 means 400 cores with something like 80-bit initial values
>>> and 128-bit processing?
>> Yes. 128-bit processing.
>>
>>> If you didn't make further mistakes then it should be rather big,
>>> order of 300K LEs. It would not fit in the biggest low-end Altera
>>> device (Cyclone 10GX 10CX220).
>>
>> IIRC it is about 150k LUTs. It uses the simplest approach, which is
>> not very fast iteration wise, but it allows a lot of cores and a fast
>> clock cycles time. One core is very small and has a fast iteration (1
>> clock cycle). But it takes a lot more iterations than can be done
>> with a more brainiac approach.
>>
>> Using a more brainiac approach would likely cut the performance in
>> half and use a lot more LUTs.
>>
> 
> According to my experiments, combining 2 steps have very small negative
> effect on achivable clock and increases area by ~1.8x. So, to me it
> looks like a win.
> That's the combined step I am talking about:
>    x = n/4;
>    switch (x % 4) {
>      case 0: n = x; break;
>      case 1: n = 3*x+1; break;
>      case 2: n = 2*x+2; break;
>      case 3: n = 9*x+8; break;
>    }
> 
> 
> 
>>> On the Xilinx side, it would not fit in any Spartan and in most
>>> Artix devices. May be, it would fit in the biggest Artix
>>> UltraScale+ (AU25P). But more realistically for 400 test units one
>>> would want Arria 10 on Altera side or Kintex on Xilinx side. Both
>>> are not prohibitively expensive, but not cheap.
>>>
>>> What device are you using?
>> I am not using an too inexpensive device. I have a Kintex-325 with
>> about 200k LUTs.
>>
> 
> According to Xilinx marketing materials XC7K325T has 326080 equivalents
> of traditional LEs.
> That explains why your design fits.
> 
> Pay attention that while Kintex evaluation boards could be
> non-expensive, right now buying devices for actual production work is
> rather costly, especially in small quantities.
> The lowest price I see on digikey site is 1147 USD. That's a bad value
> relatively to many other FPGAs.
> 
> 
>>> If it is smaller than Kintex KU5P then I'd strongly suspect that you
>>> didn't clean out all mistakes.
>>>
>>>
>>>    
>> There could be more mistakes for sure. But I am sure the lowest level
>> core works. It seemed to in simulation. I made a slight mod to it
>> since I tested it last, checking n < startn.
>> Counts are not used so the logic would be stripped out, but its good
>> for verification.
>>
>> 1 Core is:
>>
>> module Collatz_conjecture(rst, clk, startn, count, done);
>> input rst;
>> input clk;
>> input [127:0] startn;
>> output reg [127:0] count;
>> output reg done;
>> reg [127:0] n;
>>
>> always_ff @(posedge clk)
>> if (rst) begin
>> 	n <= startn;
>> 	count <= 0;
>> 	done <= FALSE;
>> end
>> else begin
>> 	if (!done) begin
>> 		if (~n[0])
>> 			n <= n >> 2'd1;
>> 		else
>> 			n <= n+n+n+1;
>> 		if (n < startn)
>> 			done <= TRUE;
>> 		count <= count + 1;
>> 	end
>> end
>>
>> endmodule
>>
> 
> 
> Ok.
> 1. Your core misses one critical part - overflow check.
> Assuming 80-bit initial count, 128-bit register will overflow
> eventually. It would happen very rarely, but it would happen.
> The check for overflow does not have to be precise. Conservative
> approach, like looking at 2 MS bits and one LS bit and signaling
> overflow not when it really happened but when it *could* happen, i.e.
> when MS bits > 0 and LS bit='1' is good enough. But the check can't be
> omitted.
> 
> 2. I see that instead of calculating n = (n*3+1)/2 as one step you are
> doing it in two separate steps. I am not sure that it is really makes
> your core smaller or faster. And it certainly increases # of iterations
> by 1/3rd.
> 
> 3.
> 128-bit count is pointless loss of resources. 12 or 13 bit are
> sufficient. If the count exceeded that size then you already found
> something of high interest and it is a job of supervisor to find out
> what exactly had happened.
> 128-bit counter actually not just too big, it is also pointless,
> because if we pretend to believe that our purpose is to disprove
> Collatz's conjecture then supervisor would have to implement stop
> condition with some sort of timeout that will effectively duplicate
> functionality of the counter.
> 
> 4.
> Your module does not latch the value of startn. It means that caller
> module has to do it on it's behalf. That is the most likely place where
> one can make a mistake.
> 
> 5.
> Coding style comment. Normally, name rst is used for asynchronous
> reset signal. In your design rst is a synchronous signal. For the
> benefit of casual reader I suggest to find different name.
> 
> 
Re-wrote the core using Michael's suggestion of iterations. Looks 
something like the below code.
Renamed the rst signal ld to better reflect its usage.
Timing is close to the same, but not as many cores can fit. It is a 
trade-off, fewer cores, but they process more iterations in one go. Net 
result is more numbers processed in the same length of time.
I am using 16x16 cores now. Overflow check using 130-bit numbers.

import const_pkg::*;

module Collatz_conjecture(clk, ld, startn, done, overflow);
input clk;
input ld;
input [127:0] startn;
output reg done;
output reg overflow;

reg [129:0] n;

always_ff @(posedge clk)
if (ld)
	n <= 130'(startn);
else begin
	if (!done)
		case(n[1:0])
		2'd0:	n <= n >> 2'd2;
		2'd1:	n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
		2'd2:	n <= n[129:2]+n[129:2]+2'd2;
		2'd3:	n <= {n[129:2],3'd0}+n[129:2]+4'd8;
		endcase
end

always_ff @(posedge clk)
if (ld)
	done <= FALSE;
else begin
	if (!done) begin
		if (n < startn || (|n[129:128]))
			done <= TRUE;
	end
end

always_ff @(posedge clk)
if (ld)
	overflow <= FALSE;
else begin
	if (!done) begin
		if (|n[129:128])
			overflow <= TRUE;
	end
end

endmodule

[toc] | [prev] | [next] | [standalone]


#115105

FromRobert Finch <robfi680@gmail.com>
Date2026-02-22 19:47 -0500
Message-ID<10ng836$2j140$1@dont-email.me>
In reply to#115103
On 2026-02-22 7:12 p.m., Robert Finch wrote:
> On 2026-02-21 1:36 p.m., Michael S wrote:
>> On Sat, 21 Feb 2026 01:56:03 -0500
>> Robert Finch <robfi680@gmail.com> wrote:
>>
>>> On 2026-02-19 5:10 a.m., Michael S wrote:
>>>> On Wed, 18 Feb 2026 22:56:19 -0500
>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>> On 2026-02-18 11:22 a.m., Michael S wrote:
>>>>>> On Sun, 15 Feb 2026 10:56:51 -0500
>>>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>>>> I have coded a version that tests up to 8192 number
>>>>>>> simultaneously. It is supposed to be able to run at over 200
>>>>>>> MHz. Some of the number would take a large number of iterations
>>>>>>> though.
>>>>>>>
>>>>>>> I found something strange, the code to count for more numbers
>>>>>>> does not increase the size of the core very much. (But it takes
>>>>>>> longer to build.) I am not sure if this occurs because there is
>>>>>>> a mitsake somewhere, or if the tools are able to figure out how
>>>>>>> to share all the logic. I am thinking the logic for adders is
>>>>>>> shared somehow. 128-bit arithmetic is being used.
>>>>>>
>>>>>> There is a mistake in your code. No other explanation is possible.
>>>>>>
>>>>>> You should look for 3 sorts of mistakes
>>>>>> 1. Core itself.
>>>>>> The most likely mistake is failing to look for one of the ending
>>>>>> conditions. You have to look for 3 conditions:
>>>>>> a) - value produced at current step is smaller than initial value.
>>>>>>     That is successful ending. Here you signal to the scheduler
>>>>>> that you are ready to accept the next value.
>>>>>> b) - overflow. Next value exceeds the width of register.
>>>>>> c) - timeout. You were running for more than N steps. N=8192 looks
>>>>>> like the reasonable value.
>>>>>> a) and b) are suspected failures.
>>>>>> The handling for b) and c) is similar - you raise alert signal and
>>>>>> stop. Your test unit has to provide supervisor circuity with the
>>>>>> mean to fetch the initial value that caused suspected failure and
>>>>>> to clear alert condition.
>>>>>> After alert is cleared your unit returns to the same state as
>>>>>> after success - waiting for next initial value.
>>>>>
>>>>> I found the mistake. Core was working, but groups of cores were
>>>>> not. Done signals were missing.
>>>>>>
>>>>>> 2. Scheduler.
>>>>>> In proof-of-concept design the scheduler does not have to be very
>>>>>> realistic. For example, it does not have to be efficiently
>>>>>> pipelined. But it should be able to monitor ready signals of all
>>>>>> test units and to supply different initial data to different
>>>>>> units. Supplying the same data to all units simultaneously is a
>>>>>> sort of mistake that explains your observed behavior.
>>>>> There is really simple scheduling ATM. There is just a bunch of
>>>>> done signals that are gated together. Meaning the slowest
>>>>> calculation of the group will slow everything else down. Once all
>>>>> the dones are true the next set of numbers are loaded.
>>>>>> 3. Supervisor
>>>>>> Again, in proof-of-concept design the supervisor does not have to
>>>>>> be very realistic. But it should be able to selectively exercise
>>>>>> all core feature related to alert, including fetching of initial
>>>>>> data and clearing of alert.
>>>>>>
>>>>>> After you coded everything correctly, do not even try to compile
>>>>>> test with 8000 units. That is totally unrealistic. Even if you
>>>>>> find huge FPGA in which such number of testers could fit (and I
>>>>>> am not sure that such FPGA exists) the compilation will take many
>>>>>> hours, possibly days.
>>>>>>
>>>>>>
>>>>> Yeah 8,000 was definitely out of the picture. I had since coded
>>>>> 24x24 matrix (576) and although the logic would fit that turned
>>>>> out not to be able to route. So, it is down to 20x20. 20x20 with
>>>>> no real scheduling fits.
>>>>>
>>>>> I tried displaying progress as an on-screen bitmap, but that just
>>>>> appeared noisy. I have been working on an SCP so may be able to use
>>>>> that to manage the scheduling with some custom logic.
>>>>>
>>>>>
>>>>
>>>> Does 20x20 means 400 cores with something like 80-bit initial values
>>>> and 128-bit processing?
>>> Yes. 128-bit processing.
>>>
>>>> If you didn't make further mistakes then it should be rather big,
>>>> order of 300K LEs. It would not fit in the biggest low-end Altera
>>>> device (Cyclone 10GX 10CX220).
>>>
>>> IIRC it is about 150k LUTs. It uses the simplest approach, which is
>>> not very fast iteration wise, but it allows a lot of cores and a fast
>>> clock cycles time. One core is very small and has a fast iteration (1
>>> clock cycle). But it takes a lot more iterations than can be done
>>> with a more brainiac approach.
>>>
>>> Using a more brainiac approach would likely cut the performance in
>>> half and use a lot more LUTs.
>>>
>>
>> According to my experiments, combining 2 steps have very small negative
>> effect on achivable clock and increases area by ~1.8x. So, to me it
>> looks like a win.
>> That's the combined step I am talking about:
>>    x = n/4;
>>    switch (x % 4) {
>>      case 0: n = x; break;
>>      case 1: n = 3*x+1; break;
>>      case 2: n = 2*x+2; break;
>>      case 3: n = 9*x+8; break;
>>    }
>>
>>
>>
>>>> On the Xilinx side, it would not fit in any Spartan and in most
>>>> Artix devices. May be, it would fit in the biggest Artix
>>>> UltraScale+ (AU25P). But more realistically for 400 test units one
>>>> would want Arria 10 on Altera side or Kintex on Xilinx side. Both
>>>> are not prohibitively expensive, but not cheap.
>>>>
>>>> What device are you using?
>>> I am not using an too inexpensive device. I have a Kintex-325 with
>>> about 200k LUTs.
>>>
>>
>> According to Xilinx marketing materials XC7K325T has 326080 equivalents
>> of traditional LEs.
>> That explains why your design fits.
>>
>> Pay attention that while Kintex evaluation boards could be
>> non-expensive, right now buying devices for actual production work is
>> rather costly, especially in small quantities.
>> The lowest price I see on digikey site is 1147 USD. That's a bad value
>> relatively to many other FPGAs.
>>
>>
>>>> If it is smaller than Kintex KU5P then I'd strongly suspect that you
>>>> didn't clean out all mistakes.
>>>>
>>>>
>>> There could be more mistakes for sure. But I am sure the lowest level
>>> core works. It seemed to in simulation. I made a slight mod to it
>>> since I tested it last, checking n < startn.
>>> Counts are not used so the logic would be stripped out, but its good
>>> for verification.
>>>
>>> 1 Core is:
>>>
>>> module Collatz_conjecture(rst, clk, startn, count, done);
>>> input rst;
>>> input clk;
>>> input [127:0] startn;
>>> output reg [127:0] count;
>>> output reg done;
>>> reg [127:0] n;
>>>
>>> always_ff @(posedge clk)
>>> if (rst) begin
>>>     n <= startn;
>>>     count <= 0;
>>>     done <= FALSE;
>>> end
>>> else begin
>>>     if (!done) begin
>>>         if (~n[0])
>>>             n <= n >> 2'd1;
>>>         else
>>>             n <= n+n+n+1;
>>>         if (n < startn)
>>>             done <= TRUE;
>>>         count <= count + 1;
>>>     end
>>> end
>>>
>>> endmodule
>>>
>>
>>
>> Ok.
>> 1. Your core misses one critical part - overflow check.
>> Assuming 80-bit initial count, 128-bit register will overflow
>> eventually. It would happen very rarely, but it would happen.
>> The check for overflow does not have to be precise. Conservative
>> approach, like looking at 2 MS bits and one LS bit and signaling
>> overflow not when it really happened but when it *could* happen, i.e.
>> when MS bits > 0 and LS bit='1' is good enough. But the check can't be
>> omitted.
>>
>> 2. I see that instead of calculating n = (n*3+1)/2 as one step you are
>> doing it in two separate steps. I am not sure that it is really makes
>> your core smaller or faster. And it certainly increases # of iterations
>> by 1/3rd.
>>
>> 3.
>> 128-bit count is pointless loss of resources. 12 or 13 bit are
>> sufficient. If the count exceeded that size then you already found
>> something of high interest and it is a job of supervisor to find out
>> what exactly had happened.
>> 128-bit counter actually not just too big, it is also pointless,
>> because if we pretend to believe that our purpose is to disprove
>> Collatz's conjecture then supervisor would have to implement stop
>> condition with some sort of timeout that will effectively duplicate
>> functionality of the counter.
>>
>> 4.
>> Your module does not latch the value of startn. It means that caller
>> module has to do it on it's behalf. That is the most likely place where
>> one can make a mistake.
>>
>> 5.
>> Coding style comment. Normally, name rst is used for asynchronous
>> reset signal. In your design rst is a synchronous signal. For the
>> benefit of casual reader I suggest to find different name.
>>
>>
> Re-wrote the core using Michael's suggestion of iterations. Looks 
> something like the below code.
> Renamed the rst signal ld to better reflect its usage.
> Timing is close to the same, but not as many cores can fit. It is a 
> trade-off, fewer cores, but they process more iterations in one go. Net 
> result is more numbers processed in the same length of time.
> I am using 16x16 cores now. Overflow check using 130-bit numbers.
> 
> import const_pkg::*;
> 
> module Collatz_conjecture(clk, ld, startn, done, overflow);
> input clk;
> input ld;
> input [127:0] startn;
> output reg done;
> output reg overflow;
> 
> reg [129:0] n;
> 
> always_ff @(posedge clk)
> if (ld)
>      n <= 130'(startn);
> else begin
>      if (!done)
>          case(n[1:0])
>          2'd0:    n <= n >> 2'd2;
>          2'd1:    n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
>          2'd2:    n <= n[129:2]+n[129:2]+2'd2;
>          2'd3:    n <= {n[129:2],3'd0}+n[129:2]+4'd8;
>          endcase
> end
> 
> always_ff @(posedge clk)
> if (ld)
>      done <= FALSE;
> else begin
>      if (!done) begin
>          if (n < startn || (|n[129:128]))
>              done <= TRUE;
>      end
> end
> 
> always_ff @(posedge clk)
> if (ld)
>      overflow <= FALSE;
> else begin
>      if (!done) begin
>          if (|n[129:128])
>              overflow <= TRUE;
>      end
> end
> 
> endmodule
> 
> 
Just noticed overflow needs a couple more extra bits since multiplying by 9.

[toc] | [prev] | [next] | [standalone]


#115077

FromEricP <ThatWouldBeTelling@thevillage.com>
Date2026-02-21 15:39 -0500
Message-ID<NKomR.158378$N9Eb.26408@fx46.iad>
In reply to#115066
Robert Finch wrote:
> On 2026-02-19 5:10 a.m., Michael S wrote:
> 
>> If it is smaller than Kintex KU5P then I'd strongly suspect that you
>> didn't clean out all mistakes.
>>
>>
>>
> There could be more mistakes for sure. But I am sure the lowest level 
> core works. It seemed to in simulation. I made a slight mod to it since 
> I tested it last, checking n < startn.
> Counts are not used so the logic would be stripped out, but its good for 
> verification.
> 
> 1 Core is:
> 
> module Collatz_conjecture(rst, clk, startn, count, done);
> input rst;
> input clk;
> input [127:0] startn;
> output reg [127:0] count;
> output reg done;
> reg [127:0] n;
> 
> always_ff @(posedge clk)
> if (rst) begin
>     n <= startn;
>     count <= 0;
>     done <= FALSE;
> end
> else begin
>     if (!done) begin
>         if (~n[0])
>             n <= n >> 2'd1;
>         else
>             n <= n+n+n+1;
>         if (n < startn)
>             done <= TRUE;
>         count <= count + 1;
>     end
> end
> 
> endmodule

I have a minimal knowledge of Verilog and may be wrong but...

- it looks to me in the code as written that startn, count and done args
   are coded either input or output, but are actually both read and written.
   The compiler should have given an error as that implies multiple drivers
   for those wires (the caller module and the assignments).
   Possibly it didn't detect an error because it defaults to data type
   "logic" which has 4 states, one of which is tri-state 'z'.

   Maybe setting the data types to 'bit' would allow it to detect the error.

   Changing them to "inout" would turn them all into tri-state buses which
   is probably not what you want.
   Possibly have separate args for input and output:

     input  bit[127:0] iCount;
     output bit]127:0] oCount.

- I don't know how smart your compiler is but the >> shift operator can
   be replaced by a bit slice operation that only routes wires:
     if (~n[0])
       n <= {1'b0, n[127:1]};

- it can eliminate a 128 bit adder by shifting the wires left:
     else
       n <= {n[126:0], 1'b0} + n + 1;



[toc] | [prev] | [next] | [standalone]


#115082

FromMichael S <already5chosen@yahoo.com>
Date2026-02-22 03:16 +0200
Message-ID<20260222031658.00002f1c@yahoo.com>
In reply to#115077
On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:

> Robert Finch wrote:
> > On 2026-02-19 5:10 a.m., Michael S wrote:
> >   
> >> If it is smaller than Kintex KU5P then I'd strongly suspect that
> >> you didn't clean out all mistakes.
> >>
> >>
> >>  
> > There could be more mistakes for sure. But I am sure the lowest
> > level core works. It seemed to in simulation. I made a slight mod
> > to it since I tested it last, checking n < startn.
> > Counts are not used so the logic would be stripped out, but its
> > good for verification.
> > 
> > 1 Core is:
> > 
> > module Collatz_conjecture(rst, clk, startn, count, done);
> > input rst;
> > input clk;
> > input [127:0] startn;
> > output reg [127:0] count;
> > output reg done;
> > reg [127:0] n;
> > 
> > always_ff @(posedge clk)
> > if (rst) begin
> >     n <= startn;
> >     count <= 0;
> >     done <= FALSE;
> > end
> > else begin
> >     if (!done) begin
> >         if (~n[0])
> >             n <= n >> 2'd1;
> >         else
> >             n <= n+n+n+1;
> >         if (n < startn)
> >             done <= TRUE;
> >         count <= count + 1;
> >     end
> > end
> > 
> > endmodule  
> 
> I have a minimal knowledge of Verilog and may be wrong but...

I am not a Verilog expert either (use VHDL exclusively), but to me the
code of Robert Finch looks o.k from the HDL perspective (functionality
is something else, I addressed it in my replay below). 
His Verilog appears to be an exact equivalent of VHDL code below,
which is perfectly legal synthesizable code under VHDL-2008 rules.


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity Collatz_conjecture is
 port (
  rst    : in  boolean;
  clk    : in  std_logic;
  startn : in  unsigned(127 downto 0);
  count  : out unsigned(127 downto 0);
  done   : out boolean
 );
end entity Collatz_conjecture;

architecture a of Collatz_conjecture is
 signal n : unsigned(127 downto 0);
begin
 process (clk)
 begin
  if rising_edge(clk) then
    if rst then
      n <= startn;
      count <= (others => '0');
      done <= false;
    elsif not done then
      if n(0)='0' then
        n <= '0' & n(n'high downto 1);
      else
        n <= n+n+n+1;
      end if;

      if n < startn then
        done <= true;
      end if;

      count <= count + 1;
    end if;
  end if;
 end process;
end architecture a;

[toc] | [prev] | [next] | [standalone]


#115098

FromEricP <ThatWouldBeTelling@thevillage.com>
Date2026-02-22 14:48 -0500
Message-ID<65JmR.56061$nJQ7.5512@fx34.iad>
In reply to#115082
Michael S wrote:
> On Sat, 21 Feb 2026 15:39:42 -0500
> EricP <ThatWouldBeTelling@thevillage.com> wrote:
> 
>> Robert Finch wrote:
>>> On 2026-02-19 5:10 a.m., Michael S wrote:
>>>   
>>>> If it is smaller than Kintex KU5P then I'd strongly suspect that
>>>> you didn't clean out all mistakes.
>>>>
>>>>
>>>>  
>>> There could be more mistakes for sure. But I am sure the lowest
>>> level core works. It seemed to in simulation. I made a slight mod
>>> to it since I tested it last, checking n < startn.
>>> Counts are not used so the logic would be stripped out, but its
>>> good for verification.
>>>
>>> 1 Core is:
>>>
>>> module Collatz_conjecture(rst, clk, startn, count, done);
>>> input rst;
>>> input clk;
>>> input [127:0] startn;
>>> output reg [127:0] count;
>>> output reg done;
>>> reg [127:0] n;
>>>
>>> always_ff @(posedge clk)
>>> if (rst) begin
>>>     n <= startn;
>>>     count <= 0;
>>>     done <= FALSE;
>>> end
>>> else begin
>>>     if (!done) begin
>>>         if (~n[0])
>>>             n <= n >> 2'd1;
>>>         else
>>>             n <= n+n+n+1;
>>>         if (n < startn)
>>>             done <= TRUE;
>>>         count <= count + 1;
>>>     end
>>> end
>>>
>>> endmodule  
>> I have a minimal knowledge of Verilog and may be wrong but...
> 
> I am not a Verilog expert either (use VHDL exclusively), but to me the
> code of Robert Finch looks o.k from the HDL perspective (functionality
> is something else, I addressed it in my replay below). 
> His Verilog appears to be an exact equivalent of VHDL code below,
> which is perfectly legal synthesizable code under VHDL-2008 rules.
> 
> 
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> 
> entity Collatz_conjecture is
>  port (
>   rst    : in  boolean;
>   clk    : in  std_logic;
>   startn : in  unsigned(127 downto 0);
>   count  : out unsigned(127 downto 0);
>   done   : out boolean
>  );
> end entity Collatz_conjecture;
> 
> architecture a of Collatz_conjecture is
>  signal n : unsigned(127 downto 0);
> begin
>  process (clk)
>  begin
>   if rising_edge(clk) then
>     if rst then
>       n <= startn;
>       count <= (others => '0');
>       done <= false;
>     elsif not done then
>       if n(0)='0' then
>         n <= '0' & n(n'high downto 1);
>       else
>         n <= n+n+n+1;
>       end if;
> 
>       if n < startn then
>         done <= true;
>       end if;
> 
>       count <= count + 1;
>     end if;
>   end if;
>  end process;
> end architecture a;
> 
> 

Well then I just don't get this at all.
I have a VHDL textbook that explicitly states that IN ports are input only,
OUT ports are output only, and INOUT are input and output (tri-state).

The language spec SystemVerilog IEEE_Std-1800-2017
(it's a horrid language - the spec is 1315 pages long,
and so complex that no compiler implements all of it,
and some compilers still use the pre-2005 language spec)

says in section "23.3.3.2 Port connection rules for variables" that
"Assignments to variables declared as input ports shall be illegal."
"Procedural or continuous assignments to a variable connected to the
output port of an instance shall be illegal."

but it also says "An output port can be connected to a net (or a
concatenation) of a compatible data type. In this case, multiple drivers
shall be permitted on the net"

and in "23.3.3.1 Port coercion
A port that is declared as input (output) but used as an output (input)
or inout may be coerced to inout. If not coerced to inout, a warning
shall be issued."

but never defines what "coerced" means or how to do it.

Verilog-2017 also has the "uwire" data type which is a uni-driver type
that only allows a single driver for the wire.
(But some compilers don't support uwire)

"6.6.2 Unresolved nets
The uwire net is an unresolved or unidriver wire and is used to model nets
that allow only a single driver. The uwire type can be used to enforce this
restriction. It shall be an error to connect any bit of a uwire net to more
than one driver. It shall be an error to connect a uwire net to a
bidirectional terminal of a bidirectional pass switch.

The port connection rule in 23.3.3.6 enforces this restriction across
the net hierarchy or shall issue a warning if not."

"23.3.3.6 Single source nets (uwire)
If the net on either side of a port has the net type uwire,
a warning shall be issued if the nets are not merged
into a single net, as described in 23.3.3.7."

and spec contradicts itself as to whether it is an error or warning.

[toc] | [prev] | [next] | [standalone]


#115100

FromMichael S <already5chosen@yahoo.com>
Date2026-02-22 23:25 +0200
Message-ID<20260222232547.00003fff@yahoo.com>
In reply to#115098
On Sun, 22 Feb 2026 14:48:13 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:

> Michael S wrote:
> > On Sat, 21 Feb 2026 15:39:42 -0500
> > EricP <ThatWouldBeTelling@thevillage.com> wrote:
> >   
> >> Robert Finch wrote:  
> >>> On 2026-02-19 5:10 a.m., Michael S wrote:
> >>>     
> >>>> If it is smaller than Kintex KU5P then I'd strongly suspect that
> >>>> you didn't clean out all mistakes.
> >>>>
> >>>>
> >>>>    
> >>> There could be more mistakes for sure. But I am sure the lowest
> >>> level core works. It seemed to in simulation. I made a slight mod
> >>> to it since I tested it last, checking n < startn.
> >>> Counts are not used so the logic would be stripped out, but its
> >>> good for verification.
> >>>
> >>> 1 Core is:
> >>>
> >>> module Collatz_conjecture(rst, clk, startn, count, done);
> >>> input rst;
> >>> input clk;
> >>> input [127:0] startn;
> >>> output reg [127:0] count;
> >>> output reg done;
> >>> reg [127:0] n;
> >>>
> >>> always_ff @(posedge clk)
> >>> if (rst) begin
> >>>     n <= startn;
> >>>     count <= 0;
> >>>     done <= FALSE;
> >>> end
> >>> else begin
> >>>     if (!done) begin
> >>>         if (~n[0])
> >>>             n <= n >> 2'd1;
> >>>         else
> >>>             n <= n+n+n+1;
> >>>         if (n < startn)
> >>>             done <= TRUE;
> >>>         count <= count + 1;
> >>>     end
> >>> end
> >>>
> >>> endmodule    
> >> I have a minimal knowledge of Verilog and may be wrong but...  
> > 
> > I am not a Verilog expert either (use VHDL exclusively), but to me
> > the code of Robert Finch looks o.k from the HDL perspective
> > (functionality is something else, I addressed it in my replay
> > below). His Verilog appears to be an exact equivalent of VHDL code
> > below, which is perfectly legal synthesizable code under VHDL-2008
> > rules.
> > 
> > 
> > library ieee;
> > use ieee.std_logic_1164.all;
> > use ieee.numeric_std.all;
> > 
> > entity Collatz_conjecture is
> >  port (
> >   rst    : in  boolean;
> >   clk    : in  std_logic;
> >   startn : in  unsigned(127 downto 0);
> >   count  : out unsigned(127 downto 0);
> >   done   : out boolean
> >  );
> > end entity Collatz_conjecture;
> > 
> > architecture a of Collatz_conjecture is
> >  signal n : unsigned(127 downto 0);
> > begin
> >  process (clk)
> >  begin
> >   if rising_edge(clk) then
> >     if rst then
> >       n <= startn;
> >       count <= (others => '0');
> >       done <= false;
> >     elsif not done then
> >       if n(0)='0' then
> >         n <= '0' & n(n'high downto 1);
> >       else
> >         n <= n+n+n+1;
> >       end if;
> > 
> >       if n < startn then
> >         done <= true;
> >       end if;
> > 
> >       count <= count + 1;
> >     end if;
> >   end if;
> >  end process;
> > end architecture a;
> > 
> >   
> 
> Well then I just don't get this at all.
> I have a VHDL textbook that explicitly states that IN ports are input
> only, OUT ports are output only, and INOUT are input and output
> (tri-state).
> 

Sounds like old textbook, not updated with VHDL-2008 changes related to
what is legal for 'out' ports.
See, for example, here:
https://docs.amd.com/r/en-US/ug901-vivado-synthesis/Reading-Output-Ports

[toc] | [prev] | [next] | [standalone]


#115042

FromTerje Mathisen <terje.mathisen@tmsw.no>
Date2026-02-19 11:51 +0100
Message-ID<10n6pup$3efqs$1@dont-email.me>
In reply to#115038
Robert Finch wrote:
> On 2026-02-18 11:22 a.m., Michael S wrote:
>> On Sun, 15 Feb 2026 10:56:51 -0500
>> Robert Finch <robfi680@gmail.com> wrote:
>>
>>> I have coded a version that tests up to 8192 number simultaneously.
>>> It is supposed to be able to run at over 200 MHz. Some of the number
>>> would take a large number of iterations though.
>>>
>>> I found something strange, the code to count for more numbers does
>>> not increase the size of the core very much. (But it takes longer to
>>> build.) I am not sure if this occurs because there is a mitsake
>>> somewhere, or if the tools are able to figure out how to share all
>>> the logic. I am thinking the logic for adders is shared somehow.
>>> 128-bit arithmetic is being used.
>>>
>>
>> There is a mistake in your code. No other explanation is possible.
>>
>> You should look for 3 sorts of mistakes
>> 1. Core itself.
>> The most likely mistake is failing to look for one of the ending
>> conditions. You have to look for 3 conditions:
>> a) - value produced at current step is smaller than initial value.
>>   That is successful ending. Here you signal to the scheduler that you
>>   are ready to accept the next value.
>> b) - overflow. Next value exceeds the width of register.
>> c) - timeout. You were running for more than N steps. N=8192 looks
>> like the reasonable value.
>> a) and b) are suspected failures.
>> The handling for b) and c) is similar - you raise alert signal and stop.
>> Your test unit has to provide supervisor circuity with the mean to
>> fetch the initial value that caused suspected failure and to clear
>> alert condition.
>> After alert is cleared your unit returns to the same state as after
>> success - waiting for next initial value.
> 
> I found the mistake. Core was working, but groups of cores were not. 
> Done signals were missing.
> 
>>
>> 2. Scheduler.
>> In proof-of-concept design the scheduler does not have to be very
>> realistic. For example, it does not have to be efficiently pipelined.
>> But it should be able to monitor ready signals of all test units and to
>> supply different initial data to different units.
>> Supplying the same data to all units simultaneously is a sort of
>> mistake that explains your observed behavior.
>>
> There is really simple scheduling ATM. There is just a bunch of done 
> signals that are gated together. Meaning the slowest calculation of the 
> group will slow everything else down. Once all the dones are true the 
> next set of numbers are loaded.

The obvious choice is to have a work-to-do list, each work item 
containing a start-end set of numbers to test.

Each worker thread would simply finish its current task, store the/any 
results, and then ask for the next available task.

A single producer - multiple consumers queue like this is very simple to 
get right with minimum admins: Just a single LOCK XADD to update the 
queue back index, grabbing the corresponding task.

I used this exact setup for my wire speed multi-threaded ntpd deamon 
about two decades ago. (I did need a few more XADD instructions to 
handle queue full/empty situations which are not applicable for the 
current problem.)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

[toc] | [prev] | [next] | [standalone]


#115050

FromMichael S <already5chosen@yahoo.com>
Date2026-02-19 23:12 +0200
Message-ID<20260219231241.0000718c@yahoo.com>
In reply to#115042
On Thu, 19 Feb 2026 11:51:03 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

> Robert Finch wrote:
> > On 2026-02-18 11:22 a.m., Michael S wrote:  
> >> On Sun, 15 Feb 2026 10:56:51 -0500
> >> Robert Finch <robfi680@gmail.com> wrote:
> >>  
> >>> I have coded a version that tests up to 8192 number
> >>> simultaneously. It is supposed to be able to run at over 200 MHz.
> >>> Some of the number would take a large number of iterations though.
> >>>
> >>> I found something strange, the code to count for more numbers does
> >>> not increase the size of the core very much. (But it takes longer
> >>> to build.) I am not sure if this occurs because there is a mitsake
> >>> somewhere, or if the tools are able to figure out how to share all
> >>> the logic. I am thinking the logic for adders is shared somehow.
> >>> 128-bit arithmetic is being used.
> >>>  
> >>
> >> There is a mistake in your code. No other explanation is possible.
> >>
> >> You should look for 3 sorts of mistakes
> >> 1. Core itself.
> >> The most likely mistake is failing to look for one of the ending
> >> conditions. You have to look for 3 conditions:
> >> a) - value produced at current step is smaller than initial value.
> >>   That is successful ending. Here you signal to the scheduler that
> >> you are ready to accept the next value.
> >> b) - overflow. Next value exceeds the width of register.
> >> c) - timeout. You were running for more than N steps. N=8192 looks
> >> like the reasonable value.
> >> a) and b) are suspected failures.
> >> The handling for b) and c) is similar - you raise alert signal and
> >> stop. Your test unit has to provide supervisor circuity with the
> >> mean to fetch the initial value that caused suspected failure and
> >> to clear alert condition.
> >> After alert is cleared your unit returns to the same state as after
> >> success - waiting for next initial value.  
> > 
> > I found the mistake. Core was working, but groups of cores were
> > not. Done signals were missing.
> >   
> >>
> >> 2. Scheduler.
> >> In proof-of-concept design the scheduler does not have to be very
> >> realistic. For example, it does not have to be efficiently
> >> pipelined. But it should be able to monitor ready signals of all
> >> test units and to supply different initial data to different units.
> >> Supplying the same data to all units simultaneously is a sort of
> >> mistake that explains your observed behavior.
> >>  
> > There is really simple scheduling ATM. There is just a bunch of
> > done signals that are gated together. Meaning the slowest
> > calculation of the group will slow everything else down. Once all
> > the dones are true the next set of numbers are loaded.  
> 
> The obvious choice is to have a work-to-do list, each work item 
> containing a start-end set of numbers to test.
> 
> Each worker thread would simply finish its current task, store
> the/any results, and then ask for the next available task.
> 
> A single producer - multiple consumers queue like this is very simple
> to get right with minimum admins: Just a single LOCK XADD to update
> the queue back index, grabbing the corresponding task.
> 

There is no XADD in FPGA, just gate, multipliers and memory blocks.

> I used this exact setup for my wire speed multi-threaded ntpd deamon 
> about two decades ago. (I did need a few more XADD instructions to 
> handle queue full/empty situations which are not applicable for the 
> current problem.)
> 
> Terje
> 

Did your ntpd deamon scale to 300-400 threads?
300 is the # of test units that fits in FPGAs that appears to be the
most cost-effective for this sort of job, e.g. 10CX220YU484E5G.






[toc] | [prev] | [next] | [standalone]


#115059

FromTerje Mathisen <terje.mathisen@tmsw.no>
Date2026-02-20 15:11 +0100
Message-ID<10n9q2f$eof0$1@dont-email.me>
In reply to#115050
Michael S wrote:
> On Thu, 19 Feb 2026 11:51:03 +0100
> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> 
>> Robert Finch wrote:
>>> On 2026-02-18 11:22 a.m., Michael S wrote:
>>>> On Sun, 15 Feb 2026 10:56:51 -0500
>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>   
>>>>> I have coded a version that tests up to 8192 number
>>>>> simultaneously. It is supposed to be able to run at over 200 MHz.
>>>>> Some of the number would take a large number of iterations though.
>>>>>
>>>>> I found something strange, the code to count for more numbers does
>>>>> not increase the size of the core very much. (But it takes longer
>>>>> to build.) I am not sure if this occurs because there is a mitsake
>>>>> somewhere, or if the tools are able to figure out how to share all
>>>>> the logic. I am thinking the logic for adders is shared somehow.
>>>>> 128-bit arithmetic is being used.
>>>>>   
>>>>
>>>> There is a mistake in your code. No other explanation is possible.
>>>>
>>>> You should look for 3 sorts of mistakes
>>>> 1. Core itself.
>>>> The most likely mistake is failing to look for one of the ending
>>>> conditions. You have to look for 3 conditions:
>>>> a) - value produced at current step is smaller than initial value.
>>>>    That is successful ending. Here you signal to the scheduler that
>>>> you are ready to accept the next value.
>>>> b) - overflow. Next value exceeds the width of register.
>>>> c) - timeout. You were running for more than N steps. N=8192 looks
>>>> like the reasonable value.
>>>> a) and b) are suspected failures.
>>>> The handling for b) and c) is similar - you raise alert signal and
>>>> stop. Your test unit has to provide supervisor circuity with the
>>>> mean to fetch the initial value that caused suspected failure and
>>>> to clear alert condition.
>>>> After alert is cleared your unit returns to the same state as after
>>>> success - waiting for next initial value.
>>>
>>> I found the mistake. Core was working, but groups of cores were
>>> not. Done signals were missing.
>>>    
>>>>
>>>> 2. Scheduler.
>>>> In proof-of-concept design the scheduler does not have to be very
>>>> realistic. For example, it does not have to be efficiently
>>>> pipelined. But it should be able to monitor ready signals of all
>>>> test units and to supply different initial data to different units.
>>>> Supplying the same data to all units simultaneously is a sort of
>>>> mistake that explains your observed behavior.
>>>>   
>>> There is really simple scheduling ATM. There is just a bunch of
>>> done signals that are gated together. Meaning the slowest
>>> calculation of the group will slow everything else down. Once all
>>> the dones are true the next set of numbers are loaded.
>>
>> The obvious choice is to have a work-to-do list, each work item
>> containing a start-end set of numbers to test.
>>
>> Each worker thread would simply finish its current task, store
>> the/any results, and then ask for the next available task.
>>
>> A single producer - multiple consumers queue like this is very simple
>> to get right with minimum admins: Just a single LOCK XADD to update
>> the queue back index, grabbing the corresponding task.
>>
> 
> There is no XADD in FPGA, just gate, multipliers and memory blocks.

Oops! :-)

> 
>> I used this exact setup for my wire speed multi-threaded ntpd deamon
>> about two decades ago. (I did need a few more XADD instructions to
>> handle queue full/empty situations which are not applicable for the
>> current problem.)
>>
> 
> Did your ntpd deamon scale to 300-400 threads?
> 300 is the # of test units that fits in FPGAs that appears to be the
> most cost-effective for this sort of job, e.g. 10CX220YU484E5G.

Scaling does not depend on the number of threads, only on the frequency 
of pulls from the queue: My code was very latency sensitive so I could 
never batch up multiple requests, while this particular 
"Gedankenexperiment" can very easily use arbitrarily large work lists, 
right?

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

[toc] | [prev] | [next] | [standalone]


#115049

FromStefan Monnier <monnier@iro.umontreal.ca>
Date2026-02-19 15:27 -0500
Message-ID<jwvwm08a3e4.fsf-monnier+comp.arch@gnu.org>
In reply to#115038
I must say I find this whole thread baffling: when searching for
a counter example and we're already in the order of 2^70, it seems clear
that constant-factor improvements are completely irrelevant (even with
large constants).

I guess it all boils down to the "useless" part of the title?


=== Stefan

[toc] | [prev] | [next] | [standalone]


#115057

FromMichael S <already5chosen@yahoo.com>
Date2026-02-20 11:41 +0200
Message-ID<20260220114153.0000416f@yahoo.com>
In reply to#115049
On Thu, 19 Feb 2026 15:27:07 -0500
Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> I must say I find this whole thread baffling: when searching for
> a counter example and we're already in the order of 2^70, it seems
> clear that constant-factor improvements are completely irrelevant
> (even with large constants).
> 
> I guess it all boils down to the "useless" part of the title?
> 
> 
> === Stefan


As I wrote in earlier post, "useless" sounds too optimistic.
"Useless" in my book is proving or disproving 99% of conjectures known
in the numbers theory, both obscure and extremely famous (Fermat's
Last Theorem). And then 1% process ends up very useful.

But Collatz's conjecture believed to be true. Trying to disprove it by
experiments is futile.
Finding that all sequence with initial value in range [2**71:2**72]
converge to 1, which is by far the most likely outcome of experiment,
leaves us in exactly the same place where we started, i.e. conjecture
believed to be true, but unproven. That's not good enough to warrant a
title of "useless". "Pointless" sounds more proper, but may be we shall
find something more derogatory.

However, it's fun.
My current estimate, after considering how "smart" filtering by modulo
2**20 can be made practical, is that a single FPGA that costs 300 USD
would be able to test 2**71 values in ~500 years. Which means that the
company I am working for can be convinced to build gear capable of
doing calculations in 1 year for something like 1M USD or a little more.
I'd guess that smaller and hungrier engineering firms would accept the
job for 1/3rd of that sum. 
So, who has a pocket money to spend?

[toc] | [prev] | [next] | [standalone]


#115104

FromBGB <cr88192@gmail.com>
Date2026-02-22 18:41 -0600
Message-ID<10ng813$2im23$1@dont-email.me>
In reply to#115049
On 2/19/2026 2:27 PM, Stefan Monnier wrote:
> I must say I find this whole thread baffling: when searching for
> a counter example and we're already in the order of 2^70, it seems clear
> that constant-factor improvements are completely irrelevant (even with
> large constants).
> 
> I guess it all boils down to the "useless" part of the title?
> 

Yeah, basically.

It is useless in the sense that no matter how big you go, an exception 
wont be found, so rendering further testing essentially meaningless 
(doesn't prove anything unless an exception is found, exception wont be 
found if none can exist).



I did feel tempted to write a program to draw a graph...

Well, then did so, and posted a partial graph on X:
https://x.com/cr88192/status/2025698159519817849

Though, a linear/log mismatch makes it look wonky, and got annoyed 
fiddling with trying to make the code to draw the graph not make it look 
like a mess.

It is like, now realizing how it works and the fairly obvious nature of 
it being true, but also realizing that I am also not in a position to be 
able to prove it as such, is making me feel stupid.

But, alas, there are reasons I am not really a math of physics person...



Well, at least excluding the guys who go around promoting wacky/crazy 
ideas like recursive compression and perpetual motion machines and 
similar (or, all the "quantum consciousness" stuff and similar), 
wouldn't really want to be associated with this crowd either...

Actually, this latter camp I don't get:
If you can easily test things and see that they don't work, what is the 
point of continuing to promote them as-if they do?...



Or, like the whole fusion thing:
   Can someone make it happen? Yes.
   Can one get net positive energy output? Not so much.

Or, the seeming futility of trying to send humans on interstellar trips. 
Basically, no way to send full sized humans interstellar within any 
plausible timescale or resource/energy budget. That said, not entirely 
convinced it is outside the reach of machines (so, my sci-fi stuff 
wouldn't involve sending humans, rather machines with enough to try to 
bootstrap the process).

Ship can get to target system, find some asteroids or similar to begin 
consuming, building itself up into a more significant structure, then 
begins building other machinery it needs; building whatever it needs to 
initiate the colonization process. Even sending a full-set of organic 
samples is a bit of a stretch, so more likely it would preferentially 
bootstrap high priority species from actual cell lines, but may need 
backup strategies (recreating the whole biosphere from yeast or similar 
if need-be). In some sense, the main "payload" not likely being lots of 
organic matter, but enough dense information storage to be able to 
initiate the bootstrap process (and a machine just smart enough to  be 
able to figure out how to find suitable asteroids and start the process).

Galactic civilizations? No. Any such colonies being unlikely to realize 
the others exist, much less have any real way to communicate with them 
(light-speed limits effectively making any sort of large scale 
"civilization" all but impossible).

Etc.



Well, except in cases where I am feeling half tempted to do stories that 
just throw realism entirely out the window, "yeah, errm, how about we 
have wooden pirate ships, in space, fighting giant robots?..." maybe 
take the "solar wind" part a little too literally: have air in space and 
the ships just sorta tack-sailing on the wind ("moment of inertia be 
damned", things still move like they were floating on water, but in 
space). Stuff basically works like in 1980s cartoons (like, you know, 
the ones that mostly existed as glorified toy commercials). Then the 
worlds' physics are based less on realism and more a mix of "rule of 
cool" and "rule of funny".

Say, also:
Ships battle off giant robots by shooting at them with old style canons;
People can expand to 50ft tall to fist-fight said robots;
Rocket flatulence (can fly around by essentially farting fire);
Trinket based costume changes;
Giant robots vs dinosaurs (maybe also in space, dinosaurs also having 
rocket flatulence, and fire breath, ...);
Obvious plot armor;
People can escape restraints by burning them off with eye lasers;
...

But, say:
Characters are not just going around like Superman or something (because 
then there are no stakes), and thus only able to use their 
higher-powered abilities when the plot demands it.

Did recently throw a fragment of this sort of idea at Grok Imagine, but 
while it didn't give exactly what I was imagining, it was in the same 
general area:
https://x.com/cr88192/status/2025021520507011292

But, alas in this case the video generation only gives a few seconds 
(for much more than this, would cost money...).


Because, alas, reality is just kinda depressing sometimes...


Well, except a while back, when I did write a story set in a sort of 
reverse-urban-fantasy setting:
   As opposed to the typical "fantasy stuff in an urban setting";
   More "stereotypical urban stuff in a fantasy setting".

So, more rap-battling gangsta elves and stuff...


Then again, "pirate ships in space" wouldn't exactly be too far out of 
place in a fantasy world. Could pass it off as "sci-fi" in a loose sense 
in that while it is fairly obviously a fantasy setting, many of the 
characters are not themselves aware that there is anything unusual with 
them living within a fantasy-world magic system. But, then may as well 
borrow ideas from "Rainbow Brite" or something, and have the ships 
powered by "Star Sprinkles" or such (well, and puts a limit on how much 
characters can do: The more powerful magic they use, the more rapidly 
they use up their supply of star sprinkles...).

...



But, Alas...

[toc] | [prev] | [next] | [standalone]


#115112

FromThomas Koenig <tkoenig@netcologne.de>
Date2026-02-23 11:10 +0000
Message-ID<10nhci9$2t9mc$1@dont-email.me>
In reply to#115104
BGB <cr88192@gmail.com> schrieb:
> On 2/19/2026 2:27 PM, Stefan Monnier wrote:
>> I must say I find this whole thread baffling: when searching for
>> a counter example and we're already in the order of 2^70, it seems clear
>> that constant-factor improvements are completely irrelevant (even with
>> large constants).
>> 
>> I guess it all boils down to the "useless" part of the title?
>> 
>
> Yeah, basically.
>
> It is useless in the sense that no matter how big you go, an exception 
> wont be found, so rendering further testing essentially meaningless 
> (doesn't prove anything unless an exception is found, exception wont be 
> found if none can exist).

You are assuming that the Collatz conjecture is true.  But this
is unknown, and the whole point of the excercise.

Do I personally think that exchaustive search will find something?  No.

Does my personal belief matter in the slightest?  No.

[...]

> Or, the seeming futility of trying to send humans on interstellar trips. 
> Basically, no way to send full sized humans interstellar within any 
> plausible timescale or resource/energy budget.

My favorite candiate for that is the nuclear salt water rocket.
Which will be difficult to build, or so my son (who just received
his master's degree in nuclear engineering) tells me because
it needs to be prompt critical.

-- 
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

[toc] | [prev] | [next] | [standalone]


#115121

FromBGB <cr88192@gmail.com>
Date2026-02-23 17:24 -0600
Message-ID<10ninsn$3e0da$1@dont-email.me>
In reply to#115112
On 2/23/2026 5:10 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>> On 2/19/2026 2:27 PM, Stefan Monnier wrote:
>>> I must say I find this whole thread baffling: when searching for
>>> a counter example and we're already in the order of 2^70, it seems clear
>>> that constant-factor improvements are completely irrelevant (even with
>>> large constants).
>>>
>>> I guess it all boils down to the "useless" part of the title?
>>>
>>
>> Yeah, basically.
>>
>> It is useless in the sense that no matter how big you go, an exception
>> wont be found, so rendering further testing essentially meaningless
>> (doesn't prove anything unless an exception is found, exception wont be
>> found if none can exist).
> 
> You are assuming that the Collatz conjecture is true.  But this
> is unknown, and the whole point of the excercise.
> 
> Do I personally think that exchaustive search will find something?  No.
> 
> Does my personal belief matter in the slightest?  No.
> 


My own looking into this led me to realize that the number-space is 
modulo (relative to a cone with a spiral that repeats with every 
power-of-2), and that all the odd steps only ever increment the phase 
angle by 1, or fall down to a lower level (whenever the phase angle is 
not incremented).

In effect, each odd step jumps to the opposite side of the cone, but 
slightly increments the angle. Since the phase-angle path can never move 
backwards, it can't touch any previous paths, thus, no loops can happen 
this way.


The only way a cycle could exist is if (somehow) there were an 
impassable wall where pretty much *no* value could descend unless it 
hits the power-of-2 descent path, and then somehow miss this path as well.

But, since "3*n+1" results in the minimum forward step, this means that 
one of the sides *has* to hit the power-of-2 descent cases at some point.

Even if you did get very unlucky and hit odd-paths at nearly every step, 
it would eventually hit the power-of-2 escape case which is basically an 
immediate descent down to 1.

Or, as I see it, the geometry of the number space itself prevents the 
possibility of a loop.


Like, while I can't write a proof for it, I can still see that it is 
true in a geometric/mechanistic sense.

Like, my confidence that it is true isn't so much because of a "no 
exception has been found, so I believe it is true" sense, but more 
because I can see the geometry of the number space well enough to see 
that no non-descending paths can exist through the space.


> [...]
> 
>> Or, the seeming futility of trying to send humans on interstellar trips.
>> Basically, no way to send full sized humans interstellar within any
>> plausible timescale or resource/energy budget.
> 
> My favorite candiate for that is the nuclear salt water rocket.
> Which will be difficult to build, or so my son (who just received
> his master's degree in nuclear engineering) tells me because
> it needs to be prompt critical.
> 

Yeah, I have floated a few ideas...


One I used in my own sci-fi was using cyclotrons and long linac (linear 
accelerator) stage to try to push atomic nuclei to relativistic speeds 
(and generate thrust mostly by the energy required to accelerating the 
nuclei).

Comparably, thrust would be modest, but the amount of propellant used 
could be very small. Harder part is powering the thing...

Main candidates are nuclear and fusion power. With fusion, one could 
separate out the He nuclei and use these. With nuclear, CO2 makes more 
sense as a working fluid and propellant (after first decomposing it into 
C and O nuclei).

While fusion would be the preferable option, nuclear would be easier to 
build in practice (probably a molten-salt breeder reactor with CO2 as 
the working fluid).

No light water reactors though, these would just suck...


For high-speed travel, would likely need to have a lot of ablative 
shielding. My current leaning here basically being to put a very big 
chunk of graphite on the front of the ship.

Could use magnetic deflection, but this is likely to result in more 
energy cost and drag than essentially using a big chunk of graphite and 
maybe some permanent magnets (to slightly reduce ablation rate). Likely 
if using permanent magnets, there would be an iron structure inside the 
big chunk of graphite.

Say, this to deal with the front of the ship being hit with a stream of 
hydrogen atoms potentially traveling at relativistic speeds, etc.

Would likely be pointy (sort of like the tip of a pencil) to reduce drag.

It is possibly that the iron structure (inside the graphite) could 
in-part be magnetized using part of the field strength of the magnets in 
the cyclotrons (say, for example, using big neodymium magnets or similar 
to limit energy cost).




Comparably, variations of nuclear thermal rockets have tradeoffs:
   Orders of magnitude more thrust;
   But, lower maximum exhaust velocity (so, more propellant is needed);
   ...

So, these are more likely to be able to get going good and fast, within 
a more reasonable timeframe, but the problem will then become how far 
you can get before running out of fuel. Assuming human passengers 
though, time and human lifespan become major limiting factors.

Also, for humans, you need a whole lot of stuff to keep them alive and 
comfortable, etc. So, you need a whole lot of thrust to get things 
moving quickly.

So, in this sense, nuclear thermal makes more sense.



Vs, say, trying to design the engine around the assumption of a 
"continuous burn" likely measured in centuries (for a ship with a 
comparably modest payload).

Ion engines are also possible, but downside of ion engines is that they 
would have higher propellant use rate than a cyclotron+linac while also 
giving significantly less thrust relative to the amount of propellant used.


Downside of cyclotron+linac though would be that it would be more energy 
hungry and necessarily significantly larger/bulkier. Such a ship would 
be impractical for ground launch so would need to be assembled in space.

Well, unless a very small version were built, likely with the 
(comparably long) linac tail in a folded up form, which then unpacks and 
assembles itself once in space.

Pretty much no one else that I am aware of is pushing for a 
cyclotron+linac based propulsion system though...


But, then again:
Less viability for manned interplanetary travel (nuclear powered plasma 
thrusters make more sense for this);
Less viability for unmanned interplanetary travel (domain likely better 
suited to ion engines);
Not viable for manned interstellar travel (limited payload, using it for 
a manned mission would likely result in everyone expiring long before it 
reached the destination).

So, it is mostly (at best) left as an option for unmanned interstellar 
travel.


Well, there are other ideas around, like using very small probes, and 
accelerating them using huge lasers, but downside is that these would 
have no way to decelerate.

With other strategies, one basically needing to flip the ship around and 
spending the latter half of the trip decelerating (say, with the "butt" 
end of the ship having the ablative graphite shield in a more concave 
shape in an attempt to maximize the drag profile).




One traditional idea has also been "generation ships", but the 
practicality of this is debatable IMO.

Wrote a bit on this, but decided to leave this out.

Seems like whatever approach to generation ships is taken, it still 
needs to be able to reach the target within a few centuries to have much 
of any real hope of success (1k years, very low probability; after 10k 
years, "inescapable problems" would make longer-term survival almost 
impossible, *1).

*1: Running out of usable fissile materials; running out of nitrogen 
(would need to initially stock up on nitrogen in the form of C-14, which 
could decay to replace that lost over time, but by 10k years, much of 
the C14 would be gone; and long term survival is kinda gone if nitrogen 
runs out), ... Stocking up on more thorium and C-14 and similar would 
ultimately only delay the issues.


Granted, one could reach further out via "system hopping", say, if each 
colony can spot another system with potentially habitable target worlds, 
say, within 25ly .. 50ly or similar, could spread pretty far over time.

Still none of the "galactic civilization" stuff (as I see it, the 
formation of a cohesive civilization spanning multiple star systems is 
most likely essentially impossible; absent the development of FTL travel 
and communication, etc, which is itself most likely impossible).

...

[toc] | [prev] | [next] | [standalone]


#115122

FromThomas Koenig <tkoenig@netcologne.de>
Date2026-02-24 06:44 +0000
Message-ID<10njhce$3l6lq$1@dont-email.me>
In reply to#115121
BGB <cr88192@gmail.com> schrieb:
> My own looking into this led me to realize that the number-space is 
> modulo (relative to a cone with a spiral that repeats with every 
> power-of-2), and that all the odd steps only ever increment the phase 
> angle by 1, or fall down to a lower level (whenever the phase angle is 
> not incremented).
>
> In effect, each odd step jumps to the opposite side of the cone, but 
> slightly increments the angle. Since the phase-angle path can never move 
> backwards, it can't touch any previous paths, thus, no loops can happen 
> this way.
>
>
> The only way a cycle could exist is if (somehow) there were an 
> impassable wall where pretty much *no* value could descend unless it 
> hits the power-of-2 descent path, and then somehow miss this path as well.
>
> But, since "3*n+1" results in the minimum forward step, this means that 
> one of the sides *has* to hit the power-of-2 descent cases at some point.

The same argument would hold for the 3n-1 problem, where there is
more than one cycle.

[...]

-- 
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

[toc] | [prev] | [next] | [standalone]


#115123

FromBGB <cr88192@gmail.com>
Date2026-02-24 01:38 -0600
Message-ID<10njkpj$3lt65$1@dont-email.me>
In reply to#115122
On 2/24/2026 12:44 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>> My own looking into this led me to realize that the number-space is
>> modulo (relative to a cone with a spiral that repeats with every
>> power-of-2), and that all the odd steps only ever increment the phase
>> angle by 1, or fall down to a lower level (whenever the phase angle is
>> not incremented).
>>
>> In effect, each odd step jumps to the opposite side of the cone, but
>> slightly increments the angle. Since the phase-angle path can never move
>> backwards, it can't touch any previous paths, thus, no loops can happen
>> this way.
>>
>>
>> The only way a cycle could exist is if (somehow) there were an
>> impassable wall where pretty much *no* value could descend unless it
>> hits the power-of-2 descent path, and then somehow miss this path as well.
>>
>> But, since "3*n+1" results in the minimum forward step, this means that
>> one of the sides *has* to hit the power-of-2 descent cases at some point.
> 
> The same argument would hold for the 3n-1 problem, where there is
> more than one cycle.
> 

At least going by "the power of imagination" (would require testing to 
confirm):

"3*n-1" would be different in that it neither reliably advances the 
phase angle, nor reliably moves it backwards. So, one is likely to end 
up with pairs of odd-steps that loop back to the same position, creating 
infinite cycles (so, it is not the same geometry).


"3*n-3" might be stable (may appear to behave similarly to 3*n+1), but 
would instead advance the phase angle in the opposite direction.

"3*n+3" should initially appear similar to "3*n+1", but would have paths 
that ascend to infinity (paths would exist which manage to miss all of 
the even steps and also miss the magic "power-of-2 auto-win" step).

If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and 
also all paths that hit an odd number will ascend to infinity.


Granted, may make sense to check/confirm, if these don't match up I 
might need to revisit my mental model.

...


> [...]
> 

[toc] | [prev] | [next] | [standalone]


#115124

FromBGB <cr88192@gmail.com>
Date2026-02-24 03:16 -0600
Message-ID<10njqha$3o4lp$1@dont-email.me>
In reply to#115123
On 2/24/2026 1:38 AM, BGB wrote:
> On 2/24/2026 12:44 AM, Thomas Koenig wrote:
>> BGB <cr88192@gmail.com> schrieb:
>>> My own looking into this led me to realize that the number-space is
>>> modulo (relative to a cone with a spiral that repeats with every
>>> power-of-2), and that all the odd steps only ever increment the phase
>>> angle by 1, or fall down to a lower level (whenever the phase angle is
>>> not incremented).
>>>
>>> In effect, each odd step jumps to the opposite side of the cone, but
>>> slightly increments the angle. Since the phase-angle path can never move
>>> backwards, it can't touch any previous paths, thus, no loops can happen
>>> this way.
>>>
>>>
>>> The only way a cycle could exist is if (somehow) there were an
>>> impassable wall where pretty much *no* value could descend unless it
>>> hits the power-of-2 descent path, and then somehow miss this path as 
>>> well.
>>>
>>> But, since "3*n+1" results in the minimum forward step, this means that
>>> one of the sides *has* to hit the power-of-2 descent cases at some 
>>> point.
>>
>> The same argument would hold for the 3n-1 problem, where there is
>> more than one cycle.
>>
> 


Then, went and took a shower, realized there was more to this:

> At least going by "the power of imagination" (would require testing to 
> confirm):
> 
> "3*n-1" would be different in that it neither reliably advances the 
> phase angle, nor reliably moves it backwards. So, one is likely to end 
> up with pairs of odd-steps that loop back to the same position, creating 
> infinite cycles (so, it is not the same geometry).
> 

Imagination is mostly saying there will be a lot of fairly short loops here.

There should probably be a quite significant number of short loops in 
this case.


> 
> "3*n-3" might be stable (may appear to behave similarly to 3*n+1), but 
> would instead advance the phase angle in the opposite direction.
> 

But, "3*n-3" will change the ending state, so it will not land on 1 anymore.

Rather:
   Around 84% of the time it end land on 3;
   Around 16% of the time it end land on 0.
Give or take... ("imagination" showing around 84% or so of paths landing 
on 3).

Will probably still converge, just ending up at a different end state.


All other negative offset paths would lead to underflow.

So, can probably exclude cases where the value space goes negative.
Though... "looks like" if these were included there is another cone on 
the other side, if one goes this way.



> "3*n+3" should initially appear similar to "3*n+1", but would have paths 
> that ascend to infinity (paths would exist which manage to miss all of 
> the even steps and also miss the magic "power-of-2 auto-win" step).
> 
> If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and 
> also all paths that hit an odd number will ascend to infinity.
> 

Further assertion 1:
All other positive offset paths should contain paths to infinity (along 
with jostling around the ending states for those paths which do converge).

So, the specific end-state behavior seems unique.

Further assertion 2:
   "3*n+1" may be unique in this space.

Or, possibly the only one that reliably converges, lands on 1, and does 
not contain cycles.


> 
> Granted, may make sense to check/confirm, if these don't match up I 
> might need to revisit my mental model.
> 

Still TODO, but for me, shower seemed like a bigger immediate priority.


So, yeah, I guess I may report back later as to whether or not the 
actual math behaves the same way as it does in my imagination.

...

[toc] | [prev] | [next] | [standalone]


#115127

FromBGB <cr88192@gmail.com>
Date2026-02-24 04:52 -0600
Message-ID<10nk05k$3pk7b$1@dont-email.me>
In reply to#115124
On 2/24/2026 3:16 AM, BGB wrote:
> On 2/24/2026 1:38 AM, BGB wrote:
>> On 2/24/2026 12:44 AM, Thomas Koenig wrote:
>>> BGB <cr88192@gmail.com> schrieb:
>>>> My own looking into this led me to realize that the number-space is
>>>> modulo (relative to a cone with a spiral that repeats with every
>>>> power-of-2), and that all the odd steps only ever increment the phase
>>>> angle by 1, or fall down to a lower level (whenever the phase angle is
>>>> not incremented).
>>>>
>>>> In effect, each odd step jumps to the opposite side of the cone, but
>>>> slightly increments the angle. Since the phase-angle path can never 
>>>> move
>>>> backwards, it can't touch any previous paths, thus, no loops can happen
>>>> this way.
>>>>
>>>>
>>>> The only way a cycle could exist is if (somehow) there were an
>>>> impassable wall where pretty much *no* value could descend unless it
>>>> hits the power-of-2 descent path, and then somehow miss this path as 
>>>> well.
>>>>
>>>> But, since "3*n+1" results in the minimum forward step, this means that
>>>> one of the sides *has* to hit the power-of-2 descent cases at some 
>>>> point.
>>>
>>> The same argument would hold for the 3n-1 problem, where there is
>>> more than one cycle.
>>>
>>
> 
> 


Now, after actually testing (to confirm or disprove my imagination):

> Then, went and took a shower, realized there was more to this:
> 
>> At least going by "the power of imagination" (would require testing to 
>> confirm):
>>
>> "3*n-1" would be different in that it neither reliably advances the 
>> phase angle, nor reliably moves it backwards. So, one is likely to end 
>> up with pairs of odd-steps that loop back to the same position, 
>> creating infinite cycles (so, it is not the same geometry).
>>
> 
> Imagination is mostly saying there will be a lot of fairly short loops 
> here.
> 
> There should probably be a quite significant number of short loops in 
> this case.
> 

Quite a lot of loops, but they are not quite matching the expected 
pattern (the typical loop period is longer than imagination had implied).


> 
>>
>> "3*n-3" might be stable (may appear to behave similarly to 3*n+1), but 
>> would instead advance the phase angle in the opposite direction.
>>
> 
> But, "3*n-3" will change the ending state, so it will not land on 1 
> anymore.
> 
> Rather:
>    Around 84% of the time it end land on 3;
>    Around 16% of the time it end land on 0.
> Give or take... ("imagination" showing around 84% or so of paths landing 
> on 3).
> 
> Will probably still converge, just ending up at a different end state.
> 

Disproven:
   "3*n-3" is nowhere near stable.

It does land on 3, but is prone to loops (imagination did not predict 
prevalent looping for this one).


> 
> All other negative offset paths would lead to underflow.
> 
> So, can probably exclude cases where the value space goes negative.
> Though... "looks like" if these were included there is another cone on 
> the other side, if one goes this way.
> 

Not testing for this ATM.

> 
> 
>> "3*n+3" should initially appear similar to "3*n+1", but would have 
>> paths that ascend to infinity (paths would exist which manage to miss 
>> all of the even steps and also miss the magic "power-of-2 auto-win" 
>> step).
>>

Does appear similar, but with a different ending state (lands on 3).

But, thus far the infinity-paths are being no-show.

Thus far, otherwise, 3*n+3 appears stable; at least for the range 
tested. This part does match mental prediction so far (but appears more 
stable than expected).

Mental model expected existence of divergent paths to exist in this 
case, but thus far I am not seeing any.



>> If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, 
>> and also all paths that hit an odd number will ascend to infinity.
>>

Confirmed: Even offsets do fire off towards infinity.


> 
> Further assertion 1:
> All other positive offset paths should contain paths to infinity (along 
> with jostling around the ending states for those paths which do converge).
> 
> So, the specific end-state behavior seems unique.
> 

Non-match:
   Loops (or long cycles) are more prevalent than expected;
   Paths to infinity are not.

Testing +5 and +7, seeing a whole lot of loops, not so many divergent paths.

Thus far, divergent paths have only appeared in one of the expected 
scenarios.


Granted, it is possible that what my imagination was predicting as 
divergent paths are instead manifesting as long-period cycles (probably 
because they are hitting too many even-numbered spots to achieve 
"sufficient escape velocity").

May need to add something to try to differentiate loops from "failed 
escape paths" where it just skips around, but doesn't represent an 
actual loop.


> Further assertion 2:
>    "3*n+1" may be unique in this space.
> 
> Or, possibly the only one that reliably converges, lands on 1, and does 
> not contain cycles.
> 

Thus far, it is looking like "3*n+1" is likely fairly unique here.


> 
>>
>> Granted, may make sense to check/confirm, if these don't match up I 
>> might need to revisit my mental model.
>>
> 
> Still TODO, but for me, shower seemed like a bigger immediate priority.
> 
> 
> So, yeah, I guess I may report back later as to whether or not the 
> actual math behaves the same way as it does in my imagination.
> 

Not quite as close of a match as I would like.


So, seems my imagination model is still subject to doubt in this case...

Given my seeming inability to sufficiently accurately predict the 
behavior of adjusting the offset to a sufficient degree of accuracy; it 
is possible my interpretation of the "3*n+1" case may not be 
sufficiently accurate.

Nothing wildly different from my imagination happened, but it is still 
not quite close enough it would seem.

...

[toc] | [prev] | [next] | [standalone]


#115128

FromThomas Koenig <tkoenig@netcologne.de>
Date2026-02-24 11:16 +0000
Message-ID<10nk1ao$3q2e9$1@dont-email.me>
In reply to#115123
BGB <cr88192@gmail.com> schrieb:

> "3*n-1" would be different in that it neither reliably advances the 
> phase angle, nor reliably moves it backwards. So, one is likely to end 
> up with pairs of odd-steps that loop back to the same position, creating 
> infinite cycles (so, it is not the same geometry).

What is this phase angle?  I am only familiar with that term for
complex numbers.

-- 
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.

[toc] | [prev] | [next] | [standalone]


#115136

FromBGB <cr88192@gmail.com>
Date2026-02-24 13:19 -0600
Message-ID<10nkttm$53d4$1@dont-email.me>
In reply to#115128
On 2/24/2026 5:16 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
> 
>> "3*n-1" would be different in that it neither reliably advances the
>> phase angle, nor reliably moves it backwards. So, one is likely to end
>> up with pairs of odd-steps that loop back to the same position, creating
>> infinite cycles (so, it is not the same geometry).
> 
> What is this phase angle?  I am only familiar with that term for
> complex numbers.
> 

I had noted that it follows a pattern, but primarily if one splits it up 
into a spiral that itself resembles a floating point number.

Though, splitting it into real and imaginary components could also make 
sense as well, but I was thinking of something more like polar 
coordinates though.


So, say, level=exponent, phase=mantissa. But, differing some in that in 
this context it is more continuous (like a log scale number), but some 
of the behavioral patterns are more like those of a floating point number.

So, say:
   level ~= log2(n)
   phase ~= n/(2**floor(level))
     So, say, imagine it represents a value between 1.0 and 1.9999...

Each crossing of a power of 2 is an immediate drop to the "win".
Each odd step will either land on a spot that is equal (and drops a 
level) or increments the phase (or, roughly like incrementing the ULP in 
a floating point number, if the size of the mantissa were proportional 
to the value of the exponent).

It seems, the behavior only manifests in the desired ways if:
  Relative phase increases for each odd step;
  Increases by the minimum possible amount
    (what results from the +1 adjustment).

As noted, the phase effectively exists on two points on the circle at 
any moment:
   phaseA = phase
   phaseB = phase + 0.5

Or, if one wants to see it as a circle in radians, could be like:
   angle=(phase-1.0)*(2*PI);
Or, in degrees:
   angle=(phase-1.0)*(360);
   Phase0 and Phase1 are always at 180 degrees relative to each other.

And, at each odd step, these two points move very slightly on the 
circle. The relative angle relative to each side of the circle, relative 
to the starting point, typically remaining small (relative to the 
starting value).

Overall alignment of the two points can be almost anywhere in the 
circle, except that it can't cross the point at 0 degrees (or 
powers-of-2) because these drop straight to 1.

Or, in polar coordinate terms:
   rho = level
   phi = angle


My previous attempt at a graph though is wonky in that the numbers are 
following a log spacing around the circle, rather than an linear 
spacing. May need to fix this, since the explanation of the two points 
being 180 degrees out of phase looks wonky if the diagram doesn't show 
these points as 180 degrees apart.



Failing to increase leads to loops (leading to the same spot).
Trying to go backwards leads to loops and instabilities (divergent 
paths, or, "more backwards" leads into negative-land).
Trying to go forwards more than the minimum amount also leads to 
instabilities (it results in it stepping over downwards paths).


I initially thought these instabilities would head towards infinity, but 
it seems more like they just head to a larger orbit and get stuck 
bouncing around the circle. I would need to do more looking into it to 
figure out whether they get stuck in a loop, or continue to climb (but 
more gradually). My mental model implies that these paths should 
continue to climb, but that increasing the constant bias possibly causes 
them to ascend faster.

If they are more prone to forming actual closed loops (rather than 
climbing indefinitely), this would be inconsistent with my mental model, 
but seems I would need to do more testing here.

Though, loops can form if the ascent is slow enough that it manages to 
complete a full rotation of the circle without escaping to a higher 
orbit (if it manages to escape the original orbit without forming a 
loop, it should trend upwards to infinity).


Once the relative phase (from the starting point) exceeds 180 degrees 
(or the two points covering a full circle), if the path has not ascended 
to a higher level, a closed loop will form.

But, as noted, with the +1 step, the above scenario (for loop forming) 
becomes impossible.




So, in this case, the "3*n+1" step would appear to be uniquely stable in 
this particular configuration.

[toc] | [prev] | [next] | [standalone]


Page 3 of 5 — ← Prev page 1 2 [3] 4 5  Next page →

Back to top | Article view | comp.arch


csiph-web