Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.arch > #114964 > unrolled thread
| Started by | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| First post | 2026-02-14 09:57 +0000 |
| Last post | 2026-02-18 21:56 +0100 |
| Articles | 20 on this page of 81 — 12 participants |
Back to article view | Back to comp.arch
A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-14 09:57 +0000
Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-14 18:48 +0000
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-14 20:18 +0100
Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-14 21:22 +0000
Re: A useless machine Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2026-02-14 13:30 -0800
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-15 12:50 +0100
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-21 13:48 -0600
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-21 21:40 +0100
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-21 15:17 -0600
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-22 04:50 -0600
Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-12 21:01 -0700
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-22 14:43 +0100
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-22 12:38 -0600
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-23 10:51 +0100
Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-26 01:27 +0000
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-26 09:21 +0100
Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-12 21:11 -0700
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 14:58 +0100
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-21 12:27 -0600
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-22 15:14 +0100
Re: A useless machine Stefan Monnier <monnier@iro.umontreal.ca> - 2026-02-22 11:41 -0500
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-23 10:53 +0100
Re: A useless machine Stefan Monnier <monnier@iro.umontreal.ca> - 2026-02-23 10:52 -0500
Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-02-14 21:19 -0800
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-15 03:58 -0500
Re: A useless machine David Brown <david.brown@hesbynett.no> - 2026-02-15 13:14 +0100
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 15:46 +0000
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-15 10:56 -0500
Re: A useless machine MitchAlsup <user5857@newsgrouper.org.invalid> - 2026-02-15 19:16 +0000
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 22:40 +0100
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-26 19:31 +0000
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-18 18:22 +0200
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-18 22:56 -0500
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-19 12:10 +0200
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-21 01:56 -0500
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-21 19:23 +0200
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-21 20:36 +0200
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 10:46 +0200
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-22 12:45 +0100
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 16:35 +0200
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-22 19:12 -0500
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-22 19:47 -0500
Re: A useless machine EricP <ThatWouldBeTelling@thevillage.com> - 2026-02-21 15:39 -0500
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 03:16 +0200
Re: A useless machine EricP <ThatWouldBeTelling@thevillage.com> - 2026-02-22 14:48 -0500
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-22 23:25 +0200
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-19 11:51 +0100
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-19 23:12 +0200
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-20 15:11 +0100
Re: A useless machine Stefan Monnier <monnier@iro.umontreal.ca> - 2026-02-19 15:27 -0500
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-20 11:41 +0200
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-22 18:41 -0600
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-23 11:10 +0000
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-23 17:24 -0600
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-24 06:44 +0000
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 01:38 -0600
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 03:16 -0600
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 04:52 -0600
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-24 11:16 +0000
Re: A useless machine BGB <cr88192@gmail.com> - 2026-02-24 13:19 -0600
Re: A useless machine Tim Rentsch <tr.17687@z991.linuxsc.com> - 2026-03-12 21:14 -0700
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-20 08:01 +0000
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-15 15:35 +0200
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-15 18:33 +0200
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 19:19 +0000
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 12:31 +0200
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-17 15:41 +0200
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-17 18:17 +0200
Re: A useless machine anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2026-02-15 22:41 +0000
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 14:52 +0100
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 16:45 +0000
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 18:01 +0100
Re: A useless machine Thomas Koenig <tkoenig@netcologne.de> - 2026-02-15 19:42 +0000
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-15 22:59 +0100
Re: A useless machine Robert Finch <robfi680@gmail.com> - 2026-02-15 21:53 -0500
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 12:40 +0200
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 20:34 +0200
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-16 16:08 +0100
Re: A useless machine Stephen Fuld <sfuld@alumni.cmu.edu.invalid> - 2026-02-16 07:50 -0800
Re: A useless machine Michael S <already5chosen@yahoo.com> - 2026-02-16 18:31 +0200
Re: A useless machine Terje Mathisen <terje.mathisen@tmsw.no> - 2026-02-18 21:56 +0100
Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →
| From | Robert Finch <robfi680@gmail.com> |
|---|---|
| Date | 2026-02-22 19:12 -0500 |
| Message-ID | <10ng622$2hurh$1@dont-email.me> |
| In reply to | #115070 |
On 2026-02-21 1:36 p.m., Michael S wrote:
> On Sat, 21 Feb 2026 01:56:03 -0500
> Robert Finch <robfi680@gmail.com> wrote:
>
>> On 2026-02-19 5:10 a.m., Michael S wrote:
>>> On Wed, 18 Feb 2026 22:56:19 -0500
>>> Robert Finch <robfi680@gmail.com> wrote:
>>>
>>>> On 2026-02-18 11:22 a.m., Michael S wrote:
>>>>> On Sun, 15 Feb 2026 10:56:51 -0500
>>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>>
>>>>>> I have coded a version that tests up to 8192 number
>>>>>> simultaneously. It is supposed to be able to run at over 200
>>>>>> MHz. Some of the number would take a large number of iterations
>>>>>> though.
>>>>>>
>>>>>> I found something strange, the code to count for more numbers
>>>>>> does not increase the size of the core very much. (But it takes
>>>>>> longer to build.) I am not sure if this occurs because there is
>>>>>> a mitsake somewhere, or if the tools are able to figure out how
>>>>>> to share all the logic. I am thinking the logic for adders is
>>>>>> shared somehow. 128-bit arithmetic is being used.
>>>>>>
>>>>>
>>>>> There is a mistake in your code. No other explanation is possible.
>>>>>
>>>>> You should look for 3 sorts of mistakes
>>>>> 1. Core itself.
>>>>> The most likely mistake is failing to look for one of the ending
>>>>> conditions. You have to look for 3 conditions:
>>>>> a) - value produced at current step is smaller than initial value.
>>>>> That is successful ending. Here you signal to the scheduler
>>>>> that you are ready to accept the next value.
>>>>> b) - overflow. Next value exceeds the width of register.
>>>>> c) - timeout. You were running for more than N steps. N=8192 looks
>>>>> like the reasonable value.
>>>>> a) and b) are suspected failures.
>>>>> The handling for b) and c) is similar - you raise alert signal and
>>>>> stop. Your test unit has to provide supervisor circuity with the
>>>>> mean to fetch the initial value that caused suspected failure and
>>>>> to clear alert condition.
>>>>> After alert is cleared your unit returns to the same state as
>>>>> after success - waiting for next initial value.
>>>>
>>>> I found the mistake. Core was working, but groups of cores were
>>>> not. Done signals were missing.
>>>>
>>>>>
>>>>> 2. Scheduler.
>>>>> In proof-of-concept design the scheduler does not have to be very
>>>>> realistic. For example, it does not have to be efficiently
>>>>> pipelined. But it should be able to monitor ready signals of all
>>>>> test units and to supply different initial data to different
>>>>> units. Supplying the same data to all units simultaneously is a
>>>>> sort of mistake that explains your observed behavior.
>>>>>
>>>> There is really simple scheduling ATM. There is just a bunch of
>>>> done signals that are gated together. Meaning the slowest
>>>> calculation of the group will slow everything else down. Once all
>>>> the dones are true the next set of numbers are loaded.
>>>>
>>>>> 3. Supervisor
>>>>> Again, in proof-of-concept design the supervisor does not have to
>>>>> be very realistic. But it should be able to selectively exercise
>>>>> all core feature related to alert, including fetching of initial
>>>>> data and clearing of alert.
>>>>>
>>>>> After you coded everything correctly, do not even try to compile
>>>>> test with 8000 units. That is totally unrealistic. Even if you
>>>>> find huge FPGA in which such number of testers could fit (and I
>>>>> am not sure that such FPGA exists) the compilation will take many
>>>>> hours, possibly days.
>>>>>
>>>>>
>>>>>
>>>> Yeah 8,000 was definitely out of the picture. I had since coded
>>>> 24x24 matrix (576) and although the logic would fit that turned
>>>> out not to be able to route. So, it is down to 20x20. 20x20 with
>>>> no real scheduling fits.
>>>>
>>>> I tried displaying progress as an on-screen bitmap, but that just
>>>> appeared noisy. I have been working on an SCP so may be able to use
>>>> that to manage the scheduling with some custom logic.
>>>>
>>>>
>>>>
>>>
>>> Does 20x20 means 400 cores with something like 80-bit initial values
>>> and 128-bit processing?
>> Yes. 128-bit processing.
>>
>>> If you didn't make further mistakes then it should be rather big,
>>> order of 300K LEs. It would not fit in the biggest low-end Altera
>>> device (Cyclone 10GX 10CX220).
>>
>> IIRC it is about 150k LUTs. It uses the simplest approach, which is
>> not very fast iteration wise, but it allows a lot of cores and a fast
>> clock cycles time. One core is very small and has a fast iteration (1
>> clock cycle). But it takes a lot more iterations than can be done
>> with a more brainiac approach.
>>
>> Using a more brainiac approach would likely cut the performance in
>> half and use a lot more LUTs.
>>
>
> According to my experiments, combining 2 steps have very small negative
> effect on achivable clock and increases area by ~1.8x. So, to me it
> looks like a win.
> That's the combined step I am talking about:
> x = n/4;
> switch (x % 4) {
> case 0: n = x; break;
> case 1: n = 3*x+1; break;
> case 2: n = 2*x+2; break;
> case 3: n = 9*x+8; break;
> }
>
>
>
>>> On the Xilinx side, it would not fit in any Spartan and in most
>>> Artix devices. May be, it would fit in the biggest Artix
>>> UltraScale+ (AU25P). But more realistically for 400 test units one
>>> would want Arria 10 on Altera side or Kintex on Xilinx side. Both
>>> are not prohibitively expensive, but not cheap.
>>>
>>> What device are you using?
>> I am not using an too inexpensive device. I have a Kintex-325 with
>> about 200k LUTs.
>>
>
> According to Xilinx marketing materials XC7K325T has 326080 equivalents
> of traditional LEs.
> That explains why your design fits.
>
> Pay attention that while Kintex evaluation boards could be
> non-expensive, right now buying devices for actual production work is
> rather costly, especially in small quantities.
> The lowest price I see on digikey site is 1147 USD. That's a bad value
> relatively to many other FPGAs.
>
>
>>> If it is smaller than Kintex KU5P then I'd strongly suspect that you
>>> didn't clean out all mistakes.
>>>
>>>
>>>
>> There could be more mistakes for sure. But I am sure the lowest level
>> core works. It seemed to in simulation. I made a slight mod to it
>> since I tested it last, checking n < startn.
>> Counts are not used so the logic would be stripped out, but its good
>> for verification.
>>
>> 1 Core is:
>>
>> module Collatz_conjecture(rst, clk, startn, count, done);
>> input rst;
>> input clk;
>> input [127:0] startn;
>> output reg [127:0] count;
>> output reg done;
>> reg [127:0] n;
>>
>> always_ff @(posedge clk)
>> if (rst) begin
>> n <= startn;
>> count <= 0;
>> done <= FALSE;
>> end
>> else begin
>> if (!done) begin
>> if (~n[0])
>> n <= n >> 2'd1;
>> else
>> n <= n+n+n+1;
>> if (n < startn)
>> done <= TRUE;
>> count <= count + 1;
>> end
>> end
>>
>> endmodule
>>
>
>
> Ok.
> 1. Your core misses one critical part - overflow check.
> Assuming 80-bit initial count, 128-bit register will overflow
> eventually. It would happen very rarely, but it would happen.
> The check for overflow does not have to be precise. Conservative
> approach, like looking at 2 MS bits and one LS bit and signaling
> overflow not when it really happened but when it *could* happen, i.e.
> when MS bits > 0 and LS bit='1' is good enough. But the check can't be
> omitted.
>
> 2. I see that instead of calculating n = (n*3+1)/2 as one step you are
> doing it in two separate steps. I am not sure that it is really makes
> your core smaller or faster. And it certainly increases # of iterations
> by 1/3rd.
>
> 3.
> 128-bit count is pointless loss of resources. 12 or 13 bit are
> sufficient. If the count exceeded that size then you already found
> something of high interest and it is a job of supervisor to find out
> what exactly had happened.
> 128-bit counter actually not just too big, it is also pointless,
> because if we pretend to believe that our purpose is to disprove
> Collatz's conjecture then supervisor would have to implement stop
> condition with some sort of timeout that will effectively duplicate
> functionality of the counter.
>
> 4.
> Your module does not latch the value of startn. It means that caller
> module has to do it on it's behalf. That is the most likely place where
> one can make a mistake.
>
> 5.
> Coding style comment. Normally, name rst is used for asynchronous
> reset signal. In your design rst is a synchronous signal. For the
> benefit of casual reader I suggest to find different name.
>
>
Re-wrote the core using Michael's suggestion of iterations. Looks
something like the below code.
Renamed the rst signal ld to better reflect its usage.
Timing is close to the same, but not as many cores can fit. It is a
trade-off, fewer cores, but they process more iterations in one go. Net
result is more numbers processed in the same length of time.
I am using 16x16 cores now. Overflow check using 130-bit numbers.
import const_pkg::*;
module Collatz_conjecture(clk, ld, startn, done, overflow);
input clk;
input ld;
input [127:0] startn;
output reg done;
output reg overflow;
reg [129:0] n;
always_ff @(posedge clk)
if (ld)
n <= 130'(startn);
else begin
if (!done)
case(n[1:0])
2'd0: n <= n >> 2'd2;
2'd1: n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
2'd2: n <= n[129:2]+n[129:2]+2'd2;
2'd3: n <= {n[129:2],3'd0}+n[129:2]+4'd8;
endcase
end
always_ff @(posedge clk)
if (ld)
done <= FALSE;
else begin
if (!done) begin
if (n < startn || (|n[129:128]))
done <= TRUE;
end
end
always_ff @(posedge clk)
if (ld)
overflow <= FALSE;
else begin
if (!done) begin
if (|n[129:128])
overflow <= TRUE;
end
end
endmodule
[toc] | [prev] | [next] | [standalone]
| From | Robert Finch <robfi680@gmail.com> |
|---|---|
| Date | 2026-02-22 19:47 -0500 |
| Message-ID | <10ng836$2j140$1@dont-email.me> |
| In reply to | #115103 |
On 2026-02-22 7:12 p.m., Robert Finch wrote:
> On 2026-02-21 1:36 p.m., Michael S wrote:
>> On Sat, 21 Feb 2026 01:56:03 -0500
>> Robert Finch <robfi680@gmail.com> wrote:
>>
>>> On 2026-02-19 5:10 a.m., Michael S wrote:
>>>> On Wed, 18 Feb 2026 22:56:19 -0500
>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>> On 2026-02-18 11:22 a.m., Michael S wrote:
>>>>>> On Sun, 15 Feb 2026 10:56:51 -0500
>>>>>> Robert Finch <robfi680@gmail.com> wrote:
>>>>>>> I have coded a version that tests up to 8192 number
>>>>>>> simultaneously. It is supposed to be able to run at over 200
>>>>>>> MHz. Some of the number would take a large number of iterations
>>>>>>> though.
>>>>>>>
>>>>>>> I found something strange, the code to count for more numbers
>>>>>>> does not increase the size of the core very much. (But it takes
>>>>>>> longer to build.) I am not sure if this occurs because there is
>>>>>>> a mitsake somewhere, or if the tools are able to figure out how
>>>>>>> to share all the logic. I am thinking the logic for adders is
>>>>>>> shared somehow. 128-bit arithmetic is being used.
>>>>>>
>>>>>> There is a mistake in your code. No other explanation is possible.
>>>>>>
>>>>>> You should look for 3 sorts of mistakes
>>>>>> 1. Core itself.
>>>>>> The most likely mistake is failing to look for one of the ending
>>>>>> conditions. You have to look for 3 conditions:
>>>>>> a) - value produced at current step is smaller than initial value.
>>>>>> That is successful ending. Here you signal to the scheduler
>>>>>> that you are ready to accept the next value.
>>>>>> b) - overflow. Next value exceeds the width of register.
>>>>>> c) - timeout. You were running for more than N steps. N=8192 looks
>>>>>> like the reasonable value.
>>>>>> a) and b) are suspected failures.
>>>>>> The handling for b) and c) is similar - you raise alert signal and
>>>>>> stop. Your test unit has to provide supervisor circuity with the
>>>>>> mean to fetch the initial value that caused suspected failure and
>>>>>> to clear alert condition.
>>>>>> After alert is cleared your unit returns to the same state as
>>>>>> after success - waiting for next initial value.
>>>>>
>>>>> I found the mistake. Core was working, but groups of cores were
>>>>> not. Done signals were missing.
>>>>>>
>>>>>> 2. Scheduler.
>>>>>> In proof-of-concept design the scheduler does not have to be very
>>>>>> realistic. For example, it does not have to be efficiently
>>>>>> pipelined. But it should be able to monitor ready signals of all
>>>>>> test units and to supply different initial data to different
>>>>>> units. Supplying the same data to all units simultaneously is a
>>>>>> sort of mistake that explains your observed behavior.
>>>>> There is really simple scheduling ATM. There is just a bunch of
>>>>> done signals that are gated together. Meaning the slowest
>>>>> calculation of the group will slow everything else down. Once all
>>>>> the dones are true the next set of numbers are loaded.
>>>>>> 3. Supervisor
>>>>>> Again, in proof-of-concept design the supervisor does not have to
>>>>>> be very realistic. But it should be able to selectively exercise
>>>>>> all core feature related to alert, including fetching of initial
>>>>>> data and clearing of alert.
>>>>>>
>>>>>> After you coded everything correctly, do not even try to compile
>>>>>> test with 8000 units. That is totally unrealistic. Even if you
>>>>>> find huge FPGA in which such number of testers could fit (and I
>>>>>> am not sure that such FPGA exists) the compilation will take many
>>>>>> hours, possibly days.
>>>>>>
>>>>>>
>>>>> Yeah 8,000 was definitely out of the picture. I had since coded
>>>>> 24x24 matrix (576) and although the logic would fit that turned
>>>>> out not to be able to route. So, it is down to 20x20. 20x20 with
>>>>> no real scheduling fits.
>>>>>
>>>>> I tried displaying progress as an on-screen bitmap, but that just
>>>>> appeared noisy. I have been working on an SCP so may be able to use
>>>>> that to manage the scheduling with some custom logic.
>>>>>
>>>>>
>>>>
>>>> Does 20x20 means 400 cores with something like 80-bit initial values
>>>> and 128-bit processing?
>>> Yes. 128-bit processing.
>>>
>>>> If you didn't make further mistakes then it should be rather big,
>>>> order of 300K LEs. It would not fit in the biggest low-end Altera
>>>> device (Cyclone 10GX 10CX220).
>>>
>>> IIRC it is about 150k LUTs. It uses the simplest approach, which is
>>> not very fast iteration wise, but it allows a lot of cores and a fast
>>> clock cycles time. One core is very small and has a fast iteration (1
>>> clock cycle). But it takes a lot more iterations than can be done
>>> with a more brainiac approach.
>>>
>>> Using a more brainiac approach would likely cut the performance in
>>> half and use a lot more LUTs.
>>>
>>
>> According to my experiments, combining 2 steps have very small negative
>> effect on achivable clock and increases area by ~1.8x. So, to me it
>> looks like a win.
>> That's the combined step I am talking about:
>> x = n/4;
>> switch (x % 4) {
>> case 0: n = x; break;
>> case 1: n = 3*x+1; break;
>> case 2: n = 2*x+2; break;
>> case 3: n = 9*x+8; break;
>> }
>>
>>
>>
>>>> On the Xilinx side, it would not fit in any Spartan and in most
>>>> Artix devices. May be, it would fit in the biggest Artix
>>>> UltraScale+ (AU25P). But more realistically for 400 test units one
>>>> would want Arria 10 on Altera side or Kintex on Xilinx side. Both
>>>> are not prohibitively expensive, but not cheap.
>>>>
>>>> What device are you using?
>>> I am not using an too inexpensive device. I have a Kintex-325 with
>>> about 200k LUTs.
>>>
>>
>> According to Xilinx marketing materials XC7K325T has 326080 equivalents
>> of traditional LEs.
>> That explains why your design fits.
>>
>> Pay attention that while Kintex evaluation boards could be
>> non-expensive, right now buying devices for actual production work is
>> rather costly, especially in small quantities.
>> The lowest price I see on digikey site is 1147 USD. That's a bad value
>> relatively to many other FPGAs.
>>
>>
>>>> If it is smaller than Kintex KU5P then I'd strongly suspect that you
>>>> didn't clean out all mistakes.
>>>>
>>>>
>>> There could be more mistakes for sure. But I am sure the lowest level
>>> core works. It seemed to in simulation. I made a slight mod to it
>>> since I tested it last, checking n < startn.
>>> Counts are not used so the logic would be stripped out, but its good
>>> for verification.
>>>
>>> 1 Core is:
>>>
>>> module Collatz_conjecture(rst, clk, startn, count, done);
>>> input rst;
>>> input clk;
>>> input [127:0] startn;
>>> output reg [127:0] count;
>>> output reg done;
>>> reg [127:0] n;
>>>
>>> always_ff @(posedge clk)
>>> if (rst) begin
>>> n <= startn;
>>> count <= 0;
>>> done <= FALSE;
>>> end
>>> else begin
>>> if (!done) begin
>>> if (~n[0])
>>> n <= n >> 2'd1;
>>> else
>>> n <= n+n+n+1;
>>> if (n < startn)
>>> done <= TRUE;
>>> count <= count + 1;
>>> end
>>> end
>>>
>>> endmodule
>>>
>>
>>
>> Ok.
>> 1. Your core misses one critical part - overflow check.
>> Assuming 80-bit initial count, 128-bit register will overflow
>> eventually. It would happen very rarely, but it would happen.
>> The check for overflow does not have to be precise. Conservative
>> approach, like looking at 2 MS bits and one LS bit and signaling
>> overflow not when it really happened but when it *could* happen, i.e.
>> when MS bits > 0 and LS bit='1' is good enough. But the check can't be
>> omitted.
>>
>> 2. I see that instead of calculating n = (n*3+1)/2 as one step you are
>> doing it in two separate steps. I am not sure that it is really makes
>> your core smaller or faster. And it certainly increases # of iterations
>> by 1/3rd.
>>
>> 3.
>> 128-bit count is pointless loss of resources. 12 or 13 bit are
>> sufficient. If the count exceeded that size then you already found
>> something of high interest and it is a job of supervisor to find out
>> what exactly had happened.
>> 128-bit counter actually not just too big, it is also pointless,
>> because if we pretend to believe that our purpose is to disprove
>> Collatz's conjecture then supervisor would have to implement stop
>> condition with some sort of timeout that will effectively duplicate
>> functionality of the counter.
>>
>> 4.
>> Your module does not latch the value of startn. It means that caller
>> module has to do it on it's behalf. That is the most likely place where
>> one can make a mistake.
>>
>> 5.
>> Coding style comment. Normally, name rst is used for asynchronous
>> reset signal. In your design rst is a synchronous signal. For the
>> benefit of casual reader I suggest to find different name.
>>
>>
> Re-wrote the core using Michael's suggestion of iterations. Looks
> something like the below code.
> Renamed the rst signal ld to better reflect its usage.
> Timing is close to the same, but not as many cores can fit. It is a
> trade-off, fewer cores, but they process more iterations in one go. Net
> result is more numbers processed in the same length of time.
> I am using 16x16 cores now. Overflow check using 130-bit numbers.
>
> import const_pkg::*;
>
> module Collatz_conjecture(clk, ld, startn, done, overflow);
> input clk;
> input ld;
> input [127:0] startn;
> output reg done;
> output reg overflow;
>
> reg [129:0] n;
>
> always_ff @(posedge clk)
> if (ld)
> n <= 130'(startn);
> else begin
> if (!done)
> case(n[1:0])
> 2'd0: n <= n >> 2'd2;
> 2'd1: n <= n[129:2]+n[129:2]+n[129:2]+2'd1;
> 2'd2: n <= n[129:2]+n[129:2]+2'd2;
> 2'd3: n <= {n[129:2],3'd0}+n[129:2]+4'd8;
> endcase
> end
>
> always_ff @(posedge clk)
> if (ld)
> done <= FALSE;
> else begin
> if (!done) begin
> if (n < startn || (|n[129:128]))
> done <= TRUE;
> end
> end
>
> always_ff @(posedge clk)
> if (ld)
> overflow <= FALSE;
> else begin
> if (!done) begin
> if (|n[129:128])
> overflow <= TRUE;
> end
> end
>
> endmodule
>
>
Just noticed overflow needs a couple more extra bits since multiplying by 9.
[toc] | [prev] | [next] | [standalone]
| From | EricP <ThatWouldBeTelling@thevillage.com> |
|---|---|
| Date | 2026-02-21 15:39 -0500 |
| Message-ID | <NKomR.158378$N9Eb.26408@fx46.iad> |
| In reply to | #115066 |
Robert Finch wrote:
> On 2026-02-19 5:10 a.m., Michael S wrote:
>
>> If it is smaller than Kintex KU5P then I'd strongly suspect that you
>> didn't clean out all mistakes.
>>
>>
>>
> There could be more mistakes for sure. But I am sure the lowest level
> core works. It seemed to in simulation. I made a slight mod to it since
> I tested it last, checking n < startn.
> Counts are not used so the logic would be stripped out, but its good for
> verification.
>
> 1 Core is:
>
> module Collatz_conjecture(rst, clk, startn, count, done);
> input rst;
> input clk;
> input [127:0] startn;
> output reg [127:0] count;
> output reg done;
> reg [127:0] n;
>
> always_ff @(posedge clk)
> if (rst) begin
> n <= startn;
> count <= 0;
> done <= FALSE;
> end
> else begin
> if (!done) begin
> if (~n[0])
> n <= n >> 2'd1;
> else
> n <= n+n+n+1;
> if (n < startn)
> done <= TRUE;
> count <= count + 1;
> end
> end
>
> endmodule
I have a minimal knowledge of Verilog and may be wrong but...
- it looks to me in the code as written that startn, count and done args
are coded either input or output, but are actually both read and written.
The compiler should have given an error as that implies multiple drivers
for those wires (the caller module and the assignments).
Possibly it didn't detect an error because it defaults to data type
"logic" which has 4 states, one of which is tri-state 'z'.
Maybe setting the data types to 'bit' would allow it to detect the error.
Changing them to "inout" would turn them all into tri-state buses which
is probably not what you want.
Possibly have separate args for input and output:
input bit[127:0] iCount;
output bit]127:0] oCount.
- I don't know how smart your compiler is but the >> shift operator can
be replaced by a bit slice operation that only routes wires:
if (~n[0])
n <= {1'b0, n[127:1]};
- it can eliminate a 128 bit adder by shifting the wires left:
else
n <= {n[126:0], 1'b0} + n + 1;
[toc] | [prev] | [next] | [standalone]
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2026-02-22 03:16 +0200 |
| Message-ID | <20260222031658.00002f1c@yahoo.com> |
| In reply to | #115077 |
On Sat, 21 Feb 2026 15:39:42 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:
> Robert Finch wrote:
> > On 2026-02-19 5:10 a.m., Michael S wrote:
> >
> >> If it is smaller than Kintex KU5P then I'd strongly suspect that
> >> you didn't clean out all mistakes.
> >>
> >>
> >>
> > There could be more mistakes for sure. But I am sure the lowest
> > level core works. It seemed to in simulation. I made a slight mod
> > to it since I tested it last, checking n < startn.
> > Counts are not used so the logic would be stripped out, but its
> > good for verification.
> >
> > 1 Core is:
> >
> > module Collatz_conjecture(rst, clk, startn, count, done);
> > input rst;
> > input clk;
> > input [127:0] startn;
> > output reg [127:0] count;
> > output reg done;
> > reg [127:0] n;
> >
> > always_ff @(posedge clk)
> > if (rst) begin
> > n <= startn;
> > count <= 0;
> > done <= FALSE;
> > end
> > else begin
> > if (!done) begin
> > if (~n[0])
> > n <= n >> 2'd1;
> > else
> > n <= n+n+n+1;
> > if (n < startn)
> > done <= TRUE;
> > count <= count + 1;
> > end
> > end
> >
> > endmodule
>
> I have a minimal knowledge of Verilog and may be wrong but...
I am not a Verilog expert either (use VHDL exclusively), but to me the
code of Robert Finch looks o.k from the HDL perspective (functionality
is something else, I addressed it in my replay below).
His Verilog appears to be an exact equivalent of VHDL code below,
which is perfectly legal synthesizable code under VHDL-2008 rules.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity Collatz_conjecture is
port (
rst : in boolean;
clk : in std_logic;
startn : in unsigned(127 downto 0);
count : out unsigned(127 downto 0);
done : out boolean
);
end entity Collatz_conjecture;
architecture a of Collatz_conjecture is
signal n : unsigned(127 downto 0);
begin
process (clk)
begin
if rising_edge(clk) then
if rst then
n <= startn;
count <= (others => '0');
done <= false;
elsif not done then
if n(0)='0' then
n <= '0' & n(n'high downto 1);
else
n <= n+n+n+1;
end if;
if n < startn then
done <= true;
end if;
count <= count + 1;
end if;
end if;
end process;
end architecture a;
[toc] | [prev] | [next] | [standalone]
| From | EricP <ThatWouldBeTelling@thevillage.com> |
|---|---|
| Date | 2026-02-22 14:48 -0500 |
| Message-ID | <65JmR.56061$nJQ7.5512@fx34.iad> |
| In reply to | #115082 |
Michael S wrote: > On Sat, 21 Feb 2026 15:39:42 -0500 > EricP <ThatWouldBeTelling@thevillage.com> wrote: > >> Robert Finch wrote: >>> On 2026-02-19 5:10 a.m., Michael S wrote: >>> >>>> If it is smaller than Kintex KU5P then I'd strongly suspect that >>>> you didn't clean out all mistakes. >>>> >>>> >>>> >>> There could be more mistakes for sure. But I am sure the lowest >>> level core works. It seemed to in simulation. I made a slight mod >>> to it since I tested it last, checking n < startn. >>> Counts are not used so the logic would be stripped out, but its >>> good for verification. >>> >>> 1 Core is: >>> >>> module Collatz_conjecture(rst, clk, startn, count, done); >>> input rst; >>> input clk; >>> input [127:0] startn; >>> output reg [127:0] count; >>> output reg done; >>> reg [127:0] n; >>> >>> always_ff @(posedge clk) >>> if (rst) begin >>> n <= startn; >>> count <= 0; >>> done <= FALSE; >>> end >>> else begin >>> if (!done) begin >>> if (~n[0]) >>> n <= n >> 2'd1; >>> else >>> n <= n+n+n+1; >>> if (n < startn) >>> done <= TRUE; >>> count <= count + 1; >>> end >>> end >>> >>> endmodule >> I have a minimal knowledge of Verilog and may be wrong but... > > I am not a Verilog expert either (use VHDL exclusively), but to me the > code of Robert Finch looks o.k from the HDL perspective (functionality > is something else, I addressed it in my replay below). > His Verilog appears to be an exact equivalent of VHDL code below, > which is perfectly legal synthesizable code under VHDL-2008 rules. > > > library ieee; > use ieee.std_logic_1164.all; > use ieee.numeric_std.all; > > entity Collatz_conjecture is > port ( > rst : in boolean; > clk : in std_logic; > startn : in unsigned(127 downto 0); > count : out unsigned(127 downto 0); > done : out boolean > ); > end entity Collatz_conjecture; > > architecture a of Collatz_conjecture is > signal n : unsigned(127 downto 0); > begin > process (clk) > begin > if rising_edge(clk) then > if rst then > n <= startn; > count <= (others => '0'); > done <= false; > elsif not done then > if n(0)='0' then > n <= '0' & n(n'high downto 1); > else > n <= n+n+n+1; > end if; > > if n < startn then > done <= true; > end if; > > count <= count + 1; > end if; > end if; > end process; > end architecture a; > > Well then I just don't get this at all. I have a VHDL textbook that explicitly states that IN ports are input only, OUT ports are output only, and INOUT are input and output (tri-state). The language spec SystemVerilog IEEE_Std-1800-2017 (it's a horrid language - the spec is 1315 pages long, and so complex that no compiler implements all of it, and some compilers still use the pre-2005 language spec) says in section "23.3.3.2 Port connection rules for variables" that "Assignments to variables declared as input ports shall be illegal." "Procedural or continuous assignments to a variable connected to the output port of an instance shall be illegal." but it also says "An output port can be connected to a net (or a concatenation) of a compatible data type. In this case, multiple drivers shall be permitted on the net" and in "23.3.3.1 Port coercion A port that is declared as input (output) but used as an output (input) or inout may be coerced to inout. If not coerced to inout, a warning shall be issued." but never defines what "coerced" means or how to do it. Verilog-2017 also has the "uwire" data type which is a uni-driver type that only allows a single driver for the wire. (But some compilers don't support uwire) "6.6.2 Unresolved nets The uwire net is an unresolved or unidriver wire and is used to model nets that allow only a single driver. The uwire type can be used to enforce this restriction. It shall be an error to connect any bit of a uwire net to more than one driver. It shall be an error to connect a uwire net to a bidirectional terminal of a bidirectional pass switch. The port connection rule in 23.3.3.6 enforces this restriction across the net hierarchy or shall issue a warning if not." "23.3.3.6 Single source nets (uwire) If the net on either side of a port has the net type uwire, a warning shall be issued if the nets are not merged into a single net, as described in 23.3.3.7." and spec contradicts itself as to whether it is an error or warning.
[toc] | [prev] | [next] | [standalone]
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2026-02-22 23:25 +0200 |
| Message-ID | <20260222232547.00003fff@yahoo.com> |
| In reply to | #115098 |
On Sun, 22 Feb 2026 14:48:13 -0500 EricP <ThatWouldBeTelling@thevillage.com> wrote: > Michael S wrote: > > On Sat, 21 Feb 2026 15:39:42 -0500 > > EricP <ThatWouldBeTelling@thevillage.com> wrote: > > > >> Robert Finch wrote: > >>> On 2026-02-19 5:10 a.m., Michael S wrote: > >>> > >>>> If it is smaller than Kintex KU5P then I'd strongly suspect that > >>>> you didn't clean out all mistakes. > >>>> > >>>> > >>>> > >>> There could be more mistakes for sure. But I am sure the lowest > >>> level core works. It seemed to in simulation. I made a slight mod > >>> to it since I tested it last, checking n < startn. > >>> Counts are not used so the logic would be stripped out, but its > >>> good for verification. > >>> > >>> 1 Core is: > >>> > >>> module Collatz_conjecture(rst, clk, startn, count, done); > >>> input rst; > >>> input clk; > >>> input [127:0] startn; > >>> output reg [127:0] count; > >>> output reg done; > >>> reg [127:0] n; > >>> > >>> always_ff @(posedge clk) > >>> if (rst) begin > >>> n <= startn; > >>> count <= 0; > >>> done <= FALSE; > >>> end > >>> else begin > >>> if (!done) begin > >>> if (~n[0]) > >>> n <= n >> 2'd1; > >>> else > >>> n <= n+n+n+1; > >>> if (n < startn) > >>> done <= TRUE; > >>> count <= count + 1; > >>> end > >>> end > >>> > >>> endmodule > >> I have a minimal knowledge of Verilog and may be wrong but... > > > > I am not a Verilog expert either (use VHDL exclusively), but to me > > the code of Robert Finch looks o.k from the HDL perspective > > (functionality is something else, I addressed it in my replay > > below). His Verilog appears to be an exact equivalent of VHDL code > > below, which is perfectly legal synthesizable code under VHDL-2008 > > rules. > > > > > > library ieee; > > use ieee.std_logic_1164.all; > > use ieee.numeric_std.all; > > > > entity Collatz_conjecture is > > port ( > > rst : in boolean; > > clk : in std_logic; > > startn : in unsigned(127 downto 0); > > count : out unsigned(127 downto 0); > > done : out boolean > > ); > > end entity Collatz_conjecture; > > > > architecture a of Collatz_conjecture is > > signal n : unsigned(127 downto 0); > > begin > > process (clk) > > begin > > if rising_edge(clk) then > > if rst then > > n <= startn; > > count <= (others => '0'); > > done <= false; > > elsif not done then > > if n(0)='0' then > > n <= '0' & n(n'high downto 1); > > else > > n <= n+n+n+1; > > end if; > > > > if n < startn then > > done <= true; > > end if; > > > > count <= count + 1; > > end if; > > end if; > > end process; > > end architecture a; > > > > > > Well then I just don't get this at all. > I have a VHDL textbook that explicitly states that IN ports are input > only, OUT ports are output only, and INOUT are input and output > (tri-state). > Sounds like old textbook, not updated with VHDL-2008 changes related to what is legal for 'out' ports. See, for example, here: https://docs.amd.com/r/en-US/ug901-vivado-synthesis/Reading-Output-Ports
[toc] | [prev] | [next] | [standalone]
| From | Terje Mathisen <terje.mathisen@tmsw.no> |
|---|---|
| Date | 2026-02-19 11:51 +0100 |
| Message-ID | <10n6pup$3efqs$1@dont-email.me> |
| In reply to | #115038 |
Robert Finch wrote: > On 2026-02-18 11:22 a.m., Michael S wrote: >> On Sun, 15 Feb 2026 10:56:51 -0500 >> Robert Finch <robfi680@gmail.com> wrote: >> >>> I have coded a version that tests up to 8192 number simultaneously. >>> It is supposed to be able to run at over 200 MHz. Some of the number >>> would take a large number of iterations though. >>> >>> I found something strange, the code to count for more numbers does >>> not increase the size of the core very much. (But it takes longer to >>> build.) I am not sure if this occurs because there is a mitsake >>> somewhere, or if the tools are able to figure out how to share all >>> the logic. I am thinking the logic for adders is shared somehow. >>> 128-bit arithmetic is being used. >>> >> >> There is a mistake in your code. No other explanation is possible. >> >> You should look for 3 sorts of mistakes >> 1. Core itself. >> The most likely mistake is failing to look for one of the ending >> conditions. You have to look for 3 conditions: >> a) - value produced at current step is smaller than initial value. >> That is successful ending. Here you signal to the scheduler that you >> are ready to accept the next value. >> b) - overflow. Next value exceeds the width of register. >> c) - timeout. You were running for more than N steps. N=8192 looks >> like the reasonable value. >> a) and b) are suspected failures. >> The handling for b) and c) is similar - you raise alert signal and stop. >> Your test unit has to provide supervisor circuity with the mean to >> fetch the initial value that caused suspected failure and to clear >> alert condition. >> After alert is cleared your unit returns to the same state as after >> success - waiting for next initial value. > > I found the mistake. Core was working, but groups of cores were not. > Done signals were missing. > >> >> 2. Scheduler. >> In proof-of-concept design the scheduler does not have to be very >> realistic. For example, it does not have to be efficiently pipelined. >> But it should be able to monitor ready signals of all test units and to >> supply different initial data to different units. >> Supplying the same data to all units simultaneously is a sort of >> mistake that explains your observed behavior. >> > There is really simple scheduling ATM. There is just a bunch of done > signals that are gated together. Meaning the slowest calculation of the > group will slow everything else down. Once all the dones are true the > next set of numbers are loaded. The obvious choice is to have a work-to-do list, each work item containing a start-end set of numbers to test. Each worker thread would simply finish its current task, store the/any results, and then ask for the next available task. A single producer - multiple consumers queue like this is very simple to get right with minimum admins: Just a single LOCK XADD to update the queue back index, grabbing the corresponding task. I used this exact setup for my wire speed multi-threaded ntpd deamon about two decades ago. (I did need a few more XADD instructions to handle queue full/empty situations which are not applicable for the current problem.) Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
[toc] | [prev] | [next] | [standalone]
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2026-02-19 23:12 +0200 |
| Message-ID | <20260219231241.0000718c@yahoo.com> |
| In reply to | #115042 |
On Thu, 19 Feb 2026 11:51:03 +0100 Terje Mathisen <terje.mathisen@tmsw.no> wrote: > Robert Finch wrote: > > On 2026-02-18 11:22 a.m., Michael S wrote: > >> On Sun, 15 Feb 2026 10:56:51 -0500 > >> Robert Finch <robfi680@gmail.com> wrote: > >> > >>> I have coded a version that tests up to 8192 number > >>> simultaneously. It is supposed to be able to run at over 200 MHz. > >>> Some of the number would take a large number of iterations though. > >>> > >>> I found something strange, the code to count for more numbers does > >>> not increase the size of the core very much. (But it takes longer > >>> to build.) I am not sure if this occurs because there is a mitsake > >>> somewhere, or if the tools are able to figure out how to share all > >>> the logic. I am thinking the logic for adders is shared somehow. > >>> 128-bit arithmetic is being used. > >>> > >> > >> There is a mistake in your code. No other explanation is possible. > >> > >> You should look for 3 sorts of mistakes > >> 1. Core itself. > >> The most likely mistake is failing to look for one of the ending > >> conditions. You have to look for 3 conditions: > >> a) - value produced at current step is smaller than initial value. > >> That is successful ending. Here you signal to the scheduler that > >> you are ready to accept the next value. > >> b) - overflow. Next value exceeds the width of register. > >> c) - timeout. You were running for more than N steps. N=8192 looks > >> like the reasonable value. > >> a) and b) are suspected failures. > >> The handling for b) and c) is similar - you raise alert signal and > >> stop. Your test unit has to provide supervisor circuity with the > >> mean to fetch the initial value that caused suspected failure and > >> to clear alert condition. > >> After alert is cleared your unit returns to the same state as after > >> success - waiting for next initial value. > > > > I found the mistake. Core was working, but groups of cores were > > not. Done signals were missing. > > > >> > >> 2. Scheduler. > >> In proof-of-concept design the scheduler does not have to be very > >> realistic. For example, it does not have to be efficiently > >> pipelined. But it should be able to monitor ready signals of all > >> test units and to supply different initial data to different units. > >> Supplying the same data to all units simultaneously is a sort of > >> mistake that explains your observed behavior. > >> > > There is really simple scheduling ATM. There is just a bunch of > > done signals that are gated together. Meaning the slowest > > calculation of the group will slow everything else down. Once all > > the dones are true the next set of numbers are loaded. > > The obvious choice is to have a work-to-do list, each work item > containing a start-end set of numbers to test. > > Each worker thread would simply finish its current task, store > the/any results, and then ask for the next available task. > > A single producer - multiple consumers queue like this is very simple > to get right with minimum admins: Just a single LOCK XADD to update > the queue back index, grabbing the corresponding task. > There is no XADD in FPGA, just gate, multipliers and memory blocks. > I used this exact setup for my wire speed multi-threaded ntpd deamon > about two decades ago. (I did need a few more XADD instructions to > handle queue full/empty situations which are not applicable for the > current problem.) > > Terje > Did your ntpd deamon scale to 300-400 threads? 300 is the # of test units that fits in FPGAs that appears to be the most cost-effective for this sort of job, e.g. 10CX220YU484E5G.
[toc] | [prev] | [next] | [standalone]
| From | Terje Mathisen <terje.mathisen@tmsw.no> |
|---|---|
| Date | 2026-02-20 15:11 +0100 |
| Message-ID | <10n9q2f$eof0$1@dont-email.me> |
| In reply to | #115050 |
Michael S wrote: > On Thu, 19 Feb 2026 11:51:03 +0100 > Terje Mathisen <terje.mathisen@tmsw.no> wrote: > >> Robert Finch wrote: >>> On 2026-02-18 11:22 a.m., Michael S wrote: >>>> On Sun, 15 Feb 2026 10:56:51 -0500 >>>> Robert Finch <robfi680@gmail.com> wrote: >>>> >>>>> I have coded a version that tests up to 8192 number >>>>> simultaneously. It is supposed to be able to run at over 200 MHz. >>>>> Some of the number would take a large number of iterations though. >>>>> >>>>> I found something strange, the code to count for more numbers does >>>>> not increase the size of the core very much. (But it takes longer >>>>> to build.) I am not sure if this occurs because there is a mitsake >>>>> somewhere, or if the tools are able to figure out how to share all >>>>> the logic. I am thinking the logic for adders is shared somehow. >>>>> 128-bit arithmetic is being used. >>>>> >>>> >>>> There is a mistake in your code. No other explanation is possible. >>>> >>>> You should look for 3 sorts of mistakes >>>> 1. Core itself. >>>> The most likely mistake is failing to look for one of the ending >>>> conditions. You have to look for 3 conditions: >>>> a) - value produced at current step is smaller than initial value. >>>> That is successful ending. Here you signal to the scheduler that >>>> you are ready to accept the next value. >>>> b) - overflow. Next value exceeds the width of register. >>>> c) - timeout. You were running for more than N steps. N=8192 looks >>>> like the reasonable value. >>>> a) and b) are suspected failures. >>>> The handling for b) and c) is similar - you raise alert signal and >>>> stop. Your test unit has to provide supervisor circuity with the >>>> mean to fetch the initial value that caused suspected failure and >>>> to clear alert condition. >>>> After alert is cleared your unit returns to the same state as after >>>> success - waiting for next initial value. >>> >>> I found the mistake. Core was working, but groups of cores were >>> not. Done signals were missing. >>> >>>> >>>> 2. Scheduler. >>>> In proof-of-concept design the scheduler does not have to be very >>>> realistic. For example, it does not have to be efficiently >>>> pipelined. But it should be able to monitor ready signals of all >>>> test units and to supply different initial data to different units. >>>> Supplying the same data to all units simultaneously is a sort of >>>> mistake that explains your observed behavior. >>>> >>> There is really simple scheduling ATM. There is just a bunch of >>> done signals that are gated together. Meaning the slowest >>> calculation of the group will slow everything else down. Once all >>> the dones are true the next set of numbers are loaded. >> >> The obvious choice is to have a work-to-do list, each work item >> containing a start-end set of numbers to test. >> >> Each worker thread would simply finish its current task, store >> the/any results, and then ask for the next available task. >> >> A single producer - multiple consumers queue like this is very simple >> to get right with minimum admins: Just a single LOCK XADD to update >> the queue back index, grabbing the corresponding task. >> > > There is no XADD in FPGA, just gate, multipliers and memory blocks. Oops! :-) > >> I used this exact setup for my wire speed multi-threaded ntpd deamon >> about two decades ago. (I did need a few more XADD instructions to >> handle queue full/empty situations which are not applicable for the >> current problem.) >> > > Did your ntpd deamon scale to 300-400 threads? > 300 is the # of test units that fits in FPGAs that appears to be the > most cost-effective for this sort of job, e.g. 10CX220YU484E5G. Scaling does not depend on the number of threads, only on the frequency of pulls from the queue: My code was very latency sensitive so I could never batch up multiple requests, while this particular "Gedankenexperiment" can very easily use arbitrarily large work lists, right? Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"
[toc] | [prev] | [next] | [standalone]
| From | Stefan Monnier <monnier@iro.umontreal.ca> |
|---|---|
| Date | 2026-02-19 15:27 -0500 |
| Message-ID | <jwvwm08a3e4.fsf-monnier+comp.arch@gnu.org> |
| In reply to | #115038 |
I must say I find this whole thread baffling: when searching for a counter example and we're already in the order of 2^70, it seems clear that constant-factor improvements are completely irrelevant (even with large constants). I guess it all boils down to the "useless" part of the title? === Stefan
[toc] | [prev] | [next] | [standalone]
| From | Michael S <already5chosen@yahoo.com> |
|---|---|
| Date | 2026-02-20 11:41 +0200 |
| Message-ID | <20260220114153.0000416f@yahoo.com> |
| In reply to | #115049 |
On Thu, 19 Feb 2026 15:27:07 -0500 Stefan Monnier <monnier@iro.umontreal.ca> wrote: > I must say I find this whole thread baffling: when searching for > a counter example and we're already in the order of 2^70, it seems > clear that constant-factor improvements are completely irrelevant > (even with large constants). > > I guess it all boils down to the "useless" part of the title? > > > === Stefan As I wrote in earlier post, "useless" sounds too optimistic. "Useless" in my book is proving or disproving 99% of conjectures known in the numbers theory, both obscure and extremely famous (Fermat's Last Theorem). And then 1% process ends up very useful. But Collatz's conjecture believed to be true. Trying to disprove it by experiments is futile. Finding that all sequence with initial value in range [2**71:2**72] converge to 1, which is by far the most likely outcome of experiment, leaves us in exactly the same place where we started, i.e. conjecture believed to be true, but unproven. That's not good enough to warrant a title of "useless". "Pointless" sounds more proper, but may be we shall find something more derogatory. However, it's fun. My current estimate, after considering how "smart" filtering by modulo 2**20 can be made practical, is that a single FPGA that costs 300 USD would be able to test 2**71 values in ~500 years. Which means that the company I am working for can be convinced to build gear capable of doing calculations in 1 year for something like 1M USD or a little more. I'd guess that smaller and hungrier engineering firms would accept the job for 1/3rd of that sum. So, who has a pocket money to spend?
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2026-02-22 18:41 -0600 |
| Message-ID | <10ng813$2im23$1@dont-email.me> |
| In reply to | #115049 |
On 2/19/2026 2:27 PM, Stefan Monnier wrote:
> I must say I find this whole thread baffling: when searching for
> a counter example and we're already in the order of 2^70, it seems clear
> that constant-factor improvements are completely irrelevant (even with
> large constants).
>
> I guess it all boils down to the "useless" part of the title?
>
Yeah, basically.
It is useless in the sense that no matter how big you go, an exception
wont be found, so rendering further testing essentially meaningless
(doesn't prove anything unless an exception is found, exception wont be
found if none can exist).
I did feel tempted to write a program to draw a graph...
Well, then did so, and posted a partial graph on X:
https://x.com/cr88192/status/2025698159519817849
Though, a linear/log mismatch makes it look wonky, and got annoyed
fiddling with trying to make the code to draw the graph not make it look
like a mess.
It is like, now realizing how it works and the fairly obvious nature of
it being true, but also realizing that I am also not in a position to be
able to prove it as such, is making me feel stupid.
But, alas, there are reasons I am not really a math of physics person...
Well, at least excluding the guys who go around promoting wacky/crazy
ideas like recursive compression and perpetual motion machines and
similar (or, all the "quantum consciousness" stuff and similar),
wouldn't really want to be associated with this crowd either...
Actually, this latter camp I don't get:
If you can easily test things and see that they don't work, what is the
point of continuing to promote them as-if they do?...
Or, like the whole fusion thing:
Can someone make it happen? Yes.
Can one get net positive energy output? Not so much.
Or, the seeming futility of trying to send humans on interstellar trips.
Basically, no way to send full sized humans interstellar within any
plausible timescale or resource/energy budget. That said, not entirely
convinced it is outside the reach of machines (so, my sci-fi stuff
wouldn't involve sending humans, rather machines with enough to try to
bootstrap the process).
Ship can get to target system, find some asteroids or similar to begin
consuming, building itself up into a more significant structure, then
begins building other machinery it needs; building whatever it needs to
initiate the colonization process. Even sending a full-set of organic
samples is a bit of a stretch, so more likely it would preferentially
bootstrap high priority species from actual cell lines, but may need
backup strategies (recreating the whole biosphere from yeast or similar
if need-be). In some sense, the main "payload" not likely being lots of
organic matter, but enough dense information storage to be able to
initiate the bootstrap process (and a machine just smart enough to be
able to figure out how to find suitable asteroids and start the process).
Galactic civilizations? No. Any such colonies being unlikely to realize
the others exist, much less have any real way to communicate with them
(light-speed limits effectively making any sort of large scale
"civilization" all but impossible).
Etc.
Well, except in cases where I am feeling half tempted to do stories that
just throw realism entirely out the window, "yeah, errm, how about we
have wooden pirate ships, in space, fighting giant robots?..." maybe
take the "solar wind" part a little too literally: have air in space and
the ships just sorta tack-sailing on the wind ("moment of inertia be
damned", things still move like they were floating on water, but in
space). Stuff basically works like in 1980s cartoons (like, you know,
the ones that mostly existed as glorified toy commercials). Then the
worlds' physics are based less on realism and more a mix of "rule of
cool" and "rule of funny".
Say, also:
Ships battle off giant robots by shooting at them with old style canons;
People can expand to 50ft tall to fist-fight said robots;
Rocket flatulence (can fly around by essentially farting fire);
Trinket based costume changes;
Giant robots vs dinosaurs (maybe also in space, dinosaurs also having
rocket flatulence, and fire breath, ...);
Obvious plot armor;
People can escape restraints by burning them off with eye lasers;
...
But, say:
Characters are not just going around like Superman or something (because
then there are no stakes), and thus only able to use their
higher-powered abilities when the plot demands it.
Did recently throw a fragment of this sort of idea at Grok Imagine, but
while it didn't give exactly what I was imagining, it was in the same
general area:
https://x.com/cr88192/status/2025021520507011292
But, alas in this case the video generation only gives a few seconds
(for much more than this, would cost money...).
Because, alas, reality is just kinda depressing sometimes...
Well, except a while back, when I did write a story set in a sort of
reverse-urban-fantasy setting:
As opposed to the typical "fantasy stuff in an urban setting";
More "stereotypical urban stuff in a fantasy setting".
So, more rap-battling gangsta elves and stuff...
Then again, "pirate ships in space" wouldn't exactly be too far out of
place in a fantasy world. Could pass it off as "sci-fi" in a loose sense
in that while it is fairly obviously a fantasy setting, many of the
characters are not themselves aware that there is anything unusual with
them living within a fantasy-world magic system. But, then may as well
borrow ideas from "Rainbow Brite" or something, and have the ships
powered by "Star Sprinkles" or such (well, and puts a limit on how much
characters can do: The more powerful magic they use, the more rapidly
they use up their supply of star sprinkles...).
...
But, Alas...
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2026-02-23 11:10 +0000 |
| Message-ID | <10nhci9$2t9mc$1@dont-email.me> |
| In reply to | #115104 |
BGB <cr88192@gmail.com> schrieb: > On 2/19/2026 2:27 PM, Stefan Monnier wrote: >> I must say I find this whole thread baffling: when searching for >> a counter example and we're already in the order of 2^70, it seems clear >> that constant-factor improvements are completely irrelevant (even with >> large constants). >> >> I guess it all boils down to the "useless" part of the title? >> > > Yeah, basically. > > It is useless in the sense that no matter how big you go, an exception > wont be found, so rendering further testing essentially meaningless > (doesn't prove anything unless an exception is found, exception wont be > found if none can exist). You are assuming that the Collatz conjecture is true. But this is unknown, and the whole point of the excercise. Do I personally think that exchaustive search will find something? No. Does my personal belief matter in the slightest? No. [...] > Or, the seeming futility of trying to send humans on interstellar trips. > Basically, no way to send full sized humans interstellar within any > plausible timescale or resource/energy budget. My favorite candiate for that is the nuclear salt water rocket. Which will be difficult to build, or so my son (who just received his master's degree in nuclear engineering) tells me because it needs to be prompt critical. -- This USENET posting was made without artificial intelligence, artificial impertinence, artificial arrogance, artificial stupidity, artificial flavorings or artificial colorants.
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2026-02-23 17:24 -0600 |
| Message-ID | <10ninsn$3e0da$1@dont-email.me> |
| In reply to | #115112 |
On 2/23/2026 5:10 AM, Thomas Koenig wrote: > BGB <cr88192@gmail.com> schrieb: >> On 2/19/2026 2:27 PM, Stefan Monnier wrote: >>> I must say I find this whole thread baffling: when searching for >>> a counter example and we're already in the order of 2^70, it seems clear >>> that constant-factor improvements are completely irrelevant (even with >>> large constants). >>> >>> I guess it all boils down to the "useless" part of the title? >>> >> >> Yeah, basically. >> >> It is useless in the sense that no matter how big you go, an exception >> wont be found, so rendering further testing essentially meaningless >> (doesn't prove anything unless an exception is found, exception wont be >> found if none can exist). > > You are assuming that the Collatz conjecture is true. But this > is unknown, and the whole point of the excercise. > > Do I personally think that exchaustive search will find something? No. > > Does my personal belief matter in the slightest? No. > My own looking into this led me to realize that the number-space is modulo (relative to a cone with a spiral that repeats with every power-of-2), and that all the odd steps only ever increment the phase angle by 1, or fall down to a lower level (whenever the phase angle is not incremented). In effect, each odd step jumps to the opposite side of the cone, but slightly increments the angle. Since the phase-angle path can never move backwards, it can't touch any previous paths, thus, no loops can happen this way. The only way a cycle could exist is if (somehow) there were an impassable wall where pretty much *no* value could descend unless it hits the power-of-2 descent path, and then somehow miss this path as well. But, since "3*n+1" results in the minimum forward step, this means that one of the sides *has* to hit the power-of-2 descent cases at some point. Even if you did get very unlucky and hit odd-paths at nearly every step, it would eventually hit the power-of-2 escape case which is basically an immediate descent down to 1. Or, as I see it, the geometry of the number space itself prevents the possibility of a loop. Like, while I can't write a proof for it, I can still see that it is true in a geometric/mechanistic sense. Like, my confidence that it is true isn't so much because of a "no exception has been found, so I believe it is true" sense, but more because I can see the geometry of the number space well enough to see that no non-descending paths can exist through the space. > [...] > >> Or, the seeming futility of trying to send humans on interstellar trips. >> Basically, no way to send full sized humans interstellar within any >> plausible timescale or resource/energy budget. > > My favorite candiate for that is the nuclear salt water rocket. > Which will be difficult to build, or so my son (who just received > his master's degree in nuclear engineering) tells me because > it needs to be prompt critical. > Yeah, I have floated a few ideas... One I used in my own sci-fi was using cyclotrons and long linac (linear accelerator) stage to try to push atomic nuclei to relativistic speeds (and generate thrust mostly by the energy required to accelerating the nuclei). Comparably, thrust would be modest, but the amount of propellant used could be very small. Harder part is powering the thing... Main candidates are nuclear and fusion power. With fusion, one could separate out the He nuclei and use these. With nuclear, CO2 makes more sense as a working fluid and propellant (after first decomposing it into C and O nuclei). While fusion would be the preferable option, nuclear would be easier to build in practice (probably a molten-salt breeder reactor with CO2 as the working fluid). No light water reactors though, these would just suck... For high-speed travel, would likely need to have a lot of ablative shielding. My current leaning here basically being to put a very big chunk of graphite on the front of the ship. Could use magnetic deflection, but this is likely to result in more energy cost and drag than essentially using a big chunk of graphite and maybe some permanent magnets (to slightly reduce ablation rate). Likely if using permanent magnets, there would be an iron structure inside the big chunk of graphite. Say, this to deal with the front of the ship being hit with a stream of hydrogen atoms potentially traveling at relativistic speeds, etc. Would likely be pointy (sort of like the tip of a pencil) to reduce drag. It is possibly that the iron structure (inside the graphite) could in-part be magnetized using part of the field strength of the magnets in the cyclotrons (say, for example, using big neodymium magnets or similar to limit energy cost). Comparably, variations of nuclear thermal rockets have tradeoffs: Orders of magnitude more thrust; But, lower maximum exhaust velocity (so, more propellant is needed); ... So, these are more likely to be able to get going good and fast, within a more reasonable timeframe, but the problem will then become how far you can get before running out of fuel. Assuming human passengers though, time and human lifespan become major limiting factors. Also, for humans, you need a whole lot of stuff to keep them alive and comfortable, etc. So, you need a whole lot of thrust to get things moving quickly. So, in this sense, nuclear thermal makes more sense. Vs, say, trying to design the engine around the assumption of a "continuous burn" likely measured in centuries (for a ship with a comparably modest payload). Ion engines are also possible, but downside of ion engines is that they would have higher propellant use rate than a cyclotron+linac while also giving significantly less thrust relative to the amount of propellant used. Downside of cyclotron+linac though would be that it would be more energy hungry and necessarily significantly larger/bulkier. Such a ship would be impractical for ground launch so would need to be assembled in space. Well, unless a very small version were built, likely with the (comparably long) linac tail in a folded up form, which then unpacks and assembles itself once in space. Pretty much no one else that I am aware of is pushing for a cyclotron+linac based propulsion system though... But, then again: Less viability for manned interplanetary travel (nuclear powered plasma thrusters make more sense for this); Less viability for unmanned interplanetary travel (domain likely better suited to ion engines); Not viable for manned interstellar travel (limited payload, using it for a manned mission would likely result in everyone expiring long before it reached the destination). So, it is mostly (at best) left as an option for unmanned interstellar travel. Well, there are other ideas around, like using very small probes, and accelerating them using huge lasers, but downside is that these would have no way to decelerate. With other strategies, one basically needing to flip the ship around and spending the latter half of the trip decelerating (say, with the "butt" end of the ship having the ablative graphite shield in a more concave shape in an attempt to maximize the drag profile). One traditional idea has also been "generation ships", but the practicality of this is debatable IMO. Wrote a bit on this, but decided to leave this out. Seems like whatever approach to generation ships is taken, it still needs to be able to reach the target within a few centuries to have much of any real hope of success (1k years, very low probability; after 10k years, "inescapable problems" would make longer-term survival almost impossible, *1). *1: Running out of usable fissile materials; running out of nitrogen (would need to initially stock up on nitrogen in the form of C-14, which could decay to replace that lost over time, but by 10k years, much of the C14 would be gone; and long term survival is kinda gone if nitrogen runs out), ... Stocking up on more thorium and C-14 and similar would ultimately only delay the issues. Granted, one could reach further out via "system hopping", say, if each colony can spot another system with potentially habitable target worlds, say, within 25ly .. 50ly or similar, could spread pretty far over time. Still none of the "galactic civilization" stuff (as I see it, the formation of a cohesive civilization spanning multiple star systems is most likely essentially impossible; absent the development of FTL travel and communication, etc, which is itself most likely impossible). ...
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2026-02-24 06:44 +0000 |
| Message-ID | <10njhce$3l6lq$1@dont-email.me> |
| In reply to | #115121 |
BGB <cr88192@gmail.com> schrieb: > My own looking into this led me to realize that the number-space is > modulo (relative to a cone with a spiral that repeats with every > power-of-2), and that all the odd steps only ever increment the phase > angle by 1, or fall down to a lower level (whenever the phase angle is > not incremented). > > In effect, each odd step jumps to the opposite side of the cone, but > slightly increments the angle. Since the phase-angle path can never move > backwards, it can't touch any previous paths, thus, no loops can happen > this way. > > > The only way a cycle could exist is if (somehow) there were an > impassable wall where pretty much *no* value could descend unless it > hits the power-of-2 descent path, and then somehow miss this path as well. > > But, since "3*n+1" results in the minimum forward step, this means that > one of the sides *has* to hit the power-of-2 descent cases at some point. The same argument would hold for the 3n-1 problem, where there is more than one cycle. [...] -- This USENET posting was made without artificial intelligence, artificial impertinence, artificial arrogance, artificial stupidity, artificial flavorings or artificial colorants.
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2026-02-24 01:38 -0600 |
| Message-ID | <10njkpj$3lt65$1@dont-email.me> |
| In reply to | #115122 |
On 2/24/2026 12:44 AM, Thomas Koenig wrote: > BGB <cr88192@gmail.com> schrieb: >> My own looking into this led me to realize that the number-space is >> modulo (relative to a cone with a spiral that repeats with every >> power-of-2), and that all the odd steps only ever increment the phase >> angle by 1, or fall down to a lower level (whenever the phase angle is >> not incremented). >> >> In effect, each odd step jumps to the opposite side of the cone, but >> slightly increments the angle. Since the phase-angle path can never move >> backwards, it can't touch any previous paths, thus, no loops can happen >> this way. >> >> >> The only way a cycle could exist is if (somehow) there were an >> impassable wall where pretty much *no* value could descend unless it >> hits the power-of-2 descent path, and then somehow miss this path as well. >> >> But, since "3*n+1" results in the minimum forward step, this means that >> one of the sides *has* to hit the power-of-2 descent cases at some point. > > The same argument would hold for the 3n-1 problem, where there is > more than one cycle. > At least going by "the power of imagination" (would require testing to confirm): "3*n-1" would be different in that it neither reliably advances the phase angle, nor reliably moves it backwards. So, one is likely to end up with pairs of odd-steps that loop back to the same position, creating infinite cycles (so, it is not the same geometry). "3*n-3" might be stable (may appear to behave similarly to 3*n+1), but would instead advance the phase angle in the opposite direction. "3*n+3" should initially appear similar to "3*n+1", but would have paths that ascend to infinity (paths would exist which manage to miss all of the even steps and also miss the magic "power-of-2 auto-win" step). If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and also all paths that hit an odd number will ascend to infinity. Granted, may make sense to check/confirm, if these don't match up I might need to revisit my mental model. ... > [...] >
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2026-02-24 03:16 -0600 |
| Message-ID | <10njqha$3o4lp$1@dont-email.me> |
| In reply to | #115123 |
On 2/24/2026 1:38 AM, BGB wrote:
> On 2/24/2026 12:44 AM, Thomas Koenig wrote:
>> BGB <cr88192@gmail.com> schrieb:
>>> My own looking into this led me to realize that the number-space is
>>> modulo (relative to a cone with a spiral that repeats with every
>>> power-of-2), and that all the odd steps only ever increment the phase
>>> angle by 1, or fall down to a lower level (whenever the phase angle is
>>> not incremented).
>>>
>>> In effect, each odd step jumps to the opposite side of the cone, but
>>> slightly increments the angle. Since the phase-angle path can never move
>>> backwards, it can't touch any previous paths, thus, no loops can happen
>>> this way.
>>>
>>>
>>> The only way a cycle could exist is if (somehow) there were an
>>> impassable wall where pretty much *no* value could descend unless it
>>> hits the power-of-2 descent path, and then somehow miss this path as
>>> well.
>>>
>>> But, since "3*n+1" results in the minimum forward step, this means that
>>> one of the sides *has* to hit the power-of-2 descent cases at some
>>> point.
>>
>> The same argument would hold for the 3n-1 problem, where there is
>> more than one cycle.
>>
>
Then, went and took a shower, realized there was more to this:
> At least going by "the power of imagination" (would require testing to
> confirm):
>
> "3*n-1" would be different in that it neither reliably advances the
> phase angle, nor reliably moves it backwards. So, one is likely to end
> up with pairs of odd-steps that loop back to the same position, creating
> infinite cycles (so, it is not the same geometry).
>
Imagination is mostly saying there will be a lot of fairly short loops here.
There should probably be a quite significant number of short loops in
this case.
>
> "3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
> would instead advance the phase angle in the opposite direction.
>
But, "3*n-3" will change the ending state, so it will not land on 1 anymore.
Rather:
Around 84% of the time it end land on 3;
Around 16% of the time it end land on 0.
Give or take... ("imagination" showing around 84% or so of paths landing
on 3).
Will probably still converge, just ending up at a different end state.
All other negative offset paths would lead to underflow.
So, can probably exclude cases where the value space goes negative.
Though... "looks like" if these were included there is another cone on
the other side, if one goes this way.
> "3*n+3" should initially appear similar to "3*n+1", but would have paths
> that ascend to infinity (paths would exist which manage to miss all of
> the even steps and also miss the magic "power-of-2 auto-win" step).
>
> If the adjustment offset is even, then the "(3*n+1)/2" trick breaks, and
> also all paths that hit an odd number will ascend to infinity.
>
Further assertion 1:
All other positive offset paths should contain paths to infinity (along
with jostling around the ending states for those paths which do converge).
So, the specific end-state behavior seems unique.
Further assertion 2:
"3*n+1" may be unique in this space.
Or, possibly the only one that reliably converges, lands on 1, and does
not contain cycles.
>
> Granted, may make sense to check/confirm, if these don't match up I
> might need to revisit my mental model.
>
Still TODO, but for me, shower seemed like a bigger immediate priority.
So, yeah, I guess I may report back later as to whether or not the
actual math behaves the same way as it does in my imagination.
...
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2026-02-24 04:52 -0600 |
| Message-ID | <10nk05k$3pk7b$1@dont-email.me> |
| In reply to | #115124 |
On 2/24/2026 3:16 AM, BGB wrote:
> On 2/24/2026 1:38 AM, BGB wrote:
>> On 2/24/2026 12:44 AM, Thomas Koenig wrote:
>>> BGB <cr88192@gmail.com> schrieb:
>>>> My own looking into this led me to realize that the number-space is
>>>> modulo (relative to a cone with a spiral that repeats with every
>>>> power-of-2), and that all the odd steps only ever increment the phase
>>>> angle by 1, or fall down to a lower level (whenever the phase angle is
>>>> not incremented).
>>>>
>>>> In effect, each odd step jumps to the opposite side of the cone, but
>>>> slightly increments the angle. Since the phase-angle path can never
>>>> move
>>>> backwards, it can't touch any previous paths, thus, no loops can happen
>>>> this way.
>>>>
>>>>
>>>> The only way a cycle could exist is if (somehow) there were an
>>>> impassable wall where pretty much *no* value could descend unless it
>>>> hits the power-of-2 descent path, and then somehow miss this path as
>>>> well.
>>>>
>>>> But, since "3*n+1" results in the minimum forward step, this means that
>>>> one of the sides *has* to hit the power-of-2 descent cases at some
>>>> point.
>>>
>>> The same argument would hold for the 3n-1 problem, where there is
>>> more than one cycle.
>>>
>>
>
>
Now, after actually testing (to confirm or disprove my imagination):
> Then, went and took a shower, realized there was more to this:
>
>> At least going by "the power of imagination" (would require testing to
>> confirm):
>>
>> "3*n-1" would be different in that it neither reliably advances the
>> phase angle, nor reliably moves it backwards. So, one is likely to end
>> up with pairs of odd-steps that loop back to the same position,
>> creating infinite cycles (so, it is not the same geometry).
>>
>
> Imagination is mostly saying there will be a lot of fairly short loops
> here.
>
> There should probably be a quite significant number of short loops in
> this case.
>
Quite a lot of loops, but they are not quite matching the expected
pattern (the typical loop period is longer than imagination had implied).
>
>>
>> "3*n-3" might be stable (may appear to behave similarly to 3*n+1), but
>> would instead advance the phase angle in the opposite direction.
>>
>
> But, "3*n-3" will change the ending state, so it will not land on 1
> anymore.
>
> Rather:
> Around 84% of the time it end land on 3;
> Around 16% of the time it end land on 0.
> Give or take... ("imagination" showing around 84% or so of paths landing
> on 3).
>
> Will probably still converge, just ending up at a different end state.
>
Disproven:
"3*n-3" is nowhere near stable.
It does land on 3, but is prone to loops (imagination did not predict
prevalent looping for this one).
>
> All other negative offset paths would lead to underflow.
>
> So, can probably exclude cases where the value space goes negative.
> Though... "looks like" if these were included there is another cone on
> the other side, if one goes this way.
>
Not testing for this ATM.
>
>
>> "3*n+3" should initially appear similar to "3*n+1", but would have
>> paths that ascend to infinity (paths would exist which manage to miss
>> all of the even steps and also miss the magic "power-of-2 auto-win"
>> step).
>>
Does appear similar, but with a different ending state (lands on 3).
But, thus far the infinity-paths are being no-show.
Thus far, otherwise, 3*n+3 appears stable; at least for the range
tested. This part does match mental prediction so far (but appears more
stable than expected).
Mental model expected existence of divergent paths to exist in this
case, but thus far I am not seeing any.
>> If the adjustment offset is even, then the "(3*n+1)/2" trick breaks,
>> and also all paths that hit an odd number will ascend to infinity.
>>
Confirmed: Even offsets do fire off towards infinity.
>
> Further assertion 1:
> All other positive offset paths should contain paths to infinity (along
> with jostling around the ending states for those paths which do converge).
>
> So, the specific end-state behavior seems unique.
>
Non-match:
Loops (or long cycles) are more prevalent than expected;
Paths to infinity are not.
Testing +5 and +7, seeing a whole lot of loops, not so many divergent paths.
Thus far, divergent paths have only appeared in one of the expected
scenarios.
Granted, it is possible that what my imagination was predicting as
divergent paths are instead manifesting as long-period cycles (probably
because they are hitting too many even-numbered spots to achieve
"sufficient escape velocity").
May need to add something to try to differentiate loops from "failed
escape paths" where it just skips around, but doesn't represent an
actual loop.
> Further assertion 2:
> "3*n+1" may be unique in this space.
>
> Or, possibly the only one that reliably converges, lands on 1, and does
> not contain cycles.
>
Thus far, it is looking like "3*n+1" is likely fairly unique here.
>
>>
>> Granted, may make sense to check/confirm, if these don't match up I
>> might need to revisit my mental model.
>>
>
> Still TODO, but for me, shower seemed like a bigger immediate priority.
>
>
> So, yeah, I guess I may report back later as to whether or not the
> actual math behaves the same way as it does in my imagination.
>
Not quite as close of a match as I would like.
So, seems my imagination model is still subject to doubt in this case...
Given my seeming inability to sufficiently accurately predict the
behavior of adjusting the offset to a sufficient degree of accuracy; it
is possible my interpretation of the "3*n+1" case may not be
sufficiently accurate.
Nothing wildly different from my imagination happened, but it is still
not quite close enough it would seem.
...
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2026-02-24 11:16 +0000 |
| Message-ID | <10nk1ao$3q2e9$1@dont-email.me> |
| In reply to | #115123 |
BGB <cr88192@gmail.com> schrieb: > "3*n-1" would be different in that it neither reliably advances the > phase angle, nor reliably moves it backwards. So, one is likely to end > up with pairs of odd-steps that loop back to the same position, creating > infinite cycles (so, it is not the same geometry). What is this phase angle? I am only familiar with that term for complex numbers. -- This USENET posting was made without artificial intelligence, artificial impertinence, artificial arrogance, artificial stupidity, artificial flavorings or artificial colorants.
[toc] | [prev] | [next] | [standalone]
| From | BGB <cr88192@gmail.com> |
|---|---|
| Date | 2026-02-24 13:19 -0600 |
| Message-ID | <10nkttm$53d4$1@dont-email.me> |
| In reply to | #115128 |
On 2/24/2026 5:16 AM, Thomas Koenig wrote:
> BGB <cr88192@gmail.com> schrieb:
>
>> "3*n-1" would be different in that it neither reliably advances the
>> phase angle, nor reliably moves it backwards. So, one is likely to end
>> up with pairs of odd-steps that loop back to the same position, creating
>> infinite cycles (so, it is not the same geometry).
>
> What is this phase angle? I am only familiar with that term for
> complex numbers.
>
I had noted that it follows a pattern, but primarily if one splits it up
into a spiral that itself resembles a floating point number.
Though, splitting it into real and imaginary components could also make
sense as well, but I was thinking of something more like polar
coordinates though.
So, say, level=exponent, phase=mantissa. But, differing some in that in
this context it is more continuous (like a log scale number), but some
of the behavioral patterns are more like those of a floating point number.
So, say:
level ~= log2(n)
phase ~= n/(2**floor(level))
So, say, imagine it represents a value between 1.0 and 1.9999...
Each crossing of a power of 2 is an immediate drop to the "win".
Each odd step will either land on a spot that is equal (and drops a
level) or increments the phase (or, roughly like incrementing the ULP in
a floating point number, if the size of the mantissa were proportional
to the value of the exponent).
It seems, the behavior only manifests in the desired ways if:
Relative phase increases for each odd step;
Increases by the minimum possible amount
(what results from the +1 adjustment).
As noted, the phase effectively exists on two points on the circle at
any moment:
phaseA = phase
phaseB = phase + 0.5
Or, if one wants to see it as a circle in radians, could be like:
angle=(phase-1.0)*(2*PI);
Or, in degrees:
angle=(phase-1.0)*(360);
Phase0 and Phase1 are always at 180 degrees relative to each other.
And, at each odd step, these two points move very slightly on the
circle. The relative angle relative to each side of the circle, relative
to the starting point, typically remaining small (relative to the
starting value).
Overall alignment of the two points can be almost anywhere in the
circle, except that it can't cross the point at 0 degrees (or
powers-of-2) because these drop straight to 1.
Or, in polar coordinate terms:
rho = level
phi = angle
My previous attempt at a graph though is wonky in that the numbers are
following a log spacing around the circle, rather than an linear
spacing. May need to fix this, since the explanation of the two points
being 180 degrees out of phase looks wonky if the diagram doesn't show
these points as 180 degrees apart.
Failing to increase leads to loops (leading to the same spot).
Trying to go backwards leads to loops and instabilities (divergent
paths, or, "more backwards" leads into negative-land).
Trying to go forwards more than the minimum amount also leads to
instabilities (it results in it stepping over downwards paths).
I initially thought these instabilities would head towards infinity, but
it seems more like they just head to a larger orbit and get stuck
bouncing around the circle. I would need to do more looking into it to
figure out whether they get stuck in a loop, or continue to climb (but
more gradually). My mental model implies that these paths should
continue to climb, but that increasing the constant bias possibly causes
them to ascend faster.
If they are more prone to forming actual closed loops (rather than
climbing indefinitely), this would be inconsistent with my mental model,
but seems I would need to do more testing here.
Though, loops can form if the ascent is slow enough that it manages to
complete a full rotation of the circle without escaping to a higher
orbit (if it manages to escape the original orbit without forming a
loop, it should trend upwards to infinity).
Once the relative phase (from the starting point) exceeds 180 degrees
(or the two points covering a full circle), if the path has not ascended
to a higher level, a closed loop will form.
But, as noted, with the +1 step, the above scenario (for loop forming)
becomes impossible.
So, in this case, the "3*n+1" step would appear to be uniquely stable in
this particular configuration.
[toc] | [prev] | [next] | [standalone]
Page 3 of 5 — ← Prev page 1 2 [3] 4 5 Next page →
Back to top | Article view | comp.arch
csiph-web