Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c++ Subject: Re: Sieve of Erastosthenes optimized to the max Date: Sun, 02 Jun 2024 03:23:24 -0700 Organization: A noiseless patient Spider Lines: 186 Message-ID: <86ttibod7n.fsf@linuxsc.com> References: <86r0duwqgg.fsf@linuxsc.com> <86o78mpnlf.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Sun, 02 Jun 2024 12:23:26 +0200 (CEST) Injection-Info: dont-email.me; posting-host="487e0958afb8d1cfa0659a77e612b1fd"; logging-data="3488671"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SUqdVTwQdmXg8DZf+qAPQkkBo/Qel5gA=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:JSRULToYKC7+b5Mf5ZQ+sG42xGU= sha1:7ppeVLHopJ6VwYNxKhnANWifQeY= Xref: csiph.com comp.lang.c++:119455 Vir Campestris writes: > On 31/05/2024 06:17, Tim Rentsch wrote: > >> Vir Campestris writes: [...] >>> I don't store even numbers. That's an optimisation, and is of >>> course equivalent to Tim's Mod 30 trick, but with mod 2 instead. >>> And it doesn't require an divide which may be slow. >> >> Like I keep saying, keeping the bits in a mod 30 representation >> doesn't need any divides (at least not in the main loop, where >> multiples are being computed). [...] > I'm missing something. > > When setting the bits for a multiple of a reasonably large prime > what I do is: > > (1) Shift right 1 to drop the last bit. > > (2) The remaining top bits tell me which word needs to be accessed. > I shift the multiple right to get the index of the word. > > (3) The remaining bottom bits tell me which bit in the word needs > to be set, so I shift 1 left by those bottom bits to get a mask. > > My operation (2) is a divide, by a power of 2. > > Take 997 as an example: > multiple mod30 div30 > 997 7 33 > 1994 14 66 > 2991 21 99 > 3988 28 132 > 4985 5 166 > 5982 12 199 > 6979 19 232 > 7976 26 265 > 8973 3 299 > 9970 10 332 > [etc] > > I don't see how you get that last column without a divide by 30. Good example. Let's look at this example more closely, first in the a-bit-for-only-odd-numbers scheme. Forgive me if any of the following seems obvious. Let me redo your table slightly: multiplier product mod 2 1 997 1 2 1994 0 3 2991 1 4 3988 0 5 4985 1 6 5982 0 7 6979 1 8 7976 0 9 8973 1 10 9970 0 11 10967 1 [etc] Which multipliers do we need to consider? Obviously not 1. After that, we don't need any even multipliers, since they are all divisible by two, and there aren't even places in the table where the product could go. So we quickly get to this: multiplier product mod 2 3 2991 1 5 4985 1 7 6979 1 9 8973 1 11 10967 1 [etc] Now which multipliers do we need? We don't need 3 - we have already crossed out all multiples of 3. Similarly we don't need 5, 7, or 11. We don't need 9, because it has a factor of 3, - any multiple of 9 is also a multiple of 3 and has already been crossed out. And so forth for 13, 15, 17, 19, 21, ... A little thought should convince you we don't need to consider any multiplier less than 997. Let's pick up the table from there (I have taken out 'product' and put in 'factors'): multiplier factors 997 997 999 3 3 3 37 1001 7 11 13 1003 17 59 1005 3 5 67 1007 19 53 1009 1009 1011 3 337 We need to eliminate 997*997, as it certainly has not been crossed out before. We don't need to consider 999, 1001, 1003, 1005, 1007, or 1011, because they all have prime factors less than 997, and so these products have already been eliminated. For 1009 though, it's prime, and has no smaller prime factors, so we need to cross out 997*1009. And so forth. Speaking more generally, we need to consider multipliers greater than 997 (as well as 997 itself) that have not already been marked as composite. We discover what those numbers are by scanning the bitmap. In pseudocode: for each prime p : starting at the bitmap location for p, and up for each not-yet-marked-composite m : mark as composite p*m end for end for (I need to put in an asterisk here to say that this code mostly works but misses some composites, and if you play around with it you will see why. But that isn't important to my explanation so I'd like to pretend for now that this code is what we need to look at (and in fact it isn't far from code that actually does work).) Now let's think about how things change when we are storing bits only for numbers that are nonzero mod 30, so one block of 30 per byte. First it should be obvious that we need to consider only those multipliers that are nonzero mod 30, or in other words those numbers that correspond to a bit in the table. Pseudocode now looks something like this: for each index i in the table : for each bit in table[i] : if that bit has been marked composite : continue // we have found the next prime for each index j > i in the table : for each bit in table[j] : if that bit has beem marked composite : continue // we have found a multiplier m and .. // need to mark as composite p*m end end end end The relevant values are: p == i*30 + x m == j*30 + y where x and y are constant values that are nonzero mod 30, that is, from the set { 1, 7, 11, 13, 17, 19, 23, 29 }, depending only on the respective bit positions for p and m. The product p*m is what we're looking for, and in particular p*m / 30 and p*m % 30. Doing a little algebra p*m = (i*30 + x) * (j*30 + y) = i*30 * j*30 + i*30*y + j*30*x + x*y The components of this last expression are p*m / 30 = 30*i*j + i*y + j*x + (x*y/30) p*m % 30 = (x*y%30) In the pseudocode we said 'for each bit in table[...]', but rather than use a for loop we write 8 specific tests, so the values x and y are specific constants in each of the 8 cases. Since they are constants, the values (x*y/30) and (x*y%30) can in each of the 64 cases be computed as compile-time constants, and so we eliminate the divide and remainder operations from run-time by computing them at compile time. Similarly the value (x*y%30) can be turned into a bit mask at compile-time, because it's just a long series of ?: expressions, one for each of the 8 values of nonzero residues mod 30. Note that the expression for p*m/30 gives the index in the table for where to zap the bit for the product. Were you able to follow all that? Do you see how the scheme works now? I have deliberately left out explaining the problem of missing some composite values and what to do about it, but I can explain further if someone has trouble with that.