Path: csiph.com!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c++ Subject: Re: Sieve of Erastosthenes optimized to the max Date: Sun, 24 Dec 2023 00:36:42 -0800 Organization: A noiseless patient Spider Lines: 50 Message-ID: <86a5q0uhd1.fsf@linuxsc.com> References: <86edfcvkjm.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: dont-email.me; posting-host="e91b714cfd3f51772707cd580185f87c"; logging-data="2640454"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18653vuyhC0XQlBvlzbrV8p5m4vElefOEk=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:WmDrc1q/bl46P8FXGL88MxhdTxc= sha1:uxfVDNveoH1P1f66t4eETrVYTUs= Xref: csiph.com comp.lang.c++:118124 Vir Campestris writes: > On 23/12/2023 18:30, Tim Rentsch wrote: > >> A more compact form is to consider numbers mod 30, in which case >> numbers that are 0 mod {2, 3, or 5} can be ignored. Conveniently >> there are just 8 such values - 1, 7, 11, 13, 17, 19, 23, and 29 - >> so each block of 30 can be represented in one 8-bit byte. Doing >> that gives a 25% reduction in space compared to a 6n+/-1 scheme. > > I found that on my system the modulo operation was so slow this wasn't > worth doing. Depending on how the code is written, no modulo operations need to be done, because they will be optimized away and done at compile time. If we look at multiplying two numbers represented by bits in bytes i and j, the two numbers are i*30 + a j*30 + b for some a and b in { 1, 7, 11, 13 17, 19, 23, 29 }. The values we're interested in are the index of the product and the residue of the product, namely (i*30+a) * (j*30+b) / 30 (for the index) (i*30+a) * (j*30+b) % 30 (for the residue) Any term with a *30 in the numerator doesn't contribute to the residue, and also can be combined with the by-30 divide for computing the index. Thus these expressions can be rewritten as i*30*j + i*b + j*a + (a*b/30) (for the index) a*b%30 (for the residue) When a and b have values that are known at compile time, neither the divide nor the remainder result in run-time operations being done; all of that heavy lifting is optimized away and done at compile time. Of course there are some multiplies, but they are cheaper than divides, and also can be done in parallel. (The multiplication a*b also can be done at compile time.) The residue needs to be turned into a bit mask to do the logical operation on the byte of bits, but here again that computation can be optimized away and done at compile time. Does that all make sense?