Path: csiph.com!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c++ Subject: Re: Sieve of Erastosthenes optimized to the max Date: Sat, 13 Jan 2024 21:31:42 -0800 Organization: A noiseless patient Spider Lines: 53 Message-ID: <865xzwmqf5.fsf@linuxsc.com> References: <86edfcvkjm.fsf@linuxsc.com> <86a5q0uhd1.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: dont-email.me; posting-host="8638bc13aeaeb22b15b54cee031022ba"; logging-data="363857"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/iMI9akCeg+W8+QUAOF36EUkx9P9Mci60=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:t2RIV+H8H3cZEHFzhMxP++4/bi0= sha1:TEUw05wEOftD8OUK2WrtAV9w/S0= Xref: csiph.com comp.lang.c++:118214 Vir Campestris writes: > On 24/12/2023 08:36, Tim Rentsch wrote: > >> Depending on how the code is written, no modulo operations need >> to be done, because they will be optimized away and done at >> compile time. If we look at multiplying two numbers represented >> by bits in bytes i and j, the two numbers are >> >> i*30 + a >> j*30 + b >> >> for some a and b in { 1, 7, 11, 13 17, 19, 23, 29 }. >> >> The values we're interested in are the index of the product and >> the residue of the product, namely >> >> (i*30+a) * (j*30+b) / 30 (for the index) >> (i*30+a) * (j*30+b) % 30 (for the residue) >> >> Any term with a *30 in the numerator doesn't contribute to >> the residue, and also can be combined with the by-30 divide >> for computing the index. Thus these expressions can be >> rewritten as >> >> i*30*j + i*b + j*a + (a*b/30) (for the index) >> a*b%30 (for the residue) >> >> When a and b have values that are known at compile time, >> neither the divide nor the remainder result in run-time >> operations being done; all of that heavy lifting is >> optimized away and done at compile time. Of course there >> are some multiplies, but they are cheaper than divides, and >> also can be done in parallel. (The multiplication a*b also >> can be done at compile time.) >> >> The residue needs to be turned into a bit mask to do the >> logical operation on the byte of bits, but here again that >> computation can be optimized away and done at compile time. >> >> Does that all make sense? > > Right now, no. But that's me. I'll flag it to read again when I've > had a better night's sleep. I'm posting to nudge you into looking at this again, if you haven't already. I have now had a chance to get your source and run some comparisons. A program along the lines I outlined can run much faster than the code you posted (as well as needing less memory). A good target is to find all primes less than 1e11, which needs less than 4gB of ram.