Path: csiph.com!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c++
Subject: Re: Sieve of Erastosthenes optimized to the max
Date: Sat, 13 Jan 2024 21:31:42 -0800
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <865xzwmqf5.fsf@linuxsc.com>
References: <86edfcvkjm.fsf@linuxsc.com> <86a5q0uhd1.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="8638bc13aeaeb22b15b54cee031022ba"; logging-data="363857"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/iMI9akCeg+W8+QUAOF36EUkx9P9Mci60="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:t2RIV+H8H3cZEHFzhMxP++4/bi0= sha1:TEUw05wEOftD8OUK2WrtAV9w/S0=
Xref: csiph.com comp.lang.c++:118214
Vir Campestris writes:
> On 24/12/2023 08:36, Tim Rentsch wrote:
>
>> Depending on how the code is written, no modulo operations need
>> to be done, because they will be optimized away and done at
>> compile time. If we look at multiplying two numbers represented
>> by bits in bytes i and j, the two numbers are
>>
>> i*30 + a
>> j*30 + b
>>
>> for some a and b in { 1, 7, 11, 13 17, 19, 23, 29 }.
>>
>> The values we're interested in are the index of the product and
>> the residue of the product, namely
>>
>> (i*30+a) * (j*30+b) / 30 (for the index)
>> (i*30+a) * (j*30+b) % 30 (for the residue)
>>
>> Any term with a *30 in the numerator doesn't contribute to
>> the residue, and also can be combined with the by-30 divide
>> for computing the index. Thus these expressions can be
>> rewritten as
>>
>> i*30*j + i*b + j*a + (a*b/30) (for the index)
>> a*b%30 (for the residue)
>>
>> When a and b have values that are known at compile time,
>> neither the divide nor the remainder result in run-time
>> operations being done; all of that heavy lifting is
>> optimized away and done at compile time. Of course there
>> are some multiplies, but they are cheaper than divides, and
>> also can be done in parallel. (The multiplication a*b also
>> can be done at compile time.)
>>
>> The residue needs to be turned into a bit mask to do the
>> logical operation on the byte of bits, but here again that
>> computation can be optimized away and done at compile time.
>>
>> Does that all make sense?
>
> Right now, no. But that's me. I'll flag it to read again when I've
> had a better night's sleep.
I'm posting to nudge you into looking at this again, if
you haven't already.
I have now had a chance to get your source and run some
comparisons. A program along the lines I outlined can run much
faster than the code you posted (as well as needing less memory).
A good target is to find all primes less than 1e11, which needs
less than 4gB of ram.