Path: csiph.com!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c++
Subject: Re: Sieve of Erastosthenes optimized to the max
Date: Sun, 24 Dec 2023 00:36:42 -0800
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <86a5q0uhd1.fsf@linuxsc.com>
References: <86edfcvkjm.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="e91b714cfd3f51772707cd580185f87c"; logging-data="2640454"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18653vuyhC0XQlBvlzbrV8p5m4vElefOEk="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:WmDrc1q/bl46P8FXGL88MxhdTxc= sha1:uxfVDNveoH1P1f66t4eETrVYTUs=
Xref: csiph.com comp.lang.c++:118124
Vir Campestris writes:
> On 23/12/2023 18:30, Tim Rentsch wrote:
>
>> A more compact form is to consider numbers mod 30, in which case
>> numbers that are 0 mod {2, 3, or 5} can be ignored. Conveniently
>> there are just 8 such values - 1, 7, 11, 13, 17, 19, 23, and 29 -
>> so each block of 30 can be represented in one 8-bit byte. Doing
>> that gives a 25% reduction in space compared to a 6n+/-1 scheme.
>
> I found that on my system the modulo operation was so slow this wasn't
> worth doing.
Depending on how the code is written, no modulo operations need
to be done, because they will be optimized away and done at
compile time. If we look at multiplying two numbers represented
by bits in bytes i and j, the two numbers are
i*30 + a
j*30 + b
for some a and b in { 1, 7, 11, 13 17, 19, 23, 29 }.
The values we're interested in are the index of the product and
the residue of the product, namely
(i*30+a) * (j*30+b) / 30 (for the index)
(i*30+a) * (j*30+b) % 30 (for the residue)
Any term with a *30 in the numerator doesn't contribute to
the residue, and also can be combined with the by-30 divide
for computing the index. Thus these expressions can be
rewritten as
i*30*j + i*b + j*a + (a*b/30) (for the index)
a*b%30 (for the residue)
When a and b have values that are known at compile time,
neither the divide nor the remainder result in run-time
operations being done; all of that heavy lifting is
optimized away and done at compile time. Of course there
are some multiplies, but they are cheaper than divides, and
also can be done in parallel. (The multiplication a*b also
can be done at compile time.)
The residue needs to be turned into a bit mask to do the
logical operation on the byte of bits, but here again that
computation can be optimized away and done at compile time.
Does that all make sense?