X-Received: by 10.236.222.166 with SMTP id t36mr7335583yhp.24.1413360551213; Wed, 15 Oct 2014 01:09:11 -0700 (PDT)
X-Received: by 10.140.34.231 with SMTP id l94mr9311qgl.10.1413360551175; Wed, 15 Oct 2014 01:09:11 -0700 (PDT)
Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!s7no2845961qap.0!news-out.google.com!i10ni86qaf.0!nntp.google.com!s7no2845960qap.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.arch.arithmetic
Date: Wed, 15 Oct 2014 01:09:11 -0700 (PDT)
In-Reply-To: <m1g20k$a5r$1@speranza.aioe.org>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=5.55.57.237; posting-account=lD5X3AoAAAB2_KDReA0WuoP_A_fBgycC
NNTP-Posting-Host: 5.55.57.237
References: <74d951ee-6fd2-43ad-8f0a-2852a031bd2c@googlegroups.com> <m1g20k$a5r$1@speranza.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <aa29be5c-2076-4288-90ce-2dff24a99c5e@googlegroups.com>
Subject: Re: New appromixation for integer division by 63 or 127
From: nikolaos.kavvadias@gmail.com
Injection-Date: Wed, 15 Oct 2014 08:09:11 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Lines: 141
Xref: csiph.com comp.arch.arithmetic:79

Hi Terje,

indeed, my formula:

- will actually work for any divisor of the form 2^n-1
- can be simplified to (x + (x>>n) + 1)>>n
- works for a limited range: x should be in the closed interval: [0, 2^(2n)=
-2]=20

I understand this is not new, but it has never been properly documented.
I have spent quite some time tracking a textbook (or paper) on this. Implic=
itly, yes, it's there, it can be inferred; but no direct reference to it.

BTW pkhuong contributed a nice number-theoretical proof for the formula reg=
arding a somewhat larger range here:=20
http://www.reddit.com/r/programming/comments/2j18nt/multiplierless_hack_for=
_constant_division_by_2n1/

I also have a blog article on the topic:
- http://nkavvadias.com/blog/2014/10/11/multless-kdiv/

Further, it is useful and even more portable to have the formula working in=
 any high-level language, I prefer it compared to x86 or ARM assembly trick=
ery :). Your div-by-100 using leas for scaled multiplication is nice and yo=
u can do a similar thing with ARM's addressing modes.

I have written a tool for generating optimum code for ANSI C and "generic a=
ssembly". It is named kdiv and you can find it here: http://github.com/nkka=
v/kdiv=20
It is based on Hacker's Delight' ideas and code for calculating the magic n=
umber (multiplicative inverse)

There is also kmul, its sister (or brother?) project here: http://github.co=
m/nkkav/kmul . This is based on Preston Briggs' code for Bernstein's algori=
thm on multiplication by integer constants.=20


Best regards
Nikolaos Kavvadias
http://www.nkavvadias.com





=CE=A4=CE=B7 =CE=94=CE=B5=CF=85=CF=84=CE=AD=CF=81=CE=B1, 13 =CE=9F=CE=BA=CF=
=84=CF=89=CE=B2=CF=81=CE=AF=CE=BF=CF=85 2014 11:20:38 =CF=80.=CE=BC. UTC+3,=
 =CE=BF =CF=87=CF=81=CE=AE=CF=83=CF=84=CE=B7=CF=82 Terje Mathisen =CE=AD=CE=
=B3=CF=81=CE=B1=CF=88=CE=B5:
>=20
> > I believe I have invented a new approximation for quotient
>=20
> > calculation that works for x/63 and x/127.
>=20
> >
>=20
> > The formula is as follows and it is divisionless and multiplierless:
>=20
> >
>=20
> > y =3D (((x>>n)+x+((1<<n)+1))>>n)-1;
>=20
> >
>=20
> > Use n=3D6 for 63, and n=3D7 for 127. 1<<n is the strength-reduced form
>=20
> > for calculating 2^n.
>=20
>=20
>=20
> Nothing new here.:-)
>=20
>=20
>=20
> Instead of dividing by 63 you are multiplying by 65, and since=20
>=20
> (x+1)(x-1) =3D x^2 - 1, the error is manageable.
>=20
>=20
>=20
> Effectively you are doing reciprocal mul with a multiplier with very few=
=20
>=20
> significant bits.
>=20
>=20
>=20
> This also means that your formula only works for a limited range of=20
>=20
> inputs (i.e. number of bits in x), and it fails badly when (for a 32-bit=
=20
>=20
> number) x is so close to 2^32 that adding x/64 to it causes a carry.
>=20
>=20
>=20
> BTW, I use a similar trick in my data calculations, when I need to=20
>=20
> divide a small range of inputs by 100 in order to convert a year number=
=20
>=20
> to century:
>=20
>=20
>=20
>    //  y100 =3D y / 100;  // Reciprocal mul
>=20
>    y100 =3D y * 41 >> 12;  // Faster version with very short reciprocal
>=20
>=20
>=20
> The multiplier has just three bits set (101001) so it can be calculated=
=20
>=20
> with a pair of LEAs:
>=20
>=20
>=20
>    lea ebx, [eax+eax*4]     ;; y*5
>=20
>    lea ebx, [eax+ebx*8]	;; y*41
>=20
>    shr ebx,12
>=20
>=20
>=20
> The result is exact for all possible inputs in the 0..400 range which is=
=20
>=20
> what I need at this point.
>=20
>=20
>=20
> Terje
>=20
>=20
>=20
> --=20
>=20
> - <Terje.Mathisen at tmsw.no>
>=20
> "almost all programming can be viewed as an exercise in caching"