X-Received: by 10.236.222.166 with SMTP id t36mr7335583yhp.24.1413360551213; Wed, 15 Oct 2014 01:09:11 -0700 (PDT) X-Received: by 10.140.34.231 with SMTP id l94mr9311qgl.10.1413360551175; Wed, 15 Oct 2014 01:09:11 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!s7no2845961qap.0!news-out.google.com!i10ni86qaf.0!nntp.google.com!s7no2845960qap.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.arch.arithmetic Date: Wed, 15 Oct 2014 01:09:11 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=5.55.57.237; posting-account=lD5X3AoAAAB2_KDReA0WuoP_A_fBgycC NNTP-Posting-Host: 5.55.57.237 References: <74d951ee-6fd2-43ad-8f0a-2852a031bd2c@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: New appromixation for integer division by 63 or 127 From: nikolaos.kavvadias@gmail.com Injection-Date: Wed, 15 Oct 2014 08:09:11 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Lines: 141 Xref: csiph.com comp.arch.arithmetic:79 Hi Terje, indeed, my formula: - will actually work for any divisor of the form 2^n-1 - can be simplified to (x + (x>>n) + 1)>>n - works for a limited range: x should be in the closed interval: [0, 2^(2n)= -2]=20 I understand this is not new, but it has never been properly documented. I have spent quite some time tracking a textbook (or paper) on this. Implic= itly, yes, it's there, it can be inferred; but no direct reference to it. BTW pkhuong contributed a nice number-theoretical proof for the formula reg= arding a somewhat larger range here:=20 http://www.reddit.com/r/programming/comments/2j18nt/multiplierless_hack_for= _constant_division_by_2n1/ I also have a blog article on the topic: - http://nkavvadias.com/blog/2014/10/11/multless-kdiv/ Further, it is useful and even more portable to have the formula working in= any high-level language, I prefer it compared to x86 or ARM assembly trick= ery :). Your div-by-100 using leas for scaled multiplication is nice and yo= u can do a similar thing with ARM's addressing modes. I have written a tool for generating optimum code for ANSI C and "generic a= ssembly". It is named kdiv and you can find it here: http://github.com/nkka= v/kdiv=20 It is based on Hacker's Delight' ideas and code for calculating the magic n= umber (multiplicative inverse) There is also kmul, its sister (or brother?) project here: http://github.co= m/nkkav/kmul . This is based on Preston Briggs' code for Bernstein's algori= thm on multiplication by integer constants.=20 Best regards Nikolaos Kavvadias http://www.nkavvadias.com =CE=A4=CE=B7 =CE=94=CE=B5=CF=85=CF=84=CE=AD=CF=81=CE=B1, 13 =CE=9F=CE=BA=CF= =84=CF=89=CE=B2=CF=81=CE=AF=CE=BF=CF=85 2014 11:20:38 =CF=80.=CE=BC. UTC+3,= =CE=BF =CF=87=CF=81=CE=AE=CF=83=CF=84=CE=B7=CF=82 Terje Mathisen =CE=AD=CE= =B3=CF=81=CE=B1=CF=88=CE=B5: >=20 > > I believe I have invented a new approximation for quotient >=20 > > calculation that works for x/63 and x/127. >=20 > > >=20 > > The formula is as follows and it is divisionless and multiplierless: >=20 > > >=20 > > y =3D (((x>>n)+x+((1<>n)-1; >=20 > > >=20 > > Use n=3D6 for 63, and n=3D7 for 127. 1<=20 > > for calculating 2^n. >=20 >=20 >=20 > Nothing new here.:-) >=20 >=20 >=20 > Instead of dividing by 63 you are multiplying by 65, and since=20 >=20 > (x+1)(x-1) =3D x^2 - 1, the error is manageable. >=20 >=20 >=20 > Effectively you are doing reciprocal mul with a multiplier with very few= =20 >=20 > significant bits. >=20 >=20 >=20 > This also means that your formula only works for a limited range of=20 >=20 > inputs (i.e. number of bits in x), and it fails badly when (for a 32-bit= =20 >=20 > number) x is so close to 2^32 that adding x/64 to it causes a carry. >=20 >=20 >=20 > BTW, I use a similar trick in my data calculations, when I need to=20 >=20 > divide a small range of inputs by 100 in order to convert a year number= =20 >=20 > to century: >=20 >=20 >=20 > // y100 =3D y / 100; // Reciprocal mul >=20 > y100 =3D y * 41 >> 12; // Faster version with very short reciprocal >=20 >=20 >=20 > The multiplier has just three bits set (101001) so it can be calculated= =20 >=20 > with a pair of LEAs: >=20 >=20 >=20 > lea ebx, [eax+eax*4] ;; y*5 >=20 > lea ebx, [eax+ebx*8] ;; y*41 >=20 > shr ebx,12 >=20 >=20 >=20 > The result is exact for all possible inputs in the 0..400 range which is= =20 >=20 > what I need at this point. >=20 >=20 >=20 > Terje >=20 >=20 >=20 > --=20 >=20 > - >=20 > "almost all programming can be viewed as an exercise in caching"