Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3726 > unrolled thread

Paper: dividing by seven

Started byJohn R Levine <johnl@taugh.com>
First post2026-04-10 12:04 -0400
Last post2026-04-19 14:30 +0000
Articles 4 — 3 participants

Back to article view | Back to comp.compilers


Contents

  Paper: dividing by seven John R Levine <johnl@taugh.com> - 2026-04-10 12:04 -0400
    Re: Paper: dividing by seven anton@mips.complang.tuwien.ac.at - 2026-04-13 05:48 +0000
    Re: Paper: dividing by seven robin51@dodo.com.au - 2026-04-19 00:19 +1000
      Re: Paper: dividing by seven anton@mips.complang.tuwien.ac.at - 2026-04-19 14:30 +0000

#3726 — Paper: dividing by seven

FromJohn R Levine <johnl@taugh.com>
Date2026-04-10 12:04 -0400
SubjectPaper: dividing by seven
Message-ID<26-04-002@comp.compilers>
Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

Shigeo Mitsunari, Takashi Hoshino

Granlund and Montgomery proposed an optimization method for unsigned
integer division by constants [3]. Their method (called the GM method in
this paper) was further improved in part by works such as [1] and [7], and
is now adopted by major compilers including GCC, Clang, Microsoft
Compiler, and Apple Clang. However, for example, for x/7, the generated
code is designed for 32-bit CPUs and therefore does not fully exploit
64-bit capabilities. This paper proposes an optimization method for 32-bit
unsigned division by constants targeting 64-bit CPUs. We implemented
patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X
(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the
microbenchmark described later. The LLVM patch has already been merged
into llvm:main [6], demonstrating the practical applicability of the
proposed method.

https://arxiv.org/abs/2604.07902

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

[toc] | [next] | [standalone]


#3727

Fromanton@mips.complang.tuwien.ac.at
Date2026-04-13 05:48 +0000
Message-ID<26-04-003@comp.compilers>
In reply to#3726
John R Levine <johnl@taugh.com> writes:
>Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets
>
>Shigeo Mitsunari, Takashi Hoshino
>
>Granlund and Montgomery proposed an optimization method for unsigned
>integer division by constants [3]. Their method (called the GM method in
>this paper) was further improved in part by works such as [1] and [7], and
>is now adopted by major compilers including GCC, Clang, Microsoft
>Compiler, and Apple Clang. However, for example, for x/7, the generated
>code is designed for 32-bit CPUs and therefore does not fully exploit
>64-bit capabilities. This paper proposes an optimization method for 32-bit
>unsigned division by constants targeting 64-bit CPUs. We implemented
>patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X
>(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the
>microbenchmark described later. The LLVM patch has already been merged
>into llvm:main [6], demonstrating the practical applicability of the
>proposed method.
>
>https://arxiv.org/abs/2604.07902

In a paper [ertl19kps] I looked at full-width division by multiplying
with the double-width (128 bits on 64-bit targets) reciprocal , and
that gave a small reduction in latency (over the conventional approach
of using a single-width multiplication with corrections) in some
unsigned cases (e.g., n/7).  I did not look into half-width division,
because the programming language I am interested in does not have
half-width arithmetics, but of course one can apply the same idea in
this context by multiplying with the full-width reciprocal.  It's
interesting that the benefits are much higher in this context.

@InProceedings{ertl19kps,
  author =       {M. Anton Ertl},
  title =        {Integer Division by Multiplying with the
                  Double-Width Reciprocal},
  crossref =     {kps19},
  pages =        {75--84},
  url =          {https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf},
  url-slides =   {https://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf},
  abstract =     {Earlier work on integer division by multiplying with
                  the reciprocal has focused on multiplying with a
                  single-width reciprocal, combined with a correction
                  and followed by a shift.  The present work explores
                  using a double-width reciprocal to allow getting rid
                  of the correction and shift.}
}

@Proceedings{kps19,
  title =        {20. Kolloquium Programmiersprachen und Grundlagen
                  der Programmierung (KPS)},
  booktitle =    {20. Kolloquium Programmiersprachen und Grundlagen
                  der Programmierung (KPS)},
  year =         {2019},
  key =          {kps19},
  editor =       {Martin Pl\"umicke and Fayez Abu Alia},
  url =          {https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf}
}

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]


#3728

Fromrobin51@dodo.com.au
Date2026-04-19 00:19 +1000
Message-ID<26-04-004@comp.compilers>
In reply to#3726
Might be useful to look at Hacker's Delight (H. S. Warren, Jr).

On 11/04/2026 02:04, John R Levine wrote:
> Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets
>
> Shigeo Mitsunari, Takashi Hoshino
> ...
> https://arxiv.org/abs/2604.07902

[It's an amazing book that belongs on every compiler developer's shelf. Here's a
version of the first edition, of dubious legality, at the Internet archive:
https://dn710003.ca.archive.org/0/items/random-papers9/Henry%20S.%20Warren%20-%20Hacker%27s%20delight-Addison-Wesley%20%282003%29.pdf
-John]

[toc] | [prev] | [next] | [standalone]


#3729

Fromanton@mips.complang.tuwien.ac.at
Date2026-04-19 14:30 +0000
Message-ID<26-04-005@comp.compilers>
In reply to#3728
robin51@dodo.com.au writes:
>Might be useful to look at Hacker's Delight (H. S. Warren, Jr).

I looked at the first edition of this book for my paper
<https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf>, but its
Chapter 10 only got an honourable mention at the end of my
Related-Work section; there is a second edition of this book, however.

In any case, I wrote at the start of my related-work section:

|If you read only one paper about the topic, my recommendation is
|Robison���s [Rob05].

@InProceedings{robison05,
  author =	"Arch D. Robison",
  title =	"{$N$}-Bit Unsigned Division Via {$N$}-Bit
		 Multiply-Add",
  OPTeditor =	"Paolo Montuschi and Eric (Eric Mark) Schwarz",
  booktitle =	"{Proceedings of the 17th IEEE Symposium on Computer
		 Arithmetic (ARITH-17)}",
  publisher =	"IEEE Computer Society Press",
  ISBN = 	"0-7695-2366-8",
  ISBN-13 =	"978-0-7695-2366-8",
  year = 	"2005",
  bibdate =	"Wed Jun 22 07:02:55 2005",
  bibsource =	"http://www.math.utah.edu/pub/tex/bib/fparith.bib",
  URL =  	"http://www.acsel-lab.com/arithmetic/arith17/papers/ARITH17_Robison.pdf",
  abstract =	"Integer division on modern processors is expensive
		 compared to multiplication. Previous algorithms for
		 performing unsigned division by an invariant divisor,
		 via reciprocal approximation, suffer in the worst case
		 from a common requirement for $ n + 1 $ bit
		 multiplication, which typically must be synthesized
		 from $n$-bit multiplication and extra arithmetic
		 operations. This paper presents, and proves, a hybrid
		 of previous algorithms that replaces $ n + 1 $ bit
		 multiplication with a single fused multiply-add
		 operation on $n$-bit operands, thus reducing any
		 $n$-bit unsigned division to the upper $n$ bits of a
		 multiply-add, followed by a single right shift. An
		 additional benefit is that the prerequisite
		 calculations are simple and fast. On the Itanium 2
		 processor, the technique is advantageous for as few as
		 two quotients that share a common run-time divisor.",
  acknowledgement = "Nelson H. F. Beebe, University of Utah, Department
		 of Mathematics, 110 LCB, 155 S 1400 E RM 233, Salt Lake
		 City, UT 84112-0090, USA, Tel: +1 801 581 5254, FAX: +1
		 801 581 4148, e-mail: \path|beebe@math.utah.edu|,
		 \path|beebe@acm.org|, \path|beebe@computer.org|
		 (Internet), URL:
		 \path|http://www.math.utah.edu/~beebe/|",
  keywords =	"ARITH-17",
  pagecount =	"9",
}

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.compilers


csiph-web