Path: csiph.com!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: anton@mips.complang.tuwien.ac.at
Newsgroups: comp.compilers
Subject: Re: Paper: dividing by seven
Date: Mon, 13 Apr 2026 05:48:46 +0000
Organization: Compilers Central
Sender: johnl%iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <26-04-003@comp.compilers>
References: <26-04-002@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="89953"; mail-complaints-to="abuse@iecc.com"
Keywords: code, optimize, paper
Posted-Date: 13 Apr 2026 20:08:38 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:3727

John R Levine <johnl@taugh.com> writes:
>Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets
>
>Shigeo Mitsunari, Takashi Hoshino
>
>Granlund and Montgomery proposed an optimization method for unsigned
>integer division by constants [3]. Their method (called the GM method in
>this paper) was further improved in part by works such as [1] and [7], and
>is now adopted by major compilers including GCC, Clang, Microsoft
>Compiler, and Apple Clang. However, for example, for x/7, the generated
>code is designed for 32-bit CPUs and therefore does not fully exploit
>64-bit capabilities. This paper proposes an optimization method for 32-bit
>unsigned division by constants targeting 64-bit CPUs. We implemented
>patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X
>(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the
>microbenchmark described later. The LLVM patch has already been merged
>into llvm:main [6], demonstrating the practical applicability of the
>proposed method.
>
>https://arxiv.org/abs/2604.07902

In a paper [ertl19kps] I looked at full-width division by multiplying
with the double-width (128 bits on 64-bit targets) reciprocal , and
that gave a small reduction in latency (over the conventional approach
of using a single-width multiplication with corrections) in some
unsigned cases (e.g., n/7).  I did not look into half-width division,
because the programming language I am interested in does not have
half-width arithmetics, but of course one can apply the same idea in
this context by multiplying with the full-width reciprocal.  It's
interesting that the benefits are much higher in this context.

@InProceedings{ertl19kps,
  author =       {M. Anton Ertl},
  title =        {Integer Division by Multiplying with the
                  Double-Width Reciprocal},
  crossref =     {kps19},
  pages =        {75--84},
  url =          {https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf},
  url-slides =   {https://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf},
  abstract =     {Earlier work on integer division by multiplying with
                  the reciprocal has focused on multiplying with a
                  single-width reciprocal, combined with a correction
                  and followed by a shift.  The present work explores
                  using a double-width reciprocal to allow getting rid
                  of the correction and shift.}
}

@Proceedings{kps19,
  title =        {20. Kolloquium Programmiersprachen und Grundlagen
                  der Programmierung (KPS)},
  booktitle =    {20. Kolloquium Programmiersprachen und Grundlagen
                  der Programmierung (KPS)},
  year =         {2019},
  key =          {kps19},
  editor =       {Martin Pl\"umicke and Fayez Abu Alia},
  url =          {https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf}
}

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/