Path: csiph.com!weretis.net!feeder9.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at Newsgroups: comp.compilers Subject: Re: Paper: dividing by seven Date: Mon, 13 Apr 2026 05:48:46 +0000 Organization: Compilers Central Sender: johnl%iecc.com Approved: comp.compilers@iecc.com Message-ID: <26-04-003@comp.compilers> References: <26-04-002@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="89953"; mail-complaints-to="abuse@iecc.com" Keywords: code, optimize, paper Posted-Date: 13 Apr 2026 20:08:38 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3727 John R Levine writes: >Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets > >Shigeo Mitsunari, Takashi Hoshino > >Granlund and Montgomery proposed an optimization method for unsigned >integer division by constants [3]. Their method (called the GM method in >this paper) was further improved in part by works such as [1] and [7], and >is now adopted by major compilers including GCC, Clang, Microsoft >Compiler, and Apple Clang. However, for example, for x/7, the generated >code is designed for 32-bit CPUs and therefore does not fully exploit >64-bit capabilities. This paper proposes an optimization method for 32-bit >unsigned division by constants targeting 64-bit CPUs. We implemented >patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X >(Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the >microbenchmark described later. The LLVM patch has already been merged >into llvm:main [6], demonstrating the practical applicability of the >proposed method. > >https://arxiv.org/abs/2604.07902 In a paper [ertl19kps] I looked at full-width division by multiplying with the double-width (128 bits on 64-bit targets) reciprocal , and that gave a small reduction in latency (over the conventional approach of using a single-width multiplication with corrections) in some unsigned cases (e.g., n/7). I did not look into half-width division, because the programming language I am interested in does not have half-width arithmetics, but of course one can apply the same idea in this context by multiplying with the full-width reciprocal. It's interesting that the benefits are much higher in this context. @InProceedings{ertl19kps, author = {M. Anton Ertl}, title = {Integer Division by Multiplying with the Double-Width Reciprocal}, crossref = {kps19}, pages = {75--84}, url = {https://www.complang.tuwien.ac.at/papers/ertl19kps.pdf}, url-slides = {https://www.complang.tuwien.ac.at/papers/ertl19kps-slides.pdf}, abstract = {Earlier work on integer division by multiplying with the reciprocal has focused on multiplying with a single-width reciprocal, combined with a correction and followed by a shift. The present work explores using a double-width reciprocal to allow getting rid of the correction and shift.} } @Proceedings{kps19, title = {20. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS)}, booktitle = {20. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS)}, year = {2019}, key = {kps19}, editor = {Martin Pl\"umicke and Fayez Abu Alia}, url = {https://www.hb.dhbw-stuttgart.de/kps2019/kps2019_Tagungsband.pdf} } - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/