Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: antispam@math.uni.wroc.pl Newsgroups: comp.compilers,comp.arch Subject: Re: Assembling span-dependent instructions Date: Thu, 28 Jul 2022 12:15:14 -0000 (UTC) Organization: Aioe.org NNTP Server Lines: 58 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-07-055@comp.compilers> References: <22-07-049@comp.compilers> <22-07-052@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="49930"; mail-complaints-to="abuse@iecc.com" Keywords: optimize, assembler Posted-Date: 29 Jul 2022 16:38:22 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3142 comp.arch:76960 In comp.arch Kaz Kylheku <480-992-1380@kylheku.com> wrote: > On 2022-07-27, Anton Ertl wrote: > > However, one can also construct cases where making the code larger can > > reduce the minimum size of the immediate operand, e.g.: > > > > foo: > > movl foo+133-bar(%rdi),%eax > > bar: > > That's weird; what is accessed this way, relative to the code, > and does it occur in compiler output? Code like this may appear due to alignment, say jump to page or cache line boundary. In realistic situation one is faced in much more compilex problem. Namely on many architectures best way to provide constant arguments is by storing constants in memory. This leads to "constant pools" and problem where to place them. One wants constant pools as close as possible to code, to use short offsets accessing them. But for performance reasons it is desirable to put constants in separate cache lines. Also, one needs jumps to jump around constant pools. Some jumps occur naturally in program, it is good to re-use them. But there are possible unused parts of cache lines (both for code and constant pools). So there is need to balance loss due to unused parts of cache lines (probably dominant factor), length of instructions and possible overhead due to extra instructions. There is extra complication when machine has limited range of offsets which can be used in single intstruction: when needed offset exceeds allowed range one has to change to indirect form which needs free register. So there is extra interaction with registes allocation. This is particularly nasty if one needs free register and wants to spill some other register, but address constants exceed length allowed in instruction, so in order to free register one already needs free register to put there address constant. i386 allows rather easy solution to such problems because one can reach any locations from loads and jumps and most instructions accept 32-bit constant (immediate) arguments. On x86_64 (that is in 64 bit mode) situation is more interesting because immediates and offsets are limited to 32-bits, so one can no longer reach whole memory. But current practice make it easy but wasetful: each part of program (main executable and shared libraries) is limited to 2G so that 32-bit offsets are enough and accesses to other parts are indirect. 32 bit ARM has most of such problems: offsets are quite limited and mosts constants need to be loaded from memory. ARM intructions are of fixed length, but when offset is too big to fit into single instruction one has to use alternative sequence of several instructions (and deal with register allocation). Z architecture (modern versions of IBM 360) has such problems too: there are variants of instruction having different lengths but even longest variant have limited range of available offsets. At least some versions of Z architecture had severe penalty for simultaneusly accessing the same cache line for instruction fetch and data access, so putting constant pools in separate cache line was very important. -- Waldek Hebisch