Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Thomas Koenig Newsgroups: comp.compilers Subject: Re: Interpreters and caller-saved registers Date: Sun, 15 Oct 2023 19:52:45 -0000 Organization: Compilers Central Sender: johnl%iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-10-002@comp.compilers> References: <23-10-001@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="48403"; mail-complaints-to="abuse@iecc.com" Keywords: interpreter, optimize Posted-Date: 15 Oct 2023 15:54:53 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3529 [Replying to comp.compilers as this is more pertinent there] anton@mips.complang.tuwien.ac.at schrieb: > asm("":"=X"(s2)) > > This tells gcc that the asm statement writes to s2, and thus kills it, > but it actually does not generate any assembly language. [...] > Unfortunately, gcc-11.4 also introduced two additional redundant move > instructions in every VM instruction, and Bernd Paysan reported that > gcc-12 and gcc-13 introduced even more superfluous code in every VM > instruction. It is well known that compilers in general and gcc specfically often generate superflous register moves; there are quite some PRs in gcc's bug database on this; I have submitted a few of them myself, such as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111373 which includes compiler-generated code like movq %rdx, %rsi movq %rax, %rdx movq %rcx, 8(%rdi) movq %rsi, %rax movq %rdx, 16(%rdi) movq %rax, (%rdi) ret where it is obviously to anybody who can read assembly that the register moves are unneeded (although they are likely to be zero-cycle operations because of register renaming). However, if this got worse between releases, this is a regression. Those get higher priority for fixing. So, if it is reasonable to generate a reduced test case (for which cvise, for example, is an excellent tool) so filing a bug report would be a good thing. > This is similar to what we have seen from gcc-3.0 for > Gforth at that time, and what we have seen from clang last we tried > it. > I tried to work around this issue by having the kills only at the end > of VM instructions that perform a call, and indeed, that worked for > gcc-11.4. However, gcc-12 and gcc-13 still produced bad code. > Finally Bernd Paysan had the right idea and added -fno-tree-vectorize > to the list of options that we use to avoid gcc shenanigans, and now > we can also use this idea with gcc-12 and gcc-13. That is strange, and would give valuable hints for investigating this regression. This sort of code is an example of the contradictions in today's compiler technology. On the one hand, they do amazing optimizations on large amounts of code which no programmer could hope to reach while staying productive. On the other hand, it is very common to see glaring inefficiencies when one looks at even small chunks of code. (A good assembler programmer can often beat compiler-generated code by a factor of two or more, especially if SIMD is involved, but SIMD is really hard to generate code for). So far, nobody has found an algorithm for "just remove the silliness" from compiled programs. Maybe it would be feasible to run some peephole optimization as last passes which could improve code like the one above, but that might also be difficult in the more general case where registers are reused in other basic blocks (which would mean just to redo the register allocation). So, still work to do...