Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Thomas Koenig <tkoenig@netcologne.de>
Newsgroups: comp.compilers
Subject: Re: Interpreters and caller-saved registers
Date: Sun, 15 Oct 2023 19:52:45 -0000
Organization: Compilers Central
Sender: johnl%iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <23-10-002@comp.compilers>
References: <23-10-001@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="48403"; mail-complaints-to="abuse@iecc.com"
Keywords: interpreter, optimize
Posted-Date: 15 Oct 2023 15:54:53 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:3529

[Replying to comp.compilers as this is more pertinent there]

anton@mips.complang.tuwien.ac.at <anton@mips.complang.tuwien.ac.at> schrieb:

> asm("":"=X"(s2))
>
> This tells gcc that the asm statement writes to s2, and thus kills it,
> but it actually does not generate any assembly language.

[...]

> Unfortunately, gcc-11.4 also introduced two additional redundant move
> instructions in every VM instruction, and Bernd Paysan reported that
> gcc-12 and gcc-13 introduced even more superfluous code in every VM
> instruction.

It is well known that compilers in general and gcc specfically often
generate superflous register moves; there are quite some PRs in
gcc's bug database on this; I have submitted a few of them myself,
such as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111373 which
includes compiler-generated code like

        movq    %rdx, %rsi
        movq    %rax, %rdx
        movq    %rcx, 8(%rdi)
        movq    %rsi, %rax
        movq    %rdx, 16(%rdi)
        movq    %rax, (%rdi)
        ret

where it is obviously to anybody who can read assembly that the
register moves are unneeded (although they are likely to
be zero-cycle operations because of register renaming).

However, if this got worse between releases, this is a regression.
Those get higher priority for fixing.  So, if it is reasonable
to generate a reduced test case (for which cvise, for example,
is an excellent tool) so filing a bug report would be a good thing.

> This is similar to what we have seen from gcc-3.0 for
> Gforth at that time, and what we have seen from clang last we tried
> it.

> I tried to work around this issue by having the kills only at the end
> of VM instructions that perform a call, and indeed, that worked for
> gcc-11.4.  However, gcc-12 and gcc-13 still produced bad code.
> Finally Bernd Paysan had the right idea and added -fno-tree-vectorize
> to the list of options that we use to avoid gcc shenanigans, and now
> we can also use this idea with gcc-12 and gcc-13.

That is strange, and would give valuable hints for investigating
this regression.

This sort of code is an example of the contradictions in today's
compiler technology.  On the one hand, they do amazing optimizations
on large amounts of code which no programmer could hope to reach
while staying productive.  On the other hand, it is very common
to see glaring inefficiencies when one looks at even small chunks
of code.

(A good assembler programmer can often beat compiler-generated
code by a factor of two or more, especially if SIMD is involved,
but SIMD is really hard to generate code for).

So far, nobody has found an algorithm for "just remove the
silliness" from compiled programs.  Maybe it would be feasible to
run some peephole optimization as last passes which could improve
code like the one above, but that might also be difficult in the
more general case where registers are reused in other basic blocks
(which would mean just to redo the register allocation).

So, still work to do...