Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Is This a Dumb Idea? paralellizing byte codes
Date: Fri, 28 Oct 2022 17:06:55 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-10-062@comp.compilers>
References: <22-10-046@comp.compilers> <22-10-048@comp.compilers> <22-10-056@comp.compilers> <22-10-059@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="24437"; mail-complaints-to="abuse@iecc.com"
Keywords: interpreter, optimize
Posted-Date: 30 Oct 2022 00:50:20 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:3232

Alain Ketterlin <alain@universite-de-strasbourg.fr> writes:
>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>
>> Alain Ketterlin <alain@universite-de-strasbourg.fr> writes:
>>>I've heard/read several times that byte-code micro-optimizations are not
>>>worth the trouble.
...
>This is not directly related to the paper I mention later. I was talking
>about optimizing bytecode vs. compiler optimizations. I know of no
>interpreter doing elaborate static byte-code optimization.

If I understand you correctly, you mean optimizations that the
compiler that generates "byte code" performs, e.g., stuff like partial
redundancy elimination.

I expect that these optimizations are as effective for virtual machine
code as for native (i.e., real-machine) code, but if you want to go to
these lengths, you use a native-code compiler.  And for systems that
uses a JIT compiler (i.e., a two stage process: source -> VM (aka byte
code) -> native code), the preferred place for putting these
optimizations is in the second stage (probably because it enables
optimization decisions with consideration of the target machine).
There have been some efforts to have analysis at the source code level
(or anyway, before JIT compilation), and embed the results as optional
component in the .class file to speed up JIT compilation, but has this
made it into production systems?

Otherwise: I dimly remember optimizations by Prolog compilers that
generate WAM (Warren abstract machine) code.

>>>https://ieeexplore.ieee.org/document/7054191

https://hal.inria.fr/hal-01100647/document

>I'm glad it works for you.

What's "it"?  Anyway you miss the point: The paper suggests that one
should just write a switch-based interpreter and that more advanced
techniques are no longer needed.  My results disprove this, on the
same hardware that they base their claims on.  Branch mispredictions
may play a smaller role now than they used to, but apparently there
are other reasons that make the more advanced techniques still very
profitable.

This was somewhat surprising for me, too.  We also did some work with
simulations of more advanced branch predictors in this context
[ertl&gregg03jilp], so I expected the performance benefits of our
advanced techniques to diminish significantly when the hardware
acquires such techniques, but I never really saw that happen.  And
that's even on hardware that has very good indirect branch prediction
(as Rohou et al. showed).

@Article{ertl&gregg03jilp,
  author =	 {M. Anton Ertl and David Gregg},
  title =	 {The Structure and Performance of \emph{Efficient}
                  Interpreters},
  journal =	 {The Journal of Instruction-Level Parallelism},
  year =	 {2003},
  volume =	 {5},
  month =	 nov,
  url =         {http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.ps.gz},
  url2 =	 {http://www.jilp.org/vol5/v5paper12.pdf},
  note =	 {http://www.jilp.org/vol5/},
  abstract =	 {Interpreters designed for high general-purpose
                  performance typically perform a large number of
                  indirect branches (3.2\%--13\% of all executed
                  instructions in our benchmarks). These branches
                  consume more than half of the run-time in a number
                  of configurations we simulated. We evaluate how
                  accurate various existing and proposed branch
                  prediction schemes are on a number of interpreters,
                  how the mispredictions affect the performance of the
                  interpreters and how two different interpreter
                  implementation techniques perform with various
                  branch predictors. We also suggest various ways in
                  which hardware designers, C compiler writers, and
                  interpreter writers can improve the performance of
                  interpreters.}
}

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/