Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: Is This a Dumb Idea? paralellizing byte codes Date: Fri, 28 Oct 2022 17:06:55 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-10-062@comp.compilers> References: <22-10-046@comp.compilers> <22-10-048@comp.compilers> <22-10-056@comp.compilers> <22-10-059@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="24437"; mail-complaints-to="abuse@iecc.com" Keywords: interpreter, optimize Posted-Date: 30 Oct 2022 00:50:20 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3232 Alain Ketterlin writes: >anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: > >> Alain Ketterlin writes: >>>I've heard/read several times that byte-code micro-optimizations are not >>>worth the trouble. ... >This is not directly related to the paper I mention later. I was talking >about optimizing bytecode vs. compiler optimizations. I know of no >interpreter doing elaborate static byte-code optimization. If I understand you correctly, you mean optimizations that the compiler that generates "byte code" performs, e.g., stuff like partial redundancy elimination. I expect that these optimizations are as effective for virtual machine code as for native (i.e., real-machine) code, but if you want to go to these lengths, you use a native-code compiler. And for systems that uses a JIT compiler (i.e., a two stage process: source -> VM (aka byte code) -> native code), the preferred place for putting these optimizations is in the second stage (probably because it enables optimization decisions with consideration of the target machine). There have been some efforts to have analysis at the source code level (or anyway, before JIT compilation), and embed the results as optional component in the .class file to speed up JIT compilation, but has this made it into production systems? Otherwise: I dimly remember optimizations by Prolog compilers that generate WAM (Warren abstract machine) code. >>>https://ieeexplore.ieee.org/document/7054191 https://hal.inria.fr/hal-01100647/document >I'm glad it works for you. What's "it"? Anyway you miss the point: The paper suggests that one should just write a switch-based interpreter and that more advanced techniques are no longer needed. My results disprove this, on the same hardware that they base their claims on. Branch mispredictions may play a smaller role now than they used to, but apparently there are other reasons that make the more advanced techniques still very profitable. This was somewhat surprising for me, too. We also did some work with simulations of more advanced branch predictors in this context [ertl&gregg03jilp], so I expected the performance benefits of our advanced techniques to diminish significantly when the hardware acquires such techniques, but I never really saw that happen. And that's even on hardware that has very good indirect branch prediction (as Rohou et al. showed). @Article{ertl&gregg03jilp, author = {M. Anton Ertl and David Gregg}, title = {The Structure and Performance of \emph{Efficient} Interpreters}, journal = {The Journal of Instruction-Level Parallelism}, year = {2003}, volume = {5}, month = nov, url = {http://www.complang.tuwien.ac.at/papers/ertl%26gregg03jilp.ps.gz}, url2 = {http://www.jilp.org/vol5/v5paper12.pdf}, note = {http://www.jilp.org/vol5/}, abstract = {Interpreters designed for high general-purpose performance typically perform a large number of indirect branches (3.2\%--13\% of all executed instructions in our benchmarks). These branches consume more than half of the run-time in a number of configurations we simulated. We evaluate how accurate various existing and proposed branch prediction schemes are on a number of interpreters, how the mispredictions affect the performance of the interpreters and how two different interpreter implementation techniques perform with various branch predictors. We also suggest various ways in which hardware designers, C compiler writers, and interpreter writers can improve the performance of interpreters.} } - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/