Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups: comp.compilers
Subject: Re: Is This a Dumb Idea? paralellizing byte codes
Date: Sun, 23 Oct 2022 02:21:53 +0200
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-10-050@comp.compilers>
References: <22-10-046@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="89666"; mail-complaints-to="abuse@iecc.com"
Keywords: parallel, optimize, comment
Posted-Date: 22 Oct 2022 22:57:36 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-10-046@comp.compilers>
Xref: csiph.com comp.compilers:3220

On 10/22/22 8:00 PM, Jon Forrest wrote:
> Modern CPUs employ all kinds of clever techniques to improve
> instruction level parallelism (ILP). I was wondering if it
> makes sense to try to employ similar techniques in the
> virtual machines used to execute byte code produced by language
> compilers.

>
> By that I mean what if virtual machines were to examine byte code
> streams to detect when it would be safe to execute multiple
> byte codes concurrently?

I came across instruction reordering when writing decompilers for
various machines. The solution was easy to find, but not so easy to
implement, dependency analysis between successive instructions. On which
register contents or state flags does each instruction depend, and which
was the most recent instruction that changed these items?

Less important for decompilation are dependencies on memory contents. I
found it sufficient and safe to execute all memory writing instructions
preceding a memory read, so that memory writes and their addresses did
not require special handling.

If the same analysis is possible with virtual machine code then it will
be possible to emulate multiple instructions in parallel.


> I have no idea if the overhead of the byte code examination would
> exceed any advantage of the concurrent execution, although it's
> important to point out that this examination would only have to
> be done once, and the results could somehow be stored along with
> the byte code. Of course, if the byte code changes the examination
> would have to be done again.

The next step were compilation of the virtual machine code into physical
machine code. This also can be done once, in most cases, and then the
compiled version can be much faster than the interpreted code.

But here a word of warning, also from my decompiler research:

When Microsoft announced phantastic speedup of compiled vs. interpreted
Visual Basic, the 7 times speed gain was reduced to more realistic "up
to 30%" soon, and even then they did not succeed in compiling to really
equivalent code. AFAIR the biggest problem was the Variant type, that
left not much room for optimizations. The presented example code was a
real hall of shame, where desperate coders tried whatever tricks to
evaluate the simplest boolean expressions which were randomized by the
infamous compiler :-(

To be honest: following VB versions came with a much better compiler,
but even then the speed gain was, hmmm, noticeable. A look at the .NET
compiled VB code revealed how much housekeeping was required only with
line numbers.

> This idea, if it works, would be a great way to take advantage of
> multiple cores without having to rewrite any user code. The big
> question is whether it would work.

Automated decompilation and re-compilation to native code may be the
most promising approach for a real speed explosion. Provided there are
no line numbers, Variants, On Error statements and other BASIC brake
shoes in the virtual machine. But here again an observation:

If fully automated decompilation is not possible then don't do it! Else
you'll spend more time in the manual analysis of the next program
version than you gain in faster execution.

Just my $0.02
   DoDi
[I'm wondering whether coarse-grain analysis might be worth it, e.g.,
look at two Java functions and see that they share no writable data so
you can run the entire functions in parallel. -John]