Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Subject: Re: Is This a Dumb Idea? paralellizing byte codes
Date: Sat, 22 Oct 2022 23:50:51 -0700 (PDT)
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-10-051@comp.compilers>
References: <22-10-046@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="82455"; mail-complaints-to="abuse@iecc.com"
Keywords: parallel, interpreter
Posted-Date: 23 Oct 2022 12:27:28 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-10-046@comp.compilers>
Xref: csiph.com comp.compilers:3221

On Saturday, October 22, 2022 at 11:51:31 AM UTC-7, nob...@gmail.com wrote:
> Modern CPUs employ all kinds of clever techniques to improve
> instruction level parallelism (ILP). I was wondering if it
> makes sense to try to employ similar techniques in the
> virtual machines used to execute byte code produced by language
> compilers.

I always find the idea of calling any instruction set designed to be
interpreted (by a program written in a high level language) a byte
code.

It seems to me that no such language is more byte-oriented than VAX.
(Though VAX was designed to be interpreted by microcode, so maybe that
counts.)

In any case, one reason VAX didn't stay around as long as its
designers expected, was the difficulty in parallel execution.

Instructions have up to 6 operands, each of which has an addressing
mode byte, possibly an index operation, and then the appropriate, and
variable number of bytes, for the specified address mode. The mode
specified by each mode byte determines where the next mode byte is.

On the other hand, the instruction bits for RISC-V are arranged in an
unusual order, such that related bits stay in the same position.

> By that I mean what if virtual machines were to examine byte code
> streams to detect when it would be safe to execute multiple
> byte codes concurrently? Then, based on its findings, the virtual
> machine would execute as many byte codes concurrently as is safe.

Pipelined hardware can make some of those decisions in parallel.

But okay, in the case of JIT compilers, if it can figure it out once
and use it many times, then it might work.

> I have no idea if the overhead of the byte code examination would
> exceed any advantage of the concurrent execution, although it's
> important to point out that this examination would only have to
> be done once, and the results could somehow be stored along with
> the byte code. Of course, if the byte code changes the examination
> would have to be done again.

Done once and results stored is pretty much how JIT works.

> I'm also worried that internal virtual machine locking requirements
> might make this idea infeasible. For example, in a virtual machine with
> a global interpreter lock, would it be possible for there to be any
> concurrent execution?
>
> This idea, if it works, would be a great way to take advantage of
> multiple cores without having to rewrite any user code. The big
> question is whether it would work.

As someone else mentioned, vectors (and vector processors)
are the favorite way to do this. JIT should work well for vectors.

As for fine-grain parallelism, there is OpenMP for explicit
(requested by the programmer) parallel execution, based
on threads and thread synchronization.

I am not sure at all how much time is spent keeping the
threads doing what they are supposed to do, and not
doing actual work.   There is still Amdahl's law.

As well as I know, OpenMP works best on vector
operations, which would, of course, be convenient
to run on actual vector processors.  (Too bad there
aren't any more running Cray-1 machines.)

So, what you didn't ask, how about explicit requests
like OpenMP?