Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: gah4 Newsgroups: comp.compilers Subject: Re: Is This a Dumb Idea? paralellizing byte codes Date: Sat, 22 Oct 2022 23:50:51 -0700 (PDT) Organization: Compilers Central Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-10-051@comp.compilers> References: <22-10-046@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="82455"; mail-complaints-to="abuse@iecc.com" Keywords: parallel, interpreter Posted-Date: 23 Oct 2022 12:27:28 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-10-046@comp.compilers> Xref: csiph.com comp.compilers:3221 On Saturday, October 22, 2022 at 11:51:31 AM UTC-7, nob...@gmail.com wrote: > Modern CPUs employ all kinds of clever techniques to improve > instruction level parallelism (ILP). I was wondering if it > makes sense to try to employ similar techniques in the > virtual machines used to execute byte code produced by language > compilers. I always find the idea of calling any instruction set designed to be interpreted (by a program written in a high level language) a byte code. It seems to me that no such language is more byte-oriented than VAX. (Though VAX was designed to be interpreted by microcode, so maybe that counts.) In any case, one reason VAX didn't stay around as long as its designers expected, was the difficulty in parallel execution. Instructions have up to 6 operands, each of which has an addressing mode byte, possibly an index operation, and then the appropriate, and variable number of bytes, for the specified address mode. The mode specified by each mode byte determines where the next mode byte is. On the other hand, the instruction bits for RISC-V are arranged in an unusual order, such that related bits stay in the same position. > By that I mean what if virtual machines were to examine byte code > streams to detect when it would be safe to execute multiple > byte codes concurrently? Then, based on its findings, the virtual > machine would execute as many byte codes concurrently as is safe. Pipelined hardware can make some of those decisions in parallel. But okay, in the case of JIT compilers, if it can figure it out once and use it many times, then it might work. > I have no idea if the overhead of the byte code examination would > exceed any advantage of the concurrent execution, although it's > important to point out that this examination would only have to > be done once, and the results could somehow be stored along with > the byte code. Of course, if the byte code changes the examination > would have to be done again. Done once and results stored is pretty much how JIT works. > I'm also worried that internal virtual machine locking requirements > might make this idea infeasible. For example, in a virtual machine with > a global interpreter lock, would it be possible for there to be any > concurrent execution? > > This idea, if it works, would be a great way to take advantage of > multiple cores without having to rewrite any user code. The big > question is whether it would work. As someone else mentioned, vectors (and vector processors) are the favorite way to do this. JIT should work well for vectors. As for fine-grain parallelism, there is OpenMP for explicit (requested by the programmer) parallel execution, based on threads and thread synchronization. I am not sure at all how much time is spent keeping the threads doing what they are supposed to do, and not doing actual work. There is still Amdahl's law. As well as I know, OpenMP works best on vector operations, which would, of course, be convenient to run on actual vector processors. (Too bad there aren't any more running Cray-1 machines.) So, what you didn't ask, how about explicit requests like OpenMP?