Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Hans-Peter Diettrich Newsgroups: comp.compilers Subject: Re: Is This a Dumb Idea? paralellizing byte codes Date: Sun, 23 Oct 2022 02:21:53 +0200 Organization: Compilers Central Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-10-050@comp.compilers> References: <22-10-046@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="89666"; mail-complaints-to="abuse@iecc.com" Keywords: parallel, optimize, comment Posted-Date: 22 Oct 2022 22:57:36 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-10-046@comp.compilers> Xref: csiph.com comp.compilers:3220 On 10/22/22 8:00 PM, Jon Forrest wrote: > Modern CPUs employ all kinds of clever techniques to improve > instruction level parallelism (ILP). I was wondering if it > makes sense to try to employ similar techniques in the > virtual machines used to execute byte code produced by language > compilers. > > By that I mean what if virtual machines were to examine byte code > streams to detect when it would be safe to execute multiple > byte codes concurrently? I came across instruction reordering when writing decompilers for various machines. The solution was easy to find, but not so easy to implement, dependency analysis between successive instructions. On which register contents or state flags does each instruction depend, and which was the most recent instruction that changed these items? Less important for decompilation are dependencies on memory contents. I found it sufficient and safe to execute all memory writing instructions preceding a memory read, so that memory writes and their addresses did not require special handling. If the same analysis is possible with virtual machine code then it will be possible to emulate multiple instructions in parallel. > I have no idea if the overhead of the byte code examination would > exceed any advantage of the concurrent execution, although it's > important to point out that this examination would only have to > be done once, and the results could somehow be stored along with > the byte code. Of course, if the byte code changes the examination > would have to be done again. The next step were compilation of the virtual machine code into physical machine code. This also can be done once, in most cases, and then the compiled version can be much faster than the interpreted code. But here a word of warning, also from my decompiler research: When Microsoft announced phantastic speedup of compiled vs. interpreted Visual Basic, the 7 times speed gain was reduced to more realistic "up to 30%" soon, and even then they did not succeed in compiling to really equivalent code. AFAIR the biggest problem was the Variant type, that left not much room for optimizations. The presented example code was a real hall of shame, where desperate coders tried whatever tricks to evaluate the simplest boolean expressions which were randomized by the infamous compiler :-( To be honest: following VB versions came with a much better compiler, but even then the speed gain was, hmmm, noticeable. A look at the .NET compiled VB code revealed how much housekeeping was required only with line numbers. > This idea, if it works, would be a great way to take advantage of > multiple cores without having to rewrite any user code. The big > question is whether it would work. Automated decompilation and re-compilation to native code may be the most promising approach for a real speed explosion. Provided there are no line numbers, Variants, On Error statements and other BASIC brake shoes in the virtual machine. But here again an observation: If fully automated decompilation is not possible then don't do it! Else you'll spend more time in the manual analysis of the next program version than you gain in faster execution. Just my $0.02 DoDi [I'm wondering whether coarse-grain analysis might be worth it, e.g., look at two Java functions and see that they share no writable data so you can run the entire functions in parallel. -John]