Parallelizing byte codes

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups	comp.compilers
Subject	Parallelizing byte codes
Date	2022-10-23 10:17 +0300
Organization	Compilers Central
Message-ID	<22-10-053@comp.compilers> (permalink)
References	<22-10-046@comp.compilers>

Show all headers | View raw

This is not necessarily a dumb idea.  However, the details are important.

Relevant important factors are the cost communication and
synchronization.  Fine grained parallelism often flounders on those
aspects.

ILP works in things like modern x86 processors, because the x86 is
effectively a very odd byte code developed under severe constraints.
The code an out-of-order x86 processor runs doesn't really look like
the x86 and tricks like register renaming internal micro-ops make that
possible.

However, at the more generic byte code level, you have different
things to worry about (unless you are doing an actual hardware
design).  The impact of the software framework that is used to
implement the byte code interpreter can take a large toll.  That's one
of the reasons hotspot compilers are effective.  They can remove that
framework.  They can do that fine grained parallelism which mimics
what goes on behind the scenes in an out-of-order processor.

We were able to exploit that in Hyperscan (a software regular
expression accelerator) that Intel acquired, optimized, and then open
sourced.  That was essentially my last project at Intel.  We took an
NFA, turned it into upto 8 parallel DFAs, but then wrote a specialized
interpreter that interleaved the 8 DFAs, with the help of VTune.
Doing that and interleaving the code carefully to reduce instruction
stalls (which were mostly due to memory reads), we managed to run the
8 DFAs at the same performance as the original DFA code ran 1, simply
by effectively filling up "delay slots" that were implicit in the
code.  Note the NFA and DFAs were essentially byte code, it was just
the interpreter we tweaked.

A different example was exploited at NuoDB, where we were optimizing
SQL queries.  We did this by noting that most Queries deal with
multiple rows, and changed the byte code emitted to pass through the
rows.  The interpreter then would do the rows as batches, meaning that
1 byte code operation could operate on 1000 rows concurrently,
reducing the overhead by the batch size.  This was a speed up even
without using the vector instructions on the machine.

But notice in both cases, we did analysis to figure out what the
relevant parallelism was.  It was not ad hoc.  And, it was cases where
we had large enough and slow enough sections that were key
bottlenecks. Very rarely can one take serial code and find that
parallelism without having a model that exposes it.

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------

Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

Is This a Dumb Idea? paralellizing byte codes Jon Forrest <nobozo@gmail.com> - 2022-10-22 11:00 -0700
  Re: Is This a Dumb Idea? paralellizing byte codes Alain Ketterlin <alain@universite-de-strasbourg.fr> - 2022-10-22 23:50 +0200
    Re: Is This a Dumb Idea? paralellizing byte codes anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-10-23 13:16 +0000
      Re: Is This a Dumb Idea? paralellizing byte codes Alain Ketterlin <alain@universite-de-strasbourg.fr> - 2022-10-23 21:29 +0200
        Re: Is This a Dumb Idea? paralellizing byte codes anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-10-28 17:06 +0000
  Re: Is This a Dumb Idea? paralellizing byte codes Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-10-23 02:21 +0200
  Re: Is This a Dumb Idea? paralellizing byte codes gah4 <gah4@u.washington.edu> - 2022-10-22 23:50 -0700
  Parallelizing byte codes Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-10-23 10:17 +0300
  Re: Is This a Dumb Idea? paralellizing byte codes anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-10-23 12:33 +0000
  Re: Is This a Dumb Idea? paralellizing byte codes gah4 <gah4@u.washington.edu> - 2022-10-26 18:18 -0700
    Re: Is This a Dumb Idea? paralellizing byte codes Kaz Kylheku <864-117-4973@kylheku.com> - 2022-10-27 14:51 +0000
    Re: Is This a Dumb Idea? paralellizing byte codes anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-10-29 09:06 +0000

csiph-web