Groups > comp.lang.python > #2310 > unrolled thread

a basic bytecode to machine code compiler

Started by	Rouslan Korneychuk <rouslank@msn.com>
First post	2011-03-31 18:33 -0400
Last post	2011-04-01 15:31 -0400
Articles	11 — 7 participants

Back to article view | Back to comp.lang.python

  a basic bytecode to machine code compiler Rouslan Korneychuk <rouslank@msn.com> - 2011-03-31 18:33 -0400
    Re: a basic bytecode to machine code compiler Stefan Behnel <stefan_ml@behnel.de> - 2011-04-01 02:05 +0200
    Re: a basic bytecode to machine code compiler Terry Reedy <tjreedy@udel.edu> - 2011-03-31 20:52 -0400
    Re: a basic bytecode to machine code compiler Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-04-01 12:57 +0000
      Re: a basic bytecode to machine code compiler Stefan Behnel <stefan_ml@behnel.de> - 2011-04-01 17:45 +0200
        Re: a basic bytecode to machine code compiler Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-04-02 10:04 +0000
          Re: a basic bytecode to machine code compiler Stefan Behnel <stefan_ml@behnel.de> - 2011-04-02 12:30 +0200
            Re: a basic bytecode to machine code compiler John Nagle <nagle@animats.com> - 2011-04-02 12:05 -0700
              Re: a basic bytecode to machine code compiler Paul Rubin <no.email@nospam.invalid> - 2011-04-02 14:32 -0700
              Re: a basic bytecode to machine code compiler Robert Kern <robert.kern@gmail.com> - 2011-04-02 19:12 -0500
    Re: a basic bytecode to machine code compiler Rouslan Korneychuk <rouslank@msn.com> - 2011-04-01 15:31 -0400

#2310 — a basic bytecode to machine code compiler

From	Rouslan Korneychuk <rouslank@msn.com>
Date	2011-03-31 18:33 -0400
Subject	a basic bytecode to machine code compiler
Message-ID	<4j7lp.5204$sS4.1784@newsfe11.iad>

I was looking at the list of bytecode instructions that Python uses and 
I noticed how much it looked like assembly. So I figured it can't be to 
hard to convert this to actual machine code, to get at least a small 
boost in speed.

And so I whipped up a proof of concept, available at 
https://github.com/Rouslan/nativecompile

I'm aware that PyPy already has a working JIT compiler, but I figure it 
will be a long time before they have a version of Python that is ready 
for everybody to use, so this could be useful in the mean time.

I chose to create this for the latest stable version of Python and I 
happen to use some functionality that is only available since Python 3.2.

The basic usage is:

 >>> import nativecompile
 >>> bcode = compile('print("Hello World!")','<string>','exec')
 >>> mcode = nativecompile.compile(bcode)
 >>> mcode()
Hello World!


This compiler does absolutely nothing clever. The only difference 
between the bytecode version and the compiled version is there is no 
interpreter loop and the real stack is used instead of an array.

Most of it is written in Python itself. There is one module written in C 
that does the things that cannot easily be done in pure Python, such as 
get the addresses of API functions and to execute the newly created code.

So far I have only implemented a few bytecode instructions and only have 
32-bit x86-compatible support. I have only tested this on Linux. It 
might work on Windows but only if you can run programs without any sort 
of data execution prevention (I can fix that if anyone wants). And I'm 
sure more optimized machine code can be generated (such as rearranging 
the error checking code to work better with the CPU's branch predictor).

Since so few instructions are implemented I haven't done any benchmarks.


What do people think? Would I be wasting my time going further with this?

[toc] | [next] | [standalone]

#2318

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2011-04-01 02:05 +0200
Message-ID	<mailman.55.1301616327.2990.python-list@python.org>
In reply to	#2310

Rouslan Korneychuk, 01.04.2011 00:33:
> I was looking at the list of bytecode instructions that Python uses and I
> noticed how much it looked like assembly. So I figured it can't be to hard
> to convert this to actual machine code, to get at least a small boost in
> speed.

I think I recall having read about something like this before, but I can't 
quite find it. In any case, you may want to take a look at both Cython and 
Psyco.

Stefan

[toc] | [prev] | [next] | [standalone]

#2323

From	Terry Reedy <tjreedy@udel.edu>
Date	2011-03-31 20:52 -0400
Message-ID	<mailman.58.1301619169.2990.python-list@python.org>
In reply to	#2310

On 3/31/2011 6:33 PM, Rouslan Korneychuk wrote:
> I was looking at the list of bytecode instructions that Python uses and
> I noticed how much it looked like assembly. So I figured it can't be to
> hard to convert this to actual machine code, to get at least a small
> boost in speed.
>
> And so I whipped up a proof of concept, available at
> https://github.com/Rouslan/nativecompile
>
> I'm aware that PyPy already has a working JIT compiler, but I figure it
> will be a long time before they have a version of Python that is ready
> for everybody to use, so this could be useful in the mean time.

I believe PyPy folk think it ready now, at least for some uses. 
Speedwise, it is more or less is now comparable to CPython, winning some 
benchmarks, losing others.
...
> What do people think? Would I be wasting my time going further with this?

Depends on whether your goal is personal (learning, fun, use) or 
usefulness to others. For the latter, you *might* do better to help with 
an existing project, such as Cython or Dufour's ShedSkin.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#2350

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2011-04-01 12:57 +0000
Message-ID	<4d95cba6$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to	#2310

On Thu, 31 Mar 2011 18:33:36 -0400, Rouslan Korneychuk wrote:

> I'm aware that PyPy already has a working JIT compiler, but I figure it
> will be a long time before they have a version of Python that is ready
> for everybody to use, so this could be useful in the mean time.

PyPy is ready to use *now*, if you are happy writing code that targets 
Python 2.5 and don't need C extensions.

[...]
> What do people think? Would I be wasting my time going further with
> this?

Depends on what your ultimate aim is. If it is to learn things yourself, 
then it is never a waste of time to learn new things.

If your aim is to get a good working project that you can be proud to put 
on your CV, then go right ahead.

If your aim is to contribute to a Python compiler that will actually be 
used by people other than yourself, I'm not so sure... personally, I 
expect that PyPy is the future of Python optimizing compilers, but I 
could be wrong.

I suggest you check out the competitors:

Shedskin is a Python to C++ compiler;
Psyco is a JIT specialising compiler;
Nuitka claims to be a C++ implementation that compiles to machine code; 
Berp claims to be a Haskell implementation that does the same;
Compyler claims to be a native x86 assembly compiler;
UnPython claims to be an experimental Python to C compiler.

Of the six, as far as I know only Shedskin and Psyco are widely used. 

Good luck, and remember: 

Release early, release often, and let the community know when you do!

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#2367

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2011-04-01 17:45 +0200
Message-ID	<mailman.80.1301672757.2990.python-list@python.org>
In reply to	#2350

Steven D'Aprano, 01.04.2011 14:57:
> I suggest you check out the competitors:
>
> Shedskin is a Python to C++ compiler;
> Psyco is a JIT specialising compiler;
> Nuitka claims to be a C++ implementation that compiles to machine code;
> Berp claims to be a Haskell implementation that does the same;
> Compyler claims to be a native x86 assembly compiler;
> UnPython claims to be an experimental Python to C compiler.
>
>
> Of the six, as far as I know only Shedskin and Psyco are widely used.

Erm, yes, right. If you want to exclude Cython, which arguably is the only 
static Python compiler that actually has a large user base, then those may 
really be the only two that are widely used. Except that Psyco is certainly 
being used a lot more often than Shedskin, mainly because it actually 
allows you to execute Python code.

Stefan

[toc] | [prev] | [next] | [standalone]

#2446

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2011-04-02 10:04 +0000
Message-ID	<4d96f49b$0$29992$c3e8da3$5496439d@news.astraweb.com>
In reply to	#2367

On Fri, 01 Apr 2011 17:45:39 +0200, Stefan Behnel wrote:

> Steven D'Aprano, 01.04.2011 14:57:
>> I suggest you check out the competitors:
>>
>> Shedskin is a Python to C++ compiler; Psyco is a JIT specialising
>> compiler; Nuitka claims to be a C++ implementation that compiles to
>> machine code; Berp claims to be a Haskell implementation that does the
>> same; Compyler claims to be a native x86 assembly compiler; UnPython
>> claims to be an experimental Python to C compiler.
>>
>>
>> Of the six, as far as I know only Shedskin and Psyco are widely used.
> 
> Erm, yes, right. If you want to exclude Cython, which arguably is the
> only static Python compiler that actually has a large user base, then
> those may really be the only two that are widely used. Except that Psyco
> is certainly being used a lot more often than Shedskin, mainly because
> it actually allows you to execute Python code.

My apologies, I thought about including Cython in the list, but my 
understanding of it is that it is a derivative of Pyrex, and used for 
writing C extensions in a Python-like language (Python + type 
annotations). We were talking about talking ordinary, unmodified Python 
code and compiling it to machine code, and I didn't think either Pyrex or 
Cython do that.




-- 
Steven

[toc] | [prev] | [next] | [standalone]

#2448

From	Stefan Behnel <stefan_ml@behnel.de>
Date	2011-04-02 12:30 +0200
Message-ID	<mailman.124.1301740231.2990.python-list@python.org>
In reply to	#2446

Steven D'Aprano, 02.04.2011 12:04:
> On Fri, 01 Apr 2011 17:45:39 +0200, Stefan Behnel wrote:
>
>> Steven D'Aprano, 01.04.2011 14:57:
>>> I suggest you check out the competitors:
>>>
>>> Shedskin is a Python to C++ compiler; Psyco is a JIT specialising
>>> compiler; Nuitka claims to be a C++ implementation that compiles to
>>> machine code; Berp claims to be a Haskell implementation that does the
>>> same; Compyler claims to be a native x86 assembly compiler; UnPython
>>> claims to be an experimental Python to C compiler.
>>>
>>>
>>> Of the six, as far as I know only Shedskin and Psyco are widely used.
>>
>> Erm, yes, right. If you want to exclude Cython, which arguably is the
>> only static Python compiler that actually has a large user base, then
>> those may really be the only two that are widely used. Except that Psyco
>> is certainly being used a lot more often than Shedskin, mainly because
>> it actually allows you to execute Python code.
>
> My apologies, I thought about including Cython in the list, but my
> understanding of it is that it is a derivative of Pyrex, and used for
> writing C extensions in a Python-like language (Python + type
> annotations). We were talking about talking ordinary, unmodified Python
> code and compiling it to machine code, and I didn't think either Pyrex or
> Cython do that.

Ok, no problem. Pyrex certainly doesn't play in the same league.

Cython actually supports most Python language features now (including 
generators in the development branch), both from Python 2 and Python 3. 
Chances are that the next release will actually compile most of your Python 
code unchanged, or only with minor adaptations.

Stefan

[toc] | [prev] | [next] | [standalone]

#2465

From	John Nagle <nagle@animats.com>
Date	2011-04-02 12:05 -0700
Message-ID	<4d977362$0$10551$742ec2ed@news.sonic.net>
In reply to	#2448

On 4/2/2011 3:30 AM, Stefan Behnel wrote:
> Cython actually supports most Python language features now (including
> generators in the development branch), both from Python 2 and Python 3.
> Chances are that the next release will actually compile most of your
> Python code unchanged, or only with minor adaptations.

     Cython requires the user to insert type declarations, though, to
get a speed improvement.

     Shed Skin has a good type inference system, but it insists on
a unique type for each object (which includes "null"; a type of
"str or null" is not acceptable).  The rest of Shed Skin, outside
the type inference system, is not very well developed.

     There's a path there to a fast Python with some restrictions.
The Shed Skin inference engine with the Cython engine might have
potential.

     PyPy gets some speedups, mostly by recognizing when numbers can be
unboxed.  (CPython carries around a CObject for every integer; that's
what's meant by "boxing")  But outside of number-crunching, PyPy
doesn't show big gains.  And the whole PyPy JIT system is far
too complex.  They try to retain all of Python's dynamism, and
as a result, they need to be able to bail from compiled code
to one of two different interpreters, then recompile and go
back to compiled mode.

     Unladen Swallow failed to produce much of a speed improvement
and Google pulled the plug.  It's time to look at alternatives.

     There's no easy way to speed up Python; that's been tried.
It needs either a very, very elaborate JIT system, more complex
than the ones for Java or Self, or some language restrictions.
The main restriction I would impose is to provide a call that says:
"OK, we're done with loading, initialization, and configuration.
Now freeze the code."  At that moment, all the global
analysis and compiling takes place.  This allows getting rid
of the GIL and getting real performance out of multithread
CPUs.

				John Nagle

[toc] | [prev] | [next] | [standalone]

#2471

From	Paul Rubin <no.email@nospam.invalid>
Date	2011-04-02 14:32 -0700
Message-ID	<7x4o6ge3vz.fsf@ruckus.brouhaha.com>
In reply to	#2465

John Nagle <nagle@animats.com> writes:
>     There's no easy way to speed up Python; that's been tried.
> It needs either a very, very elaborate JIT system, more complex
> than the ones for Java or Self, or some language restrictions.

Is it worse than Javascript?  Tracemonkey and its descendants produce
pretty fast code, I think.

> The main restriction I would impose is to provide a call that says:
> "OK, we're done with loading, initialization, and configuration.
> Now freeze the code."  At that moment, all the global
> analysis and compiling takes place.  This allows getting rid
> of the GIL and getting real performance out of multithread CPUs.

That's quite an interesting idea.  I do think a lot of production Python
code implicitly depends on the GIL and would need rework for multicore.
For example, code that expects "n += 1" to be atomic, because the
CPython bytecode interpreter won't switch threads in the middle of it.

[toc] | [prev] | [next] | [standalone]

#2478

From	Robert Kern <robert.kern@gmail.com>
Date	2011-04-02 19:12 -0500
Message-ID	<mailman.142.1301789559.2990.python-list@python.org>
In reply to	#2465

On 4/2/11 2:05 PM, John Nagle wrote:

> There's no easy way to speed up Python; that's been tried.
> It needs either a very, very elaborate JIT system, more complex
> than the ones for Java or Self, or some language restrictions.
> The main restriction I would impose is to provide a call that says:
> "OK, we're done with loading, initialization, and configuration.
> Now freeze the code." At that moment, all the global
> analysis and compiling takes place. This allows getting rid
> of the GIL and getting real performance out of multithread
> CPUs.

That is not the reason the GIL exists, and being able to "freeze" the code will 
not let you get rid of the GIL alone. CPython's GIL exists to protect the data 
structures internal to its implementation and provide implicit protection to C 
extension modules (making them relatively easy to write threadsafe). Reference 
counts are the primary data structures being protected. IronPython and Jython 
allow just as much dynamism as CPython, and they have no mechanism for 
"freezing" the code, but they do not have a GIL. I believe this has been pointed 
out to you before, but I don't think I've seen you acknowledge it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [next] | [standalone]

#2386

From	Rouslan Korneychuk <rouslank@msn.com>
Date	2011-04-01 15:31 -0400
Message-ID	<PKplp.3047$0s5.2777@newsfe17.iad>
In reply to	#2310

Thanks for all the replies. I wasn't aware of some of these 
alternatives. Most of these seem to transform Python code/bytecode into 
another language. I was already well aware of Cython. On the Nuitka 
blog, I notice it says "Compiling takes a lot [sic] time, ...". Compyler 
seems to generate assembly and then parse the assembly to generate a 
Windows exe. Berp turns python into Haskell, not directly into machine code.

The closest thing to mine seems to be Psyco. It tries to do something 
more ambitious. It analyzes the program while it's running to create 
specialized versions of certain functions. High memory usage seems to be 
an issue with Psyco.

My approach is to simply translate the bytecode into raw machine code as 
directly as possible, quickly and without using much memory. Basically I 
was going for a solution with no significant drawbacks. It was also 
meant to be very easy to maintain. The machine code is generated with a 
series of functions that very closely mirrors AT&T syntax (same as the 
default syntax for the GNU assembler) with some convenience functions 
that make it look like some kind of high-level assembly. For example, 
here is the implementation for LOAD_GLOBAL:

@hasname
def _op_LOAD_GLOBAL(f,name):
     return (
         f.stack.push_tos(True) + [
         ops.push(address_of(name)),
         ops.push(GLOBALS)
     ] + call('PyDict_GetItem') +
         if_eax_is_zero([
             discard_stack_items(1),
             ops.push(BUILTINS)
         ] + call('PyDict_GetItem') +
             if_eax_is_zero([
                 discard_stack_items(1),
                 ops.push(pyinternals.raw_addresses[
                     'GLOBAL_NAME_ERROR_MSG']),
                 ops.push(pyinternals.raw_addresses['PyExc_NameError'])
             ] + call('format_exc_check_arg') + [
                 goto(f.end)
             ])
         ) + [
         discard_stack_items(2)
     ])

To make sense of it, you just need to ignore the square brackets and 
plus signs (they are there to create a list that gets joined into one 
byte string at the very end) and imagine it's assembly code (I should 
probably just write a variadic function or use operator overloading to 
make this syntactically clearer). Any time a new version of Python is 
released, you would just run diff on Python-X.X/Python/ceval.c and see 
which op codes need updating. You wouldn't need to make a broad series 
of changes because of a minor change in Python's syntax or bytecode.

And that's just one of the larger op code implementations. Here's the 
one for LOAD_CONST:

@hasconst
def _op_LOAD_CONST(f,const):
     return f.stack.push_tos() + [f.stack.push(address_of(const))]


Anyway, It was certainly interesting working on this. I'll probably at 
least implement looping and arithmetic so I can have something 
meaningful to benchmark.

[toc] | [prev] | [standalone]

csiph-web

a basic bytecode to machine code compiler

Contents

#2310 — a basic bytecode to machine code compiler

#2318

#2323

#2350

#2367

#2446

#2448

#2465

#2471

#2478

#2386