Groups > comp.lang.python > #108566 > unrolled thread

Calling python from C with OpenMP

Started by	oysteijo@gmail.com
First post	2016-05-12 12:28 -0700
Last post	2016-05-13 19:09 -0700
Articles	7 — 6 participants

Back to article view | Back to comp.lang.python

  Calling python from C with OpenMP oysteijo@gmail.com - 2016-05-12 12:28 -0700
    Re: Calling python from C with OpenMP Sturla Molden <sturla.molden@gmail.com> - 2016-05-13 00:04 +0000
      Re: Calling python from C with OpenMP Øystein Schønning-Johansen <oysteijo@gmail.com> - 2016-05-13 09:22 -0700
        Re: Calling python from C with OpenMP MRAB <python@mrabarnett.plus.com> - 2016-05-13 18:12 +0100
          Re: Calling python from C with OpenMP Øystein Schønning-Johansen <oysteijo@gmail.com> - 2016-05-13 13:16 -0700
            Re: Calling python from C with OpenMP Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-05-13 20:04 -0400
              Re: Calling python from C with OpenMP Paul Rubin <no.email@nospam.invalid> - 2016-05-13 19:09 -0700

#108566 — Calling python from C with OpenMP

From	oysteijo@gmail.com
Date	2016-05-12 12:28 -0700
Subject	Calling python from C with OpenMP
Message-ID	<8224bdd2-9afe-487b-804b-f3b88dee2028@googlegroups.com>

Hi,

I have a framework written in C and I need to call Python from that framework. I have written the code, and it runs fine, however when I recompile with OpenMP enabled, I get segmentation faults and some times an error message: 

Fatal Python error: GC object already tracked

I'm able to reconstruct the bug with this simple code:

/* main.c */
#include <Python.h>
#include <omp.h>
int main()
{
    wchar_t *program = Py_DecodeLocale( "My_problem", NULL );
    if( !program ){
        fprintf(stderr, "cannot decode python program name\n");
        return -1;
    }

    Py_SetProgramName( program );
    Py_Initialize();

    PyObject *sys = PyImport_ImportModule("sys");
    PyObject *path = PyObject_GetAttrString(sys, "path");
    PyList_Append(path, PyUnicode_FromString("."));

    PyObject *module_filename = PyUnicode_FromString( "multiplier" );
    if(!module_filename){
        printf("Cannot create python module  multiplier.\n");
        return -1;
    }
    PyObject *module = PyImport_Import( module_filename );
    if(!module){
        printf("Cannot create python.\n");
        return -1;
    }

    PyObject *mult_obj = PyObject_CallMethod( module ,"multiplier", "i", 7);
    if(!mult_obj){
        printf("Cannot create python multiplier class instance\n");
        return -1;
    }

    Py_DECREF( module );
    Py_DECREF( module_filename );
    Py_DECREF( path );
    Py_DECREF( sys );
    /* Up to now we have actually done:
     * >>> import multiplier
     * >>> mult_obj = multipier.multiplier(7)
     */

    /* lets try something like:
     * >>> for x in range(10):
     * ...     printf(mult_obj.do_multiply(x))
     */

#pragma omp parallel for 
    for( int i = 0; i < 10; i++ ){
        PyObject *ret = PyObject_CallMethod( mult_obj, "do_multiply", "i", i );
        if( !ret ){
            printf("Cannot call 'do_multiply'\n");
            continue;
        }
        printf("The value calculated in Python was: %3d\n", (int) PyLong_AsLong(ret));
        Py_DECREF(ret);
    }

    Py_DECREF(mult_obj);
    Py_Finalize();

    return 0;
} 

Compile with:
gcc -std=gnu99 -O3 -Wall -Wextra -fopenmp `pkg-config --cflags --libs python3` -lgomp main.c -o main

Then you need the python code:

# multiplier.py
class multiplier(object):
    def __init__(self, factor):
        self.factor = factor
    def do_multiply(self, x):
        return self.factor * x

First question: Does my C code leak memory? Valgrind says it does, but the memory footprint of the executable is stable while looping?

Second and most important question: When I run this code it sometimes segementation faults, and sometimes some threads run normal and some other threads says "Cannot call 'do_multiply'". Sometimes I get the message: Fatal Python error: GC object already tracked. And some times it even runs normally...
I understand there is some kind of race condition here, where python tries to refer to some memory that has been already released. But how can I avoid this? What am I doing wrong? (or, less likely, is this a bug?)

Maybe needless to say, but the code works when compiled w/o OpenMP.

Using Python 3.5.1

Thanks,
-Øystein

[toc] | [next] | [standalone]

#108571

From	Sturla Molden <sturla.molden@gmail.com>
Date	2016-05-13 00:04 +0000
Message-ID	<mailman.618.1463097878.32212.python-list@python.org>
In reply to	#108566

<oysteijo@gmail.com> wrote:

> Second and most important question: When I run this code it sometimes
> segementation faults, and sometimes some threads run normal and some
> other threads says "Cannot call 'do_multiply'". Sometimes I get the
> message: Fatal Python error: GC object already tracked. And some times it
> even runs normally...
> I understand there is some kind of race condition here, where python
> tries to refer to some memory that has been already released. But how can
> I avoid this? What am I doing wrong? (or, less likely, is this a bug?)


You must own the GIL before you can safely use the Python C API, object
creation and refcounting in particular. Use the "Simplified GIL API" to
grab the GIL and release it when you are done.

[toc] | [prev] | [next] | [standalone]

#108599

From	Øystein Schønning-Johansen <oysteijo@gmail.com>
Date	2016-05-13 09:22 -0700
Message-ID	<91ea4bb0-2d34-4a27-8b18-640c79712f64@googlegroups.com>
In reply to	#108571

On Friday, May 13, 2016 at 2:04:53 AM UTC+2, Sturla Molden wrote:
> You must own the GIL before you can safely use the Python C API, object
> creation and refcounting in particular. Use the "Simplified GIL API" to
> grab the GIL and release it when you are done.

I've now read about the GIL and it looks like I am in deep problems. 

I've added the GILState lock to the threaded loop like this:

#pragma omp parallel for
    for( int i = 0; i < 10; i++ ){
        PyGILState_STATE gstate;
        gstate = PyGILState_Ensure();
        PyObject *ret = PyObject_CallMethod( mult_obj, "do_multiply", "i", i );
        if( !ret ){
            printf("Cannot call 'do_multiply'\n");
            continue;
        }
        printf("The value calculated in Python was: %3d\n", (int) PyLong_AsLong(ret));
        Py_DECREF(ret);
        PyGILState_Release(gstate);
    }

.... but still no success. Have I done it right?

regs,
-Øystein

[toc] | [prev] | [next] | [standalone]

#108607

From	MRAB <python@mrabarnett.plus.com>
Date	2016-05-13 18:12 +0100
Message-ID	<mailman.637.1463159537.32212.python-list@python.org>
In reply to	#108599

On 2016-05-13 17:22, Øystein Schønning-Johansen wrote:
> On Friday, May 13, 2016 at 2:04:53 AM UTC+2, Sturla Molden wrote:
>> You must own the GIL before you can safely use the Python C API, object
>> creation and refcounting in particular. Use the "Simplified GIL API" to
>> grab the GIL and release it when you are done.
>
> I've now read about the GIL and it looks like I am in deep problems.
>
> I've added the GILState lock to the threaded loop like this:
>
> #pragma omp parallel for
>     for( int i = 0; i < 10; i++ ){
>         PyGILState_STATE gstate;
>         gstate = PyGILState_Ensure();
>         PyObject *ret = PyObject_CallMethod( mult_obj, "do_multiply", "i", i );
>         if( !ret ){
>             printf("Cannot call 'do_multiply'\n");
>             continue;
>         }
>         printf("The value calculated in Python was: %3d\n", (int) PyLong_AsLong(ret));
>         Py_DECREF(ret);
>         PyGILState_Release(gstate);
>     }
>
> .... but still no success. Have I done it right?
>
> regs,
> -Øystein
>
Every PyGILState_Ensure call must be matched with a PyGILState_Release 
call. The way it's currently written, it won't call PyGILState_Release 
if ret is NULL.

However, I don't think you'll gain much here because you can gain from 
multi-threading only if the threads can run in parallel. You need to 
hold the GIL while making Python calls, and only 1 thread can hold the 
GIL at any time.

[toc] | [prev] | [next] | [standalone]

#108617

From	Øystein Schønning-Johansen <oysteijo@gmail.com>
Date	2016-05-13 13:16 -0700
Message-ID	<2a0461ad-fd4c-4ffa-9a4a-b1bf2a21b6cf@googlegroups.com>
In reply to	#108607

On Friday, May 13, 2016 at 7:12:33 PM UTC+2, MRAB wrote:
> Every PyGILState_Ensure call must be matched with a PyGILState_Release 
> call. The way it's currently written, it won't call PyGILState_Release 
> if ret is NULL.

Yeah, that's tiny bug, however it is not the main problem...

> However, I don't think you'll gain much here because you can gain from 
> multi-threading only if the threads can run in parallel. You need to 
> hold the GIL while making Python calls, and only 1 thread can hold the 
> GIL at any time.

In addition to the fact that it still does not work, what you here point out, is the main problem. I actually think I'm giving up for now. Maybe I'm able to solve this later.... (when python does not use a GIL anymore...?)

Thanks to Sturla and MRAB anyway!
-Øystein

[toc] | [prev] | [next] | [standalone]

#108629

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2016-05-13 20:04 -0400
Message-ID	<mailman.656.1463184279.32212.python-list@python.org>
In reply to	#108617

On Fri, 13 May 2016 13:16:58 -0700 (PDT), Øystein Schønning-Johansen
<oysteijo@gmail.com> declaimed the following:

>In addition to the fact that it still does not work, what you here point out, is the main problem. I actually think I'm giving up for now. Maybe I'm able to solve this later.... (when python does not use a GIL anymore...?)
>
	It's been tried -- but the non-GIL implementations tend to be slower at
everything else.

	I don't think Jython uses a GIL... (it's only the baseline C-based
Python [aka CPython -- not be confused with Cython] that uses the GIL as I
recall).


-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

#108630

From	Paul Rubin <no.email@nospam.invalid>
Date	2016-05-13 19:09 -0700
Message-ID	<87inyh2xw3.fsf@jester.gateway.pace.com>
In reply to	#108629

Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
> 	It's been tried -- but the non-GIL implementations tend to be
> slower at everything else.

Has Micropython been compared?  CPython needs the GIL because of its
frequent twiddling of reference counts.  Without the GIL, multi-threaded
CPython would have to acquire and release a lock whenever it touched a
refcount, which slows things down badly.

MicroPython uses a tracing garbage collector instead of refcounts, so
there's no issue of having to lock refcounts all the time.  It's fairly
common in such systems to stop all the user threads during GC, but they
can happily run in parallel the rest of the time.  

Come to think of it, I don't know if MicroPython currently supports
threads at all!  But its implementation style (i.e. no refcounts) is
more parallelism-friendly than CPython's.

[toc] | [prev] | [standalone]

csiph-web

Calling python from C with OpenMP

Contents

#108566 — Calling python from C with OpenMP

#108571

#108599

#108607

#108617

#108629

#108630