Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #84917 > unrolled thread

Re: multiprocessing module backport from 3 to 2.7 - spawn feature

Started bySturla Molden <sturla.molden@gmail.com>
First post2015-01-30 21:11 +0000
Last post2015-01-31 03:01 +0100
Articles 3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: multiprocessing module backport from 3 to 2.7 - spawn feature Sturla Molden <sturla.molden@gmail.com> - 2015-01-30 21:11 +0000
    Re: multiprocessing module backport from 3 to 2.7 - spawn feature Marko Rauhamaa <marko@pacujo.net> - 2015-01-31 00:25 +0200
      Re: multiprocessing module backport from 3 to 2.7 - spawn feature Sturla Molden <sturla.molden@gmail.com> - 2015-01-31 03:01 +0100

#84917 — Re: multiprocessing module backport from 3 to 2.7 - spawn feature

FromSturla Molden <sturla.molden@gmail.com>
Date2015-01-30 21:11 +0000
SubjectRe: multiprocessing module backport from 3 to 2.7 - spawn feature
Message-ID<mailman.18320.1422652292.18130.python-list@python.org>
Skip Montanaro <skip.montanaro@gmail.com> wrote:

> Can you explain what you see as the difference between "spawn" and "fork"
> in this context? Are you using Windows perhaps? I don't know anything
> obviously different between the two terms on Unix systems.

spawn is fork + exec.

Only a handful of POSIX functions are required to be "fork safe", i.e.
callable on each side of a fork without an exec. 

An example of an API which is not safe to use on both sides of a fork is
Apple's GCD. The default builds of NumPy and SciPy depend on it on OSX
because it is used in Accelerate Framework. You can thus get problems if
you use numpy.dot in a process started with multiprocessing. What will
happen is that the call to numpy.dot never returns, given that you called
any BLAS or LAPACK function at least once before the instance of
multiprocessing.Process was started. This is not a bug in NumPy or in
Accelerate Framework, it is a bug in multiprocessing because it assumes
that BLAS is fork safe. The correct way of doing this is to start processes
with spawn (fork + exec), which multiprocessing does on Python 3.4. 

Sturla

[toc] | [next] | [standalone]


#84921

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-01-31 00:25 +0200
Message-ID<87h9v7hpo1.fsf@elektro.pacujo.net>
In reply to#84917
Sturla Molden <sturla.molden@gmail.com>:

> Only a handful of POSIX functions are required to be "fork safe", i.e.
> callable on each side of a fork without an exec.

That is a pretty surprising statement. Forking without an exec is a
routine way to do multiprocessing.

I understand there are things to consider, but all system calls are
available and safe.


Marko

[toc] | [prev] | [next] | [standalone]


#84929

FromSturla Molden <sturla.molden@gmail.com>
Date2015-01-31 03:01 +0100
Message-ID<mailman.18326.1422669687.18130.python-list@python.org>
In reply to#84921
On 30/01/15 23:25, Marko Rauhamaa wrote:
> Sturla Molden <sturla.molden@gmail.com>:
>
>> Only a handful of POSIX functions are required to be "fork safe", i.e.
>> callable on each side of a fork without an exec.
>
> That is a pretty surprising statement. Forking without an exec is a
> routine way to do multiprocessing.
>
> I understand there are things to consider, but all system calls are
> available and safe.

POSIX says this:


- No asynchronous input or asynchronous output operations shall be 
inherited by the child process.

- A process shall be created with a single thread. If a multi-threaded 
process calls fork(), the new process shall contain a replica of the 
calling thread and its entire address space, possibly including the 
states of mutexes and other resources. Consequently, to avoid errors, 
the child process may only execute async-signal-safe operations until 
such time as one of the exec functions is called.

- Fork handlers may be established by means of the pthread_atfork() 
function in order to maintain application invariants across fork() calls.

- When the application calls fork() from a signal handler and any of the 
fork handlers registered by pthread_atfork() calls a function that is 
not asynch-signal-safe, the behavior is undefined.


Hence you must be very careful which functions you use after calling 
forking before you have called exec. Generally never use an API above 
POSIX, e.g. BLAS or Apple's CoreFoundation.



Apple said this when the problem with multiprocessing and Accelerate 
Framework first was discovered:


---------- Forwarded message ----------
From:  <devbugs@apple.com>
Date: 2012/8/2
Subject: Bug ID 11036478: Segfault when calling dgemm with Accelerate
/ GCD after in a forked process
To: ******@******


Hi Olivier,

Thank you for contacting us regarding Bug ID# 11036478.

Thank you for filing this bug report.

This usage of fork() is not supported on our platform.

For API outside of POSIX, including GCD and technologies like
Accelerate, we do not support usage on both sides of a fork(). For
this reason among others, use of fork() without exec is discouraged in
general in processes that use layers above POSIX.

We recommend that you either restrict usage of blas to the parent or
the child process but not both, or that you switch to using GCD or
pthreads rather than forking to create parallelism.



Also see this:

http://bugs.python.org/issue8713

https://mail.python.org/pipermail/python-ideas/2012-November/017930.html





Sturla

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web