Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #7399 > unrolled thread

parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool

Started byHseu-Ming Chen <hseuming@gmail.com>
First post2011-06-10 16:23 -0400
Last post2011-06-12 22:00 +0000
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool Hseu-Ming Chen <hseuming@gmail.com> - 2011-06-10 16:23 -0400
    Re: parallel computations: subprocess.Popen(...).communicate()[0] does        not work with multiprocessing.Pool Chris Torek <nospam@torek.net> - 2011-06-12 22:00 +0000

#7399 — parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool

FromHseu-Ming Chen <hseuming@gmail.com>
Date2011-06-10 16:23 -0400
Subjectparallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool
Message-ID<mailman.105.1307737402.11593.python-list@python.org>
Hi,
I am having an issue when making a shell call from within a
multiprocessing.Process().  Here is the story: i tried to parallelize
the computations in 800-ish Matlab scripts and then save the results
to MySQL.   The non-parallel/serial version has been running fine for
about 2 years.  However, in the parallel version via multiprocessing
that i'm working on, it appears that the Matlab scripts have never
been kicked off and nothing happened with subprocess.Popen.  The debug
printing below does not show up either.

Moreover, even if i replace the Matlab invocation with some trivial
"sed" call, still nothing happens.

Is it possible that the Python interpreter i'm using (version 2.6
released on Oct. 1, 2008) is too old?   Nevertheless, i would like to
make sure the basic framework i've now is not blatantly wrong.

Below is a skeleton of my Python program:

----------------------------------------------
import subprocess
from multiprocessing import Pool

def worker(DBrow,config):
   #  run one Matlab script
   cmd1 = "/usr/local/bin/matlab  ...  myMatlab.1.m"
   subprocess.Popen([cmd1], shell=True, stdout=subprocess.PIPE).communicate()[0]
   print "this does not get printed"

   cmd2 = "sed ..."
   print subprocess.Popen(cmd2, shell=True,
stdout=subprocess.PIPE).communicate()[0]
   print "this does not get printed either"
   sys.stdout.flush()

###   main program below
......
# kick off parallel processing
pool = Pool()
for DBrow in DBrows: pool.apply_async(worker,(DBrow,config))
pool.close()
pool.join()
......
----------------------------------------------

Furthermore, i also tried adding the following:
  multiprocessing.current_process().curr_proc.daemon = False
at the beginning of the "worker" function above but to no avail.

Any help would really be appreciated.

[toc] | [next] | [standalone]


#7497 — Re: parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool

FromChris Torek <nospam@torek.net>
Date2011-06-12 22:00 +0000
SubjectRe: parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool
Message-ID<it3cuq01l04@news6.newsguy.com>
In reply to#7399
In article <mailman.105.1307737402.11593.python-list@python.org>
Hseu-Ming Chen  <hseuming@gmail.com> wrote:
>I am having an issue when making a shell call from within a
>multiprocessing.Process().  Here is the story: i tried to parallelize
>the computations in 800-ish Matlab scripts and then save the results
>to MySQL.   The non-parallel/serial version has been running fine for
>about 2 years.  However, in the parallel version via multiprocessing
>that i'm working on, it appears that the Matlab scripts have never
>been kicked off and nothing happened with subprocess.Popen.  The debug
>printing below does not show up either.

I obviously do not have your code, and have not even tried this as
an experiment in a simplified environment, but:

>import subprocess
>from multiprocessing import Pool
>
>def worker(DBrow,config):
>   #  run one Matlab script
>   cmd1 = "/usr/local/bin/matlab  ...  myMatlab.1.m"
>   subprocess.Popen([cmd1], shell=True, stdout=subprocess.PIPE).communicate()[0]
>   print "this does not get printed"
 ...
># kick off parallel processing
>pool = Pool()
>for DBrow in DBrows: pool.apply_async(worker,(DBrow,config))
>pool.close()
>pool.join()

The multiprocessing code makes use of pipes to communicate between
the various subprocesses it creates.  I suspect these "extra" pipes
are interfering with your subprocesses, when pool.close() waits
for the Matlab script to do something with its copy of the pipes.
To make the subprocess module close them -- so that Matlab does
not have them in the first place and hence pool.close() cannot get
stuck there -- add "close_fds=True" to the Popen() call.

There could still be issues with competing wait() and/or waitpid()
calls (assuming you are using a Unix-like system, or whatever the
equivalent is for Windows) "eating" the wrong subprocess completion
notifications, but that one is harder to solve in general :-) so
if close_fds fixes things, it was just the pipes.  If close_fds
does not fix things, you will probably need to defer the pool.close()
step until after all the subprocesses complete.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web