Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #7399 > unrolled thread
| Started by | Hseu-Ming Chen <hseuming@gmail.com> |
|---|---|
| First post | 2011-06-10 16:23 -0400 |
| Last post | 2011-06-12 22:00 +0000 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool Hseu-Ming Chen <hseuming@gmail.com> - 2011-06-10 16:23 -0400
Re: parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool Chris Torek <nospam@torek.net> - 2011-06-12 22:00 +0000
| From | Hseu-Ming Chen <hseuming@gmail.com> |
|---|---|
| Date | 2011-06-10 16:23 -0400 |
| Subject | parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool |
| Message-ID | <mailman.105.1307737402.11593.python-list@python.org> |
Hi, I am having an issue when making a shell call from within a multiprocessing.Process(). Here is the story: i tried to parallelize the computations in 800-ish Matlab scripts and then save the results to MySQL. The non-parallel/serial version has been running fine for about 2 years. However, in the parallel version via multiprocessing that i'm working on, it appears that the Matlab scripts have never been kicked off and nothing happened with subprocess.Popen. The debug printing below does not show up either. Moreover, even if i replace the Matlab invocation with some trivial "sed" call, still nothing happens. Is it possible that the Python interpreter i'm using (version 2.6 released on Oct. 1, 2008) is too old? Nevertheless, i would like to make sure the basic framework i've now is not blatantly wrong. Below is a skeleton of my Python program: ---------------------------------------------- import subprocess from multiprocessing import Pool def worker(DBrow,config): # run one Matlab script cmd1 = "/usr/local/bin/matlab ... myMatlab.1.m" subprocess.Popen([cmd1], shell=True, stdout=subprocess.PIPE).communicate()[0] print "this does not get printed" cmd2 = "sed ..." print subprocess.Popen(cmd2, shell=True, stdout=subprocess.PIPE).communicate()[0] print "this does not get printed either" sys.stdout.flush() ### main program below ...... # kick off parallel processing pool = Pool() for DBrow in DBrows: pool.apply_async(worker,(DBrow,config)) pool.close() pool.join() ...... ---------------------------------------------- Furthermore, i also tried adding the following: multiprocessing.current_process().curr_proc.daemon = False at the beginning of the "worker" function above but to no avail. Any help would really be appreciated.
[toc] | [next] | [standalone]
| From | Chris Torek <nospam@torek.net> |
|---|---|
| Date | 2011-06-12 22:00 +0000 |
| Subject | Re: parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool |
| Message-ID | <it3cuq01l04@news6.newsguy.com> |
| In reply to | #7399 |
In article <mailman.105.1307737402.11593.python-list@python.org> Hseu-Ming Chen <hseuming@gmail.com> wrote: >I am having an issue when making a shell call from within a >multiprocessing.Process(). Here is the story: i tried to parallelize >the computations in 800-ish Matlab scripts and then save the results >to MySQL. The non-parallel/serial version has been running fine for >about 2 years. However, in the parallel version via multiprocessing >that i'm working on, it appears that the Matlab scripts have never >been kicked off and nothing happened with subprocess.Popen. The debug >printing below does not show up either. I obviously do not have your code, and have not even tried this as an experiment in a simplified environment, but: >import subprocess >from multiprocessing import Pool > >def worker(DBrow,config): > # run one Matlab script > cmd1 = "/usr/local/bin/matlab ... myMatlab.1.m" > subprocess.Popen([cmd1], shell=True, stdout=subprocess.PIPE).communicate()[0] > print "this does not get printed" ... ># kick off parallel processing >pool = Pool() >for DBrow in DBrows: pool.apply_async(worker,(DBrow,config)) >pool.close() >pool.join() The multiprocessing code makes use of pipes to communicate between the various subprocesses it creates. I suspect these "extra" pipes are interfering with your subprocesses, when pool.close() waits for the Matlab script to do something with its copy of the pipes. To make the subprocess module close them -- so that Matlab does not have them in the first place and hence pool.close() cannot get stuck there -- add "close_fds=True" to the Popen() call. There could still be issues with competing wait() and/or waitpid() calls (assuming you are using a Unix-like system, or whatever the equivalent is for Windows) "eating" the wrong subprocess completion notifications, but that one is harder to solve in general :-) so if close_fds fixes things, it was just the pipes. If close_fds does not fix things, you will probably need to defer the pool.close() step until after all the subprocesses complete. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web