Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!hq-usenetpeers.eweka.nl!81.171.88.15.MISMATCH!eweka.nl!lightspeed.eweka.nl!194.134.4.91.MISMATCH!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'doug': 0.05; 'result,': 0.05; 'subject:skip:s 10': 0.05; 'bash': 0.07; 'wrapper': 0.07; 'scripts': 0.09; 'url:blog': 0.09; 'python': 0.09; 'mkdir': 0.09; 'specifying': 0.09; 'subject:skip:m 10': 0.09; 'result.': 0.15; '-quiet': 0.16; '2.7.3': 0.16; 'appreciated!': 0.16; 'call)': 0.16; 'core.': 0.16; 'flushed': 0.16; 'paths.': 0.16; 'skip:/ 50': 0.16; 'somehow,': 0.16; 'subject:fails': 0.16; 'subject:program': 0.16; 'url:html)': 0.16; 'variables': 0.17; 'module': 0.19; 'trying': 0.21; 'error.': 0.21; 'own.': 0.22; 'cheers,': 0.23; 'example': 0.23; 'work.': 0.23; 'this:': 0.23; 'seems': 0.23; 'external': 0.24; 'command': 0.24; 'script': 0.24; 'tried': 0.25; 'developers': 0.26; 'looks': 0.26; 'implemented': 0.27; 'core': 0.27; 'i.e.': 0.27; 'replace': 0.27; 'message-id:@mail.gmail.com': 0.27; 'fine': 0.28; 'run': 0.28; 'environment': 0.29; 'consumers,': 0.29; 'unique,': 0.29; 'error': 0.30; 'gets': 0.32; 'running': 0.32; 'skip:- 10': 0.32; 'skip:s 30': 0.33; 'correctly.': 0.33; 'instances': 0.33; 'skip:l 40': 0.33; 'ubuntu': 0.33; 'problem': 0.33; 'to:addr:python-list': 0.33; 'skip:- 20': 0.34; 'received:google.com': 0.34; 'data,': 0.35; 'sequence': 0.35; 'expected': 0.35; "won't": 0.35; 'really': 0.36; 'created': 0.36; 'but': 0.36; 'received:74.125': 0.36; '(i.e.': 0.36; 'should': 0.36; 'correctly': 0.37; 'execute': 0.37; 'skip:t 40': 0.37; 'data': 0.37; 'files': 0.38; 'some': 0.38; 'to:addr:python.org': 0.39; 'called': 0.39; 'header:Received:5': 0.40; 'help': 0.40; 'export': 0.62; 'between': 0.63; 'hints': 0.65; 'skip:$ 10': 0.66; 'skip:c 50': 0.66; 'increase': 0.72; 'consumer,': 0.84; 'handing': 0.84; 'pipeline': 0.84; 'assured': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=GyL4YSjqzots1K99cvqLkIygTzeDk1lYvJp6C/Flalg=; b=m+QYK6+JyfN3EIIOTwd44QrO9OTWtGwVmzIVyqBEJZb0d6LX7cQ9qegDxkfxJs2fNJ PvdgUfX0qIj/1qH1Tsj7YdHfDsv2iFrvtytmO1oEv2mUiLIJx0oxhhYWBuEEmmhRkGqO gTCbjZJgtTc8Xw7AyrvGknsDQXB6ugHrWzahCymOHu9QodvtZz36PRO1jtwIMOgUsCkb YvSjAtjmgoV5JZdSF9kM7uhjkmcOPAZRTRWgsKVeBaPMl267AH2hB2nH0+hphpqDHtZx HIiQmCXAvOSNpuUcGWagIuNUcpB8OKp6CGzmCt1Sb+upDts/vLyj3h5oMz6ZhV7tmcms 3dMw== MIME-Version: 1.0 Date: Wed, 9 Jan 2013 23:08:33 -0500 Subject: subprocess.Popen and multiprocessing fails to execute external program From: Niklas Berliner To: python-list@python.org Content-Type: multipart/alternative; boundary=047d7b34413c1a626004d2e755b8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 110 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1357794609 news.xs4all.nl 6939 [2001:888:2000:d::a6]:43911 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:36561 --047d7b34413c1a626004d2e755b8 Content-Type: text/plain; charset=ISO-8859-1 I have a pipline that involves processing some data, handing the data to an external program (t_coffee used for sequence alignments in bioinformatics), and postprocessing the result. Since I have a lot of data, I need to run my pipeline in parallel which I implemented using the multiprocessing module following Doug Hellmanns blog ( http://blog.doughellmann.com/2009/04/pymotw-multiprocessing-part-1.html). My pipeline works perfectly fine when I run it with the multiprocessing implementation and one consumer, i.e. on one core. If I increase the number of consumers, i.e. that multiple instances of my pipeline run in parallel the external program fails with a core dump. To call the external programm I let python write a bash wrapper script that is called by subprocess.Popen(system_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) result, error = childProcess.communicate() rc = childProcess.returncode (I also tried shell=False and calling the program directly specifying the env for the call) To avoid conflict between the external program each program call gets a flushed environment and the important environment variables are set to unique, existing paths. An example looks like this: #!/bin/bash env -i export HOME_4_TCOFFEE="/home/niklas/tcoffee/parallel/99-1-Consumer-2/" export CACHE_4_TCOFFEE="$HOME_4_TCOFFEE/cache/" export TMP_4_TCOFFEE="$HOME_4_TCOFFEE/tmp/" export LOCKDIR_4_TCOFFEE="$HOME_4_TCOFFEE/lock/" mkdir -p $CACHE_4_TCOFFEE mkdir -p $TMP_4_TCOFFEE mkdir -p $LOCKDIR_4_TCOFFEE t_coffee -mode expresso -seq /home/niklas/tcoffee/parallel/Consumer-2Q9FHL4_ARATH -blast_server=LOCAL -pdb_db=pdbaa -outorder=input -output fasta_aln -quiet -no_warning -outfile=/tmp/tmpm3mViZ If I replace the t_coffee command by some simple 'touch I--was-here' the files are created as expected and no error is produced. The developers of the external program assured me that running their program in parallel should not be a problem if the environment variables are set correctly. If a take the exact same bash scripts that are generated by python and that failed when trying to run them in parallel through python and execute batches of them manually using a for loop in multiple terminals (i.e. in parallel) they don't produce an error. I am really puzzled and stuck. Python seems to work correctly on its own and the external program seems to work correctly on its own. But somehow, when combined, they won't work. Any help and hints would be really appreciated! I need that to work. I am using Ubuntu 12.04 with python 2.7.3 Cheers, Niklas --047d7b34413c1a626004d2e755b8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have a pipline that involves processing some data, handing the data to an= external program (t_coffee used for sequence alignments in bioinformatics)= , and postprocessing the result. Since I have a lot of data, I need to run = my pipeline in parallel which I implemented using the multiprocessing modul= e following Doug Hellmanns blog (http://blog.doughellmann.com/2009= /04/pymotw-multiprocessing-part-1.html).

My pipeline works perfectly fine when I run it with the multiprocessing= implementation and one consumer, i.e. on one core. If I increase the numbe= r of consumers, i.e. that multiple instances of my pipeline run in parallel= the external program fails with a core dump.

To call the external programm I let python write a bash wrapper script = that is called by
subprocess.Popen(syste= m_command, stdout=3Dsubprocess.PIPE, stderr=3Dsubprocess.PIPE, shell=3DTrue= )
result, error =3D childProcess.communicate()
rc =3D childProcess.returnc= ode
(I also tried shell=3DFalse and calling the program directly s= pecifying the env for the call)

To avoid conflict between the extern= al program each program call gets a flushed environment and the important e= nvironment variables are set to unique, existing paths. An example looks li= ke this:
#!/bin/bash
env -i
export HOME_4_TCOF= FEE=3D"/home/niklas/tcoffee/parallel/99-1-Consumer-2/"
export = CACHE_4_TCOFFEE=3D"$HOME_4_TCOFFEE/cache/"
export TMP_4_TCOFFE= E=3D"$HOME_4_TCOFFEE/tmp/"
export LOCKDIR_4_TCOFFEE=3D"$HOME_4_TCOFFEE/lock/"
mkdir -p $C= ACHE_4_TCOFFEE
mkdir -p $TMP_4_TCOFFEE
mkdir -p $LOCKDIR_4_TCOFFEE
t_coffee -mode expresso -seq /home/niklas/tcoffee/parallel/Consumer-2Q= 9FHL4_ARATH -blast_server=3DLOCAL -pdb_db=3Dpdbaa -outorder=3Dinput -output= fasta_aln -quiet -no_warning -outfile=3D/tmp/tmpm3mViZ
If I replace the t_coffee command by some simple 'touch I-<uni= que ID>-was-here'=A0 the files are created as expected and no error = is produced.
The developers of the external program assured me that runn= ing their program in parallel should not be a problem if the environment va= riables are set correctly. If a take the exact same bash scripts that are g= enerated by python and that failed when trying to run them in parallel thro= ugh python and execute batches of them manually using a for loop in multipl= e terminals (i.e. in parallel) they don't produce an error.


I am really puzzled and stuck. Python seems to work correctly on it= s own and the external program seems to work correctly on its own. But some= how, when combined, they won't work.
Any help and hints would be rea= lly appreciated! I need that to work.

I am using Ubuntu 12.04 with python 2.7.3

Cheers,
Niklas
--047d7b34413c1a626004d2e755b8--