Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #45894 > unrolled thread

Piping processes works with 'shell = True' but not otherwise.

Started byLuca Cerone <luca.cerone@gmail.com>
First post2013-05-24 07:04 -0700
Last post2013-05-31 11:52 +0200
Articles 13 — 7 participants

Back to article view | Back to comp.lang.python


Contents

  Piping processes works with 'shell = True' but not otherwise. Luca Cerone <luca.cerone@gmail.com> - 2013-05-24 07:04 -0700
    Re: Piping processes works with 'shell = True' but not otherwise. Luca Cerone <luca.cerone@gmail.com> - 2013-05-26 03:31 -0700
    Re: Piping processes works with 'shell = True' but not otherwise. Chris Rebert <clp2@rebertia.com> - 2013-05-26 14:05 -0700
      Re: Piping processes works with 'shell = True' but not otherwise. Luca Cerone <luca.cerone@gmail.com> - 2013-05-26 16:58 -0700
        RE: Piping processes works with 'shell = True' but not otherwise. Carlos Nepomuceno <carlosnepomuceno@outlook.com> - 2013-05-27 03:14 +0300
          Re: Piping processes works with 'shell = True' but not otherwise. Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2013-05-29 19:39 +0200
            RE: Piping processes works with 'shell = True' but not otherwise. Carlos Nepomuceno <carlosnepomuceno@outlook.com> - 2013-05-29 22:31 +0300
            Re: Piping processes works with 'shell = True' but not otherwise. Cameron Simpson <cs@zip.com.au> - 2013-05-30 08:18 +1000
        Re: Piping processes works with 'shell = True' but not otherwise. Chris Angelico <rosuav@gmail.com> - 2013-05-27 18:28 +1000
          Re: Piping processes works with 'shell = True' but not otherwise. Luca Cerone <luca.cerone@gmail.com> - 2013-05-27 04:33 -0700
        Re: Piping processes works with 'shell = True' but not otherwise. Chris Rebert <clp2@rebertia.com> - 2013-05-29 10:17 -0700
          Re: Piping processes works with 'shell = True' but not otherwise. Luca Cerone <luca.cerone@gmail.com> - 2013-05-31 02:28 -0700
            Re: Piping processes works with 'shell = True' but not otherwise. Peter Otten <__peter__@web.de> - 2013-05-31 11:52 +0200

#45894 — Piping processes works with 'shell = True' but not otherwise.

FromLuca Cerone <luca.cerone@gmail.com>
Date2013-05-24 07:04 -0700
SubjectPiping processes works with 'shell = True' but not otherwise.
Message-ID<062f557e-8e1a-4efb-9178-7d685b47a834@googlegroups.com>
Hi everybody, 
I am new to the group (and relatively new to Python)
so I am sorry if this issues has been discussed (although searching for topics in the group I couldn't find a solution to my problem).

I am using Python 2.7.3 to analyse the output of two 3rd parties programs that can be launched in a linux shell as:

 program1 | program2

To do this I have written a function that pipes program1 and program2 (using subprocess.Popen) and the stdout of the subprocess, and a function that parses the output:

A basic example:

from subprocess import Popen, STDOUT, PIPE
def run():
  p1 = Popen(['program1'], stdout = PIPE, stderr = STDOUT)
  p2 = Popen(['program2'], stdin = p1.stdout, stdout = PIPE, stderr = STDOUT)
  p1.stdout.close()
  return p2.stdout


def parse(out):
  for row in out:
    print row
    #do something else with each line
  out.close()
  return parsed_output
   

# main block here

pout = run()

parsed = parse(pout)

#--- END OF PROGRAM ----#

I want to parse the output of 'program1 | program2' line by line because the output is very large.

When running the code above, occasionally some error occurs (IOERROR: [Errno 0]). However this error doesn't occur if I code the run() function as:

def run():
  p = Popen('program1 | program2', shell = True, stderr = STDOUT, stdout = PIPE)
  return p.stdout

I really can't understand why the first version causes errors, while the second one doesn't.

Can you please help me understanding what's the difference between the two cases? 

Thanks a lot in advance for the help,
Cheers, Luca

[toc] | [next] | [standalone]


#46058

FromLuca Cerone <luca.cerone@gmail.com>
Date2013-05-26 03:31 -0700
Message-ID<53d92a77-14ed-4999-90ac-e1c00f9889d5@googlegroups.com>
In reply to#45894
> 
> Can you please help me understanding what's the difference between the two cases? 
> 

Hi guys has some of you ideas on what is causing my issue?

[toc] | [prev] | [next] | [standalone]


#46122

FromChris Rebert <clp2@rebertia.com>
Date2013-05-26 14:05 -0700
Message-ID<mailman.2202.1369602328.3114.python-list@python.org>
In reply to#45894

[Multipart message — attachments visible in raw view] — view raw

On May 24, 2013 7:06 AM, "Luca Cerone" <luca.cerone@gmail.com> wrote:
>
> Hi everybody,
> I am new to the group (and relatively new to Python)
> so I am sorry if this issues has been discussed (although searching for
topics in the group I couldn't find a solution to my problem).
>
> I am using Python 2.7.3 to analyse the output of two 3rd parties programs
that can be launched in a linux shell as:
>
>  program1 | program2
>
> To do this I have written a function that pipes program1 and program2
(using subprocess.Popen) and the stdout of the subprocess, and a function
that parses the output:
>
> A basic example:
>
> from subprocess import Popen, STDOUT, PIPE
> def run():
>   p1 = Popen(['program1'], stdout = PIPE, stderr = STDOUT)
>   p2 = Popen(['program2'], stdin = p1.stdout, stdout = PIPE, stderr =
STDOUT)

Could you provide the *actual* commands you're using, rather than the
generic "program1" and "program2" placeholders? It's *very* common for
people to get the tokenization of a command line wrong (see the Note box in
http://docs.python.org/2/library/subprocess.html#subprocess.Popen for some
relevant advice).

>   p1.stdout.close()
>   return p2.stdout
>
>
> def parse(out):
>   for row in out:
>     print row
>     #do something else with each line
>   out.close()
>   return parsed_output
>
>
> # main block here
>
> pout = run()
>
> parsed = parse(pout)
>
> #--- END OF PROGRAM ----#
>
> I want to parse the output of 'program1 | program2' line by line because
the output is very large.
>
> When running the code above, occasionally some error occurs (IOERROR:
[Errno 0]).

Could you provide the full & complete error message and exception traceback?

> However this error doesn't occur if I code the run() function as:
>
> def run():
>   p = Popen('program1 | program2', shell = True, stderr = STDOUT, stdout
= PIPE)
>   return p.stdout
>
> I really can't understand why the first version causes errors, while the
second one doesn't.
>
> Can you please help me understanding what's the difference between the
two cases?

One obvious difference between the 2 approaches is that the shell doesn't
redirect the stderr streams of the programs, whereas you /are/ redirecting
the stderrs to stdout in the non-shell version of your code. But this is
unlikely to be causing the error you're currently seeing.

You may also want to provide /dev/null as p1's stdin, out of an abundance
of caution.

Lastly, you may want to consider using a wrapper library such as
http://plumbum.readthedocs.org/en/latest/ , which makes it easier to do
pipelining and other such "fancy" things with subprocesses, while still
avoiding the many perils of the shell.

Cheers,
Chris
--
Be patient; it's Memorial Day weekend.

[toc] | [prev] | [next] | [standalone]


#46145

FromLuca Cerone <luca.cerone@gmail.com>
Date2013-05-26 16:58 -0700
Message-ID<08aa32b7-0fb7-4665-83fa-5cc7ec36f898@googlegroups.com>
In reply to#46122
> Could you provide the *actual* commands you're using, rather than the generic "program1" and "program2" placeholders? It's *very* common for people to get the tokenization of a command line wrong (see the Note box in http://docs.python.org/2/library/subprocess.html#subprocess.Popen for some relevant advice).
> 
Hi Chris, first of all thanks for the help. Unfortunately I can't provide the actual commands because are tools that are not publicly available.
I think I get the tokenization right, though.. the problem is not that the programs don't run.. it is just that sometimes I get that error..

Just to be clear I run the process like:

p = subprocess.Popen(['program1','--opt1','val1',...'--optn','valn'], ... the rest)

which I think is the right way to pass arguments (it works fine for other commands)..

> 
> Could you provide the full & complete error message and exception traceback?
> 
yes, as soon as I get to my work laptop..

> 
> One obvious difference between the 2 approaches is that the shell doesn't redirect the stderr streams of the programs, whereas you /are/ redirecting the stderrs to stdout in the non-shell version of your code. But this is unlikely to be causing the error you're currently seeing.
> 
> 
> You may also want to provide /dev/null as p1's stdin, out of an abundance of caution.
>

I tried to redirect the output to /dev/null using the Popen argument:
'stdin = os.path.devnull' (having imported os of course)..
But this seemed to cause even more troubles...
 
> Lastly, you may want to consider using a wrapper library such as http://plumbum.readthedocs.org/en/latest/ , which makes it easier to do pipelining and other such "fancy" things with subprocesses, while still avoiding the many perils of the shell.
> 
> 
Thanks, I didn't know this library, I'll give it a try.
Though I forgot to mention that I was using the subprocess module, because I want the code to be portable (even though for now if it works in Unix platform is OK).

Thanks a lot for your help,
Cheers,
Luca

[toc] | [prev] | [next] | [standalone]


#46146

FromCarlos Nepomuceno <carlosnepomuceno@outlook.com>
Date2013-05-27 03:14 +0300
Message-ID<mailman.2222.1369613672.3114.python-list@python.org>
In reply to#46145
pipes usually consumes disk storage at '/tmp'. Are you sure you have enough room on that filesystem? Make sure no other processes are competing against for that space. Just my 50c because I don't know what's causing Errno 0. I don't even know what are the possible causes of such error. Good luck!

----------------------------------------
> Date: Sun, 26 May 2013 16:58:57 -0700
> Subject: Re: Piping processes works with 'shell = True' but not otherwise.
> From: luca.cerone@gmail.com
> To: python-list@python.org
[...]
> I tried to redirect the output to /dev/null using the Popen argument:
> 'stdin = os.path.devnull' (having imported os of course)..
> But this seemed to cause even more troubles...
>
>> Lastly, you may want to consider using a wrapper library such as http://plumbum.readthedocs.org/en/latest/ , which makes it easier to do pipelining and other such "fancy" things with subprocesses, while still avoiding the many perils of the shell.
>>
>>
> Thanks, I didn't know this library, I'll give it a try.
> Though I forgot to mention that I was using the subprocess module, because I want the code to be portable (even though for now if it works in Unix platform is OK).
>
> Thanks a lot for your help,
> Cheers,
> Luca
> --
> http://mail.python.org/mailman/listinfo/python-list 		 	   		  

[toc] | [prev] | [next] | [standalone]


#46396

FromThomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de>
Date2013-05-29 19:39 +0200
Message-ID<ko5egs$knk$1@r01.glglgl.de>
In reply to#46146
Am 27.05.2013 02:14 schrieb Carlos Nepomuceno:
> pipes usually consumes disk storage at '/tmp'.

Good that my pipes don't know about that.

Why should that happen?


Thomas

[toc] | [prev] | [next] | [standalone]


#46409

FromCarlos Nepomuceno <carlosnepomuceno@outlook.com>
Date2013-05-29 22:31 +0300
Message-ID<mailman.2378.1369855873.3114.python-list@python.org>
In reply to#46396
----------------------------------------
> From: nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de
> Subject: Re: Piping processes works with 'shell = True' but not otherwise.
> Date: Wed, 29 May 2013 19:39:40 +0200
> To: python-list@python.org
>
> Am 27.05.2013 02:14 schrieb Carlos Nepomuceno:
>> pipes usually consumes disk storage at '/tmp'.
>
> Good that my pipes don't know about that.
>
> Why should that happen?
>
>
> Thomas
> --
> http://mail.python.org/mailman/listinfo/python-list

Ooops! My mistake! We've been using 'tee' when in debugging mode and I though that would apply to this case. Nevermind! 		 	   		  

[toc] | [prev] | [next] | [standalone]


#46419

FromCameron Simpson <cs@zip.com.au>
Date2013-05-30 08:18 +1000
Message-ID<mailman.2386.1369867287.3114.python-list@python.org>
In reply to#46396
On 29May2013 19:39, Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> wrote:
| Am 27.05.2013 02:14 schrieb Carlos Nepomuceno:
| >pipes usually consumes disk storage at '/tmp'.
| 
| Good that my pipes don't know about that.
| Why should that happen?

It probably doesn't on anything modern. On V7 UNIX at least there
was a kernel notion of the "pipe fs", where pipe storage existed;
usually /tmp; using small real (but unnamed) files is an easy way
to implement them, especially on systems where RAM is very small
and without a paging VM - for example, V7 UNIX ran on PDP-11s amongst
other things. And files need a filesystem.

But even then pipes are still small fixed length buffers; they don't
grow without bound as you might have inferred from the quoted
statement.

Cheers,
-- 
Cameron Simpson <cs@zip.com.au>

ERROR 155 - You can't do that.  - Data General S200 Fortran error code list

[toc] | [prev] | [next] | [standalone]


#46181

FromChris Angelico <rosuav@gmail.com>
Date2013-05-27 18:28 +1000
Message-ID<mailman.2244.1369643321.3114.python-list@python.org>
In reply to#46145
On Mon, May 27, 2013 at 9:58 AM, Luca Cerone <luca.cerone@gmail.com> wrote:
>> Could you provide the *actual* commands you're using, rather than the generic "program1" and "program2" placeholders? It's *very* common for people to get the tokenization of a command line wrong (see the Note box in http://docs.python.org/2/library/subprocess.html#subprocess.Popen for some relevant advice).
>>
> Hi Chris, first of all thanks for the help. Unfortunately I can't provide the actual commands because are tools that are not publicly available.
> I think I get the tokenization right, though.. the problem is not that the programs don't run.. it is just that sometimes I get that error..

Will it violate privacy / NDA to post the command line? Even if we
can't actually replicate your system, we may be able to see something
from the commands given.

ChrisA

[toc] | [prev] | [next] | [standalone]


#46190

FromLuca Cerone <luca.cerone@gmail.com>
Date2013-05-27 04:33 -0700
Message-ID<4fd43d0e-9c72-41d0-8c8d-dbc0d0a50022@googlegroups.com>
In reply to#46181
> 
> 
> Will it violate privacy / NDA to post the command line? Even if we
> 
> can't actually replicate your system, we may be able to see something
> 
> from the commands given.
> 
> 

Unfortunately yes..

[toc] | [prev] | [next] | [standalone]


#46395

FromChris Rebert <clp2@rebertia.com>
Date2013-05-29 10:17 -0700
Message-ID<mailman.2366.1369847860.3114.python-list@python.org>
In reply to#46145
On Sun, May 26, 2013 at 4:58 PM, Luca Cerone <luca.cerone@gmail.com> wrote:
<snip>
> Hi Chris, first of all thanks for the help. Unfortunately I can't provide the actual commands because are tools that are not publicly available.
> I think I get the tokenization right, though.. the problem is not that the programs don't run.. it is just that sometimes I get that error..
>
> Just to be clear I run the process like:
>
> p = subprocess.Popen(['program1','--opt1','val1',...'--optn','valn'], ... the rest)
>
> which I think is the right way to pass arguments (it works fine for other commands)..
<snip>
>> You may also want to provide /dev/null as p1's stdin, out of an abundance of caution.
>
> I tried to redirect the output to /dev/null using the Popen argument:
> 'stdin = os.path.devnull' (having imported os of course)..
> But this seemed to cause even more troubles...

That's because stdin/stdout/stderr take file descriptors or file
objects, not path strings.

Cheers,
Chris

[toc] | [prev] | [next] | [standalone]


#46585

FromLuca Cerone <luca.cerone@gmail.com>
Date2013-05-31 02:28 -0700
Message-ID<8c89fb10-63eb-446c-855b-f4c5406976ea@googlegroups.com>
In reply to#46395
> 
> That's because stdin/stdout/stderr take file descriptors or file
> 
> objects, not path strings.
> 

Thanks Chris, how do I set the file descriptor to /dev/null then?

[toc] | [prev] | [next] | [standalone]


#46588

FromPeter Otten <__peter__@web.de>
Date2013-05-31 11:52 +0200
Message-ID<mailman.2480.1369993961.3114.python-list@python.org>
In reply to#46585
Luca Cerone wrote:

>> 
>> That's because stdin/stdout/stderr take file descriptors or file
>> 
>> objects, not path strings.
>> 
> 
> Thanks Chris, how do I set the file descriptor to /dev/null then?

For example:

with open(os.devnull, "wb") as stderr:
    p = subprocess.Popen(..., stderr=stderr)
    ...


In Python 3.3 and above:

p = subprocess.Popen(..., stderr=subprocess.DEVNULL)

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web