Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #77100 > unrolled thread

Reading from sys.stdin reads the whole file in

Started bySteven D'Aprano <steve@pearwood.info>
First post2014-08-27 05:19 +0000
Last post2014-08-27 11:31 +0200
Articles 10 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Reading from sys.stdin reads the whole file in Steven D'Aprano <steve@pearwood.info> - 2014-08-27 05:19 +0000
    Re: Reading from sys.stdin reads the whole file in Marko Rauhamaa <marko@pacujo.net> - 2014-08-27 08:29 +0300
      Re: Reading from sys.stdin reads the whole file in Marko Rauhamaa <marko@pacujo.net> - 2014-08-27 08:31 +0300
      Re: Reading from sys.stdin reads the whole file in Steven D'Aprano <steve@pearwood.info> - 2014-08-27 06:37 +0000
        Re: Reading from sys.stdin reads the whole file in Chris Angelico <rosuav@gmail.com> - 2014-08-27 16:45 +1000
        Re: Reading from sys.stdin reads the whole file in Akira Li <4kir4.1i@gmail.com> - 2014-08-29 04:02 +0400
    Re: Reading from sys.stdin reads the whole file in Chris Angelico <rosuav@gmail.com> - 2014-08-27 16:02 +1000
    Re: Reading from sys.stdin reads the whole file in Peter Otten <__peter__@web.de> - 2014-08-27 09:42 +0200
      Re: Reading from sys.stdin reads the whole file in Marko Rauhamaa <marko@pacujo.net> - 2014-08-27 11:39 +0300
        Re: Reading from sys.stdin reads the whole file in Peter Otten <__peter__@web.de> - 2014-08-27 11:31 +0200

#77100 — Reading from sys.stdin reads the whole file in

FromSteven D'Aprano <steve@pearwood.info>
Date2014-08-27 05:19 +0000
SubjectReading from sys.stdin reads the whole file in
Message-ID<53fd6a48$0$11111$c3e8da3@news.astraweb.com>
I'm trying to read from stdin. Here I simulate a process that slowly 
outputs data to stdout:

steve@runes:~$ cat out.py
import time

print "Hello..."
time.sleep(10)
print "World!"
time.sleep(10)
print "Goodbye!"


and another process that reads from stdin:

steve@runes:~$ cat slurp.py 
import sys
import time

for line in sys.stdin:
	print time.ctime(), line



When I pipe one to the other, I expect each line to be printed as they 
arrive, but instead they all queue up and happen at once:


steve@runes:~$ python out.py | python slurp.py 
Wed Aug 27 15:13:44 2014 Hello...

Wed Aug 27 15:13:44 2014 World!

Wed Aug 27 15:13:44 2014 Goodbye!



(Note how the time stamps are all together, instead of ten seconds apart.)


Why is this happening?

How can I read from sys.stdin "on the fly", so to speak, without waiting 
for the first process to end?

Is there established terminology for talking about this sort of thing?



-- 
Steven

[toc] | [next] | [standalone]


#77101

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-08-27 08:29 +0300
Message-ID<87mwaqxzyn.fsf@elektro.pacujo.net>
In reply to#77100
Steven D'Aprano <steve@pearwood.info>:

> When I pipe one to the other, I expect each line to be printed as they
> arrive, but instead they all queue up and happen at once:

Try flushing after each print.

When sys.stdout is a pipe, flushing happens only when the internal
buffer fills up.


Marko

[toc] | [prev] | [next] | [standalone]


#77102

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-08-27 08:31 +0300
Message-ID<87iolexzv5.fsf@elektro.pacujo.net>
In reply to#77101
Marko Rauhamaa <marko@pacujo.net>:

> Try flushing after each print.

   <URL: http://stackoverflow.com/questions/230751/how-to-flush-ou
   tput-of-python-print>

   Since Python 3.3, there is no need to use sys.stdout.flush():

    print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)


Marko

[toc] | [prev] | [next] | [standalone]


#77108

FromSteven D'Aprano <steve@pearwood.info>
Date2014-08-27 06:37 +0000
Message-ID<53fd7cab$0$11111$c3e8da3@news.astraweb.com>
In reply to#77101
On Wed, 27 Aug 2014 08:29:20 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano <steve@pearwood.info>:
> 
>> When I pipe one to the other, I expect each line to be printed as they
>> arrive, but instead they all queue up and happen at once:
> 
> Try flushing after each print.

Doesn't help.

Here is an update that may make the problem more clear:

steve@runes:~$ cat out.py
import time
import sys

print "Time of output:", time.ctime()
sys.stdout.flush()
time.sleep(10)
print "Time of output:", time.ctime()
sys.stdout.flush()
time.sleep(10)
print "Time of output:", time.ctime()

steve@runes:~$ cat slurp.py 
import sys
import time

for line in sys.stdin:
	print "Time of input:", time.ctime(), line
        sys.stdin.flush()
        sys.stdout.flush()



And the results:

steve@runes:~$ python out.py | python slurp.py 
Time of input: Wed Aug 27 16:35:48 2014 Time of output: Wed Aug 27 16:35:28 2014

Time of input: Wed Aug 27 16:35:48 2014 Time of output: Wed Aug 27 16:35:38 2014

Time of input: Wed Aug 27 16:35:48 2014 Time of output: Wed Aug 27 16:35:48 2014


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#77109

FromChris Angelico <rosuav@gmail.com>
Date2014-08-27 16:45 +1000
Message-ID<mailman.13492.1409121937.18130.python-list@python.org>
In reply to#77108
On Wed, Aug 27, 2014 at 4:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Wed, 27 Aug 2014 08:29:20 +0300, Marko Rauhamaa wrote:
>
>> Try flushing after each print.
>
> Doesn't help.

It does, but insufficiently. If slurp.py is run under Py3, it works
fine; or take Naoki's suggestion (although without the parens):

import sys
import time

for line in iter(sys.stdin.readline, ''):
        print "Time of input:", time.ctime(), line
        sys.stdin.flush()
        sys.stdout.flush()

Then it works.

ChrisA

[toc] | [prev] | [next] | [standalone]


#77245

FromAkira Li <4kir4.1i@gmail.com>
Date2014-08-29 04:02 +0400
Message-ID<mailman.13590.1409270573.18130.python-list@python.org>
In reply to#77108
Chris Angelico <rosuav@gmail.com> writes:

> On Wed, Aug 27, 2014 at 4:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
>> On Wed, 27 Aug 2014 08:29:20 +0300, Marko Rauhamaa wrote:
>>
>>> Try flushing after each print.
>>
>> Doesn't help.
>
> It does, but insufficiently. If slurp.py is run under Py3, it works
> fine; or take Naoki's suggestion (although without the parens):
>
> import sys
> import time
>
> for line in iter(sys.stdin.readline, ''):
>         print "Time of input:", time.ctime(), line
>         sys.stdin.flush()
>         sys.stdout.flush()
>
> Then it works.
>
> ChrisA

It looks like this bug http://bugs.python.org/issue3907

`python -u out.py | python -u slurp.py`  could be used to avoid .flush()
calls everywhere.

Or reassign `sys.stdin = io.open(sys.stdin.fileno(), 'r', 1)` inside the
script.

http://stackoverflow.com/questions/107705/python-output-buffering


--
Akira

[toc] | [prev] | [next] | [standalone]


#77104

FromChris Angelico <rosuav@gmail.com>
Date2014-08-27 16:02 +1000
Message-ID<mailman.13488.1409119385.18130.python-list@python.org>
In reply to#77100
On Wed, Aug 27, 2014 at 3:19 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> When I pipe one to the other, I expect each line to be printed as they
> arrive, but instead they all queue up and happen at once:

You're seeing two different problems here. One is the flushing of
stdout in out.py, as Marko mentioned, but it's easily proven that
that's not the whole issue. Compare "python out.py" and "python
out.py|cat" - the latter will demonstrate whether or not it's getting
flushed properly (the former, where stdout is a tty, will always flush
correctly).

But even with that sorted, iterating over stdin has issues in Python
2. Here's a tweaked version of your files (note that I cut the sleeps
to 2 seconds, but the effect is the same):

rosuav@sikorsky:~$ cat out.py
import time

print("Hello...",flush=True)
time.sleep(2)
print("World!",flush=True)
time.sleep(2)
print("Goodbye!",flush=True)
rosuav@sikorsky:~$ cat slurp.py
from __future__ import print_function
import sys
import time

for line in sys.stdin:
        print(time.ctime(), line)
rosuav@sikorsky:~$ python3 out.py|python slurp.py
Wed Aug 27 16:00:16 2014 Hello...

Wed Aug 27 16:00:16 2014 World!

Wed Aug 27 16:00:16 2014 Goodbye!

rosuav@sikorsky:~$ python3 out.py|python3 slurp.py
Wed Aug 27 16:00:19 2014 Hello...

Wed Aug 27 16:00:21 2014 World!

Wed Aug 27 16:00:23 2014 Goodbye!

rosuav@sikorsky:~$


With a Py2 consumer, there's still buffering happening. With a Py3
consumer, it works correctly. How to control the Py2 buffering,
though, I don't know.

ChrisA

[toc] | [prev] | [next] | [standalone]


#77115

FromPeter Otten <__peter__@web.de>
Date2014-08-27 09:42 +0200
Message-ID<mailman.13497.1409125378.18130.python-list@python.org>
In reply to#77100
Steven D'Aprano wrote:

> I'm trying to read from stdin. Here I simulate a process that slowly
> outputs data to stdout:
> 
> steve@runes:~$ cat out.py
> import time
> 
> print "Hello..."
> time.sleep(10)
> print "World!"
> time.sleep(10)
> print "Goodbye!"

In addition to what already has been said: you can switch off output 
buffering of stdout/stderr with

python -u out.py

or by setting the PYTHONUNBUFFERED environment variable.

You still need the readline trick to get unbuffered input. Quoting the man-
page:

"""

       -u     Force stdin, stdout and stderr to be totally  unbuffered.   On
              systems where it matters, also put stdin, stdout and stderr in
              binary mode.  Note that there is internal buffering in  xread‐
              lines(),  readlines()  and file-object iterators ("for line in
              sys.stdin") which is not influenced by this option.   To  work
              around  this,  you  will  want  to  use "sys.stdin.readline()"
              inside a "while 1:" loop.
"""

[toc] | [prev] | [next] | [standalone]


#77119

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-08-27 11:39 +0300
Message-ID<87a96qgwbl.fsf@elektro.pacujo.net>
In reply to#77115
Peter Otten <__peter__@web.de>:

> In addition to what already has been said: you can switch off output
> buffering of stdout/stderr with
>
> python -u out.py
>
> or by setting the PYTHONUNBUFFERED environment variable.

Very often such externalities are not in the control of the application
developer.


Marko

[toc] | [prev] | [next] | [standalone]


#77123

FromPeter Otten <__peter__@web.de>
Date2014-08-27 11:31 +0200
Message-ID<mailman.13501.1409131921.18130.python-list@python.org>
In reply to#77119
Marko Rauhamaa wrote:

> Peter Otten <__peter__@web.de>:
> 
>> In addition to what already has been said: you can switch off output
>> buffering of stdout/stderr with
>>
>> python -u out.py
>>
>> or by setting the PYTHONUNBUFFERED environment variable.
> 
> Very often such externalities are not in the control of the application
> developer.

Sometimes it's possible to use a wrapper script rather than to sprinkle your 
code with flush(). 

Sometimes the "offending" python script is not even written and maintained 
by you, and setting an environment variable may be a price you are willing 
to pay to keep it that way.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web