Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #77100 > unrolled thread
| Started by | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| First post | 2014-08-27 05:19 +0000 |
| Last post | 2014-08-27 11:31 +0200 |
| Articles | 10 — 5 participants |
Back to article view | Back to comp.lang.python
Reading from sys.stdin reads the whole file in Steven D'Aprano <steve@pearwood.info> - 2014-08-27 05:19 +0000
Re: Reading from sys.stdin reads the whole file in Marko Rauhamaa <marko@pacujo.net> - 2014-08-27 08:29 +0300
Re: Reading from sys.stdin reads the whole file in Marko Rauhamaa <marko@pacujo.net> - 2014-08-27 08:31 +0300
Re: Reading from sys.stdin reads the whole file in Steven D'Aprano <steve@pearwood.info> - 2014-08-27 06:37 +0000
Re: Reading from sys.stdin reads the whole file in Chris Angelico <rosuav@gmail.com> - 2014-08-27 16:45 +1000
Re: Reading from sys.stdin reads the whole file in Akira Li <4kir4.1i@gmail.com> - 2014-08-29 04:02 +0400
Re: Reading from sys.stdin reads the whole file in Chris Angelico <rosuav@gmail.com> - 2014-08-27 16:02 +1000
Re: Reading from sys.stdin reads the whole file in Peter Otten <__peter__@web.de> - 2014-08-27 09:42 +0200
Re: Reading from sys.stdin reads the whole file in Marko Rauhamaa <marko@pacujo.net> - 2014-08-27 11:39 +0300
Re: Reading from sys.stdin reads the whole file in Peter Otten <__peter__@web.de> - 2014-08-27 11:31 +0200
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2014-08-27 05:19 +0000 |
| Subject | Reading from sys.stdin reads the whole file in |
| Message-ID | <53fd6a48$0$11111$c3e8da3@news.astraweb.com> |
I'm trying to read from stdin. Here I simulate a process that slowly outputs data to stdout: steve@runes:~$ cat out.py import time print "Hello..." time.sleep(10) print "World!" time.sleep(10) print "Goodbye!" and another process that reads from stdin: steve@runes:~$ cat slurp.py import sys import time for line in sys.stdin: print time.ctime(), line When I pipe one to the other, I expect each line to be printed as they arrive, but instead they all queue up and happen at once: steve@runes:~$ python out.py | python slurp.py Wed Aug 27 15:13:44 2014 Hello... Wed Aug 27 15:13:44 2014 World! Wed Aug 27 15:13:44 2014 Goodbye! (Note how the time stamps are all together, instead of ten seconds apart.) Why is this happening? How can I read from sys.stdin "on the fly", so to speak, without waiting for the first process to end? Is there established terminology for talking about this sort of thing? -- Steven
[toc] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-08-27 08:29 +0300 |
| Message-ID | <87mwaqxzyn.fsf@elektro.pacujo.net> |
| In reply to | #77100 |
Steven D'Aprano <steve@pearwood.info>: > When I pipe one to the other, I expect each line to be printed as they > arrive, but instead they all queue up and happen at once: Try flushing after each print. When sys.stdout is a pipe, flushing happens only when the internal buffer fills up. Marko
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-08-27 08:31 +0300 |
| Message-ID | <87iolexzv5.fsf@elektro.pacujo.net> |
| In reply to | #77101 |
Marko Rauhamaa <marko@pacujo.net>:
> Try flushing after each print.
<URL: http://stackoverflow.com/questions/230751/how-to-flush-ou
tput-of-python-print>
Since Python 3.3, there is no need to use sys.stdout.flush():
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)
Marko
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2014-08-27 06:37 +0000 |
| Message-ID | <53fd7cab$0$11111$c3e8da3@news.astraweb.com> |
| In reply to | #77101 |
On Wed, 27 Aug 2014 08:29:20 +0300, Marko Rauhamaa wrote:
> Steven D'Aprano <steve@pearwood.info>:
>
>> When I pipe one to the other, I expect each line to be printed as they
>> arrive, but instead they all queue up and happen at once:
>
> Try flushing after each print.
Doesn't help.
Here is an update that may make the problem more clear:
steve@runes:~$ cat out.py
import time
import sys
print "Time of output:", time.ctime()
sys.stdout.flush()
time.sleep(10)
print "Time of output:", time.ctime()
sys.stdout.flush()
time.sleep(10)
print "Time of output:", time.ctime()
steve@runes:~$ cat slurp.py
import sys
import time
for line in sys.stdin:
print "Time of input:", time.ctime(), line
sys.stdin.flush()
sys.stdout.flush()
And the results:
steve@runes:~$ python out.py | python slurp.py
Time of input: Wed Aug 27 16:35:48 2014 Time of output: Wed Aug 27 16:35:28 2014
Time of input: Wed Aug 27 16:35:48 2014 Time of output: Wed Aug 27 16:35:38 2014
Time of input: Wed Aug 27 16:35:48 2014 Time of output: Wed Aug 27 16:35:48 2014
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-08-27 16:45 +1000 |
| Message-ID | <mailman.13492.1409121937.18130.python-list@python.org> |
| In reply to | #77108 |
On Wed, Aug 27, 2014 at 4:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Wed, 27 Aug 2014 08:29:20 +0300, Marko Rauhamaa wrote:
>
>> Try flushing after each print.
>
> Doesn't help.
It does, but insufficiently. If slurp.py is run under Py3, it works
fine; or take Naoki's suggestion (although without the parens):
import sys
import time
for line in iter(sys.stdin.readline, ''):
print "Time of input:", time.ctime(), line
sys.stdin.flush()
sys.stdout.flush()
Then it works.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Akira Li <4kir4.1i@gmail.com> |
|---|---|
| Date | 2014-08-29 04:02 +0400 |
| Message-ID | <mailman.13590.1409270573.18130.python-list@python.org> |
| In reply to | #77108 |
Chris Angelico <rosuav@gmail.com> writes: > On Wed, Aug 27, 2014 at 4:37 PM, Steven D'Aprano <steve@pearwood.info> wrote: >> On Wed, 27 Aug 2014 08:29:20 +0300, Marko Rauhamaa wrote: >> >>> Try flushing after each print. >> >> Doesn't help. > > It does, but insufficiently. If slurp.py is run under Py3, it works > fine; or take Naoki's suggestion (although without the parens): > > import sys > import time > > for line in iter(sys.stdin.readline, ''): > print "Time of input:", time.ctime(), line > sys.stdin.flush() > sys.stdout.flush() > > Then it works. > > ChrisA It looks like this bug http://bugs.python.org/issue3907 `python -u out.py | python -u slurp.py` could be used to avoid .flush() calls everywhere. Or reassign `sys.stdin = io.open(sys.stdin.fileno(), 'r', 1)` inside the script. http://stackoverflow.com/questions/107705/python-output-buffering -- Akira
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-08-27 16:02 +1000 |
| Message-ID | <mailman.13488.1409119385.18130.python-list@python.org> |
| In reply to | #77100 |
On Wed, Aug 27, 2014 at 3:19 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> When I pipe one to the other, I expect each line to be printed as they
> arrive, but instead they all queue up and happen at once:
You're seeing two different problems here. One is the flushing of
stdout in out.py, as Marko mentioned, but it's easily proven that
that's not the whole issue. Compare "python out.py" and "python
out.py|cat" - the latter will demonstrate whether or not it's getting
flushed properly (the former, where stdout is a tty, will always flush
correctly).
But even with that sorted, iterating over stdin has issues in Python
2. Here's a tweaked version of your files (note that I cut the sleeps
to 2 seconds, but the effect is the same):
rosuav@sikorsky:~$ cat out.py
import time
print("Hello...",flush=True)
time.sleep(2)
print("World!",flush=True)
time.sleep(2)
print("Goodbye!",flush=True)
rosuav@sikorsky:~$ cat slurp.py
from __future__ import print_function
import sys
import time
for line in sys.stdin:
print(time.ctime(), line)
rosuav@sikorsky:~$ python3 out.py|python slurp.py
Wed Aug 27 16:00:16 2014 Hello...
Wed Aug 27 16:00:16 2014 World!
Wed Aug 27 16:00:16 2014 Goodbye!
rosuav@sikorsky:~$ python3 out.py|python3 slurp.py
Wed Aug 27 16:00:19 2014 Hello...
Wed Aug 27 16:00:21 2014 World!
Wed Aug 27 16:00:23 2014 Goodbye!
rosuav@sikorsky:~$
With a Py2 consumer, there's still buffering happening. With a Py3
consumer, it works correctly. How to control the Py2 buffering,
though, I don't know.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-08-27 09:42 +0200 |
| Message-ID | <mailman.13497.1409125378.18130.python-list@python.org> |
| In reply to | #77100 |
Steven D'Aprano wrote:
> I'm trying to read from stdin. Here I simulate a process that slowly
> outputs data to stdout:
>
> steve@runes:~$ cat out.py
> import time
>
> print "Hello..."
> time.sleep(10)
> print "World!"
> time.sleep(10)
> print "Goodbye!"
In addition to what already has been said: you can switch off output
buffering of stdout/stderr with
python -u out.py
or by setting the PYTHONUNBUFFERED environment variable.
You still need the readline trick to get unbuffered input. Quoting the man-
page:
"""
-u Force stdin, stdout and stderr to be totally unbuffered. On
systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread‐
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()"
inside a "while 1:" loop.
"""
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-08-27 11:39 +0300 |
| Message-ID | <87a96qgwbl.fsf@elektro.pacujo.net> |
| In reply to | #77115 |
Peter Otten <__peter__@web.de>: > In addition to what already has been said: you can switch off output > buffering of stdout/stderr with > > python -u out.py > > or by setting the PYTHONUNBUFFERED environment variable. Very often such externalities are not in the control of the application developer. Marko
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2014-08-27 11:31 +0200 |
| Message-ID | <mailman.13501.1409131921.18130.python-list@python.org> |
| In reply to | #77119 |
Marko Rauhamaa wrote: > Peter Otten <__peter__@web.de>: > >> In addition to what already has been said: you can switch off output >> buffering of stdout/stderr with >> >> python -u out.py >> >> or by setting the PYTHONUNBUFFERED environment variable. > > Very often such externalities are not in the control of the application > developer. Sometimes it's possible to use a wrapper script rather than to sprinkle your code with flush(). Sometimes the "offending" python script is not even written and maintained by you, and setting an environment variable may be a price you are willing to pay to keep it that way.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web