Groups > comp.os.linux.development.system > #531

Re: how does pipe data chunking work?

From	pacman@kosh.dhis.org (Alan Curry)
Newsgroups	comp.os.linux.development.system
Subject	Re: how does pipe data chunking work?
Date	2011-01-30 23:17 +0000
Organization	Aioe.org NNTP Server
Message-ID	<ii4rip$km2$1@speranza.aioe.org> (permalink)
References	<ii4ovm$ld$1@news.mixmin.net>

Show all headers | View raw

In article <ii4ovm$ld$1@news.mixmin.net>, 1jam  <com@example.net> wrote:
>Have to admit I'm not clear on this one.. when piping programs together on a 
>terminal, what determines how often the output of the first program is sent 
>to the second program? And what amount of data?

The first program decides this itself. In most cases, it decides to use the
output functions from <stdio.h> and the C library then chooses a buffer size.
When the buffer is full, it gets flushed and the data appears in the pipe for
the second program to read.

There is a "stdbuf" command in recent versions of GNU coreutils, which you
can use to alter the behavior of stdio.

>
>It seems that with a command like:
>find . -iname \*bz2 -print0 | xargs -0 bunzip2
>
>..find will execute in entirety, then pass on its output all at once..? I 
>tested this with a double zipped file. Like some_file.bz2.bz2. It gets 
>decompressed only once, resulting in some_file.bz2.

Even if you could guarantee find flushed the filename to output before
continuing to read the directory, this would still be unlikely to work. First
of all, it's probably reading the directory with a readdir(3) implementation
that calls getdents(2), so the directory reading itself is effectively
buffered. Secondly, it probably takes more time for xargs to launch the
bunzip than it takes for find to do another readdir() call. Third,
directories are not ordered objects, so your some_file.bz2 might end up
appearing in an earlier position in the directory, where find can't see it
without rereading from the beginning. Fourth, xargs is designed to bundle
multiple names into a single command line so it won't actually try to launch
bunzip until it's got several names to work with. (xargs -l1 would fix that
last problem)

>
>Yet a command like:
>dd ... | gzip ...
>
>..will send blocks at a time, and both programs work simultaneously.

dd is special. It has all those weird options, especially bs=N, because it
wants to give you control over buffering.

-- 
Alan Curry

Back to comp.os.linux.development.system | Previous | Next — Next in thread | Find similar

Thread

Re: how does pipe data chunking work? pacman@kosh.dhis.org (Alan Curry) - 2011-01-30 23:17 +0000
  Re: how does pipe data chunking work? Jorgen Grahn <grahn+nntp@snipabacken.se> - 2011-02-02 08:00 +0000
  Re: how does pipe data chunking work? 1jam <nospam@nospam.net> - 2011-02-01 20:23 -0900

csiph-web