Re: Design question(s), re: why use of tmp-files or named-pipes(/dev/fd/N) instead of plain pipes?

Path	csiph.com!xmission!news.glorb.com!usenet.stanford.edu!not-for-mail
From	Linda Walsh <bash@tlinx.org>
Newsgroups	gnu.bash.bug
Subject	Re: Design question(s), re: why use of tmp-files or named-pipes(/dev/fd/N) instead of plain pipes?
Date	Sat, 17 Oct 2015 17:43:15 -0700
Lines	125
Approved	bug-bash@gnu.org
Message-ID	<mailman.529.1445129014.7904.bug-bash@gnu.org> (permalink)
References	<56218DA5.8030501@tlinx.org> <5622CDC8.2030102@case.edu>
NNTP-Posting-Host	lists.gnu.org
Mime-Version	1.0
Content-Type	text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding	7bit
X-Trace	usenet.stanford.edu 1445129014 5546 208.118.235.17 (18 Oct 2015 00:43:34 GMT)
X-Complaints-To	action@cs.stanford.edu
To	chet.ramey@case.edu, bug-bash <bug-bash@gnu.org>
Envelope-to	bug-bash@gnu.org
User-Agent	Thunderbird
In-Reply-To	<5622CDC8.2030102@case.edu>
X-detected-operating-system	by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic]
X-Received-From	173.164.175.65
X-BeenThere	bug-bash@gnu.org
X-Mailman-Version	2.1.14
Precedence	list
List-Id	Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe	<https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive	<http://lists.gnu.org/archive/html/bug-bash>
List-Post	<mailto:bug-bash@gnu.org>
List-Help	<mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe	<https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref	csiph.com gnu.bash.bug:11685

Show key headers only | View raw

Chet Ramey wrote:
> On 10/16/15 7:52 PM, Linda Walsh wrote:
> 
>> As I mentioned, my initial take on implementation was
>> using standard pipes instead of named pipes (not having read
>> or perhaps having glossed over the 'named pipes' aspect).
> 
> I think you're missing that process substitution is a word expansion
> that is defined to expand to a filename.  When it uses /dev/fd, it
> uses pipes and exposes that pipe to the process as a filename in
> /dev/fd.  Named pipes are an alternative for systems that don't support
> /dev/fd.
-----
	??? I've never seen a usage where it expands to a filename and
is treated as such.

Are you meaning:
readarray foo </etc/passwd  (being a read from filename case).
vs.
readarray foo < <(cat /etc/passwd), and in this case, "<(...)"
is creating a process ... that puts the input file on some "/dev/fd/xx", 
and that readarray (or read)
is then "handling that" as though it were a normal file?

That's what you are meaning by process substitution?... 

I wasn't understanding the literalness of the wording...(?)

But read or readarray, read from "file handle"s, not filenames --
as in:
 "cat /etc/passwd |readarray foo"

The only conceptual difference between that and
readarray foo < <(cat /etc/passwd) is whether or not the readarray
is done in the parent or the child, .. i.e. 

from a semantic point of view,  how is:

   readarray foo < <(cat /etc/passwd)

different from

   shopt -s lastpipe
   cat /etc/passwd |readarray foo

Is there something in the semantics that would require they
be implemented differently?

> While using pipes for here-documents is possible, the original file-based
> implementation (and I mean original -- it's almost 28 years old now) is
> simple and works.  Inserting another process just to keep writing data to
> a pipe (many here documents are larger than the OS pipe buffer size), deal
> with write errors and SIGPIPE asynchronously, and add a few pipe file
> descriptors to manage, seems like more trouble than benefit.  Files are
> simple, can handle arbitrary amounts of data, and don't require an
> additional process or bookkeeping.
----
  But they do require a tempstore that wasn't present in a system-start
bash script I was writing -- kept having problems with the different
bash communication methods all needing some external-fs names to be
present to handle what could have been done in memory.

  BTW, Dealing with the SIGPIPE is why I kept the output pipe open in
the reader process, and use a child-sig handler to close the specific
pipe -- I.e. when I wrote similar code in perl, I kept track of child
processes->pipe pairs, so my foreground process could close the
"writer-pipe" only after the reader-pipe had everything.  Alternatively,
Could use a pair of pipes, and parent could send a message to child
process for more data, or close the pipe) -- a handshake, as it were.

  As far as may docs being larger than the buffer size -- shouldn't be
an issue unless you are trying 2-way communication -- but that has lots
of warnings about each reader waiting for output from the other as
a writer and the need for asynch/non-blocking design.

  Writing to a tmp file requires enough room for the entire tmp doc on
disk.  My tmp right now only has 4.6Available.  If some heredoc tried to
encode the contents of a video, or Blue-ray data disk, and then
something tried to unpack it to tmp first, it would die --- but with
pipes, you don't have to send the whole thing at once -- that requires
basically full space on sender and receiver or 2x the HEREDOC space.  If
it could hold it in memory... my linux box has 2x48G (2 NUMA nodes) or
~96G + ~8G Swap.  So I'd say it really is case dependent -- ideally if
bash was going to be enhanced in this area, it allow the user to set the
defaults to use processes/pipes or tmp files.

  The problem I had with bash & tmps, was that bash scripts started
execution off of 'root' before a separate FS for /tmp was mounted. The
root partition has even less space available on it than /tmp.  So the
use of a /tmp file when /tmp wasn't really available yet was an issue.

> 
> here-strings are similar, and there's no reason to have here strings use
> a different mechanism than here documents.
> 
> There was also a backwards compatibility issue.  Since historical shells
> had used temp files for here-documents, there was a set of applications
> that assumed stdin was seekable as long as it wasn't a terminal, and it was
> important to support them.
----
	Oh?  What POSIX compatible applications?

> Yes.  You are missing the fundamental nature of process substitution:
> a word expansion that results in a filename.
---
	hopefully my q's at top were more on track?

> If you think that you can improve the implementation of here
> documents and here strings by using a pipe instead of a temp
> file, by all means take a shot at it.  You might want to get
> ahold of some old Unix `shar' archives to test your implementation
> with very large here-documents.
----
	Or, I might read those same shar achives today directly
into memory -- depends on system resources.

Back to gnu.bash.bug | Previous | Next | Find similar

Thread

Re: Design question(s), re: why use of tmp-files or named-pipes(/dev/fd/N) instead of plain pipes? Linda Walsh <bash@tlinx.org> - 2015-10-17 17:43 -0700

csiph-web