Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #11699

Re: Design question(s), re: why use of tmp-files or named-pipes(/dev/fd/N) instead of plain pipes?

Path csiph.com!au2pb.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!news.glorb.com!usenet.stanford.edu!not-for-mail
From Chet Ramey <chet.ramey@case.edu>
Newsgroups gnu.bash.bug
Subject Re: Design question(s), re: why use of tmp-files or named-pipes(/dev/fd/N) instead of plain pipes?
Date Sun, 18 Oct 2015 19:36:49 -0400
Organization ITS, Case Western Reserve University
Lines 161
Approved bug-bash@gnu.org
Message-ID <mailman.589.1445211431.7904.bug-bash@gnu.org> (permalink)
References <56218DA5.8030501@tlinx.org> <5622CDC8.2030102@case.edu> <5622EB23.6020700@tlinx.org>
Reply-To chet.ramey@case.edu
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset=utf-8
Content-Transfer-Encoding 7bit
X-Trace usenet.stanford.edu 1445211432 20632 208.118.235.17 (18 Oct 2015 23:37:12 GMT)
X-Complaints-To action@cs.stanford.edu
Cc chet.ramey@case.edu
To Linda Walsh <bash@tlinx.org>, bug-bash <bug-bash@gnu.org>
Envelope-to bug-bash@gnu.org
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
In-Reply-To <5622EB23.6020700@tlinx.org>
X-Junkmail-Whitelist YES (by domain whitelist at mpv1.tis.cwru.edu)
X-detected-operating-system by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic]
X-Received-From 129.22.105.36
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.14
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <http://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref csiph.com gnu.bash.bug:11699

Show key headers only | View raw


On 10/17/15 8:43 PM, Linda Walsh wrote:
> 
> 
> Chet Ramey wrote:
>> On 10/16/15 7:52 PM, Linda Walsh wrote:
>>
>>> As I mentioned, my initial take on implementation was
>>> using standard pipes instead of named pipes (not having read
>>> or perhaps having glossed over the 'named pipes' aspect).
>>
>> I think you're missing that process substitution is a word expansion
>> that is defined to expand to a filename.  When it uses /dev/fd, it
>> uses pipes and exposes that pipe to the process as a filename in
>> /dev/fd.  Named pipes are an alternative for systems that don't support
>> /dev/fd.
> -----
>     ??? I've never seen a usage where it expands to a filename and
> is treated as such.

Every example of process substitution ever given expands to a filename,
and the result is treated as a filename.

> 
> Are you meaning:
> readarray foo </etc/passwd  (being a read from filename case).
> vs.
> readarray foo < <(cat /etc/passwd), and in this case, "<(...)"
> is creating a process ... that puts the input file on some "/dev/fd/xx",
> and that readarray (or read)
> is then "handling that" as though it were a normal file?

Yes.  This expands to something akin to

	cat /etc/passwd > /dev/fd/63 &
	readarray foo < /dev/fd/62

where /dev/fd/62 and /dev/fd/63 are representations of file descriptors 62
and 63, which are the read and write ends of a pipe.  The open that
input redirection performs gives you a file handle that is a dup of fd 62,
the read end of the pipe.

> 
> But read or readarray, read from "file handle"s, not filenames --
> as in:
> "cat /etc/passwd |readarray foo"

Yes, they do.  Input redirection is the operation that `transforms' a
filename into a file descriptor.

> The only conceptual difference between that and
> readarray foo < <(cat /etc/passwd) is whether or not the readarray
> is done in the parent or the child, .. i.e.
> from a semantic point of view,  how is:
> 
>   readarray foo < <(cat /etc/passwd)
> 
> different from
> 
>   shopt -s lastpipe
>   cat /etc/passwd |readarray foo
> 
> Is there something in the semantics that would require they
> be implemented differently?

You really don't seem to be understanding that < <(cat /etc/passwd) is
composed of two separate primitives: the `<', which is input redirection,
and the `<(cat /etc/passwd)', which is a process substitution.  There is
no bash primitive that combines the two in the way you seem to think.

>> While using pipes for here-documents is possible, the original file-based
>> implementation (and I mean original -- it's almost 28 years old now) is
>> simple and works.  Inserting another process just to keep writing data to
>> a pipe (many here documents are larger than the OS pipe buffer size), deal
>> with write errors and SIGPIPE asynchronously, and add a few pipe file
>> descriptors to manage, seems like more trouble than benefit.  Files are
>> simple, can handle arbitrary amounts of data, and don't require an
>> additional process or bookkeeping.
> ----
>  But they do require a tempstore that wasn't present in a system-start
> bash script I was writing -- kept having problems with the different
> bash communication methods all needing some external-fs names to be
> present to handle what could have been done in memory.

OK.  So you need to either figure out how to use the filesystem that the
boot environment provides (e.g., TMPDIR=. to create temporary files in
the current directory, which is presumably writable), or figure out a
way to accomplish what you want in the presence of no available writable
filesystems.

>  BTW, Dealing with the SIGPIPE is why I kept the output pipe open in
> the reader process, and use a child-sig handler to close the specific
> pipe -- I.e. when I wrote similar code in perl, I kept track of child
> processes->pipe pairs, so my foreground process could close the
> "writer-pipe" only after the reader-pipe had everything.  Alternatively,
> Could use a pair of pipes, and parent could send a message to child
> process for more data, or close the pipe) -- a handshake, as it were.

None of which is simpler or more efficient than the current mechanism, nor
provide advantages which outweigh their implementation cost.

>  As far as may docs being larger than the buffer size -- shouldn't be
> an issue unless you are trying 2-way communication -- but that has lots
> of warnings about each reader waiting for output from the other as
> a writer and the need for asynch/non-blocking design.

Your writer process has to be prepared to persist indefinitely due to
output blocking.

> 
>  Writing to a tmp file requires enough room for the entire tmp doc on
> disk.  My tmp right now only has 4.6Available.  If some heredoc tried to
> encode the contents of a video, or Blue-ray data disk, and then
> something tried to unpack it to tmp first, it would die --- but with
> pipes, you don't have to send the whole thing at once -- that requires
> basically full space on sender and receiver or 2x the HEREDOC space.  If
> it could hold it in memory... my linux box has 2x48G (2 NUMA nodes) or
> ~96G + ~8G Swap.  So I'd say it really is case dependent -- ideally if
> bash was going to be enhanced in this area, it allow the user to set the
> defaults to use processes/pipes or tmp files.

If you think it's worthwhile, take a shot at an implementation and see
how it works out.

>  The problem I had with bash & tmps, was that bash scripts started
> execution off of 'root' before a separate FS for /tmp was mounted. The
> root partition has even less space available on it than /tmp.  So the
> use of a /tmp file when /tmp wasn't really available yet was an issue.

Then you need to work within the constraints of your environment.  Your
circumstances aren't that common.


>> There was also a backwards compatibility issue.  Since historical shells
>> had used temp files for here-documents, there was a set of applications
>> that assumed stdin was seekable as long as it wasn't a terminal, and it was
>> important to support them.
> ----
>     Oh?  What POSIX compatible applications?

This predated Posix, and included applications that aren't Posix, but were
widely (or maybe not so widely) used.  What does Posix matter?

>> If you think that you can improve the implementation of here
>> documents and here strings by using a pipe instead of a temp
>> file, by all means take a shot at it.  You might want to get
>> ahold of some old Unix `shar' archives to test your implementation
>> with very large here-documents.
> ----
>     Or, I might read those same shar achives today directly
> into memory -- depends on system resources.

You might, were you to implement a shell mechanism that uses memory
for here documents, since shar archives use here documents to write the
files contained in the archive.  I'm suggesting that shar files are a
convenient way to test a here-document implementation.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

Back to gnu.bash.bug | Previous | Next | Find similar


Thread

Re: Design question(s), re: why use of tmp-files or named-pipes(/dev/fd/N) instead of plain pipes? Chet Ramey <chet.ramey@case.edu> - 2015-10-18 19:36 -0400

csiph-web