Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Earnestly Newsgroups: gnu.bash.bug Subject: How does this wait -n work to cap parallelism? Date: Mon, 29 Jul 2019 19:12:42 +0100 Lines: 165 Approved: bug-bash@gnu.org Message-ID: References: <20190729181242.GB2032@teapot> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: usenet.stanford.edu 1564424779 24409 209.51.188.17 (29 Jul 2019 18:26:19 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=QWe4FLQZiVBdJMfLeRmJFTMB8zmpJ7HSZWYFNTTk88E=; b=Vq1cQD+b+GtCmgGPZUpjeR+dXyju5vxKVZq2oz2oIVSyAEsb6lqr4ZctuM82neox4I vXqGDX6ddLW03NnmZPtIltB0wEgH0SiMbhiW0LqX2o6c1aYXhLw8R+kfFYVFZkwNRf5G uEGTdsgA37pEZCublCfTtDEksBcRbOwnNf0s0YXQCq4UlYe/VpWPFtamx0Xc58hT6i4Q rNbV+f2dqHoSA8AP+7ewjVAi525GNh2ut4NShZeeF+Y9grMr5P8RLU5lTv3ah9Im8oC/ kN6sK8Xq2Nxusu4rnaBLNRQoLScYzzLaKe7c7vxe8k7oQu6K0hjgkYdV17Pd2+iaMsR6 pKNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=QWe4FLQZiVBdJMfLeRmJFTMB8zmpJ7HSZWYFNTTk88E=; b=twxO6E/W05c6uIzS+g/PVWUUjJdOZY9zJVpFXE3+OBG9K8FxP4LP5EkGagyaL6TTrt dVPLwFJIORPNduu/NuxN0mrdtZtxCvVDZ4guZBjGpo+NCeUeuy+LtkOzlwkdtu7TwK6T TMxyuEXaThl1vasHlI+HiP+BUp2E/el+OQMgbVKmxA8zJKdDGNn7ILeYFtiVGOrCtlYr vCVQcJ752fwwo7Y5ZdxOKs2sSxSLjneLXMVe47WeTY0SILle3xyZ/1h9XUE6jHLHsWdr lZYudqPDnSkc4IgvaSYRHb2CppTU6vtlc1O0jrw9qTDzu/MF/xWcV2YV1LB1T7R8APgA f/7g== X-Gm-Message-State: APjAAAXZK+DA6izB1SOLjVBwsMzAH84QmQFxCLa11iP9NTD240VFOoJW zX0OSYVTOcfS2aR+x2Nu/A== X-Google-Smtp-Source: APXvYqwbOY0vnlB3k78PSF/6YM8GizmozhhCMKNyvOx0yccvtDFaaDdDOCdNrNRgh26bAXYlXos7KQ== X-Received: by 2002:adf:e4c3:: with SMTP id v3mr120317104wrm.107.1564423963909; Mon, 29 Jul 2019 11:12:43 -0700 (PDT) Content-Disposition: inline User-Agent: Mutt/1.12.1 (2019-06-15) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::444 X-Mailman-Approved-At: Mon, 29 Jul 2019 14:26:16 -0400 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <20190729181242.GB2032@teapot> Xref: csiph.com gnu.bash.bug:15265 This mail was spurred on by users in the #bash IRC channel. It started after reading where the article introduces an example using 'wait -n' as a means to provide capped parallism: #!/usr/bin/env bash # number of processes to run in parallel num_procs=5 # function that processes one item my_job() { printf 'Processing %s\n' "$1" sleep "$(( RANDOM % 5 + 1 ))" } i=0 while IFS= read -r line; do if (( i++ >= num_procs )); then wait -n # wait for any job to complete. New in 4.3 fi my_job "$line" & done < inputlist wait # wait for the remaining processes The question is about how the example works in order to maintain parallelism capped at num_proc. Below I've provided a synthetic scenario which hopefully highlights my (and others) confusion. The logic is to provide two loops, one generating an initially slow feed of "work" for the second loop which starts "agents" in the background. Then the iteration 'i' is compared against 'nproc' (for which I use 3) to guard calls to 'wait -n' once 'i' equals or exceeds 'nproc'. As the initial feed rate and the backgrounded agents both initially take 2 seconds, there is only ever one agent started at a time, one after the other. A typical process tree in top or htop might look something like this: bash scriptname |- bash scriptname (while read) | `- bash scriptname (agent) | `- sleep 2 `- bash scriptname (slowthenfast) `- sleep 2 After some time the value of 'i' will have incremented well beyond the value of 'nproc'. It is now that the feed rate speeds up dramatically, providing more work for the agents. Due to this more agents are started while still maintaining the nproc limit: bash scriptname |- bash scriptname | |- bash scriptname | | `- sleep 2 | |- bash scriptname | | `- sleep 2 | `- bash scriptname | `- sleep 2 `- bash scriptname `- sleep 0.1 And I have no idea why or how this works. I hope the list can help explain this behaviour. --- My intuition, or assumption is as follows: I would expect that the if statement in the second loop would always succeed. It would then call 'wait -n' and wait for the existing agent to end (as I assume it's the only job running at this point). Once it ends a new agent will be started and back to the 'wait -n' the loop will go. In effect it should keep starting only one agent after the other. E.g.: agent0 (this is the last agent that ran before the loop speed increased) while read (i++ >= nproc) => always true wait -n => waits for agent0 (as its the only job?) agent0 ends agent1 starts while read (i++ >= nproc) => always true wait -n => waits for agent1 (as its the only job?) agent1 ends agent2 starts while read (i++ >= nproc) => always true wait -n => waits for agent2 (as its the only job?) agent2 ends agent3 starts But what appears to be happening is this: agent0 (this is the last agent that ran before the loop speed increased) while read (i++ >= nproc) => always true wait -n => waits for agent0 (as its the only job?) agent0 ends agent1 starts agent2 starts agent3 starts --- #!/bin/bash nproc=3 agent() { printf 'agent: %d: started... (i is %d)\n' "$1" "$2" sleep 2 printf 'agent: %d: finished\n' "$1" } slowthenfast() { local a=0 while :; do printf '%d\n' "$a" if (( a >= 10 )); then sleep 0.1 else sleep 2 fi (( ++a )) done } i=0 slowthenfast | while read -r work; do if (( i++ >= nproc )); then wait -n fi agent "$work" "$i" & done wait