How does this wait -n work to cap parallelism?

Path	csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From	Earnestly <zibeon@googlemail.com>
Newsgroups	gnu.bash.bug
Subject	How does this wait -n work to cap parallelism?
Date	Mon, 29 Jul 2019 19:12:42 +0100
Lines	165
Approved	bug-bash@gnu.org
Message-ID	<mailman.180.1564424778.1985.bug-bash@gnu.org> (permalink)
References	<20190729181242.GB2032@teapot>
NNTP-Posting-Host	lists.gnu.org
Mime-Version	1.0
Content-Type	text/plain; charset=us-ascii
X-Trace	usenet.stanford.edu 1564424779 24409 209.51.188.17 (29 Jul 2019 18:26:19 GMT)
X-Complaints-To	action@cs.stanford.edu
To	bug-bash@gnu.org
Envelope-to	bug-bash@gnu.org
DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=QWe4FLQZiVBdJMfLeRmJFTMB8zmpJ7HSZWYFNTTk88E=; b=Vq1cQD+b+GtCmgGPZUpjeR+dXyju5vxKVZq2oz2oIVSyAEsb6lqr4ZctuM82neox4I vXqGDX6ddLW03NnmZPtIltB0wEgH0SiMbhiW0LqX2o6c1aYXhLw8R+kfFYVFZkwNRf5G uEGTdsgA37pEZCublCfTtDEksBcRbOwnNf0s0YXQCq4UlYe/VpWPFtamx0Xc58hT6i4Q rNbV+f2dqHoSA8AP+7ewjVAi525GNh2ut4NShZeeF+Y9grMr5P8RLU5lTv3ah9Im8oC/ kN6sK8Xq2Nxusu4rnaBLNRQoLScYzzLaKe7c7vxe8k7oQu6K0hjgkYdV17Pd2+iaMsR6 pKNw==
X-Google-DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=QWe4FLQZiVBdJMfLeRmJFTMB8zmpJ7HSZWYFNTTk88E=; b=twxO6E/W05c6uIzS+g/PVWUUjJdOZY9zJVpFXE3+OBG9K8FxP4LP5EkGagyaL6TTrt dVPLwFJIORPNduu/NuxN0mrdtZtxCvVDZ4guZBjGpo+NCeUeuy+LtkOzlwkdtu7TwK6T TMxyuEXaThl1vasHlI+HiP+BUp2E/el+OQMgbVKmxA8zJKdDGNn7ILeYFtiVGOrCtlYr vCVQcJ752fwwo7Y5ZdxOKs2sSxSLjneLXMVe47WeTY0SILle3xyZ/1h9XUE6jHLHsWdr lZYudqPDnSkc4IgvaSYRHb2CppTU6vtlc1O0jrw9qTDzu/MF/xWcV2YV1LB1T7R8APgA f/7g==
X-Gm-Message-State	APjAAAXZK+DA6izB1SOLjVBwsMzAH84QmQFxCLa11iP9NTD240VFOoJW zX0OSYVTOcfS2aR+x2Nu/A==
X-Google-Smtp-Source	APXvYqwbOY0vnlB3k78PSF/6YM8GizmozhhCMKNyvOx0yccvtDFaaDdDOCdNrNRgh26bAXYlXos7KQ==
X-Received	by 2002:adf:e4c3:: with SMTP id v3mr120317104wrm.107.1564423963909; Mon, 29 Jul 2019 11:12:43 -0700 (PDT)
Content-Disposition	inline
User-Agent	Mutt/1.12.1 (2019-06-15)
X-detected-operating-system	by eggs.gnu.org: Genre and OS details not recognized.
X-Received-From	2a00:1450:4864:20::444
X-Mailman-Approved-At	Mon, 29 Jul 2019 14:26:16 -0400
X-BeenThere	bug-bash@gnu.org
X-Mailman-Version	2.1.23
Precedence	list
List-Id	Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe	<https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive	<https://lists.gnu.org/archive/html/bug-bash>
List-Post	<mailto:bug-bash@gnu.org>
List-Help	<mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe	<https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID	<20190729181242.GB2032@teapot>
Xref	csiph.com gnu.bash.bug:15265

Show key headers only | View raw

This mail was spurred on by users in the #bash IRC channel.  It started
after reading <http://mywiki.wooledge.org/ProcessManagement> where the
article introduces an example using 'wait -n' as a means to provide capped
parallism:


        #!/usr/bin/env bash

        # number of processes to run in parallel
        num_procs=5

        # function that processes one item
        my_job() {
            printf 'Processing %s\n' "$1"
            sleep "$(( RANDOM % 5 + 1 ))"
        }

        i=0
        while IFS= read -r line; do
            if (( i++ >= num_procs )); then
                wait -n   # wait for any job to complete. New in 4.3
            fi
            my_job "$line" &
        done < inputlist
        wait # wait for the remaining processes


The question is about how the example works in order to maintain
parallelism capped at num_proc.

Below I've provided a synthetic scenario which hopefully highlights my
(and others) confusion.

The logic is to provide two loops, one generating an initially slow feed of
"work" for the second loop which starts "agents" in the background.
Then the iteration 'i' is compared against 'nproc' (for which I use 3)
to guard calls to 'wait -n' once 'i' equals or exceeds 'nproc'.

As the initial feed rate and the backgrounded agents both initially take 2
seconds, there is only ever one agent started at a time, one after the other.

A typical process tree in top or htop might look something like this:


        bash scriptname
        |- bash scriptname (while read)
        |  `- bash scriptname (agent)
        |     `- sleep 2
        `- bash scriptname (slowthenfast)
         `- sleep 2


After some time the value of 'i' will have incremented well beyond the
value of 'nproc'.  It is now that the feed rate speeds up dramatically,
providing more work for the agents.

Due to this more agents are started while still maintaining the nproc limit:


        bash scriptname
        |- bash scriptname
        |  |- bash scriptname
        |  |  `- sleep 2
        |  |- bash scriptname
        |  |  `- sleep 2
        |  `- bash scriptname
        |     `- sleep 2
        `- bash scriptname
         `- sleep 0.1


And I have no idea why or how this works.  I hope the list can help
explain this behaviour.

---

My intuition, or assumption is as follows:

I would expect that the if statement in the second loop would always
succeed.  It would then call 'wait -n' and wait for the existing agent
to end (as I assume it's the only job running at this point).  Once it
ends a new agent will be started and back to the 'wait -n' the loop will
go.

In effect it should keep starting only one agent after the other.  E.g.:


        agent0 (this is the last agent that ran before the loop speed increased)

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent0 (as its the only job?)
            agent0 ends

        agent1 starts

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent1 (as its the only job?)
            agent1 ends

        agent2 starts

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent2 (as its the only job?)
            agent2 ends

        agent3 starts


But what appears to be happening is this:


        agent0 (this is the last agent that ran before the loop speed increased)

    while read
        (i++ >= nproc) => always true
            wait -n => waits for agent0 (as its the only job?)
            agent0 ends

        agent1 starts
        agent2 starts
        agent3 starts


---

#!/bin/bash

nproc=3

agent() {
    printf 'agent: %d: started... (i is %d)\n' "$1" "$2"
    sleep 2
    printf 'agent: %d: finished\n' "$1"
}

slowthenfast() {
    local a=0

    while :; do
        printf '%d\n' "$a"

        if (( a >= 10 )); then
            sleep 0.1
        else
            sleep 2
        fi

        (( ++a ))
    done
}

i=0
slowthenfast | while read -r work; do
    if (( i++ >= nproc )); then
        wait -n
    fi

    agent "$work" "$i" &
done

wait

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread

Thread

How does this wait -n work to cap parallelism? Earnestly <zibeon@googlemail.com> - 2019-07-29 19:12 +0100

csiph-web