如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

只需将结果存储在shell之外,例如在一个文件中。

#!/bin/bash
tmp=/tmp/results

: > $tmp  #clean the file

for i in `seq 0 9`; do
  (doCalculations $i; echo $i:$?>>$tmp)&
done      #iterate

wait      #wait until all ready

sort $tmp | grep -v ':0'  #... handle as required

其他回答

这是我使用的东西:

#wait for jobs
for job in `jobs -p`; do wait ${job}; done

我需要这个,但目标进程不是当前shell的子进程,在这种情况下,等待$PID不起作用。我确实找到了以下替代方案:

while [ -e /proc/$PID ]; do sleep 0.1 ; done

这依赖于procfs的存在,它可能不可用(例如Mac不提供它)。所以对于可移植性,你可以用这个代替:

while ps -p $PID >/dev/null ; do sleep 0.1 ; done

这是对@Luca Tettamanti获得最多赞的答案的扩展,以创建一个完全可运行的示例。

这个答案让我很好奇:

n_procs是什么类型的变量,它包含什么?procs是什么类型的变量,它包含什么?有人能更新这个答案,使其可运行通过添加这些变量的定义吗?我不明白是怎么回事。

...还有:

当子流程完成时,如何从子流程获得返回代码(这是这个问题的整个关键)?

总之,我算出来了,这是一个完全可运行的例子。

注:

$! is how to obtain the PID (Process ID) of the last-executed sub-process. Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process. myarray=() is how to create an array in bash. To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.

完整的、可运行的程序:等待所有进程结束

multi_process_program.sh (from my eRCaGuy_hello_world repo):

#!/usr/bin/env bash


# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
    seconds_to_sleep="$1"
    sleep "$seconds_to_sleep"
    return "$seconds_to_sleep"
}

# Create an array of whatever commands you want to run as subprocesses
procs=()  # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")

num_procs=${#procs[@]}  # number of processes
echo "num_procs = $num_procs"

# run commands as subprocesses and store pids in an array
pids=()  # bash array
for (( i=0; i<"$num_procs"; i++ )); do
    echo "cmd = ${procs[$i]}"
    ${procs[$i]} &  # run the cmd as a subprocess
    # store pid of last subprocess started; see:
    # https://unix.stackexchange.com/a/30371/114401
    pids+=("$!")
    echo "    pid = ${pids[$i]}"
done

# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[@]}"; do
    wait "$pid"
    return_code="$?"
    echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."

通过运行chmod +x multi_process_program.sh将上面的文件修改为可执行文件,然后像这样运行:

time ./multi_process_program.sh 

样例输出。查看调用中的time命令的输出如何显示运行时间为5.084秒。我们还能够成功地从每个子流程检索返回代码。

eRCaGuy_hello_world/bash$ time ./multi_process_program.sh num_procs = 4 cmd = my_sleep 5 pid = 21694 cmd = my_sleep 2 pid = 21695 cmd = my_sleep 3 pid = 21697 cmd = my_sleep 4 pid = 21699 PID = 21694; return_code = 5 PID = 21695; return_code = 2 PID = 21697; return_code = 3 PID = 21699; return_code = 4 All 4 processes have ended. PID 21694 is done; return_code = 5; 3 PIDs remaining. PID 21695 is done; return_code = 2; 2 PIDs remaining. PID 21697 is done; return_code = 3; 1 PIDs remaining. PID 21699 is done; return_code = 4; 0 PIDs remaining. real 0m5.084s user 0m0.025s sys 0m0.061s

更进一步:确定每个单独进程结束的时间

如果您希望在每个进程结束时执行一些操作,而您不知道它们将在何时结束,那么可以在无限while循环中轮询以查看每个进程何时结束,然后执行您想要的任何操作。

只需注释掉上面的“OPTION 1”代码块,并将其替换为“OPTION 2”代码块:

# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
    for i in "${!pids[@]}"; do
        pid="${pids[$i]}"
        # echo "pid = $pid"  # debugging

        # See if PID is still running; see my answer here:
        # https://stackoverflow.com/a/71134379/4561887
        ps --pid "$pid" > /dev/null
        if [ "$?" -ne 0 ]; then
            # PID doesn't exist anymore, meaning it terminated

            # 1st, read its return code
            wait "$pid"
            return_code="$?"

            # 2nd, remove this PID from the `pids` array by `unset`ting the
            # element at this index; NB: due to how bash arrays work, this does
            # NOT actually remove this element from the array. Rather, it
            # removes its index from the `"${!pids[@]}"` list of indices,
            # adjusts the array count(`"${#pids[@]}"`) accordingly, and it sets
            # the value at this index to either a null value of some sort, or
            # an empty string (I'm not exactly sure).
            unset "pids[$i]"

            num_pids="${#pids[@]}"
            echo "PID $pid is done; return_code = $return_code;" \
                 "$num_pids PIDs remaining."
        fi
    done

    # exit the while loop if the `pids` array is empty
    if [ "${#pids[@]}" -eq 0 ]; then
        break
    fi

    # Do some small sleep here to keep your polling loop from sucking up
    # 100% of one of your CPUs unnecessarily. Sleeping allows other processes
    # to run during this time.
    sleep 0.1
done

完整程序的示例运行和输出,注释掉选项1,使用选项2:

eRCaGuy_hello_world / bash。美元/ multi_process_program.sh Num_procs = 4 CMD = my_sleep Pid = 22275 CMD = my_sleep Pid = 22276 CMD = my_sleep Pid = 22277 CMD = my_sleep Pid = 22280 PID 22276完成;Return_code = 2;剩余3个pid。 PID 22277完成;Return_code = 3;剩余2个pid。 PID 22280完成;Return_code = 4;剩余1个pid。 PID 22275完成;Return_code = 5;剩余的pid为0。

每一个PID XXXXX都是在进程结束后实时打印出来的!请注意,尽管sleep 5的进程(在本例中为PID 22275)是首先运行的,但它是最后完成的,并且我们在每个进程终止后成功检测到它。我们还成功地检测了每个返回代码,就像选项1一样。

其他参考资料:

*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added): wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process. In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code! How to check if a process id (PID) exists my answer Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means. How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/

我刚刚修改了一个脚本到后台和并行化的过程。

我做了一些实验(在Solaris上使用bash和ksh),发现如果退出状态不为零,'wait'将输出退出状态,或者当没有提供PID参数时,将输出一个返回非零退出的作业列表。如。

Bash:

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]-  Exit 2                  sleep 20 && exit 2
[2]+  Exit 1                  sleep 10 && exit 1

Ksh:

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+  Done(2)                  sleep 20 && exit 2
[2]+  Done(1)                  sleep 10 && exit 1

这个输出被写入stderr,所以OPs示例的简单解决方案可以是:

#!/bin/bash

trap "rm -f /tmp/x.$$" EXIT

for i in `seq 0 9`; do
  doCalculations $i &
done

wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
  exit 1
fi

虽然这:

wait 2> >(wc -l)

也将返回一个计数,但不包含TMP文件。这也可以这样使用,例如:

wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)

但是这并不比tmp文件有用多少。我找不到一种有效的方法来避免tmp文件,同时也避免在子shell中运行“等待”,这根本不会起作用。

下面的代码将等待所有计算的完成,并在任何doccalculation失败时返回退出状态1。

#!/bin/bash
for i in $(seq 0 9); do
   (doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1