如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

使用'wait -n'来等待多个子进程,并在其中任何一个进程以非零状态码退出时退出。

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

状态代码'127'是不存在的进程,这意味着子进程可能已经退出。

其他回答

这是我使用的东西:

#wait for jobs
for job in `jobs -p`; do wait ${job}; done

陷阱是你的朋友。在很多系统中都可能出现ERR。您可以捕获EXIT,或在DEBUG上在每个命令之后执行一段代码。

这除了所有的标准信号。

edit

这是一次意外的登录错误的帐户,所以我没有看到示例的请求。

试试这里,用我的普通账户。

在bash脚本中处理异常

我有一个类似的情况,但有各种各样的问题与循环子shell,确保这里的其他解决方案不能工作,所以我让我的循环编写脚本,我将运行,等待结束。有效:

#!/bin/bash
echo > tmpscript.sh
for i in `seq 0 9`; do
    echo "doCalculations $i &" >> tmpscript.sh
done
echo "wait" >> tmpscript.sh
chmod u+x tmpscript.sh
./tmpscript.sh

愚蠢,但简单,并帮助调试一些事后的事情。

如果我有时间,我会更深入地了解GNU并行,但这对我自己的“doCalculations”过程来说很困难。

从Bash 5.1开始,由于引入了wait -p,有了一种很好的等待和处理多个后台作业结果的新方法:

#!/usr/bin/env bash

# Spawn background jobs
for ((i=0; i < 10; i++)); do
    secs=$((RANDOM % 10)); code=$((RANDOM % 256))
    (sleep ${secs}; exit ${code}) &
    echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done

# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
    wait -n -p pid; code=$?
    [[ -z "${pid}" ]] && break
    echo "Background job ${pid} finished with code ${code}"
    (( ${code} != 0 )) && result=1
done

# Return overall result
exit ${result}

这是对@Luca Tettamanti获得最多赞的答案的扩展,以创建一个完全可运行的示例。

这个答案让我很好奇:

n_procs是什么类型的变量,它包含什么?procs是什么类型的变量,它包含什么?有人能更新这个答案,使其可运行通过添加这些变量的定义吗?我不明白是怎么回事。

...还有:

当子流程完成时,如何从子流程获得返回代码(这是这个问题的整个关键)?

总之,我算出来了,这是一个完全可运行的例子。

注:

$! is how to obtain the PID (Process ID) of the last-executed sub-process. Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process. myarray=() is how to create an array in bash. To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.

完整的、可运行的程序:等待所有进程结束

multi_process_program.sh (from my eRCaGuy_hello_world repo):

#!/usr/bin/env bash


# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
    seconds_to_sleep="$1"
    sleep "$seconds_to_sleep"
    return "$seconds_to_sleep"
}

# Create an array of whatever commands you want to run as subprocesses
procs=()  # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")

num_procs=${#procs[@]}  # number of processes
echo "num_procs = $num_procs"

# run commands as subprocesses and store pids in an array
pids=()  # bash array
for (( i=0; i<"$num_procs"; i++ )); do
    echo "cmd = ${procs[$i]}"
    ${procs[$i]} &  # run the cmd as a subprocess
    # store pid of last subprocess started; see:
    # https://unix.stackexchange.com/a/30371/114401
    pids+=("$!")
    echo "    pid = ${pids[$i]}"
done

# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[@]}"; do
    wait "$pid"
    return_code="$?"
    echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."

通过运行chmod +x multi_process_program.sh将上面的文件修改为可执行文件,然后像这样运行:

time ./multi_process_program.sh 

样例输出。查看调用中的time命令的输出如何显示运行时间为5.084秒。我们还能够成功地从每个子流程检索返回代码。

eRCaGuy_hello_world/bash$ time ./multi_process_program.sh num_procs = 4 cmd = my_sleep 5 pid = 21694 cmd = my_sleep 2 pid = 21695 cmd = my_sleep 3 pid = 21697 cmd = my_sleep 4 pid = 21699 PID = 21694; return_code = 5 PID = 21695; return_code = 2 PID = 21697; return_code = 3 PID = 21699; return_code = 4 All 4 processes have ended. PID 21694 is done; return_code = 5; 3 PIDs remaining. PID 21695 is done; return_code = 2; 2 PIDs remaining. PID 21697 is done; return_code = 3; 1 PIDs remaining. PID 21699 is done; return_code = 4; 0 PIDs remaining. real 0m5.084s user 0m0.025s sys 0m0.061s

更进一步:确定每个单独进程结束的时间

如果您希望在每个进程结束时执行一些操作,而您不知道它们将在何时结束,那么可以在无限while循环中轮询以查看每个进程何时结束,然后执行您想要的任何操作。

只需注释掉上面的“OPTION 1”代码块,并将其替换为“OPTION 2”代码块:

# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
    for i in "${!pids[@]}"; do
        pid="${pids[$i]}"
        # echo "pid = $pid"  # debugging

        # See if PID is still running; see my answer here:
        # https://stackoverflow.com/a/71134379/4561887
        ps --pid "$pid" > /dev/null
        if [ "$?" -ne 0 ]; then
            # PID doesn't exist anymore, meaning it terminated

            # 1st, read its return code
            wait "$pid"
            return_code="$?"

            # 2nd, remove this PID from the `pids` array by `unset`ting the
            # element at this index; NB: due to how bash arrays work, this does
            # NOT actually remove this element from the array. Rather, it
            # removes its index from the `"${!pids[@]}"` list of indices,
            # adjusts the array count(`"${#pids[@]}"`) accordingly, and it sets
            # the value at this index to either a null value of some sort, or
            # an empty string (I'm not exactly sure).
            unset "pids[$i]"

            num_pids="${#pids[@]}"
            echo "PID $pid is done; return_code = $return_code;" \
                 "$num_pids PIDs remaining."
        fi
    done

    # exit the while loop if the `pids` array is empty
    if [ "${#pids[@]}" -eq 0 ]; then
        break
    fi

    # Do some small sleep here to keep your polling loop from sucking up
    # 100% of one of your CPUs unnecessarily. Sleeping allows other processes
    # to run during this time.
    sleep 0.1
done

完整程序的示例运行和输出,注释掉选项1,使用选项2:

eRCaGuy_hello_world / bash。美元/ multi_process_program.sh Num_procs = 4 CMD = my_sleep Pid = 22275 CMD = my_sleep Pid = 22276 CMD = my_sleep Pid = 22277 CMD = my_sleep Pid = 22280 PID 22276完成;Return_code = 2;剩余3个pid。 PID 22277完成;Return_code = 3;剩余2个pid。 PID 22280完成;Return_code = 4;剩余1个pid。 PID 22275完成;Return_code = 5;剩余的pid为0。

每一个PID XXXXX都是在进程结束后实时打印出来的!请注意,尽管sleep 5的进程(在本例中为PID 22275)是首先运行的,但它是最后完成的,并且我们在每个进程终止后成功检测到它。我们还成功地检测了每个返回代码,就像选项1一样。

其他参考资料:

*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added): wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process. In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code! How to check if a process id (PID) exists my answer Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means. How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/