如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

这里有一个使用等待的简单例子。

运行一些进程:

$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &

然后用wait命令等待他们:

$ wait < <(jobs -p)

或者只是等待(没有争论)。

这将等待后台的所有作业完成。

如果提供了-n选项,则等待下一个作业终止并返回其退出状态。

参见:帮助等待和帮助作业语法。

然而,缺点是这将只返回最后一个ID的状态,因此您需要检查每个子流程的状态并将其存储在变量中。

或者让你的计算函数在失败时创建一些文件(空的或有失败日志的),然后检查该文件是否存在。

$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.

其他回答

等待所有作业并返回最后一个失败作业的退出码。与上面的解决方案不同,这不需要保存pid,也不需要修改脚本的内部循环。走开,等着吧。

function wait_ex {
    # this waits for all jobs and returns the exit code of the last failing job
    ecode=0
    while true; do
        [ -z "$(jobs)" ] && break
        wait -n
        err="$?"
        [ "$err" != "0" ] && ecode="$err"
    done
    return $ecode
}

编辑:修正了脚本运行不存在的命令时可能被愚弄的错误。

陷阱是你的朋友。在很多系统中都可能出现ERR。您可以捕获EXIT,或在DEBUG上在每个命令之后执行一段代码。

这除了所有的标准信号。

edit

这是一次意外的登录错误的帐户,所以我没有看到示例的请求。

试试这里,用我的普通账户。

在bash脚本中处理异常

这是对@Luca Tettamanti获得最多赞的答案的扩展,以创建一个完全可运行的示例。

这个答案让我很好奇:

n_procs是什么类型的变量,它包含什么?procs是什么类型的变量,它包含什么?有人能更新这个答案,使其可运行通过添加这些变量的定义吗?我不明白是怎么回事。

...还有:

当子流程完成时,如何从子流程获得返回代码(这是这个问题的整个关键)?

总之,我算出来了,这是一个完全可运行的例子。

注:

$! is how to obtain the PID (Process ID) of the last-executed sub-process. Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process. myarray=() is how to create an array in bash. To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.

完整的、可运行的程序:等待所有进程结束

multi_process_program.sh (from my eRCaGuy_hello_world repo):

#!/usr/bin/env bash


# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
    seconds_to_sleep="$1"
    sleep "$seconds_to_sleep"
    return "$seconds_to_sleep"
}

# Create an array of whatever commands you want to run as subprocesses
procs=()  # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")

num_procs=${#procs[@]}  # number of processes
echo "num_procs = $num_procs"

# run commands as subprocesses and store pids in an array
pids=()  # bash array
for (( i=0; i<"$num_procs"; i++ )); do
    echo "cmd = ${procs[$i]}"
    ${procs[$i]} &  # run the cmd as a subprocess
    # store pid of last subprocess started; see:
    # https://unix.stackexchange.com/a/30371/114401
    pids+=("$!")
    echo "    pid = ${pids[$i]}"
done

# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[@]}"; do
    wait "$pid"
    return_code="$?"
    echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."

通过运行chmod +x multi_process_program.sh将上面的文件修改为可执行文件,然后像这样运行:

time ./multi_process_program.sh 

样例输出。查看调用中的time命令的输出如何显示运行时间为5.084秒。我们还能够成功地从每个子流程检索返回代码。

eRCaGuy_hello_world/bash$ time ./multi_process_program.sh num_procs = 4 cmd = my_sleep 5 pid = 21694 cmd = my_sleep 2 pid = 21695 cmd = my_sleep 3 pid = 21697 cmd = my_sleep 4 pid = 21699 PID = 21694; return_code = 5 PID = 21695; return_code = 2 PID = 21697; return_code = 3 PID = 21699; return_code = 4 All 4 processes have ended. PID 21694 is done; return_code = 5; 3 PIDs remaining. PID 21695 is done; return_code = 2; 2 PIDs remaining. PID 21697 is done; return_code = 3; 1 PIDs remaining. PID 21699 is done; return_code = 4; 0 PIDs remaining. real 0m5.084s user 0m0.025s sys 0m0.061s

更进一步:确定每个单独进程结束的时间

如果您希望在每个进程结束时执行一些操作,而您不知道它们将在何时结束,那么可以在无限while循环中轮询以查看每个进程何时结束,然后执行您想要的任何操作。

只需注释掉上面的“OPTION 1”代码块,并将其替换为“OPTION 2”代码块:

# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
    for i in "${!pids[@]}"; do
        pid="${pids[$i]}"
        # echo "pid = $pid"  # debugging

        # See if PID is still running; see my answer here:
        # https://stackoverflow.com/a/71134379/4561887
        ps --pid "$pid" > /dev/null
        if [ "$?" -ne 0 ]; then
            # PID doesn't exist anymore, meaning it terminated

            # 1st, read its return code
            wait "$pid"
            return_code="$?"

            # 2nd, remove this PID from the `pids` array by `unset`ting the
            # element at this index; NB: due to how bash arrays work, this does
            # NOT actually remove this element from the array. Rather, it
            # removes its index from the `"${!pids[@]}"` list of indices,
            # adjusts the array count(`"${#pids[@]}"`) accordingly, and it sets
            # the value at this index to either a null value of some sort, or
            # an empty string (I'm not exactly sure).
            unset "pids[$i]"

            num_pids="${#pids[@]}"
            echo "PID $pid is done; return_code = $return_code;" \
                 "$num_pids PIDs remaining."
        fi
    done

    # exit the while loop if the `pids` array is empty
    if [ "${#pids[@]}" -eq 0 ]; then
        break
    fi

    # Do some small sleep here to keep your polling loop from sucking up
    # 100% of one of your CPUs unnecessarily. Sleeping allows other processes
    # to run during this time.
    sleep 0.1
done

完整程序的示例运行和输出,注释掉选项1,使用选项2:

eRCaGuy_hello_world / bash。美元/ multi_process_program.sh Num_procs = 4 CMD = my_sleep Pid = 22275 CMD = my_sleep Pid = 22276 CMD = my_sleep Pid = 22277 CMD = my_sleep Pid = 22280 PID 22276完成;Return_code = 2;剩余3个pid。 PID 22277完成;Return_code = 3;剩余2个pid。 PID 22280完成;Return_code = 4;剩余1个pid。 PID 22275完成;Return_code = 5;剩余的pid为0。

每一个PID XXXXX都是在进程结束后实时打印出来的!请注意,尽管sleep 5的进程(在本例中为PID 22275)是首先运行的,但它是最后完成的,并且我们在每个进程终止后成功检测到它。我们还成功地检测了每个返回代码,就像选项1一样。

其他参考资料:

*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added): wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process. In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code! How to check if a process id (PID) exists my answer Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means. How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/

使用'wait -n'来等待多个子进程,并在其中任何一个进程以非零状态码退出时退出。

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

状态代码'127'是不存在的进程,这意味着子进程可能已经退出。

为了将此并行化…

for i in $(whatever_list) ; do
   do_something $i
done

翻译成这样…

for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
   (
   export -f do_something ## export functions (if needed)
   export PATH ## export any variables that are required
   xargs -I{} --max-procs 0 bash -c ' ## process in batches...
      {
      echo "processing {}" ## optional
      do_something {}
      }' 
   )

If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole. Exporting functions and variables may or may not be necessary, in any particular case. You can set --max-procs based on how much parallelism you want (0 means "all at once"). GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default. The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on. Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts. You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.

下面是一个简化的工作示例……

for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
   {
   echo sleep {}
   sleep 2s
   }'