如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

我不相信Bash的内置功能可以做到这一点。

你可以在child退出时得到通知:

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

然而,在信号处理程序中没有明显的方法来获取子进程的退出状态。

获得子状态通常是底层POSIX api中的等待函数族的工作。不幸的是,Bash对此的支持是有限的——您可以等待一个特定的子进程(并获得其退出状态),也可以等待所有子进程,但总是得到0结果。

似乎不可能做到的是waitpid(-1)的等效,它会阻塞直到任何子进程返回。

其他回答

我几乎陷入了使用jobs -p来收集pid的陷阱,如果子进程已经退出,这将不起作用,如下面的脚本所示。我选择的解决方案是简单地调用-n N次,其中N是我有孩子的数量,这是我确定知道的。

#!/usr/bin/env bash

sleeper() {
    echo "Sleeper $1"
    sleep $2
    echo "Exiting $1"
    return $3
}

start_sleepers() {
    sleeper 1 1 0 &
    sleeper 2 2 $1 &
    sleeper 3 5 0 &
    sleeper 4 6 0 &
    sleep 4
}

echo "Using jobs"
start_sleepers 1

pids=( $(jobs -p) )

echo "PIDS: ${pids[*]}"

for pid in "${pids[@]}"; do
    wait "$pid"
    echo "Exit code $?"
done

echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"

echo "Waiting for N processes"
start_sleepers 2

for ignored in $(seq 1 4); do
    wait -n
    echo "Exit code $?"
done

输出:

Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0

下面的代码将等待所有计算的完成,并在任何doccalculation失败时返回退出状态1。

#!/bin/bash
for i in $(seq 0 9); do
   (doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1

只需将结果存储在shell之外,例如在一个文件中。

#!/bin/bash
tmp=/tmp/results

: > $tmp  #clean the file

for i in `seq 0 9`; do
  (doCalculations $i; echo $i:$?>>$tmp)&
done      #iterate

wait      #wait until all ready

sort $tmp | grep -v ':0'  #... handle as required

下面是我的版本,适用于多个pid,如果执行时间过长,则记录警告,如果执行时间超过给定值,则停止子进程。

[编辑]我已经在https://github.com/deajan/ofunctions上传了我的WaitForTaskCompletion的新实现,称为ExecTasks。 还有一个用于WaitForTaskCompletion的compat层 (/编辑)

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to be replaced by yours
function Logger {
    local value="${1}"

    echo $value
}

例如,等待所有三个进程完成,如果执行时间超过5秒,则记录警告,如果执行时间超过120秒,则停止所有进程。不要在失败时退出程序。

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
    

我需要这个,但目标进程不是当前shell的子进程,在这种情况下,等待$PID不起作用。我确实找到了以下替代方案:

while [ -e /proc/$PID ]; do sleep 0.1 ; done

这依赖于procfs的存在,它可能不可用(例如Mac不提供它)。所以对于可移植性,你可以用这个代替:

while ps -p $PID >/dev/null ; do sleep 0.1 ; done