如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

我不相信Bash的内置功能可以做到这一点。

你可以在child退出时得到通知:

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

然而,在信号处理程序中没有明显的方法来获取子进程的退出状态。

获得子状态通常是底层POSIX api中的等待函数族的工作。不幸的是,Bash对此的支持是有限的——您可以等待一个特定的子进程(并获得其退出状态),也可以等待所有子进程,但总是得到0结果。

似乎不可能做到的是waitpid(-1)的等效,它会阻塞直到任何子进程返回。

其他回答

我想运行doCalculations;echo $ ?”>>/tmp/acc在一个子shell中被发送到后台,然后等待,然后/tmp/acc将包含退出状态,每行一个。不过,我不知道多个进程附加到累加器文件的任何后果。

下面是这个建议的一个例子:

文件:doCalcualtions

#!/bin/sh

random -e 20
sleep $?
random -e 10

文件:

#!/bin/sh

rm /tmp/acc

for i in $( seq 0 20 ) 
do
        ( ./doCalculations "$i"; echo "$?" >>/tmp/acc ) &
done

wait

cat /tmp/acc | fmt
rm /tmp/acc

running ./try的输出

5 1 9 6 8 1 2 0 9 6 5 9 6 0 0 4 9 5 5 9 8

简单地说:

#!/bin/bash

pids=""

for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

wait $pids

...code continued here ...

更新:

正如许多评论者指出的那样,上面的代码在继续之前会等待所有进程完成,但如果其中一个进程失败了,它不会退出,也不会失败,可以通过@Bryan, @SamBrightman和其他人建议的以下修改来完成:

#!/bin/bash

pids=""
RESULT=0


for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

for pid in $pids; do
    wait $pid || let "RESULT=1"
done

if [ "$RESULT" == "1" ];
    then
       exit 1
fi

...code continued here ...

http://jeremy.zawodny.com/blog/archives/010717.html:

#!/bin/bash

FAIL=0

echo "starting"

./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &

for job in `jobs -p`
do
echo $job
    wait $job || let "FAIL+=1"
done

echo $FAIL

if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi

下面是我的版本,适用于多个pid,如果执行时间过长,则记录警告,如果执行时间超过给定值,则停止子进程。

[编辑]我已经在https://github.com/deajan/ofunctions上传了我的WaitForTaskCompletion的新实现,称为ExecTasks。 还有一个用于WaitForTaskCompletion的compat层 (/编辑)

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to be replaced by yours
function Logger {
    local value="${1}"

    echo $value
}

例如,等待所有三个进程完成,如果执行时间超过5秒,则记录警告,如果执行时间超过120秒,则停止所有进程。不要在失败时退出程序。

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
    

这里已经有很多答案了,但我很惊讶似乎没有人建议使用数组……这就是我所做的——这可能在将来对一些人有用。

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done