如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?
简单的脚本:
#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait
上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?
有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?
正是为了这个目的,我写了一个bash函数:for。
注意::for不仅保留并返回失败函数的退出码,而且终止所有并行运行的实例。在这种情况下可能不需要。
#!/usr/bin/env bash
# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
local pids=("$@")
[ ${#pids} -eq 0 ] && return $?
trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INT
trap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM
for pid in "${pids[@]}"; do
wait "${pid}" || return $?
done
trap - INT RETURN TERM
}
# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
local f="${1}" && shift
local i=0
local pids=()
for arg in "$@"; do
( ${f} "${arg}" ) &
pids+=("$!")
if [ ! -z ${FOR_PARALLEL+x} ]; then
(( i=(i+1)%${FOR_PARALLEL} ))
if (( i==0 )) ;then
:wait "${pids[@]}" || return $?
pids=()
fi
fi
done && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?
}
使用
for.sh:
#!/usr/bin/env bash
set -e
# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)
msg="You should see this three times"
:(){
i="${1}" && shift
echo "${msg}"
sleep 1
if [ "$i" == "1" ]; then sleep 1
elif [ "$i" == "2" ]; then false
elif [ "$i" == "3" ]; then
sleep 3
echo "You should never see this"
fi
} && :for : 1 2 3 || exit $?
echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1
参考文献
[1]:博客
[2]:要点
下面是我的版本,适用于多个pid,如果执行时间过长,则记录警告,如果执行时间超过给定值,则停止子进程。
[编辑]我已经在https://github.com/deajan/ofunctions上传了我的WaitForTaskCompletion的新实现,称为ExecTasks。
还有一个用于WaitForTaskCompletion的compat层
(/编辑)
function WaitForTaskCompletion {
local pids="${1}" # pids to wait for, separated by semi-colon
local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
local caller_name="${4}" # Who called this function
local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors
Logger "${FUNCNAME[0]} called by [$caller_name]."
local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local errorcount=0 # Number of pids that finished with errors
local pidCount # number of given pids
IFS=';' read -a pidsArray <<< "$pids"
pidCount=${#pidsArray[@]}
while [ ${#pidsArray[@]} -gt 0 ]; do
newPidsArray=()
for pid in "${pidsArray[@]}"; do
if kill -0 $pid > /dev/null 2>&1; then
newPidsArray+=($pid)
else
wait $pid
result=$?
if [ $result -ne 0 ]; then
errorcount=$((errorcount+1))
Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
fi
fi
done
## Log a standby message every hour
exec_time=$(($SECONDS - $seconds_begin))
if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then
log_ttime=$exec_time
Logger "Current tasks still running with pids [${pidsArray[@]}]."
fi
fi
if [ $exec_time -gt $soft_max_time ]; then
if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
soft_alert=1
SendAlert
fi
if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
kill -SIGTERM $pid
if [ $? == 0 ]; then
Logger "Task stopped successfully"
else
errrorcount=$((errorcount+1))
fi
fi
fi
pidsArray=("${newPidsArray[@]}")
sleep 1
done
Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
Logger "Stopping execution."
exit 1337
else
return $errorcount
fi
}
# Just a plain stupid logging function to be replaced by yours
function Logger {
local value="${1}"
echo $value
}
例如,等待所有三个进程完成,如果执行时间超过5秒,则记录警告,如果执行时间超过120秒,则停止所有进程。不要在失败时退出程序。
function something {
sleep 10 &
pids="$!"
sleep 12 &
pids="$pids;$!"
sleep 9 &
pids="$pids;$!"
WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
这里有一个使用等待的简单例子。
运行一些进程:
$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &
然后用wait命令等待他们:
$ wait < <(jobs -p)
或者只是等待(没有争论)。
这将等待后台的所有作业完成。
如果提供了-n选项,则等待下一个作业终止并返回其退出状态。
参见:帮助等待和帮助作业语法。
然而,缺点是这将只返回最后一个ID的状态,因此您需要检查每个子流程的状态并将其存储在变量中。
或者让你的计算函数在失败时创建一些文件(空的或有失败日志的),然后检查该文件是否存在。
$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.