如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


wait还(可选地)接受要等待的进程的PID,并且使用$!你会得到最后一个命令的PID在后台启动。 修改循环,将每个衍生子进程的PID存储到一个数组中,然后再次循环等待每个PID。

# run processes and store pids in array
for i in $n_procs; do
    ./procs[${i}] &
    pids[${i}]=$!
done

# wait for all pids
for pid in ${pids[*]}; do
    wait $pid
done

我不相信Bash的内置功能可以做到这一点。

你可以在child退出时得到通知:

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

然而,在信号处理程序中没有明显的方法来获取子进程的退出状态。

获得子状态通常是底层POSIX api中的等待函数族的工作。不幸的是,Bash对此的支持是有限的——您可以等待一个特定的子进程(并获得其退出状态),也可以等待所有子进程,但总是得到0结果。

似乎不可能做到的是waitpid(-1)的等效,它会阻塞直到任何子进程返回。


我想运行doCalculations;echo $ ?”>>/tmp/acc在一个子shell中被发送到后台,然后等待,然后/tmp/acc将包含退出状态,每行一个。不过,我不知道多个进程附加到累加器文件的任何后果。

下面是这个建议的一个例子:

文件:doCalcualtions

#!/bin/sh

random -e 20
sleep $?
random -e 10

文件:

#!/bin/sh

rm /tmp/acc

for i in $( seq 0 20 ) 
do
        ( ./doCalculations "$i"; echo "$?" >>/tmp/acc ) &
done

wait

cat /tmp/acc | fmt
rm /tmp/acc

running ./try的输出

5 1 9 6 8 1 2 0 9 6 5 9 6 0 0 4 9 5 5 9 8

http://jeremy.zawodny.com/blog/archives/010717.html:

#!/bin/bash

FAIL=0

echo "starting"

./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &

for job in `jobs -p`
do
echo $job
    wait $job || let "FAIL+=1"
done

echo $FAIL

if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi

这是我目前想到的。我想了解如何在子进程终止时中断sleep命令,这样就不必根据使用情况调优WAITALL_DELAY。

waitall() { # PID...
  ## Wait for children to exit and indicate whether all exited with 0 status.
  local errors=0
  while :; do
    debug "Processes remaining: $*"
    for pid in "$@"; do
      shift
      if kill -0 "$pid" 2>/dev/null; then
        debug "$pid is still alive."
        set -- "$@" "$pid"
      elif wait "$pid"; then
        debug "$pid exited with zero exit status."
      else
        debug "$pid exited with non-zero exit status."
        ((++errors))
      fi
    done
    (("$#" > 0)) || break
    # TODO: how to interrupt this sleep when a child terminates?
    sleep ${WAITALL_DELAY:-1}
   done
  ((errors == 0))
}

debug() { echo "DEBUG: $*" >&2; }

pids=""
for t in 3 5 4; do 
  sleep "$t" &
  pids="$pids $!"
done
waitall $pids

下面的代码将等待所有计算的完成,并在任何doccalculation失败时返回退出状态1。

#!/bin/bash
for i in $(seq 0 9); do
   (doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1

如果你安装了GNU Parallel,你可以这样做:

# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}

GNU Parallel会给你退出代码:

0 -所有作业运行无错误。 1-253 -部分作业失败。退出状态给出了失败作业的数量 254—超过253个作业失败。 255 -其他错误。

观看介绍视频了解更多信息:http://pi.dk/1


只需将结果存储在shell之外,例如在一个文件中。

#!/bin/bash
tmp=/tmp/results

: > $tmp  #clean the file

for i in `seq 0 9`; do
  (doCalculations $i; echo $i:$?>>$tmp)&
done      #iterate

wait      #wait until all ready

sort $tmp | grep -v ':0'  #... handle as required

为了将此并行化…

for i in $(whatever_list) ; do
   do_something $i
done

翻译成这样…

for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
   (
   export -f do_something ## export functions (if needed)
   export PATH ## export any variables that are required
   xargs -I{} --max-procs 0 bash -c ' ## process in batches...
      {
      echo "processing {}" ## optional
      do_something {}
      }' 
   )

If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole. Exporting functions and variables may or may not be necessary, in any particular case. You can set --max-procs based on how much parallelism you want (0 means "all at once"). GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default. The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on. Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts. You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.

下面是一个简化的工作示例……

for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
   {
   echo sleep {}
   sleep 2s
   }'

如果您有bash 4.2或更高版本可用,以下内容可能对您有用。它使用关联数组存储任务名称及其“代码”,以及任务名称及其pid。我还内置了一个简单的速率限制方法,如果你的任务消耗了大量CPU或I/O时间,你想限制并发任务的数量,这个方法可能会派上用场。

脚本在第一个循环中启动所有任务,在第二个循环中使用结果。

对于简单的情况,这有点过分,但它允许非常简洁的东西。例如,可以将每个任务的错误消息存储在另一个关联数组中,并在一切都解决后打印它们。

#! /bin/bash

main () {
    local -A pids=()
    local -A tasks=([task1]="echo 1"
                    [task2]="echo 2"
                    [task3]="echo 3"
                    [task4]="false"
                    [task5]="echo 5"
                    [task6]="false")
    local max_concurrent_tasks=2

    for key in "${!tasks[@]}"; do
        while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
            sleep 1 # gnu sleep allows floating point here...
        done
        ${tasks[$key]} &
        pids+=(["$key"]="$!")
    done

    errors=0
    for key in "${!tasks[@]}"; do
        pid=${pids[$key]}
        local cur_ret=0
        if [ -z "$pid" ]; then
            echo "No Job ID known for the $key process" # should never happen
            cur_ret=1
        else
            wait $pid
            cur_ret=$?
        fi
        if [ "$cur_ret" -ne 0 ]; then
            errors=$(($errors + 1))
            echo "$key (${tasks[$key]}) failed."
        fi
    done

    return $errors
}

main

我最近用过这个(感谢Alnitak):

#!/bin/bash
# activate child monitoring
set -o monitor

# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!

# count, and kill when all done
c=0
function kill_on_count() {
    # you could kill on whatever criterion you wish for
    # I just counted to simulate bash's wait with no args
    [ $c -eq 9 ] && kill $pid
    c=$((c+1))
    echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD

function save_status() {
    local i=$1;
    local rc=$2;
    # do whatever, and here you know which one stopped
    # but remember, you're called from a subshell
    # so vars have their values at fork time
}

# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
    (doCalculations $i; save_status $i $?) &
done

# wait for locking subprocess to be killed
wait $pid
echo

从这里,我们可以很容易地推断,并拥有一个触发器(触摸文件,发送信号)并改变计数标准(计数触摸的文件,或其他)以响应该触发器。或者如果你只是想要'any'非零rc,只需从save_status中杀死锁。


我需要这个,但目标进程不是当前shell的子进程,在这种情况下,等待$PID不起作用。我确实找到了以下替代方案:

while [ -e /proc/$PID ]; do sleep 0.1 ; done

这依赖于procfs的存在,它可能不可用(例如Mac不提供它)。所以对于可移植性,你可以用这个代替:

while ps -p $PID >/dev/null ; do sleep 0.1 ; done

#!/bin/bash
set -m
for i in `seq 0 9`; do
  doCalculations $i &
done
while fg; do true; done

Set -m允许您在脚本中使用fg和bg Fg除了将最后一个进程放在前台之外,它的退出状态与它所前台的进程相同 而当任何fg以非零退出状态退出时,fg将停止循环

不幸的是,当后台进程以非零退出状态退出时,这将无法处理这种情况。(循环不会立即终止。它将等待前面的进程完成。)


我已经尝试过了,并结合了其他例子中最好的部分。该脚本将在任何后台进程退出时执行checkpid函数,并输出退出状态而不诉诸轮询。

#!/bin/bash

set -o monitor

sleep 2 &
sleep 4 && exit 1 &
sleep 6 &

pids=`jobs -p`

checkpids() {
    for pid in $pids; do
        if kill -0 $pid 2>/dev/null; then
            echo $pid is still alive.
        elif wait $pid; then
            echo $pid exited with zero exit status.
        else
            echo $pid exited with non-zero exit status.
        fi
    done
    echo
}

trap checkpids CHLD

wait

简单地说:

#!/bin/bash

pids=""

for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

wait $pids

...code continued here ...

更新:

正如许多评论者指出的那样,上面的代码在继续之前会等待所有进程完成,但如果其中一个进程失败了,它不会退出,也不会失败,可以通过@Bryan, @SamBrightman和其他人建议的以下修改来完成:

#!/bin/bash

pids=""
RESULT=0


for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

for pid in $pids; do
    wait $pid || let "RESULT=1"
done

if [ "$RESULT" == "1" ];
    then
       exit 1
fi

...code continued here ...

我刚刚修改了一个脚本到后台和并行化的过程。

我做了一些实验(在Solaris上使用bash和ksh),发现如果退出状态不为零,'wait'将输出退出状态,或者当没有提供PID参数时,将输出一个返回非零退出的作业列表。如。

Bash:

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]-  Exit 2                  sleep 20 && exit 2
[2]+  Exit 1                  sleep 10 && exit 1

Ksh:

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+  Done(2)                  sleep 20 && exit 2
[2]+  Done(1)                  sleep 10 && exit 1

这个输出被写入stderr,所以OPs示例的简单解决方案可以是:

#!/bin/bash

trap "rm -f /tmp/x.$$" EXIT

for i in `seq 0 9`; do
  doCalculations $i &
done

wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
  exit 1
fi

虽然这:

wait 2> >(wc -l)

也将返回一个计数,但不包含TMP文件。这也可以这样使用,例如:

wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)

但是这并不比tmp文件有用多少。我找不到一种有效的方法来避免tmp文件,同时也避免在子shell中运行“等待”,这根本不会起作用。


捕获CHLD信号可能不起作用,因为如果它们同时到达,您可能会丢失一些信号。

#!/bin/bash

trap 'rm -f $tmpfile' EXIT

tmpfile=$(mktemp)

doCalculations() {
    echo start job $i...
    sleep $((RANDOM % 5)) 
    echo ...end job $i
    exit $((RANDOM % 10))
}

number_of_jobs=10

for i in $( seq 1 $number_of_jobs )
do
    ( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done

wait 

i=0
while read res; do
    echo "$res"
    let i++
done < "$tmpfile"

echo $i jobs done !!!

陷阱是你的朋友。在很多系统中都可能出现ERR。您可以捕获EXIT,或在DEBUG上在每个命令之后执行一段代码。

这除了所有的标准信号。

edit

这是一次意外的登录错误的帐户,所以我没有看到示例的请求。

试试这里,用我的普通账户。

在bash脚本中处理异常


我看到这里列出了很多很好的例子,我也想把我的举出来。

#! /bin/bash

items="1 2 3 4 5 6"
pids=""

for item in $items; do
    sleep $item &
    pids+="$! "
done

for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
    fi
done

我使用非常类似的方法并行启动/停止服务器/服务,并检查每个退出状态。对我来说很好。希望这能帮助到一些人!


这里有一个使用等待的简单例子。

运行一些进程:

$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &

然后用wait命令等待他们:

$ wait < <(jobs -p)

或者只是等待(没有争论)。

这将等待后台的所有作业完成。

如果提供了-n选项,则等待下一个作业终止并返回其退出状态。

参见:帮助等待和帮助作业语法。

然而,缺点是这将只返回最后一个ID的状态,因此您需要检查每个子流程的状态并将其存储在变量中。

或者让你的计算函数在失败时创建一些文件(空的或有失败日志的),然后检查该文件是否存在。

$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.

set -e
fail () {
    touch .failure
}
expect () {
    wait
    if [ -f .failure ]; then
        rm -f .failure
        exit 1
    fi
}

sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect

顶部的set -e使脚本在失败时停止。

如果任何子作业失败,Expect将返回1。


下面是我的版本,适用于多个pid,如果执行时间过长,则记录警告,如果执行时间超过给定值,则停止子进程。

[编辑]我已经在https://github.com/deajan/ofunctions上传了我的WaitForTaskCompletion的新实现,称为ExecTasks。 还有一个用于WaitForTaskCompletion的compat层 (/编辑)

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to be replaced by yours
function Logger {
    local value="${1}"

    echo $value
}

例如,等待所有三个进程完成,如果执行时间超过5秒,则记录警告,如果执行时间超过120秒,则停止所有进程。不要在失败时退出程序。

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
    

这里已经有很多答案了,但我很惊讶似乎没有人建议使用数组……这就是我所做的——这可能在将来对一些人有用。

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done

这是有效的,应该是一个很好的,如果不是更好的@HoverHell的答案!

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function foo() {
     echo "CHLD exit code is $1"
     echo "CHLD pid is $2"
     echo $(jobs -l)

     for job in `jobs -p`; do
         echo "PID => ${job}"
         wait ${job} ||  echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
     done
}

trap 'foo $? $$' CHLD

DIRN=$(dirname "$0");

commands=(
    "{ echo "foo" && exit 4; }"
    "{ echo "bar" && exit 3; }"
    "{ echo "baz" && exit 5; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

# wait for all to finish
wait;

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"

# end

当然,我已经在一个NPM项目中保存了这个脚本,它允许你并行运行bash命令,对测试很有用:

https://github.com/ORESoftware/generic-subshell


在等待流程之前,流程可能已经完成。如果我们触发等待一个已经完成的进程,它将触发一个错误,比如pid不是这个shell的子进程。为了避免这种情况,可以使用以下函数来查找过程是否完成:

isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
    echo "Process: $PID is still running"
    sleep 5
done
echo "Process $PID has finished"
}

我认为并行运行作业并检查状态的最直接方法是使用临时文件。已经有几个类似的答案(例如nietzsche -jou和mug896)。

#!/bin/bash
rm -f fail
for i in `seq 0 9`; do
  doCalculations $i || touch fail &
done
wait 
! [ -f fail ]

上面的代码不是线程安全的。如果你担心上面的代码会同时运行,最好使用一个更独特的文件名,比如fail.$$。最后一行是满足需求:“当任何子进程以code !=0结束时,返回退出代码1 ?”我又加了一条要求,要清理干净。这样写可能会更清楚:

#!/bin/bash
trap 'rm -f fail.$$' EXIT
for i in `seq 0 9`; do
  doCalculations $i || touch fail.$$ &
done
wait 
! [ -f fail.$$ ] 

下面是一个类似的代码片段,用于从多个作业收集结果:我创建一个临时目录,在一个单独的文件中描述所有子任务的输出,然后转储它们以供查看。这和问题不太匹配——我把它作为奖励扔进去:

#!/bin/bash
trap 'rm -fr $WORK' EXIT

WORK=/tmp/$$.work
mkdir -p $WORK
cd $WORK

for i in `seq 0 9`; do
  doCalculations $i >$i.result &
done
wait 
grep $ *  # display the results with filenames and contents

使用'wait -n'来等待多个子进程,并在其中任何一个进程以非零状态码退出时退出。

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

状态代码'127'是不存在的进程,这意味着子进程可能已经退出。


这是我使用的东西:

#wait for jobs
for job in `jobs -p`; do wait ${job}; done

正是为了这个目的,我写了一个bash函数:for。

注意::for不仅保留并返回失败函数的退出码,而且终止所有并行运行的实例。在这种情况下可能不需要。

#!/usr/bin/env bash

# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
    local pids=("$@")
    [ ${#pids} -eq 0 ] && return $?

    trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INT
    trap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM

    for pid in "${pids[@]}"; do
        wait "${pid}" || return $?
    done

    trap - INT RETURN TERM
}

# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
    local f="${1}" && shift

    local i=0
    local pids=()
    for arg in "$@"; do
        ( ${f} "${arg}" ) &
        pids+=("$!")
        if [ ! -z ${FOR_PARALLEL+x} ]; then
            (( i=(i+1)%${FOR_PARALLEL} ))
            if (( i==0 )) ;then
                :wait "${pids[@]}" || return $?
                pids=()
            fi
        fi
    done && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?
}

使用

for.sh:

#!/usr/bin/env bash
set -e

# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)

msg="You should see this three times"

:(){
  i="${1}" && shift

  echo "${msg}"

  sleep 1
  if   [ "$i" == "1" ]; then sleep 1
  elif [ "$i" == "2" ]; then false
  elif [ "$i" == "3" ]; then
    sleep 3
    echo "You should never see this"
  fi
} && :for : 1 2 3 || exit $?

echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1

参考文献

[1]:博客 [2]:要点


等待所有作业并返回最后一个失败作业的退出码。与上面的解决方案不同,这不需要保存pid,也不需要修改脚本的内部循环。走开,等着吧。

function wait_ex {
    # this waits for all jobs and returns the exit code of the last failing job
    ecode=0
    while true; do
        [ -z "$(jobs)" ] && break
        wait -n
        err="$?"
        [ "$err" != "0" ] && ecode="$err"
    done
    return $ecode
}

编辑:修正了脚本运行不存在的命令时可能被愚弄的错误。


我几乎陷入了使用jobs -p来收集pid的陷阱,如果子进程已经退出,这将不起作用,如下面的脚本所示。我选择的解决方案是简单地调用-n N次,其中N是我有孩子的数量,这是我确定知道的。

#!/usr/bin/env bash

sleeper() {
    echo "Sleeper $1"
    sleep $2
    echo "Exiting $1"
    return $3
}

start_sleepers() {
    sleeper 1 1 0 &
    sleeper 2 2 $1 &
    sleeper 3 5 0 &
    sleeper 4 6 0 &
    sleep 4
}

echo "Using jobs"
start_sleepers 1

pids=( $(jobs -p) )

echo "PIDS: ${pids[*]}"

for pid in "${pids[@]}"; do
    wait "$pid"
    echo "Exit code $?"
done

echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"

echo "Waiting for N processes"
start_sleepers 2

for ignored in $(seq 1 4); do
    wait -n
    echo "Exit code $?"
done

输出:

Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0

我有一个类似的情况,但有各种各样的问题与循环子shell,确保这里的其他解决方案不能工作,所以我让我的循环编写脚本,我将运行,等待结束。有效:

#!/bin/bash
echo > tmpscript.sh
for i in `seq 0 9`; do
    echo "doCalculations $i &" >> tmpscript.sh
done
echo "wait" >> tmpscript.sh
chmod u+x tmpscript.sh
./tmpscript.sh

愚蠢,但简单,并帮助调试一些事后的事情。

如果我有时间,我会更深入地了解GNU并行,但这对我自己的“doCalculations”过程来说很困难。


从Bash 5.1开始,由于引入了wait -p,有了一种很好的等待和处理多个后台作业结果的新方法:

#!/usr/bin/env bash

# Spawn background jobs
for ((i=0; i < 10; i++)); do
    secs=$((RANDOM % 10)); code=$((RANDOM % 256))
    (sleep ${secs}; exit ${code}) &
    echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done

# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
    wait -n -p pid; code=$?
    [[ -z "${pid}" ]] && break
    echo "Background job ${pid} finished with code ${code}"
    (( ${code} != 0 )) && result=1
done

# Return overall result
exit ${result}

这是对@Luca Tettamanti获得最多赞的答案的扩展,以创建一个完全可运行的示例。

这个答案让我很好奇:

n_procs是什么类型的变量,它包含什么?procs是什么类型的变量,它包含什么?有人能更新这个答案,使其可运行通过添加这些变量的定义吗?我不明白是怎么回事。

...还有:

当子流程完成时,如何从子流程获得返回代码(这是这个问题的整个关键)?

总之,我算出来了,这是一个完全可运行的例子。

注:

$! is how to obtain the PID (Process ID) of the last-executed sub-process. Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process. myarray=() is how to create an array in bash. To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.

完整的、可运行的程序:等待所有进程结束

multi_process_program.sh (from my eRCaGuy_hello_world repo):

#!/usr/bin/env bash


# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
    seconds_to_sleep="$1"
    sleep "$seconds_to_sleep"
    return "$seconds_to_sleep"
}

# Create an array of whatever commands you want to run as subprocesses
procs=()  # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")

num_procs=${#procs[@]}  # number of processes
echo "num_procs = $num_procs"

# run commands as subprocesses and store pids in an array
pids=()  # bash array
for (( i=0; i<"$num_procs"; i++ )); do
    echo "cmd = ${procs[$i]}"
    ${procs[$i]} &  # run the cmd as a subprocess
    # store pid of last subprocess started; see:
    # https://unix.stackexchange.com/a/30371/114401
    pids+=("$!")
    echo "    pid = ${pids[$i]}"
done

# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[@]}"; do
    wait "$pid"
    return_code="$?"
    echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."

通过运行chmod +x multi_process_program.sh将上面的文件修改为可执行文件,然后像这样运行:

time ./multi_process_program.sh 

样例输出。查看调用中的time命令的输出如何显示运行时间为5.084秒。我们还能够成功地从每个子流程检索返回代码。

eRCaGuy_hello_world/bash$ time ./multi_process_program.sh num_procs = 4 cmd = my_sleep 5 pid = 21694 cmd = my_sleep 2 pid = 21695 cmd = my_sleep 3 pid = 21697 cmd = my_sleep 4 pid = 21699 PID = 21694; return_code = 5 PID = 21695; return_code = 2 PID = 21697; return_code = 3 PID = 21699; return_code = 4 All 4 processes have ended. PID 21694 is done; return_code = 5; 3 PIDs remaining. PID 21695 is done; return_code = 2; 2 PIDs remaining. PID 21697 is done; return_code = 3; 1 PIDs remaining. PID 21699 is done; return_code = 4; 0 PIDs remaining. real 0m5.084s user 0m0.025s sys 0m0.061s

更进一步:确定每个单独进程结束的时间

如果您希望在每个进程结束时执行一些操作,而您不知道它们将在何时结束,那么可以在无限while循环中轮询以查看每个进程何时结束,然后执行您想要的任何操作。

只需注释掉上面的“OPTION 1”代码块,并将其替换为“OPTION 2”代码块:

# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
    for i in "${!pids[@]}"; do
        pid="${pids[$i]}"
        # echo "pid = $pid"  # debugging

        # See if PID is still running; see my answer here:
        # https://stackoverflow.com/a/71134379/4561887
        ps --pid "$pid" > /dev/null
        if [ "$?" -ne 0 ]; then
            # PID doesn't exist anymore, meaning it terminated

            # 1st, read its return code
            wait "$pid"
            return_code="$?"

            # 2nd, remove this PID from the `pids` array by `unset`ting the
            # element at this index; NB: due to how bash arrays work, this does
            # NOT actually remove this element from the array. Rather, it
            # removes its index from the `"${!pids[@]}"` list of indices,
            # adjusts the array count(`"${#pids[@]}"`) accordingly, and it sets
            # the value at this index to either a null value of some sort, or
            # an empty string (I'm not exactly sure).
            unset "pids[$i]"

            num_pids="${#pids[@]}"
            echo "PID $pid is done; return_code = $return_code;" \
                 "$num_pids PIDs remaining."
        fi
    done

    # exit the while loop if the `pids` array is empty
    if [ "${#pids[@]}" -eq 0 ]; then
        break
    fi

    # Do some small sleep here to keep your polling loop from sucking up
    # 100% of one of your CPUs unnecessarily. Sleeping allows other processes
    # to run during this time.
    sleep 0.1
done

完整程序的示例运行和输出,注释掉选项1,使用选项2:

eRCaGuy_hello_world / bash。美元/ multi_process_program.sh Num_procs = 4 CMD = my_sleep Pid = 22275 CMD = my_sleep Pid = 22276 CMD = my_sleep Pid = 22277 CMD = my_sleep Pid = 22280 PID 22276完成;Return_code = 2;剩余3个pid。 PID 22277完成;Return_code = 3;剩余2个pid。 PID 22280完成;Return_code = 4;剩余1个pid。 PID 22275完成;Return_code = 5;剩余的pid为0。

每一个PID XXXXX都是在进程结束后实时打印出来的!请注意,尽管sleep 5的进程(在本例中为PID 22275)是首先运行的,但它是最后完成的,并且我们在每个进程终止后成功检测到它。我们还成功地检测了每个返回代码,就像选项1一样。

其他参考资料:

*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added): wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process. In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code! How to check if a process id (PID) exists my answer Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means. How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/


我真的很喜欢卢卡的回答,但需要它为zsh,所以这里是作为参考:

pids=()

# run processes and store pids in array
for i in $n_procs; do
    ./procs[${i}] &
    pids+=($!)
done

# wait for all pids
for pid in ${pids[*]}; do
    wait $pid
done```