如何在bash脚本中等待从该脚本派生的几个子进程完成,然后在任何子进程以code !=0结束时返回退出代码!=0?

简单的脚本:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个子进程,但它总是给出退出状态0(参见help wait)。我如何修改这个脚本,以便它将发现衍生子进程的退出状态,并在任何子进程以code !=0结束时返回退出代码1 ?

有没有比收集子进程的pid、按顺序等待它们并求和退出状态更好的解决方案呢?


当前回答

我最近用过这个(感谢Alnitak):

#!/bin/bash
# activate child monitoring
set -o monitor

# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!

# count, and kill when all done
c=0
function kill_on_count() {
    # you could kill on whatever criterion you wish for
    # I just counted to simulate bash's wait with no args
    [ $c -eq 9 ] && kill $pid
    c=$((c+1))
    echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD

function save_status() {
    local i=$1;
    local rc=$2;
    # do whatever, and here you know which one stopped
    # but remember, you're called from a subshell
    # so vars have their values at fork time
}

# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
    (doCalculations $i; save_status $i $?) &
done

# wait for locking subprocess to be killed
wait $pid
echo

从这里,我们可以很容易地推断,并拥有一个触发器(触摸文件,发送信号)并改变计数标准(计数触摸的文件,或其他)以响应该触发器。或者如果你只是想要'any'非零rc,只需从save_status中杀死锁。

其他回答

正是为了这个目的,我写了一个bash函数:for。

注意::for不仅保留并返回失败函数的退出码,而且终止所有并行运行的实例。在这种情况下可能不需要。

#!/usr/bin/env bash

# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
    local pids=("$@")
    [ ${#pids} -eq 0 ] && return $?

    trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INT
    trap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM

    for pid in "${pids[@]}"; do
        wait "${pid}" || return $?
    done

    trap - INT RETURN TERM
}

# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
    local f="${1}" && shift

    local i=0
    local pids=()
    for arg in "$@"; do
        ( ${f} "${arg}" ) &
        pids+=("$!")
        if [ ! -z ${FOR_PARALLEL+x} ]; then
            (( i=(i+1)%${FOR_PARALLEL} ))
            if (( i==0 )) ;then
                :wait "${pids[@]}" || return $?
                pids=()
            fi
        fi
    done && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?
}

使用

for.sh:

#!/usr/bin/env bash
set -e

# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)

msg="You should see this three times"

:(){
  i="${1}" && shift

  echo "${msg}"

  sleep 1
  if   [ "$i" == "1" ]; then sleep 1
  elif [ "$i" == "2" ]; then false
  elif [ "$i" == "3" ]; then
    sleep 3
    echo "You should never see this"
  fi
} && :for : 1 2 3 || exit $?

echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1

参考文献

[1]:博客 [2]:要点

这里已经有很多答案了,但我很惊讶似乎没有人建议使用数组……这就是我所做的——这可能在将来对一些人有用。

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done
set -e
fail () {
    touch .failure
}
expect () {
    wait
    if [ -f .failure ]; then
        rm -f .failure
        exit 1
    fi
}

sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect

顶部的set -e使脚本在失败时停止。

如果任何子作业失败,Expect将返回1。

使用'wait -n'来等待多个子进程,并在其中任何一个进程以非零状态码退出时退出。

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

状态代码'127'是不存在的进程,这意味着子进程可能已经退出。

为了将此并行化…

for i in $(whatever_list) ; do
   do_something $i
done

翻译成这样…

for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
   (
   export -f do_something ## export functions (if needed)
   export PATH ## export any variables that are required
   xargs -I{} --max-procs 0 bash -c ' ## process in batches...
      {
      echo "processing {}" ## optional
      do_something {}
      }' 
   )

If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole. Exporting functions and variables may or may not be necessary, in any particular case. You can set --max-procs based on how much parallelism you want (0 means "all at once"). GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default. The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on. Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts. You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.

下面是一个简化的工作示例……

for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
   {
   echo sleep {}
   sleep 2s
   }'