有什么快速而简单的方法可以确保在给定时间内只有一个shell脚本实例在运行?


当前回答

我使用一种简单的方法来处理过期的锁文件。

注意,上面的一些解决方案存储pid,忽略了pid可以环绕的事实。因此,仅仅检查是否有一个有效的进程与存储的pid是不够的,特别是对于长时间运行的脚本。

我使用noclobber来确保一次只能打开一个脚本并写入锁文件。此外,我在锁文件中存储了足够的信息来惟一地标识一个进程。我定义了一组数据来唯一地标识一个进程为pid、ppid、lstart。

当一个新脚本启动时,如果它未能创建锁文件,那么它将验证创建锁文件的进程是否仍然存在。如果不是,我们假设原始进程不体面地死亡,并留下一个过时的锁文件。然后,新脚本获得锁文件的所有权,一切又恢复正常了。

应该与跨多个平台的多个shell一起工作。快速、便携、简单。

#!/usr/bin/env sh
# Author: rouble

LOCKFILE=/var/tmp/lockfile #customize this line

trap release INT TERM EXIT

# Creates a lockfile. Sets global variable $ACQUIRED to true on success.
# 
# Returns 0 if it is successfully able to create lockfile.
acquire () {
    set -C #Shell noclobber option. If file exists, > will fail.
    UUID=`ps -eo pid,ppid,lstart $$ | tail -1`
    if (echo "$UUID" > "$LOCKFILE") 2>/dev/null; then
        ACQUIRED="TRUE"
        return 0
    else
        if [ -e $LOCKFILE ]; then 
            # We may be dealing with a stale lock file.
            # Bring out the magnifying glass. 
            CURRENT_UUID_FROM_LOCKFILE=`cat $LOCKFILE`
            CURRENT_PID_FROM_LOCKFILE=`cat $LOCKFILE | cut -f 1 -d " "`
            CURRENT_UUID_FROM_PS=`ps -eo pid,ppid,lstart $CURRENT_PID_FROM_LOCKFILE | tail -1`
            if [ "$CURRENT_UUID_FROM_LOCKFILE" == "$CURRENT_UUID_FROM_PS" ]; then 
                echo "Script already running with following identification: $CURRENT_UUID_FROM_LOCKFILE" >&2
                return 1
            else
                # The process that created this lock file died an ungraceful death. 
                # Take ownership of the lock file.
                echo "The process $CURRENT_UUID_FROM_LOCKFILE is no longer around. Taking ownership of $LOCKFILE"
                release "FORCE"
                if (echo "$UUID" > "$LOCKFILE") 2>/dev/null; then
                    ACQUIRED="TRUE"
                    return 0
                else
                    echo "Cannot write to $LOCKFILE. Error." >&2
                    return 1
                fi
            fi
        else
            echo "Do you have write permissons to $LOCKFILE ?" >&2
            return 1
        fi
    fi
}

# Removes the lock file only if this script created it ($ACQUIRED is set), 
# OR, if we are removing a stale lock file (first parameter is "FORCE") 
release () {
    #Destroy lock file. Take no prisoners.
    if [ "$ACQUIRED" ] || [ "$1" == "FORCE" ]; then
        rm -f $LOCKFILE
    fi
}

# Test code
# int main( int argc, const char* argv[] )
echo "Acquring lock."
acquire
if [ $? -eq 0 ]; then 
    echo "Acquired lock."
    read -p "Press [Enter] key to release lock..."
    release
    echo "Released lock."
else
    echo "Unable to acquire lock."
fi

其他回答

PID和锁文件绝对是最可靠的。当您尝试运行程序时,它可以检查锁文件,如果它存在,它可以使用ps查看进程是否仍在运行。如果不是,脚本可以启动,将锁文件中的PID更新为自己的PID。

在脚本的开头添加这一行

[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -en "$0" "$0" "$@" || :

这是人类群体的样板代码。

如果需要更多的日志记录,可以使用这个

[ "${FLOCKER}" != "$0" ] && { echo "Trying to start build from queue... "; exec bash -c "FLOCKER='$0' flock -E $E_LOCKED -en '$0' '$0' '$@' || if [ \"\$?\" -eq $E_LOCKED ]; then echo 'Locked.'; fi"; } || echo "Lock is free. Completing."

使用flock工具设置和检查锁。 这段代码通过检查FLOCKER变量来检测它是否第一次运行,如果它没有设置为脚本名称,那么它会尝试再次递归地使用flock启动脚本,并初始化FLOCKER变量,如果FLOCKER设置正确,那么在前一次迭代中flock成功,可以继续。如果锁繁忙,它将失败,并使用可配置的退出代码。

它似乎不能在Debian 7上工作,但似乎可以在实验util-linux 2.25包上再次工作。上面写着“羊群:……文本文件繁忙”。可以通过禁用脚本上的写权限来覆盖它。

这将工作,如果你的脚本名称是唯一的:

#!/bin/bash
if [ $(pgrep -c $(basename $0)) -gt 1 ]; then 
  echo $(basename $0) is already running
  exit 0
fi

如果scriptname不是唯一的,这在大多数linux发行版上都有效:

#!/bin/bash
exec 9>/tmp/my_lock_file
if ! flock -n 9  ; then
   echo "another instance of this script is already running";
   exit 1
fi

来源:http://mywiki.wooledge.org/BashFAQ/045

你需要一个原子操作,比如flock,否则最终会失败。

但是如果没有羊群,该怎么办呢?这是mkdir。这也是一个原子操作。只有一个进程会成功执行mkdir,其他进程都会失败。

所以代码是:

if mkdir /var/lock/.myscript.exclusivelock
then
  # do stuff
  :
  rmdir /var/lock/.myscript.exclusivelock
fi

你需要处理陈旧的锁,否则崩溃后你的脚本将永远不会再次运行。

这个例子是在man flock中解释的,但它需要一些改进,因为我们应该管理bug和退出代码:

   #!/bin/bash
   #set -e this is useful only for very stupid scripts because script fails when anything command exits with status more than 0 !! without possibility for capture exit codes. not all commands exits >0 are failed.

( #start subprocess
  # Wait for lock on /var/lock/.myscript.exclusivelock (fd 200) for 10 seconds
  flock -x -w 10 200
  if [ "$?" != "0" ]; then echo Cannot lock!; exit 1; fi
  echo $$>>/var/lock/.myscript.exclusivelock #for backward lockdir compatibility, notice this command is executed AFTER command bottom  ) 200>/var/lock/.myscript.exclusivelock.
  # Do stuff
  # you can properly manage exit codes with multiple command and process algorithm.
  # I suggest throw this all to external procedure than can properly handle exit X commands

) 200>/var/lock/.myscript.exclusivelock   #exit subprocess

FLOCKEXIT=$?  #save exitcode status
    #do some finish commands

exit $FLOCKEXIT   #return properly exitcode, may be usefull inside external scripts

你可以用另一种方法,列出我过去用过的过程。但这比上面的方法要复杂得多。你应该按ps列出进程,按其名称过滤,附加过滤器grep -v grep清除寄生虫,最后按grep -c计数。和数字比较。这是复杂而不确定的