避免pid文件、cron或任何其他试图计算不是它们的子进程的文件。
在UNIX中,您只能服侍您的子女是有原因的。任何试图解决这个问题的方法(ps解析、pgrep、存储PID等等)都是有缺陷的。说“不”。
相反,您需要监视您的进程的进程作为进程的父进程。这是什么意思?这意味着只有启动进程的进程可以可靠地等待进程结束。在bash中,这绝对是微不足道的。
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
我们为什么要等呢?因为如果我的服务器的启动顺序有问题,它会立即崩溃,你会有一个非常密集的不断重启和崩溃的循环。睡眠可以消除这种压力。
现在您所需要做的就是启动这个bash脚本(可能是异步的),它将监视我的服务器并在必要时重新启动它。如果希望在引导时启动监视器(使服务器在重新启动时“存活”下来),可以使用@reboot规则在用户的cron(1)中安排它。使用crontab打开你的cron规则:
crontab -e
然后添加一个规则来启动监视器脚本:
@reboot /usr/local/bin/myservermonitor
另外;查看inittab(5)和/etc/inittab。您可以在那里添加一行,让myserver在某个初始化级别启动并自动重生。
编辑。
让我添加一些关于为什么不使用PID文件的信息。虽然它们很受欢迎;它们也有很多缺陷,你没有理由不以正确的方式去做。
考虑一下:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
请参见:当“正确”执行时,pid文件仍然有缺陷吗?
顺便说一下;比PID文件更糟糕的是解析ps!永远不要这样做。
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
如果你不想自己管理这个过程;有一些非常好的系统可以充当您的进程的监控器。例如,看看runit。