我的应用程序在Linux上作为后台进程运行。它目前在终端窗口的命令行中启动。

最近,一个用户正在执行应用程序一段时间,它神秘地死亡了。文本:

杀了

在终端机上。这样发生了两次。我问是否有人在不同的终端使用kill命令杀死进程?不。

在什么情况下Linux会决定终止我的进程?我相信shell显示“已杀”是因为进程在接收到kill(9)信号后死亡。如果Linux发送了终止信号,系统日志中是否应该有一条消息解释为什么它被终止?


当前回答

这看起来是一篇关于这个主题的好文章:驯服OOM杀手(1)。

The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.

编辑:下面是[另一篇文章](2),它很好地解释了如何选择进程(带有一些内核代码示例的注释)。它的优点在于,它包含了一些关于各种bad()规则背后原因的注释。

其他回答

正如dwc和Adam Jaskiewicz所说,罪魁祸首很可能是OOM杀手。然而,接下来的问题是:我如何预防这种情况?

有几种方法:

如果可以的话,给你的系统更多的内存(如果是虚拟机,这很简单) 确保OOM杀手选择不同的进程。 禁用OOM杀手 选择一个禁用OOM杀手的Linux发行版。

多亏了这篇文章,我发现(2)特别容易实现。

让我先解释一下什么时候以及为什么调用OOMKiller ?

假设你有512内存+ 1GB交换内存。所以理论上,你的CPU总共可以访问1.5GB的虚拟内存。

现在,在总内存为1.5GB的情况下,一切运行正常。但是突然(或逐渐地),您的系统开始消耗越来越多的内存,它达到了总内存使用量的95%左右。

现在假设有任何进程向内核请求了大块内存。内核检查可用内存,发现无法为进程分配更多内存。因此,它将尝试调用/调用OOMKiller (http://linux-mm.org/OOM)来释放一些内存。

OOMKiller有自己的算法来为每个进程评分。通常,使用更多内存的进程将成为被杀死的受害者。

我在哪里可以找到OOMKiller的日志?

通常在/var/log目录下。/var/log/kern.log或/var/log/dmesg

希望这对你有所帮助。

一些典型的解决方案:

增加内存(不是交换) 找到程序中的内存泄漏并修复它们 限制任何进程可以使用的内存(例如,可以使用JAVA_OPTS限制JVM内存) 查看日志和谷歌:)

这看起来是一篇关于这个主题的好文章:驯服OOM杀手(1)。

The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.

编辑:下面是[另一篇文章](2),它很好地解释了如何选择进程(带有一些内核代码示例的注释)。它的优点在于,它包含了一些关于各种bad()规则背后原因的注释。

我最近遇到了这个问题。最后,我发现我的进程在自动调用Opensuse zypper更新后被杀死了。禁用zypper更新解决了我的问题。

像systemtap(或跟踪程序)这样的工具可以监视内核信号传输逻辑和报告。例如,https://sourceware.org/systemtap/examples/process/sigmon.stp

# stap --example sigmon.stp -x 31994 SIGKILL
   SPID     SNAME            RPID  RNAME            SIGNUM SIGNAME
   5609     bash             31994 find             9      SIGKILL

该脚本中的过滤if块可以根据口味进行调整,也可以取消以跟踪系统范围的信号流量。可以通过收集回溯跟踪来进一步隔离原因(分别为内核和用户空间向探针添加print_backtrace()和/或print_ubacktrace())。