前情摘要:https://blog.csdn.net/reliveit/article/details/106291631
内核文档:https://www.kernel.org/doc/Documentation/sysctl/vm.txt(本文中的几个重要选项都来自该文档,其他的来自LWN)
一、概览
整体流程:
- 当系统发生OOM的时候,根据panic_on_oom配置,走系统奔溃还是杀进程
- panic_on_oom=0:杀进程,此时根据oom_kill_allocating_task的配置选择进程赴死
- oom_kill_allocating_task=0,扫描所有进程,根据算法对进程打分,分高者赴死,此时可以通过oom_score_adj选项控制进程oom_score,手动干预算法。
- 早期选项(已失效):文件在/proc/<pid>/oom_adj。范围是[-17 ~ 15],数值越大表示越容易被oom killer杀死。如果进程的oom_adj配置为-17,表示进程禁止被OOM killer杀死。
- 现在选项:文件在/proc/<pic>/oom_score_adj。范围是[-1000 ~ 1000],数值越大表示越容易被oom killer杀死。oom_score_adj=-1000,表示完全禁止进程被oom杀死。
- oom_kill_allocating_task非0,直接杀死触发OOM的进程;
- oom_kill_allocating_task=0,扫描所有进程,根据算法对进程打分,分高者赴死,此时可以通过oom_score_adj选项控制进程oom_score,手动干预算法。
- panic_on_oom=1:见下文
- panic_on_oom=2:系统奔溃
- panic_on_oom=0:杀进程,此时根据oom_kill_allocating_task的配置选择进程赴死
- 当系统发生OOM的时候,通过oom_dump_tasks可以配置OOM时进程转储
二、panic_on_oom
当Linux发生out of memory的时候,会根据panic_on_oom的配置,启用或禁用panic机制。
This enables or disables panic on out-of-memory feature.
If this is set to 0, the kernel will kill some rogue process, called oom_killer. Usually, oom_killer can kill rogue processes and system will survive.
If this is set to 1, the kernel panics when out-of-memory happens. However, if a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case. Because other nodes' memory may be free. This means system total status may be not fatal yet.
If this is set to 2, the kernel panics compulsorily even on the above-mentioned. Even oom happens under memory cgroup, the whole system panics.
The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover.
panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot.
- panic_on_oom的默认值是0,此时发生OOM,则会杀进程让系统不至于崩溃;
- 如果panic_on_oom的值设置为1,OOM的时候系统会崩溃死机;
- 但是如果此时触发OOM的进程是跑在通过mempolicy/cpusets限制资源的节点上,那么这些节点会变成资源耗尽的状态,这时候系统不会崩溃死机,而是会走oom_killer机制杀进程;
- 如果panic_on_oom的值设置为2,就算是节点做了资源限制,也会导致kernel panic;
panic_on_oom的默认值是0,当系统发生OOM的时候,会杀进程让系统存活下来。那此时系统是怎么杀进程的?随便找一个进程杀死吗?看第二个选项“oom_kill_allocating_task”。
三、oom_kill_allocating_task
This enables or disables killing the OOM-triggering task in out-of-memory situations.
If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed.
If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan.
If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task.
The default value is 0.
oom_kill_allocating_task这个选项的配置,会在系统OOM的情形下选择什么样的进程被oom_killer杀死。
- oom_kill_allocating_task默认值是0,此时会扫描所有进程,根据算法给进程打分,最后选择一个oom_score最大的进程赴死(启动时间短但是又占用大量内存);