【Configuring the OOM Killer】
The OOM killer on Linux has several configuration options that
allow developers some choice as to the behavior the system will
exhibit when it is faced with an out-of-memory condition. These
settings and choices vary depending on the environment and
applications that the system has configured on it.
Linux上的OOM
Killer有几个配置选项,当系统面临内存不足的情况时,允许开发人员对系统表现出的行为进行一些选择。这些设置和选择因系统配置的环境和应用程序而异。
Note: It's suggested that testing and tuning be performed in a
development environment before making changes on important
production systems.
注意:建议在对重要的生产系统进行更改之前,在开发环境中进行测试和调优。
In some environments, when a system runs a single critical
task, rebooting when a system runs into an OOM condition might be a
viable option to return the system back to operational status
quickly without administrator intervention. While not an optimal
approach, the logic behind this is that if our application is
unable to operate due to being killed by the OOM killer, then a
reboot of the system will restore the application if it starts with
the system at boot time. If the application is manually started by
an administrator, this option is not beneficial.
在某些环境中,当系统运行单个关键任务时,在系统运行到OOM状态时重新启动可能是一个可行的选择,可以使系统在没有管理员干预的情况下快速恢复到可运行状态。虽然不是最佳方法,但其背后的逻辑是,如果我们的应用程序是被OOM
killer 关闭而停止,那么如果它会在系统启动时随系统拉起。如果应用程序是由管理员手动启动的,则此选项没有好处。
(没太理解,是说如果程序不能自动拉起,是跟随系统启动而拉起的话,此时出现OOM后就最好重启系统?)
The following settings will cause the system to panic and
reboot in an out-of-memory condition. The sysctl commands will set
this in real time, and appending the settings to sysctl.conf will
allow these settings to survive reboots. The X for kernel.panic is
the number of seconds before the system should be rebooted. This
setting should be adjusted to meet the needs of your
environment.
以下设置将导致系统死机,并在内存不足的情况下重新启动。sysctl命令对此设置后立即生效,并将设置附加到sysctl.conf将允许这些设置在重新启动后继续生效。X代表内核。panic是系统重新启动前的秒数。应该调整此设置以满足您的环境需求。
sysctl vm.panic_on_oom=1
sysctl kernel.panic=X
echo "vm.panic_on_oom=1" >> /etc/sysctl.conf
echo "kernel.panic=X" >> /etc/sysctl.conf
We can also tune the way that the OOM killer handles OOM
conditions with certain processes. Take, for example, our oracle
process 2592 that was killed earlier. If we want to make our oracle
process less likely to be killed by the OOM killer, we can do the
following.
我们还可以调整OOM
killer用某些过程处理OOM情况。例如,我们的oracle进程2592在早些时候被终止了。如果我们想降低我们的oracle进程被OOM
killer 关掉的可能性,我们可以按下面来操作。
echo -15 > /proc/2592/oom_adj
We can make the OOM killer more likely to kill our oracle
process by doing the following.
如下操作可以让OOM killer更倾向于关闭我们的oracle进程。
echo 10 > /proc/2592/oom_adj
If we want to exclude our oracle process from the OOM killer,
we can do the following, which will exclude it completely from the
OOM killer. It is important to note that this might cause
unexpected behavior depending on the resources and configuration of
the system. If the kernel is unable to kill a process using a large
amount of memory, it will move onto other available processes. Some
of those processes might be important operating system processes
that ultimately might cause the system to go down.
如果想从OOM killer中排除我们的oracle进程,我们可以执行以下操作,这将从OOM
killer中完全排除它。需要注意的是,这可能会导致意外行为,具体取决于系统的资源和配置。如果内核不能kill一个使用大量内存的进程,它将继续考察其他可kill的进程,其中一些进程可能是重要的操作系统进程,(kill掉某一个)可能导致系统崩溃。
echo -17 > /proc/2592/oom_adj
We can set valid ranges for oom_adj from -16 to +15, and a
setting of -17 exempts a process entirely from the OOM killer. The
higher the number, the more likely our process will be selected for
termination if the system encounters an OOM condition. The contents
of /proc/2592/oom_score can also be viewed to determine how likely
a process is to be killed by the OOM killer. A score of 0 is an
indication that our process is exempt from the OOM killer. The
higher the OOM score, the more likely a process will be killed in
an OOM condition.
我们可以把oom_adj设置为从-16到+15的有效范围,设置-17的话就是将进程从OOM
killer中排除。(进程的oom_adj)数值越大,系统遇到OOM情况时就越有可能被选择终止。也可以查看/proc/2592/oom_score的值来确定一个进程被oom
killer关闭的可能性。0分表明进程不会被OOM
killer考虑关闭。oom_score值越高,进程在OOM条件下被终止的可能性就越大。
The OOM killer can be completely disabled with the following
command. This is not recommended for production environments,
because if an out-of-memory condition does present itself, there
could be unexpected behavior depending on the available system
resources and configuration. This unexpected behavior could be
anything from a kernel panic to a hang depending on the resources
available to the kernel at the time of the OOM condition.
使用以下命令可以完全禁用OOM
killer。不建议在生产环境中使用,因为如果内存不足的情况出现,根据可用的系统资源和配置,可能会出现意外行为。这种意外的行为可能是从内核死机到挂起的任何事情,这取决于在OOM条件下内核可用的资源。
sysctl vm.overcommit_memory=2
echo "vm.overcommit_memory=2" >> /etc/sysctl.conf
For some environments, these configuration options are not
optimal and further tuning and adjustments might be needed.
Configuring HugePages for your kernel can assist with OOM issues
depending on the needs of the applications running on the
system.
对于某些环境,这些配置选项不是最佳的,可能需要进一步的调优和调整。根据系统上运行的应用程序的需要,为您的内核配置大页HugePages可以帮助解决OOM问题。
See Also
Here are some additional resources about HugePages, dtrace,
sar, and OOM for NUMA architectures:
HugePages information on My Oracle Support (requires login):
HugePages on Linux: What It Is... and What It Is Not... [ID
361323.1]
dtrace
sar
NUMA architecture and OOM
About the Author
Robert Chase is a member of the Oracle Linux product
management team. He has been involved with Linux and open source
software since 1996. He has worked with systems as small as
embedded devices and with large supercomputer-class hardware.
Revision 1.0, 02/19/2013