OOM killer 内存相关

Applies to:

Oracle Cloud Infrastructure - Version N/A and later
Linux OS - Version Enterprise Linux 3.0 and later
Oracle Database - Enterprise Edition - Version 11.2.0.4 to 11.2.0.4 [Release 11.2]
Linux x86-64
Linux x86

Purpose

This document aims to provide basic information about OOM Killer.

Scope

This document covers OOM Killer basic knowledge, and explain the reason that lead to OOM error. It also provides some methods to avoid this error.

Details

What is OOM Killer?

The OOM killer, a feature enabled by default, is a self-protection mechanism employed by the Linux kernel when under severe memory pressure.

If the kernel can not find memory to allocate when it's needed, it puts in-use user data pages on the swap-out queue, to be swapped out. If the Virtual Memory (VM) cannot allocate memory and can't swap out in-use memory, the Out-of-memory killer may begin killing current userspace processes. it will sacrifice one or more processes in order to free up memory for the system when all else fails.

If you have a line like below in /var/log/messages

Apr 1 00:01:02 <HOST> kernel: Out of Memory: Killed process 2592 (oracle).

The above means that the OOM killer has killed the oracle dedicated server process 2592.


Implementation of OOM Killer Based on 2.6 Kernel?

The implementation of OOM killer based on 2.6 Kernel, exists in the file mm/oom_kill.c, The out_of_memory() function, which is the entrance of OOM killer. here is the calling sequence:

mm/page_alloc.c::_alloc_pages() -->  mm/vmscan.c::try_to_free_pages() --> mm/oom_kill.c::out_of _memory() --> select_bad_process() --> badness().

Originally, out_of_memory() is invoked by __alloc_pages() function when the free memory is very low. The __alloc_pages() function is the heart of the zoned buddy system allocator. All high level functions which request free page frames such as alloc_pages() and __get_free_pages() invoke __alloc_pages() ultimately. In __alloc_pages() function, it tries multiply methods and goes through the zone list to allocate requested page frames. If it still cannot get the page frames, it will call out_of_memory() function to invoke OOM killer which will kill any process to get free memory or go to panic.

The behavior of OOM killer in principle is as follows:

  • Lose the minimum amount of work done
  • Recover as much as memory it can
  • Do not kill anything actually not using a lot memory alone
  • Kill the minimum amount of processes (one)
  • Try to kill the process the user expects to kill

What Causes OOM Killer Event?

The Kernel is Really out of Memory

The workload uses more memory than the system has RAM and swap space. If both "SwapFree" and "MemFree" in /proc/meminfo are very low (less than 1% of their total), the workload might be unhealthy.


The Kernel is out of LowMem (32-bit architectures only)

"LowFree" in /proc/meminfo is very low, but HighFree is much higher. OOM killer will take action under that situation. The specific workload may benefit from being run on a 64-bit architectures or other methods (See Document 452326.1) .

 
The Kernel Data Structures Take up  Too Much Space or There is Non-Freed Memory
 
There are kernel data structures as named objects in /proc/slabinfo. Please refer the Document 434351.1 to get more info about SLAB allocator.

If one kind of object is taking up a vast portion of the system's total memory, that object may be responsible for the memory shortage. Look for potential subsystems (like ocfs2) about the specific kernel structure. To see the object usage, run this on the command-line:

awk '{printf "%s %d MB\n",$1, $3*$4/(1024*1024)}' /proc/slabinfo |sort -nr -k2,2

You can also see Document 434351.1 for other tools you can use for analysing the SLABs and objects.


The Kernel is Not Using The Swap Space Properly

If the application uses mlock() or HugeTLB pages (HugePages), it may not be able to use its swap space for that application (because locked pages or HugePages are not swappable). If this happens, SwapFree may still have a very large value when the OOM occurs. However, overusing them may exhaust system memory and leave the system with no other recourse.


Other Potential Reasons

It is also possible for the system to find itself in a sort of deadlock. Writing data out to disk may itself, require allocating memory for various I/O data structures. If the system cannot find even that memory, the very functions used to create free memory will be hamstring and the system will likely run out of memory.

It is possible to do some minor tuning to start paging earlier, but if the system cannot write dirty pages out fast enough to free memory, one can only conclude that the workload is mis-sized for the installed memory and there is little to be done. Raising the value in /proc/sys/vm/min_free_kbytes will cause the system to start reclaiming memory at an earlier time than it would have before. This makes it harder to get into these kinds of deadlocks. If you get these deadlocks, this is a good value to tune.

Alternatively, the kernel might have made a bad decision and mis-read its statistics. It went OOM while it still had good RAM to use sometimes. This would be a kernel BUG that will need to be fixed.
 

How to Avoid OOM Situation?

Basically, it is not an OS problem when OOM error occurs. It is a kind of tactic for the kernel to keep the system running. Please add more extra memory or increase swap if physical memory/swap space is too low on the outaged box.  Another solution may be to move some of the applications from the problematic system.

There are also many methods to avoid OOM Killer problem, such as:

  • Set up hugemem kernel.
  • Setting kernel parameter: vm.lower_zone_protection (this only applies to x86 32-bit environments - i.e. does not apply to x86-64 etc.) - see also Document 842886.1
  • If vm.lower_zone_protection is not available then vm.min_free_kbytes should be setup.  Current recommended setting is (x) > (4*sqrt(/proc/meminfo:MemTotal))

For more information, please refer the Document 452326.1 section: Troubleshooting LowMem Pressure Problems
 

Tuning OOM Killer

With OL/RHEL5, the oom_adj value in /proc pseudo-filesystem for every process can be used for tuning the likelihood of termination of that process by the OOM killer. Higher values mean to increase the likelihood of being killed by the oom-killer. Valid values are in the range -16 to 15, plus the special value '-17', which disables oom-killing that process altogether. The default is 0:

e.g. for process 2592:

# echo 10 > /proc/2592/oom_adj


Leads the 2592 to be a good candidate for OOM killing and makes the OOM killing of the process less likely.

# echo -15 > /proc/2592/oom_adj


To disable OOM-killer for that specific process, use special value -17:

# echo -17 > /proc/2592/oom_adj


Until that value is reset the process will be protected from OOM-killer activities. This can be setup for Oracle database critical background processes.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值