一、内核参数zone_reclaim_mode
a、当某个节点可用内存不足时:
1、如果为0的话,那么系统会倾向于从其他节点分配内存
2、如果为1的话,那么系统会倾向于从本地节点回收Cache内存多数时候
b、Cache对性能很重要,所以0是一个更好的选择。
二、
a、swapper allocate failed解决方案
sysctl -w vm.zone_reclaim_mode=1
b、kernel文档描述
Zone_reclaim_mode allows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no
zone reclaim occurs. Allocations will be satisfied from other zones / nodes
in the system.
This is value ORed together of
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages
zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
page allocator will then reclaim easily reusable pages (those page
cache pages that are currently not used) before allocating off node pages.
It may be beneficial to switch off zone reclaim if the system is
used for a file server and all of memory should be used for caching files
from disk. In that case the caching effect is more important than
data locality.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively
throttle the process. This may decrease the performance of a single process
since it cannot use all of system memory to buffer the outgoing writes
anymore but it preserve the memory on other nodes so that the performance
of other processes running on other nodes will not be affected.
Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.
总结一下,就是这个参数告诉内核当内存不够用时就直接回收buffer/cache。
三、内存管理
get_page_from_freelist【mm/page_alloc.c】通过遍历系统中各个zone,来寻找可用内存,根据Linux系统中zone_reclaim_mode的设置不同,遍历时的行为略有不同。zone_reclaim_mode是Linux中的一个可配置参数,为了解该参数如何影响内存分配,那就打开get_page_from_freelist的代码,仔细看看遍历各个zone的流程:
上面看到,zone_reclaim_mode非零时,如果某个zone内存不够,则会尝试出发一次内存回收工作(zone_reclaim),等于零时,则直接尝试写一个zone。
流程图中可以看到,zone_reclaim_mode非零时,get_page_from_freelist【mm/page_alloc.c】函数中会调用zone_watermark_ok扫描free_area,如果当面有没有足够的可用内存,就会调用zone_reclaim【mm/vmscan.c】函数回收内存,zone_reclaim实际调用zone_reclaim【mm/vmscan.】收回内存。