开源技术分享：Linux 内核参数 swappiness细解

OpenInfra

于 2018-07-20 11:35:14 发布

阅读量712

点赞数

文章标签： Linux swappiness 九州云开源云私有云

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/OpenInfra/article/details/81129444

版权

本篇文章主要是就swappiness的一个源码上的解析(基于kernel版本 v4.14-13151-g5a787756b809)，仅为个人见解，有不足欢迎相互交流。

关于Swap和swappiness

Swap（交换分区）是操作系统就内存不足的一个缓解。当内存紧张时候，会适当的根据一些配置值和当前的统计值进行一次判断，会把一些anon内存（分配出去的内存）交换到Swap分区中。

Swappiness是系统的一个参数，可以调节swap的使用优先级。Linux文档描述如下：

swappiness

This control is used to define how aggressive the kernel will swap

memory pages. Higher values will increase aggressiveness, lower values

decrease the amount of swap. A value of 0 instructs the kernel not to

initiate swap until the amount of free and file-backed pages is less

than the high water mark in a zone.

The default value is 60.

翻译过来就是

这个参数是定义内核交换内存页的攻击性（aggressive）。更大的值将增加攻击性，较低的值会减少swap的数量。0值会命令内核不要使用swap，只有当free和文件使用的内存页数量少于一个zone的高水位，才会使用swap。

默认值是60。

关于这里的aggressive，看的是云里雾里。只知道这个值大概意义。在一些环境，用户一直抱怨为什么Swap使用量这么多，明明还有挺多的available内存。

Linux内存申请

Linux 内存申请一般来说会打上一些flag标志，会对申请流程产生一些影响，这里不细讲。主要是讲一般情况下（用户态的申请和大部分内核态的社区都是可以等待内存释放的）的内存申请。

__alloc_pages 一般第一次遍历每一个内存区域（zone）寻找第一个可用的足够的内存块。如果一个区域满了，那么会寻找下一个区域。单数如果 CPUSETS被设置了，他就会触发内存reclaim回收。

这里Swappiness主要是在内存reclaim时候生效。

Reclaim的方式

基本上Reclaim的方式为一个是将file相关的内存进行回收，一个是将anon部分内存（即被分配出去的内存）交换到Swap分区。

Linux的内存使用的一个宗旨是尽可能使用内存。在文件被读写的时候，文件的cache会一直保留在系统内存中，一直到内存不够时候，没有主动释放这部分内存的逻辑。这样在下次读取被缓存的文件时候可以直接从内存读取，不必从磁盘进行IO操作，这样文件读取速度会更加快速。

造成的结果是其实available的内存还很多的情况下，仍然会有内存不够，触发Reclaim逻辑，将一部分内存交换到Swap分区。

Swappiness生效方式

Swappiness是在get_scan_count函数使用的。

如下代码显示：Swap满时候，这个参数无影响。

2195 /* If we have no swap space, do not bother scanning anon pages. */

2196 if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) {

2197 scan_balance = SCAN_FILE;

2198 goto out;

2199 }

在Cgroup的mem还没达到limit时候，并且Swappiness为0，也仅仅扫描file cache部分。即不会考虑交换出去。

2201 /*

2202 * Global reclaim will swap to prevent OOM even with no

2203 * swappiness, but memcg users want to use this knob to

2204 * disable swapping for individual groups completely when

2205 * using the memory controller's swap limit feature would be

2206 * too expensive.

2207 */

2208 if (!global_reclaim(sc) && !swappiness) {

2209 scan_balance = SCAN_FILE;

2210 goto out;

2211 }

当系统接近OOM时候，并且swapiness非0，那么会平等的扫描anon和file的内存。

2213 /*

2214 * Do not apply any pressure balancing cleverness when the

2215 * system is close to OOM, scan both anon and file equally

2216 * (unless the swappiness setting disagrees with swapping).

2217 */

2218 if (!sc->priority && swappiness) {

2219 scan_balance = SCAN_EQUAL;

2220 goto out;

2221 }

当内存达到limit时候，会只释放申请的内存。这里结合前面提到的分支，可以知道，当Swappiness为0时候，没有达到limit只释放file cache，当达到limit时候，才考虑切换内存到swap中。

/*

* Prevent the reclaimer from falling into the cache trap: as

* cache pages start out inactive, every cache fault will tip

* the scan balance towards the file LRU. And as the file LRU

* shrinks, so does the window for rotation from references.

* This means we have a runaway feedback loop where a tiny

* thrashing file LRU becomes infinitely more attractive than

* anon pages. Try to detect this based on file LRU size.

*/

if (global_reclaim(sc)) {

unsigned long pgdatfile;

unsigned long pgdatfree;

int z;

unsigned long total_high_wmark = 0;

pgdatfree = sum_zone_node_page_state(pgdat->node_id, NR_FREE_PAGES);

pgdatfile = node_page_state(pgdat, NR_ACTIVE_FILE) +

node_page_state(pgdat, NR_INACTIVE_FILE);

for (z = 0; z < MAX_NR_ZONES; z++) {

struct zone *zone = &pgdat->node_zones[z];

if (!managed_zone(zone))

continue;

total_high_wmark += high_wmark_pages(zone);

}

if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {

/*

* Force SCAN_ANON if there are enough inactive

* anonymous pages on the LRU in eligible zones.

* Otherwise, the small LRU gets thrashed.

*/

if (!inactive_list_is_low(lruvec, false, memcg, sc, false) &&

lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)

>> sc->priority) {

scan_balance = SCAN_ANON;

goto out;

}

}

}

当inactive的cache页足够的时候，只释放file cache。

/*

* If there is enough inactive page cache, i.e. if the size of the

* inactive list is greater than that of the active list *and* the

* inactive list actually has some pages to scan on this priority, we

* do not reclaim anything from the anonymous working set right now.

* Without the second condition we could end up never scanning an

* lruvec even if it has plenty of old anonymous pages unless the

* system is under heavy pressure.

*/

if (!inactive_list_is_low(lruvec, true, memcg, sc, false) &&

lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {

scan_balance = SCAN_FILE;

goto out;

}

这里强调一下，swappiness的一般作用这里开始涉及。是把anon_prio设成相应的swappiness，file_prio 设成200-anon_prio。

scan_balance = SCAN_FRACT;

/*

* With swappiness at 100, anonymous and file have the same priority.

* This scanning priority is essentially the inverse of IO cost.

*/

anon_prio = swappiness;

file_prio = 200 - anon_prio;

这里进一步使用anon_prio和file_prio来获取ap和fp

/*

* The amount of pressure on anon vs file pages is inversely

* proportional to the fraction of recently scanned pages on

* each list that were recently referenced and in active use.

*/

ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);

ap /= reclaim_stat->recent_rotated[0] + 1;

fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);

fp /= reclaim_stat->recent_rotated[1] + 1;

具体其他的细节或者后续的算法，留待后续分析。

总结

Swappiness的控制方式主要是在内存紧张时候才会触发（这里是指free的内存低）。具体如下：

当swappiness为0，那么在available内存充足情况，只释放file cache，当available内存不足情况下，那么会将一些内存交换到swap空间。
Swappiness不为0，那么他的值大小主要是控制每次内存紧张时候，切换到swap和文件缓存释放的比例。

注意：大部分人误以为是控制内存剩余比例到swappiness值时，去切换内存到swap，这个是错误的。

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。