lowmem_reserve的理解

2.6内核的zone结构中一个成员变量 lowmem_reserve

struct zone {
    /* Fields commonly accessed by the page allocator */

    /* zone watermarks, access with *_wmark_pages(zone) macros */
    unsigned long watermark[NR_WMARK];

    /*
     * We don't know if the memory that we're going to allocate will be freeable
     * or/and it will be released eventually, so to avoid totally wasting several
     * GB of ram we must reserve some of the lower zone memory (otherwise we risk
     * to run OOM on the lower zones despite there's tons of freeable ram
     * on the higher zones). This array is recalculated at runtime if the
     * sysctl_lowmem_reserve_ratio sysctl changes.
     */
    unsigned long       lowmem_reserve[MAX_NR_ZONES]; 

kernel在分配内存时,可能会涉及到多个zone,分配会尝试从zonelist第一个zone分配,如果失败就会尝试下一个低级的zone(这里的低级仅仅指zone内存的位置,实际上低地址zone是更稀缺的资源)。我们可以想像应用进程通过内存映射申请Highmem 并且加mlock分配,如果此时Highmem zone无法满足分配,则会尝试从Normal进行分配。这就有一个问题,来自Highmem的请求可能会耗尽Normal zone的内存,而且由于mlock又无法回收,最终的结果就是Normal zone无内存提供给kernel的正常分配,而Highmem有大把的可回收内存无法有效利用。

因此针对这个case,使得Normal zone在碰到来自Highmem的分配请求时,可以通过lowmem_reserve声明:可以使用我的内存,但是必须要保留lowmem_reserve[NORMAL]给我自己使用。

同样当从Normal失败后,会尝试从zonelist中的DMA申请分配,通过lowmem_reserve[DMA],限制来自HIGHMEM和Normal的分配请求。

/*
 * results with 256, 32 in the lowmem_reserve sysctl:
 *  1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
 *  1G machine -> (16M dma, 784M normal, 224M high)
 *  NORMAL allocation will leave 784M/256 of ram reserved in the ZONE_DMA
 *  HIGHMEM allocation will leave 224M/32 of ram reserved in ZONE_NORMAL
 *  HIGHMEM allocation will (224M+784M)/256 of ram reserved in ZONE_DMA
 *
 * TBD: should special case ZONE_DMA32 machines here - in those we normally
 * don't need any ZONE_NORMAL reservation
 */
 #ifdef CONFIG_ZONE_DMA
     256,
#endif
#ifdef CONFIG_ZONE_DMA32
     256,
#endif
#ifdef CONFIG_HIGHMEM
     32,
#endif
     32,
};

如果不希望低级zone被较高级分配使用,那么可以设置系数为1,得到最大的保护效果

不过这个值的计算非常的奇怪,来自NORMAL的分配,lowmem_reserve[DMA] = normal_size / ratio,这里使用Normal zone size而不是DMA zone size,这点一直没有想明白。


lowmem_reserve数组

当用户通过/proc/sys/vm/lowmem_reserve_ratio修改lowmem reserve时, 会调用setup_per_zone_lowmem_reserve函数

5637         for_each_online_pgdat(pgdat) {
5638                 for (j = 0; j < MAX_NR_ZONES; j++) {
5639                         struct zone *zone = pgdat->node_zones + j;
5640                         unsigned long managed_pages = zone->managed_pages;
5641 
5642                         zone->lowmem_reserve[j] = 0;
5643 
5644                         idx = j;
5645                         while (idx) {
5646                                 struct zone *lower_zone;
5647 
5648                                 idx--;
5649 
5650                                 if (sysctl_lowmem_reserve_ratio[idx] < 1)
5651                                         sysctl_lowmem_reserve_ratio[idx] = 1;
5652 
5653                                 lower_zone = pgdat->node_zones + idx;
5654                                 lower_zone->lowmem_reserve[j] = managed_pages /
5655                                         sysctl_lowmem_reserve_ratio[idx];
5656                                 managed_pages += lower_zone->managed_pages;
5657                         }
5658                 }
5659         }
5660 
5661         /* update totalreserve_pages */
5662         calculate_totalreserve_pages();

假定我们的系统只包含两个zone: Normal zone和Highmem zone

5642行 把zone->lowmem_reserve[j] = 0设置为0, 也就是normal_zone->lowmem_reserve[0] 和highmem_zone->lowmem_reserve[1]设置为0
也就是说normal_zone不会对normal申请做限制, highmem_zone不会对highmem申请做限制.

内部循环5645~5657: 是根据当前zone管理的页面来设置较低zone的lowmem_reserve



此外,较新的内核源码目录中/Documentation/sysctl/vm.txt,对lowmem_reserve做了非常准确的描述。







setup_per_zone_lowmem_reserve
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值