稀疏型内存模型中section_mem_map的初始化

在 linux 中,稀疏型内存模型意味着物理地址空间由若干个地址段组成,段与段之间允许存在空洞,适用于地址段比较稀疏的情况,并用于支持内存热插拔。
稀疏型内存模型将整个物理地址空间划分成多个区段(section)。一个区段用一个 struct mem_section 结构体来描述,其中最重要的字段是 section_mem_map,它编码了 NUMA 节点号、页描述符数组的地址以及一些标志位。
struct mem_section 数组声明:

/*
 * Permanent SPARSEMEM data:
 *
 * 1) mem_section   - memory sections, mem_map's for valid memory
 */
#ifdef CONFIG_SPARSEMEM_EXTREME
struct mem_section **mem_section;
#else
struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
    ____cacheline_internodealigned_in_smp;
#endif
EXPORT_SYMBOL(mem_section);

struct mem_section 结构体内容:

struct mem_section {
    /*
     * This is, logically, a pointer to an array of struct
     * pages.  However, it is stored with some other magic.
     * (see sparse.c::sparse_init_one_section())
     *
     * Additionally during early boot we encode node id of
     * the location of the section here to guide allocation.
     * (see sparse.c::memory_present())
     *
     * Making it a UL at least makes someone do a cast
     * before using it wrong.
     */
    unsigned long section_mem_map;
   
    /* See declaration of similar field in struct zone */
    unsigned long *pageblock_flags;
#ifdef CONFIG_PAGE_EXTENSION
    /*
     * If SPARSEMEM, pgdat doesn't have page_ext pointer. We use
     * section. (see page_ext.h about this.)
     */
    struct page_ext *page_ext;
    unsigned long pad;
#endif
    /*
     * WARNING: mem_section must be a power-of-2 in size for the
     * calculation and use of SECTION_ROOT_MASK to make sense.
     */
};

下面节选的部分代码(内核版本为 4.19)用来初始化 struct mem_section 数组中属于某个 node 的所有有效 section,其中 sparse_buffer_init() 用于分配 struct page 数组空间,pnum 为 struct mem_section 数组打平成一维数组的下标,它对应某个 section,同时也对应某个 struct page 子数组,sparse_mem_map_populate() 用于得到 pnum 对应的 struct page 子数组的首地址,sparse_init_one_section() 用于设置 pnum 对应的 mem_section 结构体中 section_mem_map 的值。

/*       
 * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
 * And number of present sections in this node is map_count.
 */      
static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
                   unsigned long pnum_end,
                   unsigned long map_count)
{        
    unsigned long pnum, usemap_longs, *usemap;
    struct page *map;
         
   ...   
    sparse_buffer_init(map_count * section_map_size(), nid);
    for_each_present_section_nr(pnum_begin, pnum) {
        if (pnum >= pnum_end)
            break;
         
        map = sparse_mem_map_populate(pnum, nid, NULL);
      ...
        sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap);
      ...
    }    
    sparse_buffer_fini();
    return;
   ...
} 

如果某个 node 的所有 section 都有效,那么它每个 section 中的 section_mem_map 值应该都相等,因为在 sparse_buffer_init() 函数中已经预先申请了该 node 所包含的 struct page 数组空间,同时,在处理每个 section 时依次得到 pnum 对应的 struct page 子数组的首地址,并将其简单处理后设置到 section_mem_map 字段中,其中,处理的方式保证了这一点:section_mem_map = (map - section_nr_to_pfn(pnum)) —— 由于 map 和 pnum 是一一对应的,这使得每对 map、pnum,上述差值都相等。
但在实践中发现往往最后一个 section 的 section_mem_map 发生了变化,经过调试发现,sparse_buffer_init() 函数在申请内存时是以页为大小对齐的,而在 sparse_mem_map_populate() 中分配内存时是以 sizeof(struct page) * PAGES_PER_SECTION 为大小对齐的(PAGES_PER_SECTION 是一个 section 所包含的 page 数量),一般而言,后者大于前者,这会导致在处理最后一个 section 时,预申请的 struct page 数组空间不够用,使得最后一个 section 需要单独申请,这就浪费了部分因对齐而无法使用的空间。
看了一下最新的内核代码,这个问题在 09dbcf422e9b791d2d43cad8c283d9bdaef019a9 提交中修复了。

commit 09dbcf422e9b791d2d43cad8c283d9bdaef019a9
Author: Michal Hocko <mhocko@suse.com>
Date:   Sat Nov 30 17:54:27 2019 -0800

    mm/sparse.c: do not waste pre allocated memmap space
    
    Vincent has noticed [1] that there is something unusual with the memmap
    allocations going on on his platform
    
    : I noticed this because on my ARM64 platform, with 1 GiB of memory the
    : first [and only] section is allocated from the zeroing path while with
    : 2 GiB of memory the first 1 GiB section is allocated from the
    : non-zeroing path.
    
    The underlying problem is that although sparse_buffer_init allocates
    enough memory for all sections on the node sparse_buffer_alloc is not
    able to consume them due to mismatch in the expected allocation
    alignement.  While sparse_buffer_init preallocation uses the PAGE_SIZE
    alignment the real memmap has to be aligned to section_map_size() this
    results in a wasted initial chunk of the preallocated memmap and
    unnecessary fallback allocation for a section.
    
    While we are at it also change __populate_section_memmap to align to the
    requested size because at least VMEMMAP has constrains to have memmap
    properly aligned.
    
    [1] http://lkml.kernel.org/r/20191030131122.8256-1-vincent.whitchurch@axis.com
    
    [akpm@linux-foundation.org: tweak layout, per David]
    Link: http://lkml.kernel.org/r/20191119092642.31799-1-mhocko@kernel.org
    Fixes: 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations")
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
    Debugged-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Oscar Salvador <OSalvador@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

 

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值