linux5.3内核源码分析,linux 3.4.10 内核内存管理源代码分析5:伙伴系统初始化

4638                                 unsigned longkernel_pages;

4639                                 kernel_pages =min(end_pfn, usable_startpfn)

4640                                                                - start_pfn;

4641

4642                                kernelcore_remaining -= min(kernel_pages,

4643                                                        kernelcore_remaining);

4644                                required_kernelcore -= min(kernel_pages,

4645                                                        required_kernelcore);

4646

4647                                 /* Continue ifrange is now fully accounted */

4648                                 if (end_pfn<= usable_startpfn) {

4649

4650                                         /*

4651                                          * Push zone_movable_pfn to the endso

4652                                          *that if we have to rebalance

4653                                          *kernelcore across nodes, we will

4654                                          * notdouble account here

4655                                          */

4656                                        zone_movable_pfn[nid] = end_pfn;

4657                                        continue;

4658                                 }

4659                                start_pfn =usable_startpfn;

4660                         }

4661

4662                         /*

4663                          * The usable PFNrange for ZONE_MOVABLE is from

4664                          *start_pfn->end_pfn. Calculate size_pages as the

4665                          * number of pagesused as kernelcore

4666                          */

4667                         size_pages = end_pfn -start_pfn;

4668                         if (size_pages >kernelcore_remaining)

4669                                 size_pages =kernelcore_remaining;

4670                         zone_movable_pfn[nid]= start_pfn + size_pages;

4671

4672                         /*

4673                          * Some kernelcore hasbeen met, update counts and

4674                          * break if thekernelcore for this node has been

4675                          * satisified

4676                          */

4677                         required_kernelcore -=min(required_kernelcore,

4678                                                                size_pages);

4679                         kernelcore_remaining-= size_pages;

4680                         if(!kernelcore_remaining)

4681                                 break;

4682                 }

4683        }

4684

4685        /*

4686          * If there is stillrequired_kernelcore, we do another pass with one

4687          * less node in the count. This willpush zone_movable_pfn[nid] further

4688          * along on the nodes that still havememory until kernelcore is

4689          * satisified

4690          */

4691        usable_nodes--;

4692        if (usable_nodes && required_kernelcore > usable_nodes)

4693                 goto restart;

4694

4695        /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */

4696        for (nid = 0; nid < MAX_NUMNODES; nid++)

4697                 zone_movable_pfn[nid] =

4698                        roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);

4699

4700 out:

4701        /* restore the node_state */

4702        node_states[N_HIGH_MEMORY] = saved_node_state;

4703 }

这个函数的目的是计算zone_movable_pfn数组。在系统中有两个变量required_movablecore和required_kernelcore,这两个变量的值是通过命令行传进来的,变量required_movablecore通知内核保留给ZONE_MOVABLE区域的页面数,required_kernelcore是需要保留的非ZONE_MOVABLE区域的页面数。

4585-4601行,由4585行和4600行知道,如果这两个数据都没有通过命令行设置,则直接跳到out标号,也就是ZONE_MOVABLE区域为空。corepages变量由early_calculate_totalpages初始化,是空闲内存的总数,roundup(x, y)是一个宏,返回大于等于x的是y的倍数的第一个数。4592-4593行设置required_movablecore是MAX_ORDER_NR_PAGES的倍数,4596行如果设置为指定的required_kernelcore和剩余的空闲的区域required_movablecore页后的页面,其实也就是让页面优先用做非ZONE_MOVABLE区域的页面数。

在4602行的后面required_movablecore变量没有再出现,后面的代码主要做了两部分工作,先选定一个区域,选的方法是从高到低的第一个不空的非ZONE_MOVABLE区域,然后在这个区域的低端往上收缩,保证非ZONE_MOVABLE区域的页面数达到required_kernelcore。

4604行调用函数find_usable_zone_for_movable设置变量movable_zone,movable_zone被设置的值就是最高不空的非ZONE_MOVABLE区域。

4605行设置usable_startpfn变量的值,usable_startpfn也就是第一个能作为ZONE_MOVABL区域的页帧的值。

4609行设置kernelcore_node变量的值,usable_nodes一个商数,初始化为是具有ZONE_MOVABLE区域的节点数,在第一次扫描中kernelcore_node初始化为对每个节点是均匀保留非ZONE_MOVABLE区域页面的,以后每次扫描会自减usable_node。在计算zone_movable_pfn数组时,会对一个节点集合遍历,kernelcore_node变量是每个节点应该保留给非ZONE_MOVABLE区域的页面数。

4610行对在节点掩码node_states[N_HIGH_MEMORY]中可用的每个节点进行扫描。

4618-4619行如果required_kernelcore < kernelcore_node重新设置kernelcore_node变量的值

4626行kernelcore_remaining变量是在本次对节点的扫描要变量的页面数,赋值为required_kernelcore。

4629行对每个初始化内存分配器中的空闲区域进行遍历。

4632-4634行zone_movable_pfn[nid]是本次扫描节点ZONE_MOVABLE区域的最小页帧号,如果end_pfn <=zone_movable_pfn[nid]或者end_pfn <=start_pfn就是本次扫描的空闲区段不再ZONE_MOVABLE区域范围内或者是空区段,继续扫描下一个区段。

4637行,start_pfn是本次扫描的空闲区段的首页帧,usable_startpfn是ZONE_MOVABLE区域锁允许的最小帧。start_pfn< usable_startpfn意味着start_pfn -->usable_startpfn的帧是属于非ZONE_MOVABLE区域的。4638-4645在所要保留的页面数中减去这段包含的页面。

4648行end_pfn <= usable_startpfn表示正空闲区段都属于非ZONE_MOVABLE区域。4656行zone_movable_pfn[nid] = end_pfn,如果保留给非ZONE_MOVABLE区域的区域已经足够,用本次扫描的空闲区段尾做本节点的ZONE_MOVABLE区域首页帧号。注意一点区段是包含首页帧号start_pfn,不包含尾帧end_pfn。

代码执行到4659行表示end_pfn >usable_startpfn,执行start_pfn = usable_startpfn把usable_startpfnàend_pfn当成一个空闲区域执行后面的代码。

4667-4681行,执行到这段代码,表示整个区段都在都是可以作为ZONE_MOVABLE页面,这段代码中这个空闲区段中保留非ZONE_MOVABLE区域页面。

4691-4693行自减商数usable_nodes,并测试usable_nodes&& required_kernelcore > usable_nodes,这样可以比较无限循环,并在每个节点需要保留的非ZONE_MOVABLE区域页的数量大于1时,重新扫描。

4696-4698行对齐ZONE_MOVABLE区域的首页帧。

4702恢复node_state数组。

free_area_init_node函数

free_area_init_node函数初始化节点,在mm/page_alloc.c中实现,代码如下:

4420 void __paginginitfree_area_init_node(int nid, unsigned long *zones_size,

4421                 unsigned long node_start_pfn,unsigned long *zholes_size)

4422 {

4423        pg_data_t *pgdat = NODE_DATA(nid);

4424

4425        pgdat->node_id = nid;

4426        pgdat->node_start_pfn = node_start_pfn;

4427        calculate_node_totalpages(pgdat, zones_size, zholes_size);

4428

4429        alloc_node_mem_map(pgdat);

4430 #ifdef CONFIG_FLAT_NODE_MEM_MAP

4431        printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx,node_mem_map %08lx\n",

4432                 nid, (unsigned long)pgdat,

4433                 (unsignedlong)pgdat->node_mem_map);

4434 #endif

4435

4436        free_area_init_core(pgdat, zones_size, zholes_size);

4437 }

free_area_init_node函数调用calculate_node_totalpages对节点长度和节点总可用页面数进行初始化。calculate_node_totalpages函数是通过调用zone_spanned_pages_in_node和

zone_absent_pages_in_node函数实现的,这两个函数上面已经分析过。

alloc_node_mem_map是对节点的page管理数据初始化。其他的初始化工作在free_area_init_core函数中完成。

alloc_node_mem_map函数

alloc_node_mem_map函数分配节点的page管理数组的内存,在mm/page_alloc.c中实现,代码如下:

4379 static void __init_refokalloc_node_mem_map(struct pglist_data *pgdat)

4380 {

4381        /* Skip empty nodes */

4382        if (!pgdat->node_spanned_pages)

4383                 return;

4384

4385 #ifdef CONFIG_FLAT_NODE_MEM_MAP

4386        /* ia64 gets its own node_mem_map, before this, without bootmem */

4387        if (!pgdat->node_mem_map) {

4388                 unsigned long size, start,end;

4389                 struct page *map;

4390

4391                 /*

4392                  * The zone's endpoints aren'trequired to be MAX_ORDER

4393                  * aligned but thenode_mem_map endpoints must be in order

4394                 * for the buddyallocator to function correctly.

4395                  */

4396                 start =pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);

4397                 end = pgdat->node_start_pfn+ pgdat->node_spanned_pages;

4398                end = ALIGN(end,MAX_ORDER_NR_PAGES);

4399                 size =  (end - start) * sizeof(struct page);

4400                 map =alloc_remap(pgdat->node_id, size);

4401                 if (!map)

4402                         map = alloc_bootmem_node_nopanic(pgdat,size);

4403                 pgdat->node_mem_map = map +(pgdat->node_start_pfn - start);

4404        }

4405 #ifndef CONFIG_NEED_MULTIPLE_NODES

4406        /*

4407          * With no DISCONTIG, the globalmem_map is just set as node 0's

4408          */

4409        if (pgdat == NODE_DATA(0)) {

4410                 mem_map =NODE_DATA(0)->node_mem_map;

4411 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP

4412                 if (page_to_pfn(mem_map) !=pgdat->node_start_pfn)

4413                         mem_map -= (pgdat->node_start_pfn -ARCH_PFN_OFFSET);

4414 #endif /*CONFIG_HAVE_MEMBLOCK_NODE_MAP */

4415        }

4416 #endif

4417 #endif /* CONFIG_FLAT_NODE_MEM_MAP */

4418 }

在节点结构pglist_data中,成员node_start_pfn是节点的首页帧号,node_spanned_pages是包含中间不可用页面的节点的长度。node_mem_map指向节点page结构管理数组,并且指向节点首页的page结构。

4388-4403行的代码执行逻辑是:计算一个页帧范围,这个范围是包含节点的所有页面的最小范围,并且起始页帧和尾页帧都是按最大块对齐的。然后按这个范围来分配存放page结构数组的内存。分配完后(4403行)让node_mem_map成员指向node_start_pfn页帧的page结构地址。

对page数组的内存是调用alloc_remap和alloc_bootmem_node_nopanic进行分配的,这两个函数中初始化内存分频器章节中介绍。

4410行,在较早的版本,page管理数组的首地址是存放在变量mem_map中的,现在这个变量指向第零个节点的page管理数组

4412-4413行对page管理结构地址到页帧的转换进行校正。

free_area_init_core函数

free_area_init_core是伙伴系统初始化的核心函数,在mm/page_alloc.c中实现,代码如下:

4291 static void __paginginitfree_area_init_core(struct pglist_data *pgdat,

4292                 unsigned long *zones_size,unsigned long *zholes_size)

4293 {

4294        enum zone_type j;

4295        int nid = pgdat->node_id;

4296        unsigned long zone_start_pfn = pgdat->node_start_pfn;

4297        int ret;

4298

4299        pgdat_resize_init(pgdat);

4300        pgdat->nr_zones = 0;

4301        init_waitqueue_head(&pgdat->kswapd_wait);

4302        pgdat->kswapd_max_order = 0;

4303        pgdat_page_cgroup_init(pgdat);

4304

4305        for (j = 0; j < MAX_NR_ZONES; j++) {

4306                 struct zone *zone =pgdat->node_zones + j;

4307                 unsigned long size, realsize,memmap_pages;

4308                 enum lru_list lru;

4309

4310                 size = zone_spanned_pages_in_node(nid,j, zones_size);

4311                 realsize = size -zone_absent_pages_in_node(nid, j,

4312                                                                zholes_size);

4313

4314                 /*

4315                  * Adjust realsize so that itaccounts for how much memory

4316                  * is used by this zone formemmap. This affects the watermark

4317                  * and per-cpu initialisations

4318                  */

4319                 memmap_pages =

4320                         PAGE_ALIGN(size * sizeof(structpage)) >> PAGE_SHIFT;

4321                 if (realsize >=memmap_pages) {

4322                         realsize -=memmap_pages;

4323                         if (memmap_pages)

4324                                 printk(KERN_DEBUG

4325                                       "  %s zone: %lu pages usedfor memmap\n",

4326                                       zone_names[j], memmap_pages);

4327                 } else

4328                         printk(KERN_WARNING

4329                                 "  %s zone: %lu pages exceeds realsize%lu\n",

4330                                 zone_names[j],memmap_pages, realsize);

4331

4332                 /* Account for reserved pages*/

4333                 if (j == 0 && realsize> dma_reserve) {

4334                         realsize -=dma_reserve;

4335                         printk(KERN_DEBUG"  %s zone: %lu pagesreserved\n",

4336                                        zone_names[0], dma_reserve);

4337                 }

4338

4339                 if (!is_highmem_idx(j))

4340                         nr_kernel_pages +=realsize;

4341                 nr_all_pages += realsize;

4342

4343                 zone->spanned_pages = size;

4344                 zone->present_pages = realsize;

4345 #ifdef CONFIG_NUMA

4346                 zone->node = nid;

4347                 zone->min_unmapped_pages =(realsize*sysctl_min_unmapped_ratio)

4348                                                / 100;

4349                 zone->min_slab_pages =(realsize * sysctl_min_slab_ratio) / 100;

4350 #endif

4351                 zone->name = zone_names[j];

4352                spin_lock_init(&zone->lock);

4353                spin_lock_init(&zone->lru_lock);

4354                 zone_seqlock_init(zone);

4355                 zone->zone_pgdat = pgdat;

4356

4357                 zone_pcp_init(zone);

4358                 for_each_lru(lru)

4359                        INIT_LIST_HEAD(&zone->lruvec.lists[lru]);

4360                 zone->reclaim_stat.recent_rotated[0]= 0;

4361                zone->reclaim_stat.recent_rotated[1] = 0;

4362                zone->reclaim_stat.recent_scanned[0] = 0;

4363                zone->reclaim_stat.recent_scanned[1] = 0;

4364                 zap_zone_vm_stats(zone);

4365                zone->flags = 0;

4366                 if (!size)

4367                         continue;

4368

4369                set_pageblock_order(pageblock_default_order());

4370                 setup_usemap(pgdat, zone,size);

4371                 ret =init_currently_empty_zone(zone, zone_start_pfn,

4372                                                size, MEMMAP_EARLY);

4373                 BUG_ON(ret);

4374                 memmap_init(size, nid, j,zone_start_pfn);

4375                 zone_start_pfn += size;

4376        }

4377 }

这个函数的代码比较长,但比较简单,就一些变量,锁和链表的初始化。对这个函数本身就不做分析了,而对函数中调用的memmap_init做些介绍,memmap_init是一个宏定义如下:

#define memmap_init(size, nid, zone,start_pfn) \

memmap_init_zone((size),(nid), (zone), (start_pfn), MEMMAP_EARLY)。

是对memmap_init_zone函数的调用。

memmap_init_zone函数

memmap_init_zone对一个区域的page管理结构的初始化,在mm/page_alloc.c中实现,代码如下:

3619 * done. Non-atomic initialization, single-pass.

3620 */

3621 void __meminitmemmap_init_zone(unsigned long size, int nid, unsigned long zone,

3622                unsigned long start_pfn,enum memmap_context context)

3623 {

3624        struct page *page;

3625        unsigned long end_pfn = start_pfn + size;

3626        unsigned long pfn;

3627        struct zone *z;

3628

3629        if (highest_memmap_pfn < end_pfn - 1)

3630                 highest_memmap_pfn = end_pfn -1;

3631

3632        z = &NODE_DATA(nid)->node_zones[zone];

3633        for (pfn = start_pfn; pfn < end_pfn; pfn++) {

3634                 /*

3635                  * There can be holes inboot-time mem_map[]s

3636                  * handed to thisfunction.  They do not

3637                  * exist on hotplugged memory.

3638                  */

3639                 if (context == MEMMAP_EARLY) {

3640                         if (!early_pfn_valid(pfn))

3641                                 continue;

3642                         if(!early_pfn_in_nid(pfn, nid))

3643                                 continue;

3644                 }

3645                 page = pfn_to_page(pfn);

3646                 set_page_links(page, zone, nid, pfn);

3647                 mminit_verify_page_links(page,zone, nid, pfn);

3648                 init_page_count(page);

3649                 reset_page_mapcount(page);

3650                 SetPageReserved(page);

3651                /*

3652                  * Mark the block movable sothat blocks are reserved for

3653                  * movable at startup. Thiswill force kernel allocations

3654                  * to reserve their blocksrather than leaking throughout

3655                  * the address space duringboot when many long-lived

3656                  * kernel allocations aremade. Later some blocks near

3657                  * the start are markedMIGRATE_RESERVE by

3658                  * setup_zone_migrate_reserve()

3659                  *

3660                  * bitmap is created forzone's valid pfn range. but memmap

3661                  * can be created for invalidpages (for alignment)

3662                  * check here not to callset_pageblock_migratetype() against

3663                  * pfn out of zone.

3664                  */

3665                 if ((z->zone_start_pfn<= pfn)

3666                     && (pfn zone_start_pfn + z->spanned_pages)

3667                     && !(pfn &(pageblock_nr_pages - 1)))

3668                        set_pageblock_migratetype(page, MIGRATE_MOVABLE);

3669

3670                INIT_LIST_HEAD(&page->lru);

3671 #ifdef WANT_PAGE_VIRTUAL

3672                 /* The shift won't overflowbecause ZONE_NORMAL is below 4G. */

3673                 if (!is_highmem_idx(zone))

3674                         set_page_address(page,__va(pfn << PAGE_SHIFT));

3675 #endif

3676        }

3677 }

3629-3630行highest_memmap_pfn是存在page管理结构的最大的页帧号,如果本管理区的最大的存在page管理结构的最大的页帧号大于highest_memmap_pfn,就需要更新highest_memmap_pfn。

3632行获得区域结构地址。

3633对区域的所有页帧进行遍历。

3640-3641行检查页帧号是否合法,也就是要小于系统最大的页帧号,大于系统允许的最小的页帧。

3642-3643行检查页帧pfn是否属于节点nid。

3645行获得pfn帧的page管理结构地址。

3646行调用set_page_links函数设置页面的一些链接,主要包含页面所在节点,页面的区域类型,页面所在段。这样信息都是保存在page结构的成员flags中,每种信息占用一些位。3647行对设置的页面所在节点,页面的区域类型,页面所在段的信息进行验证,如果有错误输出一些调试信息。

3648初始引用数信息,3649初始化映射数信息。

3665-3668行,对每个最大块的首帧,调用set_pageblock_migratetype函数设置迁移类型信息,set_pageblock_migratetype函数在伙伴系统的内存迁移一节有分析。

3647行设置页面映射的虚拟地址。

====区域列表的初始化

build_all_zonelists函数

区域列表的初始化由函数build_all_zonelists来完成,build_all_zonelists函数的进入路径是:

start_kernel() at init/main.c:504

build_all_zonelists() at mm/page_alloc.c:3409

build_all_zonelists在mm/page_alloc.c中实现,代码如下:

3408 void __refbuild_all_zonelists(void *data)

3409 {

3410         set_zonelist_order();

3411

3412         if (system_state == SYSTEM_BOOTING) {

3413                 __build_all_zonelists(NULL);

3414                 mminit_verify_zonelist();

3415                cpuset_init_current_mems_allowed();

3416         } else {

3417                 /* we have to stop all cpus toguarantee there is no user

3418                    of zonelist */

3419 #ifdefCONFIG_MEMORY_HOTPLUG

3420                 if (data)

3421                        setup_zone_pageset((struct zone *)data);

3422 #endif

3423                stop_machine(__build_all_zonelists, NULL, NULL);

3424                 /* cpuset refresh routineshould be here */

3425         }

3426         vm_total_pages =nr_free_pagecache_pages();

3427         /*

3428          * Disable grouping by mobility if thenumber of pages in the

3429          * system is too low to allow themechanism to work. It would be

3430          * more accurate, but expensive tocheck per-zone. This check is

3431          * made on memory-hotadd so a system canstart with mobility

3432          * disabled and enable it later

3433          */

3434         if (vm_total_pages

3435                page_group_by_mobility_disabled = 1;

3436         else

3437                 page_group_by_mobility_disabled= 0;

3438

3439         printk("Built %i zonelists in %sorder, mobility grouping %s.  "

3440                 "Total pages:%ld\n",

3441                         nr_online_nodes,

3442                         zonelist_order_name[current_zonelist_order],

3443                        page_group_by_mobility_disabled ? "off" : "on",

3444                         vm_total_pages);

3445 #ifdefCONFIG_NUMA

3446         printk("Policy zone: %s\n",zone_names[policy_zone]);

3447 #endif

3448 }

在初始化过程中,函数会进入3413-3415行代码运行。

3413行区域列表的初始的主体工作是在__build_all_zonelists中完成的。介绍完本函数后介绍__build_all_zonelists函数。

3414行调用mminit_verify_zonelist函数做一些验证工作。

在伙伴系统的内存分配一节中,我们把伙伴系统内存分为三个阶段,而第一阶段的主要任务是确定区域列表和节点掩码。在进程结构中有个成员mems_allowed,是一个节点掩码,表示进程所允许分配内存的节点,只有一个节点包含在进程的mems_allowed中,并且在内存策略也允许在这个节点进行分配时才会到这个节点进行内存分配。cpuset_init_current_mems_allowed设置进程的mems_allowed成员包含所有节点。

3626行,nr_free_pagecache_pages返回的是对所有区域可用页面数减去高水位线后的的剩余页面数相加的值,这个值作为剩余可用页面数。

3434行,如果剩余可用页面小于pageblock_nr_pages * MIGRATE_TYPES,也就是说如果不能满足每个迁移类型都包含一个迁移块。则禁用迁移类型,禁用迁移类型后所有页面的迁移都会迁移到MIGRATE_UNMOVABLE迁移类型,也就是不可迁移类型。

__build_all_zonelists函数

__build_all_zonelists在mm/page_alloc.c中实现,代码如下:

3356 static__init_refok int __build_all_zonelists(void *data)

3357 {

3358         int nid;

3359         int cpu;

3360

3361 #ifdefCONFIG_NUMA

3362         memset(node_load, 0,sizeof(node_load));

3363 #endif

3364         for_each_online_node(nid) {

3365                 pg_data_t *pgdat =NODE_DATA(nid);

3366

3367                 build_zonelists(pgdat);

3368                 build_zonelist_cache(pgdat);

3369         }

3370

3371         /*

3372          * Initialize the boot_pagesets thatare going to be used

3373          * for bootstrapping processors. Thereal pagesets for

3374          * each zone will be allocated laterwhen the per cpu

3375          * allocator is available.

3376          *

3377          * boot_pagesets are used also forbootstrapping offline

3378          * cpus if the system is alreadybooted because the pagesets

3379          * are needed to initialize allocatorson a specific cpu too.

3380          * F.e. the percpu allocator needs thepage allocator which

3381          * needs the percpu allocator in orderto allocate its pagesets

3382          * (a chicken-egg dilemma).

3383          */

3384         for_each_possible_cpu(cpu) {

3385                setup_pageset(&per_cpu(boot_pageset, cpu), 0);

3386

3387 #ifdefCONFIG_HAVE_MEMORYLESS_NODES

3388                 /*

3389                  * We now know the "localmemory node" for each node--

3390                  * i.e., the node of the firstzone in the generic zonelist.

3391                  * Set up numa_mem percpuvariable for on-line cpus.  During

3392                  * boot, only the boot cpushould be on-line;  we'll init the

3393                  * secondary cpus' numa_mem as theycome on-line.  During

3394                  * node/memory hotplug, we'llfixup all on-line cpus.

3395                  */

3396                 if (cpu_online(cpu))

3397                         set_cpu_numa_mem(cpu,local_memory_node(cpu_to_node(cpu)));

3398 #endif

3399         }

3400

3401         return 0;

3402 }

区域列表是区域的有序集合,设置区域列表的目的是为了从列表中选择一个区域,在区域中进行内存分配。

有几个因素会影响区域的选择:

1:一个是区域在区域列表中的顺序。

2:还有一个是分配标志位指定的最大区域类型,一些分配只能在低端内存中分配,如一些只支持低端内存访问的设备驱动程序。当选择一个区域时,要考虑区域的类型,只有区域类型小于等于标志位指定的最大区域类型,才选择这个区域。

3:在分配的时候,如果快速通道分配内存失败,在慢速通道中会记录区域内存不充足缓存信息,在内存的时候会检查内存内存是否充足的缓存信息,这会影响区域的选择。

4: 节点掩码也会影响区域的选择,只会选择在节点掩码集合中的区域。

考虑这几个因素,我们就可以解释区域列表的结构zonelist的定义了,为什么在列表中定义一个zoneref数组,而不直接定义一个zone的数组指针?zoneref结构包含一个zone结构指针zone和zone_idx是区域的类型,考虑第二个因素,在我们扫描区域列表的一项,需要的区域类型直接可以从zoneref成员的zone_idx得到。

而zonelist的成员zlcache是个zonelist_cache结构。用来保存区域的内存是否充足信息,对区域列表中的每个区域,zonelist_cache结构的成员fullzones,是个位图数组,和zonelist结构的zoneref数组是对应的,用来表示zoneref数组索引的项内存是否充足,z_to_n用来实现从数组索引到节点号的转换,在zlc_zone_worth_trying函数中会用到这些参数。

zonelist的成员zlcache_ptr指向实际可用的zonelist_cache结构地址,zlcache_ptr不总是指向zonelist的zonelist_cache。

3364-3369行,遍历所有在线的节点,调用函数build_zonelists初始化节点的区域列表,每个节点包含若干个区域列表。调用build_zonelist_cache初始化节点的内存是否充足缓存信息。

3384-3399行,编译所有可用的cpu,调用setup_pageset初始化每cpu页缓存信息。3396-3397行对在线的cpu,调用set_cpu_numa_mem设置cpu所在节点。

在后面只介绍build_zonelists的指向流程,build_zonelist_cache和其他部分不分析了。

build_zonelists函数

build_zonelists初始化一个节点的区域列表,在mm/page_alloc.c中实现,代码如下:

3286 static void build_zonelists(pg_data_t*pgdat)

3287 {

3288        int node, local_node;

3289        enum zone_type j;

3290        struct zonelist *zonelist;

3291

3292        local_node =pgdat->node_id;

3293

3294        zonelist = &pgdat->node_zonelists[0];

3295        j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1);

3296

3297        /*

3298          * Now we build the zonelist so thatit contains the zones

3299          * of all the other nodes.

3300          * We don't want to pressure aparticular node, so when

3301          * building the zones for node N, wemake sure that the

3302          * zones coming right after the localones are those from

3303          * node N+1 (modulo N)

3304          */

3305        for (node = local_node + 1; node < MAX_NUMNODES; node++) {

3306                 if (!node_online(node))

3307                         continue;

3308                 j = build_zonelists_node(NODE_DATA(node),zonelist, j,

3309                                                        MAX_NR_ZONES - 1);

3310        }

3311        for (node = 0; node < local_node; node++) {

3312                 if (!node_online(node))

3313                         continue;

3314                 j =build_zonelists_node(NODE_DATA(node), zonelist, j,

3315                                                        MAX_NR_ZONES - 1);

3316        }

3317

3318        zonelist->_zonerefs[j].zone = NULL;

3319        zonelist->_zonerefs[j].zone_idx = 0;

3320 }

build_zonelists_node函数把一个包含的区域编译到区域列表。

这个函数的重点是区域列表初始化的顺序,local_node是本节点的号码,从3295,3305,3311行我们可以知道,对在线的节点点,对节点的初始化顺序是local_node, local_node+1,…,MAX_NR_ZONES – 1,0,…, local_node-1。

3318-3319我们知道对最后一个区域索引项,索引的是空区域,而前面的每个区域索引项都指向非空区域,这样我们可以判断区域列表的结束。

build_zonelists_node函数

build_zonelists_node把一个节点的区域编译到区域列表,把节点pgdat中类型小于等于zone_type的区域以nr_zones项开始编译到区域列表zonelist。build_zonelists_node函数在mm/page_alloc.c中实现,代码如下:

2860 static intbuild_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,

2861                                 int nr_zones,enum zone_type zone_type)

2862 {

2863        struct zone *zone;

2864

2865        BUG_ON(zone_type >= MAX_NR_ZONES);

2866        zone_type++;

2867

2868        do {

2869                 zone_type--;

2870                 zone = pgdat->node_zones +zone_type;

2871                 if (populated_zone(zone)) {

2872                         zoneref_set_zone(zone,

2873                                &zonelist->_zonerefs[nr_zones++]);

2874                        check_highest_zone(zone_type);

2875                 }

2876

2877        } while (zone_type);

2878        return nr_zones;

2879 }

区域被编译的顺序和区域类型是一致的,populated_zone是判断区域是否具有可用页面,有可用页返回真,否则返回假。check_highest_zone更新policy_zone变量,policy_zone变量保存在系统中能用的非ZONE_MOVABLE的最大的区域类型。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值