memblock 介绍
memblock 内存管理机制主要用于Linux Kernel 启动阶段(kernel启动 -> kernel 通用内存管理初始化完成.) 或者可以认为free_initmem 为止. 在启动阶段, 内存分配器并不需要很复杂, memblock 是基于静态数组, 采用的逆向最先适配的分配策略.
memblock 数据结构
memblock
struct memblock {
bool bottom_up; /* is bottom up direction? */
phys_addr_t current_limit;
struct memblock_type memory;
struct memblock_type reserved;
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
struct memblock_type physmem;
#endif
};
memblock 内存管理的核心数据结构
bottom_up
内存分配的方向
current_limit
内存分配最大限制值
memblock 的内存分为3类, memory,reserved, 和 physmem
memory
可用的内存的集合
reserved
已分配出去内存的集合
memblock_type
struct memblock_type {
unsigned long cnt; /* number of regions */
unsigned long max; /* size of the allocated array */
phys_addr_t total_size; /* size of all regions */
struct memblock_region *regions;
char *name;
};
memblock_type 用于描述在当前的memblock中此类型的memory region的数量
memblock_region
struct memblock_region {
phys_addr_t base;
phys_addr_t size;
unsigned long flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int nid;
#endif
};
memblock_region 用于描述此内存region中的基地址和大小
flags
定义于 include/linux/memblock.h
/* Definition of memblock flags. */
enum {
MEMBLOCK_NONE = 0x0, /* No special request */
MEMBLOCK_HOTPLUG = 0x1, /* hotpluggable region */
MEMBLOCK_MIRROR = 0x2, /* mirrored region */
MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */
};
memblock API
在指定的node和范围内寻找可用大小的内存
phys_addr_t memblock_find_in_range_node(phys_addr_t size, phys_addr_t align,
phys_addr_t start, phys_addr_t end,
int nid, ulong flags);
phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
phys_addr_t size, phys_addr_t align);
内存添加和删除type为memory的memblock region
int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
int memblock_add(phys_addr_t base, phys_addr_t size);
int memblock_remove(phys_addr_t base, phys_addr_t size);
memblock的内存分配和释放
phys_addr_t memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
int memblock_free(phys_addr_t base, phys_addr_t size);
void memblock_allow_resize(void);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
void memblock_trim_memory(phys_addr_t align);
bool memblock_overlaps_region(struct memblock_type *type,
phys_addr_t base, phys_addr_t size);
int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
int memblock_mark_nomap(phys_addr_t base, phys_addr_t size);
int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
ulong choose_memblock_flags(void);
memblock 实现
memblock_init
#define INIT_MEMBLOCK_REGIONS 128
#define INIT_PHYSMEM_REGIONS 4
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
static struct memblock_region memblock_physmem_init_regions[INIT_PHYSMEM_REGIONS] __initdata_memblock;
#endif
struct memblock memblock __initdata_memblock = {
.memory.regions = memblock_memory_init_regions,
.memory.cnt = 1, /* empty dummy entry */
.memory.max = INIT_MEMBLOCK_REGIONS,
.memory.name = "memory",
.reserved.regions = memblock_reserved_init_regions,
.reserved.cnt = 1, /* empty dummy entry */
.reserved.max = INIT_MEMBLOCK_REGIONS,
.reserved.name = "reserved",
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
.physmem.regions = memblock_physmem_init_regions,
.physmem.cnt = 1, /* empty dummy entry */
.physmem.max = INIT_PHYSMEM_REGIONS,
.physmem.name = "physmem",
#endif
.bottom_up = false,
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};
系统会初始化 memory 和 reserved 的 128 个 memblock_region
bottom_up = false
默认的分配方式是从上到下
此时memblock_region 是没有可用内存的, 初次添加可用内存是发生在setup_machine_fdt()
中, 它会解析device tree中的内存物理信息,然后调用函数memblock_add
添加到memblock_region中去.
current_limit
在 kernel的启动过程的 sanity_check_meminfo()
中设定
当kernel启动到此处, 已经可以使用memblock来进行内存分配,从而进行页表的建立.
memblock_add
int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
{
phys_addr_t end = base + size - 1;
memblock_dbg("memblock_add: [%pa-%pa] %pF\n",
&base, &end, (void *)_RET_IP_);
return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0);
}
int __init_memblock memblock_add_range(struct memblock_type *type,
phys_addr_t base, phys_addr_t size,
int nid, unsigned long flags)
{
bool insert = false;
phys_addr_t obase = base;
phys_addr_t end = base + memblock_cap_size(base, &size);
int idx, nr_new;
struct memblock_region *rgn;
if (!size)
return 0;
第一次添加可用内存到memory类型的memblock_region
/* special case for empty array */
if (type->regions[0].size == 0) {
WARN_ON(type->cnt != 1 || type->total_size);
type->regions[0].base = base;
type->regions[0].size = size;
type->regions[0].flags = flags;
memblock_set_region_node(&type->regions[0], nid);
type->total_size = size;
return 0;
}
repeat:
所有要添加的memblock region 都需要跑2次才能最终插入
第一次是判断是否需要插入,以及是否需要扩充存放memory region的数量大小
第二次才是真正的添加操作
base = obase;
nr_new = 0;
for_each_memblock_type(type, rgn) {
phys_addr_t rbase = rgn->base;
phys_addr_t rend = rbase + rgn->size;
if (rbase >= end)
break;
找到需要输入的节点位置
if (rend <= base)
continue;
将非重叠的部分插入类型为memory的memblock_region
if (rbase > base) {
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
WARN_ON(nid != memblock_get_region_node(rgn));
#endif
WARN_ON(flags != rgn->flags);
nr_new++;
if (insert)
memblock_insert_region(type, idx++, base,
rbase - base, nid,
flags);
}
/* area below @rend is dealt with, forget about it */
base = min(rend, end);
}
插入剩余的部分 或者是 当前要添加的内存区域需要插入到memblock_region的尾部
if (base < end) {
nr_new++;
if (insert)
memblock_insert_region(type, idx, base, end - base,
nid, flags);
}
如果第一次执行检查发现要添加的内存已经全部重叠,则直接退出
if (!nr_new)
return 0;
第一次执行检查是否需要调整内存区数组大小,第二次执行合并操作
if (!insert) {
while (type->cnt + nr_new > type->max)
if (memblock_double_array(type, obase, size) < 0)
return -ENOMEM;
insert = true;
goto repeat;
} else {
检查是否需要做合并的动作
memblock_merge_regions(type);
return 0;
}
}
memblock_alloc
static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid)
{
phys_addr_t found;
if (!align)
align = SMP_CACHE_BYTES;
在类型为memory的memblock_region中寻找合适的分配位置
found = memblock_find_in_range_node(size, align, start, end, nid);
将找到的内存保存到类型为reserved的memblock_region中,代表已经分配
if (found && !memblock_reserve(found, size)) {
/*
* The min_count is set to 0 so that memblock allocations are
* never reported as leaks.
*/
检查是否发生了memleak
kmemleak_alloc(__va(found), size, 0, 0);
return found;
}
return 0;
}
memblock_alloc 实际就是调用 memblock_alloc_range_nid 来实现分配
memblock_find_in_range_node
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
phys_addr_t end, int nid)
{
phys_addr_t kernel_end, ret;
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
end = memblock.current_limit;
永远都不会分配第一个页面
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
kernel_end = __pa_symbol(_end);
判断分配的方向
if (memblock_bottom_up() && end > kernel_end) {
phys_addr_t bottom_up_start;
/* make sure we will allocate above the kernel */
bottom_up_start = max(start, kernel_end);
/* ok, try bottom-up allocation first */
ret = __memblock_find_range_bottom_up(bottom_up_start, end,
size, align, nid);
if (ret)
return ret;
}
默认使用的是从上到下的方式寻找可用的内存
return __memblock_find_range_top_down(start, end, size, align, nid);
}
memblock_reserve
static int __init_memblock memblock_reserve_region(phys_addr_t base,
phys_addr_t size,
int nid,
unsigned long flags)
{
struct memblock_type *_rgn = &memblock.reserved;
memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
(unsigned long long)base,
(unsigned long long)base + size - 1,
flags, (void *)_RET_IP_);
return memblock_add_range(_rgn, base, size, nid, flags);
}
memblock_reserve 和 memblock_add 一样也是调用memblock_add_range 来添加,只不过对象从memory变成了memblock.reserved.