内存映射过程之paging_init
导读
之前已经逐步整理完成了idmap、fixup、memblock等内容,在此阶段已经可以:
- 访问kernel image的地址空间;
- 访问FDT地址空间,即已经获取device tree相关数据;
- 可以通过memblock提供的接口申请内存;
则接下来需要做的处理,是要把从fdt中获取的mem layout关系建立实际的映射;
所谓建立映射,实质很简单,即根据页表大小和地址空间大小,填充PUD PMD PTE等数据;
1. 页表转换关系
这部分之前已经整理过了,在这里制作简单说明:
1.1 涉及页表层级相关宏
宏(Config) | 转换后 | 值 | 描述 |
---|---|---|---|
CONFIG_ARM64_VA_BITS_39=y | NA | NA | 决策逻辑地址的范围,以40~63位标记kernel or user,user + kernel共 1T |
CONFIG_ARM64_VA_BITS=39 | NA | 39 | 同上 |
CONFIG_ARM64_4K_PAGES=y | NA | NA | 决策页表的大小,当前被配置为4K |
CONFIG_ARM64_PAGE_SHIFT=12 | NA | 12 | 同上 |
CONFIG_PGTABLE_LEVELS=3 | NA | 3 | 决策页表层级,即当前平台支持3级页表:PGD PMD PTE |
PTRS_PER_PTE | 1 << (PAGE_SHIFT - 3) | 1<<9 | PTE/PMD/PUD中表项的多少 |
PGDIR_SHIFT | ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - CONFIG_PGTABLE_LEVELS) | 30 | PGD 的偏移位 |
PUD_SHIFT | ARM64_HW_PGTABLE_LEVEL_SHIFT(1) | NA | PUD的偏移位,level = 3 无PUD |
PMD_SHIFT | ARM64_HW_PGTABLE_LEVEL_SHIFT(2) | 21 | PMD的偏移位 |
ARM64_HW_PGTABLE_LEVEL_SHIFT(n) | (PAGE_SHIFT - 3) * (4 - (n)) + 3 | NA | 根据level决策该层的shift |
1.2 页表转换
一次地址的翻译过程基本如上图所示,相关的level、page size、VA size等根据平台不同,上述Config和宏都已经配置完成,即上述转换过程是固定的
则我们本部分要介绍的paging_init实质上就是在构造上述转换关系:
- 确认PGD的访问入口(实质为swapper的地址)
- 根据PGD和 offset 计算到PMD的页表地址,并建立映射:
- 申请物理页表;
- 与上述计算出来的地址建立映射;
- 使用同样方法得到PTE的页表地址并建立映射关系;
- 使用同样方法得到PAGE的页表地址并建立映射关系;
则上述过程中:
- 页表入口地址PGD,在此时已经可以访问到了;
- 核心功能为建立映射关系,即可以通过该线性地址读取到实际的物理地址;
1.3 页表构建过程
上述部分的核心目录:
目录 | 描述 |
---|---|
./arch/arm64/mm/mmu.c | MMU地址转换相关函数的实现 |
./kernel-4.9/arch/arm64/include/asm/pgtable.h | 页表实现相关接口 |
./kernel-4.9/arch/arm64/include/asm/pgtable-hwdef.h | 页表各个level中shift、size等实现 |
2. 功能描述
直接上图:
3. code 阅读
3.1 Paging_init
paging_init这个函数并不长,注释已经说明的比较清晰了:
- set up page table;
- 初始化zone memory映射;
- 建立第一个页面;
/*
* paging_init() sets up the page tables, initialises the zone memory
* maps and sets up the zero page.
*/
void __init paging_init(void)
{
phys_addr_t pgd_phys = early_pgtable_alloc(); //分配一个phy 页面
pgd_t *pgd = pgd_set_fixmap(pgd_phys); //将其放在fixmap中,建立映射关系,方便访问
map_kernel(pgd); //映射内核中各个段的关系,即我们之前看的vmlinux.lds.S中的内容;
map_mem(pgd); //这里是映射之前memblock中add的部分
//这里复用swapper_pg_dir的空间,将当前新建的页表放进去,转换为VA哦;
cpu_replace_ttbr1(__va(pgd_phys));
memcpy(swapper_pg_dir, pgd, PAGE_SIZE);
cpu_replace_ttbr1(swapper_pg_dir);
pgd_clear_fixmap();//上述已经映射完成,由于PGD复用了swapper的空间的PGD,事实上整个系统就只用一个PGD,在这里释放掉pgd
memblock_free(pgd_phys, PAGE_SIZE);
//释放掉swapper_pg_dir中的pud pmd,这里之前是在kernel 汇编汇中填充的,目的是为了访问kernel空间,
memblock_free(__pa(swapper_pg_dir) + PAGE_SIZE, SWAPPER_DIR_SIZE - PAGE_SIZE);
}
逐步分析就是做了这几件事情:
- 获取pgd_phys地址,即页表地址
- 在fixmap中建立映射关系,方便访问
- 映射kernel中各个段
- 映射memblock中添加的部分
- 复用swapper_pg_dir的地址,将pgd添加过去
- 释放掉没有使用的资源
3.2 申请物理页面
early_pgtable_alloc 申请一个page:
static phys_addr_t __init early_pgtable_alloc(void)
{
phys_addr_t phys;
void *ptr;
phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE); // 通过memblock 申请一个page大小的地址,注意这里是物理地址
ptr = pte_set_fixmap(phys);//这里就是将申请的物理页面放到fixmap地址中的pte部分,这里还没用过,则现在可以通过fixmap中PTE的虚拟地址访问当前申请的这块物理地址;
memset(ptr, 0, PAGE_SIZE);//将其清空
pte_clear_fixmap();//去除这块的映射
return phys;//将物理地址返回
}
上述函数的核心目的就是申请一块物理内存,借用fixmap中的pte映射将其清空;
3.3 借用fixmap建立映射
具体来看下fixmap的映射操作,这个在fixmap那一部分分析过,这里翻译下这个宏:
pgd_t *pgd = pgd_set_fixmap(pgd_phys);
- #define pgd_set_fixmap(addr) ((pgd_t *)set_fixmap_offset(FIX_PGD, addr))
pgd_t *pgd = ((pgd_t *)set_fixmap_offset(FIX_PGD, pgd_phys))- #define set_fixmap_offset(idx, phys) __set_fixmap_offset(idx, phys, FIXMAP_PAGE_NORMAL)
pgd_t *pgd = ((pgd_t *)__set_fixmap_offset(FIX_PGD, pgd_phys, FIXMAP_PAGE_NORMAL))- #define __set_fixmap_offset(idx, phys, flags) ({unsigned long ________addr; __set_fixmap(idx, phys, flags);________addr = fix_to_virt(idx) + ((phys) & (PAGE_SIZE - 1)); ________addr;})
pgd_t *pgd = ((pgd_t *)({ unsigned long ________addr; __set_fixmap(FIX_PGD, pgd_phys, FIXMAP_PAGE_NORMAL);________addr = fix_to_virt(FIX_PGD) + ((pgd_phys) & (PAGE_SIZE - 1)); ________addr;}))
这个函数实际在做什么呢:
- 之前申请到了4K大小的物理内存;
- 获取到fix_pgd的虚拟地址,再通过物理地址计算偏移,也就是说这里放在FIX_PGD之后,即fix_to_virt(FIX_PGD) + ((pgd_phys) & (PAGE_SIZE - 1));
- 通过__set_fixmap建立物理地址与虚拟地址的映射;
具体来看建立映射的过程:
void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags)
{
unsigned long addr = __fix_to_virt(idx);//根据fixmap的idx获取其虚拟地址;
pte_t *pte;
BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
pte = fixmap_pte(addr);// 将addr作为pte_index计算pte的偏移;bm_pte是在early_fixmap_init时添加的链接
if (pgprot_val(flags)) {//FIXMAP_PAGE_NORMAL
set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags));//set_pte就是把物理地址放到bm_pte对应的这个pte;
} else {
pte_clear(&init_mm, addr, pte);
flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
}
}
fixmap映射的过程,由于其线性地址是固定的,所以需要做的就是把对应的物理地址放到pte中;
而pte是根据虚拟地址中pmd的偏移计算出来的,pmd是根据pud的偏移计算出来,pud是根据pgd的偏移计算出来,实质都是通过idx的虚拟地址可以找到bm_pte以及其pte_index;
3.4 kernel映射
本函数核心为建立kernel中各个段到pgd页表中,通过如下几个步骤:
- map_kernel_segment映射各个段
- pgd_val链接到FIXADDR_START
/*
* Create fine-grained mappings for the kernel.
*/
static void __init map_kernel(pgd_t *pgd)
{
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, vmlinux_data;
map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text); // text段
map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, &vmlinux_rodata); //rodata段
map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,&vmlinux_init); //init 段
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data); // data段
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
/*
* The fixmap falls in a separate pgd to the kernel, and doesn't
* live in the carveout for the swapper_pg_dir. We can simply
* re-use the existing dir for the fixmap.
*/
set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
} else if (CONFIG_PGTABLE_LEVELS > 3) {
/*
* The fixmap shares its top level pgd entry with the kernel
* mapping. This can really only occur when we are running
* with 16k/4 levels, so we can simply reuse the pud level
* entry instead.
*/
BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START), __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
pud_clear_fixmap();
} else {
BUG();
}
kasan_copy_shadow(pgd);//kasan这个东西目前还没研究过,先不care
}
// map_kernel_segment( pgd, _text, _etext, PAGE_KERNEL_EXEC, &vmlinux_text);
static void __init map_kernel_segment(pgd_t *pgd, void *va_start, void *va_end, pgprot_t prot, struct vm_struct *vma)
{
phys_addr_t pa_start = __pa(va_start); // va转换为pa
unsigned long size = va_end - va_start; //计算 size
BUG_ON(!PAGE_ALIGNED(pa_start));
BUG_ON(!PAGE_ALIGNED(size));
__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot, early_pgtable_alloc, !debug_pagealloc_enabled());//建立映射关系
// !debug_pagealloc_enabled() 根据CONFIG_DEBUG_PAGEALLOC is not set决策,当前平台这里是默认false,即传入参数为true,也就是allow_block_mappings
//将上述我们计算的虚拟地址和物理地址添加到vm_struct中,并在后续使用
vma->addr = va_start;
vma->phys_addr = pa_start;
vma->size = size;
vma->flags = VM_MAP;
vma->caller = __builtin_return_address(0);
vm_area_add_early(vma);
}
//建立映射关系
static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(void),
bool allow_block_mappings)
{
unsigned long addr, length, end, next;
pgd_t *pgd = pgd_offset_raw(pgdir, virt);
/*
* If the virtual and physical address don't have the same offset
* within a page, we cannot map the region as the caller expects.
*/
if (WARN_ON((phys ^ virt) & ~PAGE_MASK))
return;
phys &= PAGE_MASK;
addr = virt & PAGE_MASK;
length = PAGE_ALIGN(size + (virt & ~PAGE_MASK));
end = addr + length;
do {
next = pgd_addr_end(addr, end);
alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc, allow_block_mappings);
phys += next - addr;
} while (pgd++, addr = next, addr != end);
}
//把我们映射完成的vm_struct添加到vmlist管理,需要注意的是这个函数是在vmalloc_init之前使用的
/**
* vm_area_add_early - add vmap area early during boot
* @vm: vm_struct to add
*
* This function is used to add fixed kernel vm area to vmlist before
* vmalloc_init() is called. @vm->addr, @vm->size, and @vm->flags
* should contain proper values and the other fields should be zero.
*
* DO NOT USE THIS FUNCTION UNLESS YOU KNOW WHAT YOU'RE DOING.
*/
void __init vm_area_add_early(struct vm_struct *vm)
{
struct vm_struct *tmp, **p;
BUG_ON(vmap_initialized);
for (p = &vmlist; (tmp = *p) != NULL; p = &tmp->next) {
if (tmp->addr >= vm->addr) {
BUG_ON(tmp->addr < vm->addr + vm->size);
break;
} else
BUG_ON(tmp->addr + tmp->size > vm->addr);
}
vm->next = *p;
*p = vm;
}
3.5 memblock 映射
这里是将我们在memblock中添加的region,逐个遍历,通过__map_memblock做物理地址与线性地址的映射;
static void __init map_mem(pgd_t *pgd)
{
struct memblock_region *reg;
/* map all the memory banks */
for_each_memblock(memory, reg) {
phys_addr_t start = reg->base;
phys_addr_t end = start + reg->size;
if (start >= end) break;
if (memblock_is_nomap(reg)) continue;
__map_memblock(pgd, start, end);
}
}
//实质map的动作也很简单,就是通过__create_pgd_mapping建立映射关系,只是需要做些判断,确认映射位置;
static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
{
unsigned long kernel_start = __pa(_text);
unsigned long kernel_end = __pa(__init_begin);
/*
* Take care not to create a writable alias for the
* read-only text and rodata sections of the kernel image.
*/
/* No overlap with the kernel text/rodata */
if (end < kernel_start || start >= kernel_end) {
__create_pgd_mapping(pgd, start, __phys_to_virt(start), end - start, PAGE_KERNEL, early_pgtable_alloc, !debug_pagealloc_enabled());
return;
}
/*
* This block overlaps the kernel text/rodata mappings.
* Map the portion(s) which don't overlap.
*/
if (start < kernel_start)
__create_pgd_mapping(pgd, start, __phys_to_virt(start), kernel_start - start, PAGE_KERNEL, early_pgtable_alloc, !debug_pagealloc_enabled());
if (kernel_end < end)
__create_pgd_mapping(pgd, kernel_end, __phys_to_virt(kernel_end), end - kernel_end, PAGE_KERNEL, early_pgtable_alloc, !debug_pagealloc_enabled());
/*
* Map the linear alias of the [_text, __init_begin) interval as
* read-only/non-executable. This makes the contents of the
* region accessible to subsystems such as hibernate, but
* protects it from inadvertent modification or execution.
*/
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start), kernel_end - kernel_start, PAGE_KERNEL_RO, early_pgtable_alloc, !debug_pagealloc_enabled());
}
4. 总结
4.1 页表建立过程
本部分整理不是很到位,算是仓促结束吧,之后抽时间再来重新梳理一下