linux map 内存,【原创】(十五)Linux内存管理之RMAP

本文将要为您介绍的是【原创】(十五)Linux内存管理之RMAP,具体操作方法:背景

Read the fucking source code! --By 鲁迅

A picture is worth a thousand words. --By 高尔基

说明:

Kernel版本:4.14

ARM64处理器,Contex-A53,双核

使用工具:Source Insight 3.5, Visio

1. 概述

RMAP反向映射是一种物理地址反向映射虚拟地址的方法。

映射

页表用于虚拟地址到物理地址映射,其中的PTE页表项记录了映射关系,同时struct page结构体中的mapcount字段保存了有多少PTE页表项映射了该物理页。

反向映射

当某个物理地址要进行回收或迁移时,此时需要去找到有多少虚拟地址射在该物理地址,并断开映射处理。在没有反向映射的机制时,需要去遍历进程的页表,这个效率显然是很低下的。反向映射可以找到虚拟地址空间VMA,并仅从VMA使用的用户页表中取消映射,可以快速解决这个问题。

351603455aae8b093e895bf2f752d4fa.png

反向映射的典型应用场景:

kswapd进行页面回收时,需要断开所有映射了该匿名页面的PTE表项;

页面迁移时,需要断开所有映射了该匿名页面的PTE表项;

2. 数据结构

反向映射有三个关键的结构体:

struct vm_area_struct,简称VMA;

VMA我们在之前的文章中介绍过,用于描述进程地址空间中的一段区域。与反向映射相关的字段如下:

struct vm_area_struct {

...

/*

* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma

* list, after a COW of one of the file pages. A MAP_SHARED vma

* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack

* or brk vma (with NULL file) can only be in an anon_vma list.

*/

struct list_head anon_vma_chain; /* Serialized by mmap_sem &

* page_table_lock */

struct anon_vma *anon_vma; /* Serialized by page_table_lock */

...

}

struct anon_vma,简称AV;

AV结构用于管理匿名类型VMAs,当有匿名页需要unmap处理时,可以先找到AV,然后再通过AV进行查找处理。结构如下:

/*

* The anon_vma heads a list of private "related" vmas, to scan if

* an anonymous page pointing to this anon_vma needs to be unmapped:

* the vmas on the list will be related by forking, or by splitting.

*

* Since vmas come and go as they are split and merged (particularly

* in mprotect), the mapping field of an anonymous page cannot point

* directly to a vma: instead it points to an anon_vma, on whose list

* the related vmas can be easily linked or unlinked.

*

* After unlinking the last vma on the list, we must garbage collect

* the anon_vma object itself: we're guaranteed no page can be

* pointing to this anon_vma once its vma list is empty.

*/

struct anon_vma {

struct anon_vma *root; /* Root of this anon_vma tree */

struct rw_semaphore rwsem; /* W: modification, R: walking the list */

/*

* The refcount is taken on an anon_vma when there is no

* guarantee that the vma of page tables will exist for

* the duration of the operation. A caller that takes

* the reference is responsible for clearing up the

* anon_vma if they are the last user on release

*/

atomic_t refcount;

/*

* Count of child anon_vmas and VMAs which points to this anon_vma.

*

* This counter is used for making decision about reusing anon_vma

* instead of forking new one. See comments in function anon_vma_clone.

*/

unsigned degree;

struct anon_vma *parent; /* Parent of this anon_vma */

/*

* NOTE: the LSB of the rb_root.rb_node is set by

* mm_take_all_locks() _after_ taking the above lock. So the

* rb_root must only be read/written after taking the above lock

* to be sure to see a valid next pointer. The LSB bit itself

* is serialized by a system wide lock only visible to

* mm_take_all_locks() (mm_all_locks_mutex).

*/

/* Interval tree of private "related" vmas */

struct rb_root_cached rb_root;

};

struct anon_vma_chain,简称AVC;

AVC是连接VMA和AV之间的桥梁。

/*

* The copy-on-write semantics of fork mean that an anon_vma

* can become associated with multiple processes. Furthermore,

* each child process will have its own anon_vma, where new

* pages for that process are instantiated.

*

* This structure allows us to find the anon_vmas associated

* with a VMA, or the VMAs associated with an anon_vma.

* The "same_vma" list contains the anon_vma_chains linking

* all the anon_vmas associated with this VMA.

* The "rb" field indexes on an interval tree the anon_vma_chains

* which link all the VMAs associated with this anon_vma.

*/

struct anon_vma_chain {

struct vm_area_struct *vma;

struct anon_vma *anon_vma;

struct list_head same_vma; /* locked by mmap_sem & page_table_lock */

struct rb_node rb; /* locked by anon_vma->rwsem */

unsigned long rb_subtree_last;

#ifdef CONFIG_DEBUG_VM_RB

unsigned long cached_vma_start, cached_vma_last;

#endif

};

来一张图就清晰明了了:

bf3950817f191071611e84ea8f4e35da.png

通过same_vma链表节点,将anon_vma_chain添加到vma->anon_vma_chain链表中;

通过rb红黑树节点,将anon_vma_chain添加到anon_vma->rb_root的红黑树中;

2. 流程分析

先看一下宏观的图:

cde3adbfeedd0c63e388f9f439566b39.png

地址空间VMA可以通过页表完成虚拟地址到物理地址的映射;

页框与page结构对应,page结构中的mapping字段指向anon_vma,从而可以通过RMAP机制去找到与之关联的VMA;

2.1 anon_vma_prepare

之前在page fault的文章中,提到过anon_vma_prepare函数,这个函数完成的工作就是为进程地址空间中的VMA准备struct anon_vma结构。

调用例程及函数流程如下图所示:

6d3ffce6d3681a632459d1422d8fdfa0.png

至于VMA,AV,AVC三者之间的关联关系,在上文的图中已经有所描述。

当创建了与VMA关联的AV后,还有关键的一步需要做完,才能算是真正的把RMAP通路打通,那就是让page与AV关联起来。只有这样才能通过page找到AV,进而找到VMA,从而完成对应的PTE unmap操作。

e983c7cd51f718b6f640cc5b1a2205d7.png

2.2 子进程创建anon_vma

父进程通过fork()来创建子进程,子进程会复制整个父进程的地址空间及页表。子进程拷贝了父进程的VMA数据结构内容,而子进程创建相应的anon_vma结构,是通过anon_vma_fork()函数来实现的。

anon_vma_fork()效果图如下:

05330b0ca08ccf2134290e9ca42fd609.png

以实际fork()两次为例,发生COW之后,看看三个进程的链接关系,如下图:

5379378c5b061cb37eae5028079542bb.png

2.3 TTU(try to unmap)和Rmap Walk

如果有page被映射到多个虚拟地址,可以通过Rmap Walk机制来遍历所有的VMA,并最终调用回调函数来取消映射。

与之相关的结构体为struct rmap_walk_control,如下:

/*

* rmap_walk_control: To control rmap traversing for specific needs

*

* arg: passed to rmap_one() and invalid_vma()

* rmap_one: executed on each vma where page is mapped

* done: for checking traversing termination condition

* anon_lock: for getting anon_lock by optimized way rather than default

* invalid_vma: for skipping uninterested vma

*/

struct rmap_walk_control {

void *arg;

/*

* Return false if page table scanning in rmap_walk should be stopped.

* Otherwise, return true.

*/

bool (*rmap_one)(struct page *page, struct vm_area_struct *vma,

unsigned long addr, void *arg);

int (*done)(struct page *page);

struct anon_vma *(*anon_lock)(struct page *page);

bool (*invalid_vma)(struct vm_area_struct *vma, void *arg);

};

389cc561cedd4137930122c97468f0ff.png

取消映射的入口为try_to_unmap,流程如下图所示:

192ddf2c69b9d83f003dea51fed00386.png

基本的套路就是围绕着struct rmap_walk_control结构,初始化回调函数,以便在适当的时候能调用到。

关于取消映射try_to_unmap_one的详细细节就不进一步深入了,把握好大体框架即可。

本文地址:https://c.lanmit.com/czxt/Linux/30934.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值