FreeBSD 内存管理中的 pv_entry

前言

研读过FreeBSD的内存源代码,内核的内存源代码大致可以分为3个层次:
最底层的是4级页表:PML4, PDP, PD, PT 掌管VA到PA的转换;
中间层是页面的管理,包括:

  1. 页面分配管理:初始化、申请和释放遵循伙伴系统(伙伴系统依赖的是FreeList, FreeList 是页面为指数倍数的页面集合)
  2. 页面资源管理:调用机制 和 页面替换机制
  3. 涉及的数据结构有:vmspace vm_map vm_object vm_map_object pmap vm_page vm_map_entry pv_entry,这些结构分别代表进程资源、VA资源、内存使用者信息、VA和使用者的映射、物理映射信息、物理页的信息、一段虚拟地址信息、反向页表(本文的主角)
  4. supperpage: FreeBSD的大页管理涉及页表的升降级,连续物理页预留等;

最上层是内存的应用 UMA zone:主要用于内核数据结构所用存储空间的维护和管理

本文主要介绍中间层的pv_entry 也称之为方向页表,反向页表承接着上面提到的底层和中间层,页表是VA-PA的查找,当内存资源管理机制中需要对一个页面采取行动,如页面回收机制发现一个页面长期不活跃想要征收该页面用作他用则需要通过反向页表来找到该页的当前映射者来解除映射,即相当于从PA 到VA查找;

正文

先看下涉及的数据结构:

在这里插入图片描述

/*
* For each vm_page_t, there is a list of all currently valid virtual
* mappings of that page.  An entry is a pv_entry_t, the list is pv_list.
*/
typedef struct pv_entry {
     vm_offset_t     pv_va;          /* virtual address for mapping */
     TAILQ_ENTRY(pv_entry)     pv_next;
} *pv_entry_t;

/*
* pv_entries are allocated in chunks per-process.  This avoids the
* need to track per-pmap assignments.
*/
#define     _NPCM     3
#define     _NPCPV     168
struct pv_chunk {
     pmap_t               pc_pmap;
     TAILQ_ENTRY(pv_chunk)      pc_list;
     uint64_t                              pc_map[_NPCM];     /* bitmap; 1 = free */
     TAILQ_ENTRY(pv_chunk)     pc_lru;
     struct pv_entry                    pc_pventry[_NPCPV];
};

struct pmap {
     struct mtx          pm_mtx;
     pml4_entry_t          *pm_pml4;     /* KVA of level 4 page table */
     uint64_t          pm_cr3;
     TAILQ_HEAD(,pv_chunk)     pm_pvchunk;     /* list of mappings in pmap */   <=====================
     cpuset_t          pm_active;     /* active on cpus */
     cpuset_t          pm_save;     /* Context valid on cpus mask */
     int               pm_pcid;     /* context id */
     enum pmap_type          pm_type;     /* regular or nested tables */
     struct pmap_statistics     pm_stats;     /* pmap statistics */
     struct vm_radix          pm_root;     /* spare page table pages */
     long               pm_eptgen;     /* EPT pmap generation id */
     int               pm_flags;
};

数据结构见的关系猜想:
在这里插入图片描述
以下的代码验证了以上猜想:
#define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap)
在这里插入图片描述

static boolean_t
pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m,
    struct rwlock **lockp)
{
     pv_entry_t pv;

     rw_assert(&pvh_global_lock, RA_LOCKED);
     PMAP_LOCK_ASSERT(pmap, MA_OWNED);
     /* Pass NULL instead of the lock pointer to disable reclamation. */   <=============从以下流程可以看出,每次 pv_entry肯定是从一个pmap里面申请出来的然后记录了va后放在了vm_page结构成员md.pv_list所代表的队列里面
     if ((pv = get_pv_entry(pmap, NULL)) != NULL) {
          pv->pv_va = va;
          CHANGE_PV_LIST_LOCK_TO_VM_PAGE(lockp, m);
          TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next);
          m->md.pv_gen++;
          return (TRUE);
     } else
          return (FALSE);
}

附带上网上找到的先关说明

A vm_page represents an (object,index#) tuple. A pv_entry represents a hardware page table entry (pte). If you have five processes sharing the same physical page, and three of those processes’s page tables actually map the page, that page will be represented by a single vm_page structure and three pv_entry structures.
pv_entry structures only represent pages mapped by the MMU (one pv_entry represents one pte). This means that when we need to remove all hardware references to a vm_page (in order to reuse the page for something else, page it out, clear it, dirty it, and so forth) we can simply scan the linked list of pv_entry’s associated with that vm_page to remove or modify the pte’s from their page tables.
Under Linux there is no such linked list. In order to remove all the hardware page table mappings for a vm_page linux must index into every VM object that might have mapped the page. For example, if you have 50 processes all mapping the same shared library and want to get rid of page X in that library, you need to index into the page table for each of those 50 processes even if only 10 of them have actually mapped the page. So Linux is trading off the simplicity of its design against performance. Many VM algorithms which are O(1) or (small N) under FreeBSD wind up being O(N), O(N^2), or worse under Linux. Since the pte’s representing a particular page in an object tend to be at the same offset in all the page tables they are mapped in, reducing the number of accesses into the page tables at the same pte offset will often avoid blowing away the L1 cache line for that offset, which can lead to better performance.

FreeBSD has added complexity (the pv_entry scheme) in order to increase performance (to limit page table accesses to only those pte’s that need to be modified).(通过只更改需要更改的PTE而提高性能)

But FreeBSD has a scaling problem that Linux does not in that there are a limited number of pv_entry structures and this causes problems when you have massive sharing of data. In this case you may run out of pv_entry structures even though there is plenty of free memory available. This can be fixed easily enough by bumping up the number of pv_entry structures in the kernel config, but we really need to find a better way to do it.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值