mit6.s081 - lab3 page tables

最新推荐文章于 2024-04-20 18:39:49 发布

zju_cxl

最新推荐文章于 2024-04-20 18:39:49 发布

阅读量301

点赞数

文章标签： linux 运维服务器

本文链接：https://blog.csdn.net/hrbust_cxl/article/details/131213053

版权

本文详细介绍了如何通过在xv6操作系统中优化系统调用来提升性能，主要涉及在每个进程创建时映射一个只读页面存储PID以减少上下文切换，以及实现页表打印和检测页面访问情况的功能。实现包括创建进程时的物理内存分配和映射，进程销毁时的资源释放，以及通过遍历页表结构来检测页面访问状态。

摘要由CSDN通过智能技术生成

本文为本人完成6.s081 2021fall时的一些记录，仅作为备忘录使用。

代码仓库地址：代码

task 1: Speed up system calls (easy)

题意描述

When each process is created, map one read-only page at USYSCALL (a VA defined in memlayout.h). At the start of this page, store a struct usyscall (also defined in memlayout.h), and initialize it to store the PID of the current process. For this lab, ugetpid() has been provided on the userspace side and will automatically use the USYSCALL mapping. You will receive full credit for this part of the lab if the ugetpid test case passes when running pgtbltest.

xv6执行系统调用的时候，需要通过ecall指令陷入内核，然后在内核态执行完系统调用时再返回用户态，这样是比较耗时的。

出于性能方面考虑，可以让用户空间和内核空间共享一片只读的物理内存空间，这样可以避免用户态和内核态之间的切换(减少上下文切换)，从而加速某些系统调用的性能。

解决思路

在每个进程创建时，分配一个物理页面，将其与 USYSCALL 页面建立映射关系，用于存放进程号PID。(如下图所示这个地址在TRAPFRAME下面，kernel/memlayout.h文件中定义)进行映射。

之后再获取PID时就可以不陷入内核态，直接访问 USYSCALL 页面即可。

process_address_space

xv6中的定义如下：

// User memory layout.
// Address zero first:
//   text
//   original data and bss
//   fixed-size stack
//   expandable heap
//   ...
//   USYSCALL (shared with kernel)
//   TRAPFRAME (p->trapframe, used by the trampoline)
//   TRAMPOLINE (the same page as in the kernel)
#define TRAPFRAME (TRAMPOLINE - PGSIZE)
#ifdef LAB_PGTBL
#define USYSCALL (TRAPFRAME - PGSIZE)

struct usyscall {
  int pid;  // Process ID
};
#endif

因此要做的事情很简单：

创建进程时：
- 创建进程的时候分配物理页。
- 建立虚拟地址和物理地址的映射关系。
- 给这个 USYSCALL 的pid赋值。
销毁进程时：
- 归还这个页的物理内存。
- 解除映射关系。

实现

创建进程时：

创建进程的时候分配物理页(kernel/proc.c@allocproc)。

+  if((p->usyscallframe = (struct usyscall *)kalloc()) == 0){
+    freeproc(p);
+    release(&p->lock);
+    return 0;
+  }
+
   // An empty user page table.
   p->pagetable = proc_pagetable(p);
   if(p->pagetable == 0){

建立虚拟地址和物理地址的映射关系(kernel/proc.c@proc_pagetable)。

+  // map pid
+  if(mappages(pagetable, USYSCALL, PGSIZE,
+              (uint64)(p->usyscallframe), PTE_R | PTE_U) < 0){
+    uvmunmap(pagetable, TRAMPOLINE, 1, 0);
+    uvmunmap(pagetable, TRAPFRAME, 1, 0);
+    uvmfree(pagetable, 0);
+    return 0;
+  }
+
   return pagetable;
 }

给这个 USYSCALL 的pid赋值(kernel/proc.c@allocproc)。

+  p->usyscallframe->pid = p->pid;
+
   // Set up new context to start executing at forkret,
   // which returns to user space.
   memset(&p->context, 0, sizeof(p->context));

销毁进程时：

归还这个页的物理内存(kernel/proc.c@freeproc)。

   if(p->trapframe)
     kfree((void*)p->trapframe);
+  if(p->usyscallframe)
+    kfree((void*)p->usyscallframe);
   p->trapframe = 0;
   if(p->pagetable)
     proc_freepagetable(p->pagetable, p->sz);

解除映射关系(kernel/proc.c@proc_freepagetable)。

 void
 proc_freepagetable(pagetable_t pagetable, uint64 sz)
 {
+  uvmunmap(pagetable, USYSCALL, 1, 0);
   uvmunmap(pagetable, TRAMPOLINE, 1, 0);
   uvmunmap(pagetable, TRAPFRAME, 1, 0);
   uvmfree(pagetable, sz);

task 2: Print a page table (easy)

题意描述

Define a function called vmprint(). It should take a pagetable_t argument, and print that pagetable in the format described below. Insert if(p->pid==1) vmprint(p->pagetable) in exec.c just before the return argc, to print the first process’s page table. You receive full credit for this part of the lab if you pass the pte printout test of make grade.

打印pid=1的用户页表，效果如下，需要显示出三级页表的结构：

解决思路

可以参考 walk 函数的实现：

pte_t *
walk(pagetable_t pagetable, uint64 va, int alloc)
{
  if(va >= MAXVA)
    panic("walk");

  for(int level = 2; level > 0; level--) {
    pte_t *pte = &pagetable[PX(level, va)];
    if(*pte & PTE_V) {
      pagetable = (pagetable_t)PTE2PA(*pte);
    } else {
      if(!alloc || (pagetable = (pde_t*)kalloc()) == 0)
        return 0;
      memset(pagetable, 0, PGSIZE);
      *pte = PA2PTE(pagetable) | PTE_V;
    }
  }
  return &pagetable[PX(0, va)];
}

其中，查找页表中一个PTE的关键代码是：

pte_t *pte = &pagetable[PX(level, va)];
if(*pte & PTE_V) {
  // ...
}

借助上面的在页表中查找PTE的代码，采用dfs就行(纯算法题)，注意输出格式即可。

实现

kernel/vm.c：

+
+void dfs_vmpt(pagetable_t pagetable, int step) {
+  char* pre = 0;
+  if (step == 0) pre = ".. .. ..";
+  else if (step == 1) pre = ".. ..";
+  else if (step == 2) pre = "..";
+  for (int i = 0; i < 512; ++i) {
+    pte_t pte = pagetable[i];
+    if (pte & PTE_V) {
+      printf("%s%d: pte %p pa %p\n", pre, i, pte, PTE2PA(pte));
+      if (step != 0) {
+        dfs_vmpt((pagetable_t)PTE2PA(pte), step - 1);
+      }
+    }
+  }
+}
+
+void vmprint(pagetable_t pagetable) {
+  printf("page table %p\n", pagetable);
+  dfs_vmpt(pagetable, 2);
+}

task 3: Detecting which pages have been accessed (hard)

题意描述

Some garbage collectors (a form of automatic memory management) can benefit from information about which pages have been accessed (read or write). In this part of the lab, you will add a new feature to xv6 that detects and reports this information to userspace by inspecting the access bits in the RISC-V page table. The RISC-V hardware page walker marks these bits in the PTE whenever it resolves a TLB miss.

Your job is to implement pgaccess(), a system call that reports which pages have been accessed. The system call takes three arguments. First, it takes the starting virtual address of the first user page to check. Second, it takes the number of pages to check. Finally, it takes a user address to a buffer to store the results into a bitmask (a datastructure that uses one bit per page and where the first page corresponds to the least significant bit). You will receive full credit for this part of the lab if the pgaccess test case passes when running pgtbltest.

给页表Flags中添加一位用于标志页面是否被访问，并实现pgaccess系统调用。

系统调用pgaccess：获取从上次pgaccess到现在，一段虚拟内存空间的页面是否被访问过。

输入：页面起始地址、页面数量、返回结果地址，
输出：通过位图保存的页面的access状态，将其复制到输入的返回结果地址中。

解决思路

首先要明确的是，我们要利用PTE的FLAGS，其形式如下，可以看到在Sv39硬件下，有一个A标记位：

参考RISC-V的资料[1]，可以看到下面一句话：

Each leaf PTE contains an accessed (A) and dirty (D) bit. The A bit indicates the virtual page has been read, written, or fetched from since the last time the A bit was cleared. The D bit indicates the virtual page has been written since the last time the D bit was cleared.

所以第6位即是我们要利用的标记，称之为 PTE_A。

`1`	`#define PTE_A (1L << 6)`

那么此时只要搞清楚需要谁来设置这个标记就可以了：

根据以下lab文档描述，可以看出，PTE标记是MMU硬件去设置的(置1)。

The RISC-V hardware page walker marks these bits in the PTE whenever it resolves a TLB miss.
根据RISC-V的资料[1]，可以看出，PTE标记是需要操作系统这个软件来清除的(置0)。

Mandating that the PTE updates to be exact, atomic, and in program order simplifies the spec- ification, and makes the feature more useful for system software. Simple implementations may instead generate page-fault exceptions.
The A and D bits are never cleared by the implementation. If the supervisor software does not rely on accessed and/or dirty bits, e.g. if it does not swap memory pages to secondary storage or if the pages are being used to map I/O space, it should always set them to 1 in the PTE to improve performance.

以及

For non-leaf PTEs, the D, A, and U bits are reserved for future standard use and must be cleared by software for forward compatibility.

综合来说，PTE_A 的设置(置1)是MMU完成的，清除(置0)是操作系统完成的。

这样，这个task就比较好完成了，实现一个系统调用去访问给定范围这些PTE，得到一个bitmap的结果，然后再将PTE_A 置0即可。

实现

先定义 PTE_A：(kernel/riscv.h)

 #define PTE_W (1L << 2)
 #define PTE_X (1L << 3)
 #define PTE_U (1L << 4) // 1 -> user can access
+#define PTE_A (1L << 6)

实现系统调用，是用户态程序与内核交互的接口：(kernel/sysproc.c)

 int
 sys_pgaccess(void)
 {
-  // lab pgtbl: your code here.
+  uint64 base, mask;
+  int len;
+  if(argaddr(0, &base) < 0)
+    return -1;
+  if(argint(1, &len) < 0)
+    return -1;
+  if(argaddr(2, &mask) < 0)
+    return -1;
+  printf("%d %d %d\n", base, len, mask);
+  if(len > (sizeof(int) * 8)) return -1;
+  int imask;
+  pgaccess(base, len, &imask);
+  if(copyout(myproc()->pagetable, mask, (char *)&imask, sizeof(int)) < 0)
+    return -1;
   return 0;
 }

实现对应的获取访问标志位的bitmap的功能，并把PTE_A 标记清除：(kernel/proc.c)

     printf("\n");
   }
 }
+
+int pgaccess(uint64 base, int len, int *mask) {
+  pte_t *pte;
+  int ans = 0;
+  for (int i = 0 ; i < len; ++i) {
+    pte = walk(myproc()->pagetable, base, 0);
+    if (pte != 0 && (*pte) & PTE_A) {
+      ans |= 1 << i;
+      *pte ^= PTE_A;
+    }
+    base += PGSIZE;
+  }
+  *mask = ans;
+  return 0;
+}

需要注意的是，把内核数据往用户空间写要使用 copyout 方法：该方法将页表看作是一个数据结构，然后使用软件模拟MMU功能的方式，将虚拟地址转换为物理地址，来进行数据读写。

// Copy from kernel to user.
// Copy len bytes from src to virtual address dstva in a given page table.
// Return 0 on success, -1 on error.
int
copyout(pagetable_t pagetable, uint64 dstva, char *src, uint64 len)
{
  uint64 n, va0, pa0;

  while(len > 0){
    va0 = PGROUNDDOWN(dstva);
    pa0 = walkaddr(pagetable, va0);
    if(pa0 == 0)
      return -1;
    n = PGSIZE - (dstva - va0);
    if(n > len)
      n = len;
    memmove((void *)(pa0 + (dstva - va0)), src, n);

    len -= n;
    src += n;
    dstva = va0 + PGSIZE;
  }
  return 0;
}

切换内核态的时候，cpu的参数没变(这个参数的改变和调度相关)，因此可以在内核态用 myproc 访问当前用户态的进程信息：

// Per-CPU state.
struct cpu {
  struct proc *proc;          // The process running on this cpu, or null.
  struct context context;     // swtch() here to enter scheduler().
  int noff;                   // Depth of push_off() nesting.
  int intena;                 // Were interrupts enabled before push_off()?
};

extern struct cpu cpus[NCPU];

// ...

// Return this CPU's cpu struct.
// Interrupts must be disabled.
struct cpu*
mycpu(void) {
  int id = cpuid();
  struct cpu *c = &cpus[id];
  return c;
}

// Return the current struct proc *, or zero if none.
struct proc*
myproc(void) {
  push_off();
  struct cpu *c = mycpu();
  struct proc *p = c->proc;
  pop_off();
  return p;
}