6.S081 Lab5 Copy-on-Write Fork for xv6_lab 5 copy-on-write fork for xv6-CSDN博客

本文链接：https://blog.csdn.net/ahundredmile/article/details/125577852

6.S081 Lab5 Copy-on-Write Fork for xv6

文章目录

6.S081 Lab5 Copy-on-Write Fork for xv6
0. 背景 + 思路
1. 要求
3. 具体做法
4. 总结

0. 背景 + 思路

(1)背景：

这个部分的设计是 6.S081-6缺页异常Page Fault （我的课程学习笔记）的第四节的思想。
当shell执行指令的时候会fork，shell子进程的第一件事就是调用exec，执行我们想要执行的命令。如果fork创建了shell地址空间的完整copy，而exec的第一件事就是丢弃这个空间，这样太浪费了（前面的课程中讲过，通常来说exec系统调用不会返回，因为exec会完全替换当前进程的内存，相当于当前进程不复存在了，所以exec系统调用已经没有地方能返回了。）。

(2) 思想：（具体流程）

fork之后，我们直接让父进程的物理内存page被共享 —— 让子进程的PTE指向父进程对应的物理内存page。——为了保证隔离性，可以将父、子进程的PTE标志位都设置成只读。
当我们需要更改内存的时候，就会触发page fault （因为现在正在向一个只读PTE的进行写数据的操作）。
page fault后的处理：
1. page fault之后重新分配一个物理内存page；
2. 然后copy page fault对应的物理内存的内容到新建的page；
3. 并将新分配的page映射到子进程。
4. 注意：这时候新分配的子进程的page和之前父进程的page对应的PTE都设置成可读可写。

(3) 本实验参考了

Lab6: Copy-on-Write Fork for xv6 详解 —— 尤其是ref cnt部分

1. 要求

标红处是重点。

Virtual memory provides a level of indirection: the kernel can intercept memory references by marking PTEs invalid or read-only, leading to page faults, and can change what addresses mean by modifying PTEs. There is a saying in computer systems that any systems problem can be solved with a level of indirection. The lazy allocation lab provided one example. This lab explores another example: copy-on write fork.

To start the lab, switch to the cow branch:

$ git fetch
$ git checkout cow
$ make clean

(1) The problem

The fork() system call in xv6 copies all of the parent process’s user-space memory into the child. If the parent is large, copying can take a long time. Worse, the work is often largely wasted; for example, a fork() followed by exec() in the child will cause the child to discard the copied memory, probably without ever using most of it. On the other hand, if both parent and child use a page, and one or both writes it, a copy is truly needed.

(2) The solution

The goal of copy-on-write (COW) fork() is to defer allocating and copying physical memory pages for the child until the copies are actually needed, if ever.

COW fork() creates just a pagetable for the child, with PTEs for user memory pointing to the parent’s physical pages. COW fork() marks all the user PTEs in both parent and child as not writable. When either process tries to write one of these COW pages, the CPU will force a page fault. The kernel page-fault handler detects this case, allocates a page of physical memory for the faulting process, copies the original page into the new page, and modifies the relevant PTE in the faulting process to refer to the new page, this time with the PTE marked writeable. When the page fault handler returns, the user process will be able to write its copy of the page.

COW fork() makes freeing of the physical pages that implement user memory a little trickier. A given physical page may be referred to by multiple processes’ page tables, and should be freed only when the last reference disappears.

(3) Implement copy-on write(hard)

Your task is to implement copy-on-write fork in the xv6 kernel. You are done if your modified kernel executes both the cowtest and usertests programs successfully.

To help you test your implementation, we’ve provided an xv6 program called cowtest (source in user/cowtest.c). cowtest runs various tests, but even the first will fail on unmodified xv6. Thus, initially, you will see:

$ cowtest
simple: fork() failed
$

The “simple” test allocates more than half of available physical memory, and then fork()s. The fork fails because there is not enough free physical memory to give the child a complete copy of the parent’s memory.

When you are done, your kernel should pass all the tests in both cowtest and usertests. That is:

$ cowtest
simple: ok
simple: ok
three: zombie!
ok
three: zombie!
ok
three: zombie!
ok
file: ok
ALL COW TESTS PASSED
$ usertests
...
ALL TESTS PASSED
$

Here’s a reasonable plan of attack.

Modify uvmcopy() to map the parent’s physical pages into the child, instead of allocating new pages. Clear PTE_W in the PTEs of both child and parent.
Modify usertrap() to recognize page faults. When a page-fault occurs on a COW page, allocate a new page with kalloc(), copy the old page to the new page, and install the new page in the PTE with PTE_W set.
Ensure that each physical page is freed when the last PTE reference to it goes away – but not before. A good way to do this is to keep, for each physical page, a “reference count” of the number of user page tables that refer to that page. Set a page’s reference count to one when kalloc() allocates it. Increment a page’s reference count when fork causes a child to share the page, and decrement a page’s count each time any process drops the page from its page table. kfree() should only place a page back on the free list if its reference count is zero. It’s OK to to keep these counts in a fixed-size array of integers. You’ll have to work out a scheme for how to index the array and how to choose its size. For example, you could index the array with the page’s physical address divided by 4096, and give the array a number of elements equal to highest physical address of any page placed on the free list by kinit() in kalloc.c.
Modify copyout() to use the same scheme as page faults when it encounters a COW page.

Some hints:

The lazy page allocation lab has likely made you familiar with much of the xv6 kernel code that’s relevant for copy-on-write. However, you should not base this lab on your lazy allocation solution; instead, please start with a fresh copy of xv6 as directed above.
It may be useful to have a way to record, for each PTE, whether it is a COW mapping. You can use the RSW (reserved for software) bits in the RISC-V PTE for this.
usertests explores scenarios that cowtest does not test, so don’t forget to check that all tests pass for both.
Some helpful macros and definitions for page table flags are at the end of kernel/riscv.h.
If a COW page fault occurs and there’s no free memory, the process should be killed.

3. 具体做法

（注意有些新加的函数，比如krefcnt需要加入到def.h中，此外，需要对walk函数进行修改，将static 去掉，并且也放入def.h中，方便后续调用） —— 但是后面记得该回去，或者丢弃这个分支。

(1) 修改uvmcopy()

首先增加PTE的cow标记 (riscv.h里面增加#define PTE_W (1L << 8) // copy on write fork)

#define PTE_V (1L << 0) // valid
#define PTE_R (1L << 1)
#define PTE_W (1L << 2)
#define PTE_X (1L << 3)
#define PTE_U (1L << 4) // 1 -> user can access

#define PTE_W (1L << 8) // copy on write fork

根据plan 1，进行修改（注意，这里还根据hints 1，在kalloc.c中添加reference count 函数）。uvmcopy()原始函数如下👇

// Given a parent process's page table, copy its memory into a child's page table.
// Copies both the page table and the physical memory.
// returns 0 on success, -1 on failure.
// frees any allocated pages on failure.
int
uvmcopy(pagetable_t old, pagetable_t new, uint64 sz)
{
  pte_t *pte;
  uint64 pa, i;
  uint flags;
  char *mem;

  for(i = 0; i < sz; i += PGSIZE){
    if((pte = walk(old, i, 0)) == 0)
      panic("uvmcopy: pte should exist");
    if((*pte & PTE_V) == 0)
      panic("uvmcopy: page not present");   
    pa = PTE2PA(*pte);
    flags = PTE_FLAGS(*pte);
    if((mem = kalloc()) == 0)
      goto err;
    memmove(mem, (char*)pa, PGSIZE);
    if(mappages(new, i, PGSIZE, (uint64)mem, flags) != 0){
      kfree(mem);
      goto err;
    }
  }
  return 0;

 err:
  uvmunmap(new, 0, i, 1);
  return -1;
}

修改后的uvmcopy函数如下👇 (0.禁止写；1. 只映射mappages，而不kalloc)

int
uvmcopy(pagetable_t old, pagetable_t new, uint64 sz)
{
  pte_t *pte;
  uint64 pa, i;
  uint flags;
  // char *mem;

  for(i = 0; i < sz; i += PGSIZE){
    if((pte = walk(old, i, 0)) == 0)
      panic("uvmcopy: pte should exist");
    if((*pte & PTE_V) == 0)
      panic("uvmcopy: page not present");   

    pa = PTE2PA(*pte);
    flags = PTE_FLAGS(*pte);

    //added for copy on write fork()
    if (flags & PTE_W) {
      // 禁止修改，并且标识PTE_F
      flags = (flags | PTE_F) & ~PTE_W;
      *pte = PA2PTE(pa) | flags;
    }


    // if((mem = kalloc()) == 0)
    //   goto err;
    // memmove(mem, (char*)pa, PGSIZE);
    // if(mappages(new, i, PGSIZE, (uint64)mem, flags) != 0){
    //   kfree(mem);
    //   goto err;
    // }

    if (mappages(new, i, PGSIZE, pa, flags) != 0) {
      uvmunmap(new, 0, i / PGSIZE, 1);
      return -1;
    }
  }
  return 0;

//  err:
//   uvmunmap(new, 0, i, 1);
//   return -1;
}

(2) 增加ref cnt

思想：初始化的时候赋值ref_cnt = 1; 在uvmcopy的时候ref_cnt += 1; kfree的时候 ref_cnt -= 1，并且如果ref_cnt == 0，才能真正删除。

这里具体锁的实现，参考了Lab6: Copy-on-Write Fork for xv6 详解

a. 增加一个锁🔒 （kalloc.c中）——参考了Lab6: Copy-on-Write Fork for xv6 详解

struct ref_stru {
  struct spinlock lock;
  int cnt[PHYSTOP / PGSIZE];  // 引用计数 最大物理地址除以页面大小，为每一个物理地址建一个映射
} ref;

int krefcnt(void* pa) {  // 获取内存的引用计数
  return ref.cnt[(uint64)pa / PGSIZE];
}

初始化的时候增加锁：在kinit中初始化ref的自旋锁 initlock(&ref.lock, "ref"); // 这里的锁需要个名字

void
kinit()
{
  initlock(&kmem.lock, "kmem");
  initlock(&ref.lock, "ref"); // 这里的锁需要个名字
  freerange(end, (void*)PHYSTOP);
}

b. 增加(kalloc)和减少(kfree) ref_cnt。

(1). 初始化的时候赋值为1

首先给出kalloc.c源码 ()

// Allocate one 4096-byte page of physical memory.
// Returns a pointer that the kernel can use.
// Returns 0 if the memory cannot be allocated.
void *
kalloc(void)
{
  struct run *r;

  acquire(&kmem.lock);
  r = kmem.freelist;
  if(r)
    kmem.freelist = r->next;
  release(&kmem.lock);

  if(r)
    memset((char*)r, 5, PGSIZE); // fill with junk
  return (void*)r;
}

- kalloc.c 修改如下（第一次创建的时候，ref_cnt = 1）——只修改了第一个if部分

if(r) {
	kmem.freelist = r->next;
	acquire(&ref.lock);
    ref.cnt[(uint64)r / PGSIZE] = 1;  // 将引用计数初始化为1
    release(&ref.lock);
}
release(&kmem.lock);

(2) fork 的时候ref_cnt++：

- 首先给出++cnt的函数

int kaddrefcnt(void* pa) { 
  if(((uint64)pa % PGSIZE) != 0 || (char*)pa < end || (uint64)pa >= PHYSTOP)
    return -1;
  acquire(&ref.lock);
  ++ref.cnt[(uint64)pa / PGSIZE];
  release(&ref.lock);
  return 0;
}

- 在uvmcopy函数里面，增加一条调用：

kaddredcnt(pa);

(3) kfree的时候cnt – ，当cnt == 0的时候才将它放到kmem.freelist中

- kfree原始代码

// Free the page of physical memory pointed at by v,
// which normally should have been returned by a
// call to kalloc().  (The exception is when
// initializing the allocator; see kinit above.)
void
kfree(void *pa)
{
  struct run *r;

  if(((uint64)pa % PGSIZE) != 0 || (char*)pa < end || (uint64)pa >= PHYSTOP){
    panic("kfree");
  }

  // Fill with junk to catch dangling refs.
  memset(pa, 1, PGSIZE);

  r = (struct run*)pa;

  acquire(&kmem.lock);
  r->next = kmem.freelist;
  kmem.freelist = r;
  release(&kmem.lock);
}

- 修改方法，修改释放内存的部分，修改后的代码如下

void
kfree(void *pa)
{
  struct run *r;

  if(((uint64)pa % PGSIZE) != 0 || (char*)pa < end || (uint64)pa >= PHYSTOP)
    panic("kfree");

  // == 0, 释放
  // else --cnt;
  acquire(&ref.lock);
  if(--ref.cnt[(uint64)pa / PGSIZE] == 0) {
    release(&ref.lock);

    r = (struct run*)pa;

    // Fill with junk to catch dangling refs.
    memset(pa, 1, PGSIZE);

    acquire(&kmem.lock);
    r->next = kmem.freelist;
    kmem.freelist = r;
    release(&kmem.lock);
  } else {
    release(&ref.lock);
  }
}

(4) 还有一个初始化 ——参考了Lab6: Copy-on-Write Fork for xv6 详解

我们在kinit的时候，还调用了freerange(end, (void*)PHYSTOP);，它调用了kfree（每次用初始化的时候，先free一次，因此我们在初始化的时候，应该将ref_cnt赋值为1），源码如下所示👇

void
kinit()
{
  initlock(&kmem.lock, "kmem");
  initlock(&ref.lock, "ref"); // 这里的锁需要个名字
  freerange(end, (void*)PHYSTOP);
}

void
freerange(void *pa_start, void *pa_end)
{
  char *p;
  p = (char*)PGROUNDUP((uint64)pa_start);
  for(; p + PGSIZE <= (char*)pa_end; p += PGSIZE)
    kfree(p);
}

修改freerange如下

void
freerange(void *pa_start, void *pa_end)
{
  char *p;
  p = (char*)PGROUNDUP((uint64)pa_start);
  for(; p + PGSIZE <= (char*)pa_end; p += PGSIZE)
    ref.cnt[(uint64)p / PGSIZE] = 1;
    kfree(p);
}

(3) 增加trap处理（如何处理page fault）

这里和上一个实验 6.S081 Lab4 Lazy allocation 很类似。

usertrap.c函数，然后同样是增加else if (r_scause() == 15 || r_scause() == 13)注释部分是lazy allocation的实验配置，这里可以参考。

  else if(r_scause() == 13 || r_scause() == 15)
  {
    // added by levi
    // printf("usertrap(): unexpected scause %p pid=%d\n", r_scause(), p->pid);
    // printf("            sepc=%p stval=%p\n", r_sepc(), r_stval());

    // added by levi
    // uint64 va = r_stval();
    // printf("page fault %p\n", va);
    // uint64 ka = (uint64)kalloc();
    // if (ka == 0) {
    //   p -> killed  = 1;
    // } else {
    //   memset((void *)ka, 0, PGSIZE);
    //   va = PGROUNDDOWN(va);
      // if (mappages(p -> pagetable, va, PGSIZE, ka, PTE_W | PTE_U | PTE_R) != 0) {
    //     kfree((void *)ka);
    //     p -> killed = 1;
    //   }
    // }
    // uvmalloc(p->pagetable, PGROUNDDOWN(r_stval()), PGROUNDDOWN(r_stval()) + 4096);
    

    // added by levi for copy on write fork
    uint64 va = r_stval();
    printf("page fault %p\n", va);
    if (va >= p -> sz || is_cowpage(p -> pagetable, va) == 0 || cowalloc(p -> pagetable, PGROUNDDOWN(va)) == 0) {
      p -> killed  = 1;
    }
  }

其中引用的is_cowpage和cowalloc函数如下(看名字就知道这俩函数的意思，这里不再详细解释)

// added by levi for copy on write fork
int is_cowpage(pagetable_t pg_tb, uint64 va) {
  if (va >= MAXVA) return 0;
  pte_t *pte = walk(pg_tb, va, 0);
  if (pte == 0) return 0;
  if((*pte & PTE_V) == 0) return 0;
  
  if (*pte & PTE_F) return 1;
  else return 0;

  return 0;
}
void* cowalloc(pagetable_t pg_tb, uint64 va) {
  if (va % PGSIZE) return 0;
  
  // get pa
  uint64 pa = walkaddr(pg_tb, va) ;
  if (!pa) return 0;

  // get pte
  pte_t *pte = walk(pg_tb, va, 0);

  if (krefcnt((char*)pa) == 1) {
    // 只剩一个进程对此物理地址引用，直接修改PTE权限就行
    *pte |= PTE_W;
    *pte &= ~PTE_F;
    return (void*)pa;
  } else  {
    char* mem = kalloc();
    if (!mem) return 0;

    // copy from old page
    memmove(mem, (char*)pa, PGSIZE);
    // enable write
    *pte &= ~PTE_W;
    // new map
    if(mappages(pg_tb, va, PGSIZE, (uint64)mem, (PTE_FLAGS(*pte) | PTE_W) & ~PTE_F) != 0) {
      kfree(mem);
      *pte |= PTE_V;
      return 0;
    }

    // 将原来的物理内存引用计数减1
    kfree((char*)PGROUNDDOWN(pa));
    return mem;
  }
}

(4）修改copyout()

这里是因为，从kernel到user的copy并不会触发usertrap，因此需要特殊操作。这里是——参考了Lab6: Copy-on-Write Fork for xv6 详解。

原始代码如下copyout()

// Copy from kernel to user.
// Copy len bytes from src to virtual address dstva in a given page table.
// Return 0 on success, -1 on error.
int
copyout(pagetable_t pagetable, uint64 dstva, char *src, uint64 len)
{
  uint64 n, va0, pa0;

  while(len > 0){
    va0 = PGROUNDDOWN(dstva);
    pa0 = walkaddr(pagetable, va0);
    if(pa0 == 0)
      return -1;
    n = PGSIZE - (dstva - va0);
    if(n > len)
      n = len;
    memmove((void *)(pa0 + (dstva - va0)), src, n);

    len -= n;
    src += n;
    dstva = va0 + PGSIZE;
  }
  return 0;
}

具体修改为：判断是否是cow page，如果是的话，要把pa0重新分配一个page

while(len > 0){
    va0 = PGROUNDDOWN(dstva);
    pa0 = walkaddr(pagetable, va0);
    
    if (is_cowpage(pagetable, va0) == 0) {
        pa0 = (uint64)cowalloc(pagetable, va0);
    }
    
    if(pa0 == 0)
      return -1;
    n = PGSIZE - (dstva - va0);
    if(n > len)
      n = len;
    memmove((void *)(pa0 + (dstva - va0)), src, n);

    len -= n;
    src += n;
    dstva = va0 + PGSIZE;
  }

(5) 结果

$ cowtest
simple: ok
...
file: ok
ALL COW TESTS PASSED
$ usertests
usertests starting
test execout: OK
test pgbug: OK
test sbrkbugs: usertrap(): unexpected scause 0x000000000000000c pid=3251
            sepc=0x000000000000555e stval=0x000000000000555e
usertrap(): unexpected scause 0x000000000000000c pid=3252
            sepc=0x000000000000555e stval=0x000000000000555e
OK
test badarg: OK
... 
ALL TESTS PASSED

4. 总结

本实验和lazy allocate都是相同的思想，都是在使用的时候，不分配页表，这样可以尽可能地减少页面分配。当然写的时候，还是需要牺牲一定的性能的（写的时候需要现分配page才能写）。—— 其实中间还有一个Zero Fill On Demand (刚开始所有全0的page只分配一个，所有的全零page都指向同一个物理地址)。这仨思想都是lazy allocate。

fork之后，我们直接让父进程的物理内存page被共享 —— 让子进程的PTE指向父进程对应的物理内存page。——为了保证隔离性，可以将父、子进程的PTE标志位都设置成只读。
当我们需要更改内存的时候，就会触发page fault （因为现在正在向一个只读PTE的进行写数据的操作）。
page fault后的处理：
1. page fault之后重新分配一个物理内存page；
2. 然后copy page fault对应的物理内存的内容到新建的page；
3. 并将新分配的page映射到子进程。
4. 注意：这时候新分配的子进程的page和之前父进程的page对应的PTE都设置成可读可写。