「实验记录」MIT 6.S081 Lab10 mmap

士全

已于 2023-04-08 16:16:38 修改

阅读量830

点赞数 2

分类专栏： xv6-labs-2020 文章标签： linux risc-v

于 2023-02-23 16:25:22 首次发布

本文链接：https://blog.csdn.net/qq_34872231/article/details/129184260

版权

xv6-labs-2020 专栏收录该内容

10 篇文章 8 订阅

订阅专栏

#Lab10: mmap

I. Source
II. My Code
III. Motivation
- M1 - 虚拟内存经典用法
- M2 - 文件 map2 虚拟内存
IV. Solution
V. Result
VI. Reference

I. Source

II. My Code

III. Motivation

Lab10: mmap 主要是想让我们熟悉虚拟内存的另一种用法，即是将 on-disk 中的文件对象映射到进程的虚拟内存中

在实验开始之前，建议阅读 Paper - Virtual memory primitives for user programs ，提到的一些虚拟内存的原语，或许对理解 mmap 有所帮助，以及一些虚拟内存的应用，特别是共享虚拟内存（ shared virtual memory ），这些技巧在系统构建中至关重要。可以阅读 MIT-6.S081 Paper Virtual memory primitives for user programs 个人笔记

说到虚拟内存的一般用法，或许我们都知道，在 Lab3: page tables 之后的几个实验里，就一直在和虚拟内存打交道（虚拟地址 va 和物理地址 pa ）

M1 - 虚拟内存经典用法

进程想要正常运行，就必须要有内存存放数据。在以前没有引入虚拟内存的年代，我们是把数据直接存放在物理内存的（可以理解成实模式）。在这种情况下，会带来很多不便，举个例子，

现有两个进程 AB ，它俩是可以访问物理内存的所有地址的。进程 A 更新变量 i 之后将其存放在 addr64 处，进程 B 更新变量 j 之后将其存放在 addr128 处，这两个操作没问题，好像数据就应该这么存放

进程 A 又继续运行，想再去更新变量 i ，指令本应该是 write addr64 ，但是编写这段代码的程序员疏忽了，指令写成了 write addr128 ，这下坏事了！进程 A 把爪子伸到了进程 B 的口袋里了，竟然去修改进程 B 的数据，待进程 B 要用到变量 j 时，因为 j 被篡改过，所以进程 B 之后的一系列计算都会出错，这岂不是乱套了！

我们认识到上述这种错误，本质，就是没有隔离性。其他进程竟然可以访问属于自己的数据，这绝不允许！针对这种情况，我们提出了虚拟内存的概念，也就是每个进程都有自己的 “一亩三分地” ，在自己的地盘上作，爱咋作咋作，反正也不会作到其他进程头上

这就是我们的初衷，虚拟内存应运而生！有了 “一亩三分地” 之后，我们考虑这 “一亩三分地” 到底划多大合适呢？由于现在的硬件都是 32 or 64 位的，那我们干脆把虚拟内存调成 2³² 吧。这样，一方面足够大，另一方面也方便硬件访问等操作。2³² B 就是我们常说的 4 GB ，好，我们现在知道了虚拟内存一般为 4 GB

我们现在又有一个问题，都说虚拟内存 4 GB ，可我的机器的物理内存才 512 MB ，这可如何装的下 4 GB 这么大一个进程啊？更别说 OS 运行时，同时会有好多个进程！

装的下，信吗？肯定能装的下，因为机器还有 disk 啊！disk 的容量很大。可以把 disk 也拉上，算作 memory 的一部分，权当扩展物理内存了。这个部分就是访问慢了一点而已，无伤大雅的，速度更不上的话，就把 disk 部分当作仓库不行嘛？当用到数据的时候，将数据从 disk 拷贝到 memory 中；当不再使用时，将数据从 memory 中回写至 disk

这是个很不错的想法，假设我们的 disk 有 128 GB ，如果采取上述的想法，那么 memory 将可以扩充为 128GB $^+$ ，这很不错！这样一来，OS 就可以同时能够容乃 32 $^+$ 个进程了

这种想法被现代 OS 采纳，成为虚拟内存的经典用法（换入换出）。总结一下，就是用 disk 扩充 memory ，给人一种错觉：每个进程都拥有 4 GB 空间。且看下图，

这就是 xv6 中虚拟内存的空间分布，text 和 data 区域一般是只读的；user stack 和 heap 是给 user 使用的；trapframe 是用来保存现场的；而 trampoline 存放的是 kernel 跳转代码，也是只读的。看完虚拟内存之后，我们再来看一下它与物理内存的联系，

联系是通过映射将虚拟内存的地址转译成物理地址，即是，用到数据时将其从 disk 中拷贝至 memory ，因为 CPU 只能操纵 memory

至此，我想我们已经清楚了虚拟内存的经典用法了！

M2 - 文件 map2 虚拟内存

现在，我们知道每个进程都有虚拟内存，而且空间很大，高达 4 GB ，用于暂存运行时的数据。在现实世界中，一般都是这种情况，许许多多的文件存储在 disk 中，比如 .mp4 或 .pdf 。我们想要访问它们，唯一的方法，就是将其从 disk 拉取到 memory 中，因为 CPU 只能访问 memory（这是由冯诺依曼体系决定的）

如何将文件从 disk 拉取到 memory 中呢？这是 Lab10: mmap 的重点，比如，进程 A 需要访问 ReadMe.pdf ，我们不可以直接将 .pdf 拷贝至 memory 就完事了，这就好比货物运送到商店，并没有将其上架，而是直接放在地上，这完全没有达到上货的目的

我们在拉取 .pdf 时要清楚是哪个进程需要访问它，对应在商店中，就是哪个货架需要补货，然后那个货架要腾出点地方 or 预留点空间用来存放新的货物。文件映射的情况也是类似，进程 A 需要收拾一下虚拟内存空间，为 .pdf 预留出空位（标明地址空间从哪到哪是留给 .pdf 的）。且看映射图，

进程需要在 heap 中为 file 预留一块空闲区域，在图中对应灰色的 mapped 区域；可以将整个 file 都映射到虚拟内存，也可以自定义，选择映射部分文件，图中的 offset 和 len 说明了一切。这些我们都可以通过 mmap 设置

我想看完这张图，大概已经清楚 Lab10: mmap 的 Motivation 了！就是将 disk 中的 file 拉取到 memory 中，期间需要我们在进程的虚拟内存中为 file 预留位置

至此，你可能不觉得 mmap 机制有多牛，它不就是把 file 拖到 memory 中嘛？如果这么想，就大错特错了。因为在 Linux 中所有对象都是 file ，包括设备，也就是万事万物都是 file 。看到这里，是不是觉得 mmap 瞬间高大了

IV. Solution

讲明白了 Lab10: mmap 具体想让我们做什么事之后，可以开始着手设计 mmap 机制了

S1 - 声明 mmap() 和 munmap()

根据 Lab10: mmap 实验主页给出的提示，我们现在 user/user.h 中声明 mmap() 和 munmap() 这两个 system call ，声明如下，

void* mmap(void *addr, int len, int prot, int flags, int fd, uint offset);
int munmap(void *addr, int len);

我简单的翻译一下，mmap() 主要职责就是在 heap 中为 file 预留一块空闲区域，需要提前记录一下这块区域的一些基本信息。其第一个参数 addr 是指从虚拟内存的何处开始，存放映射的内容；len 是映射内容的长度；prot 是映射区域的保护级别；flags 规定了文件更新之后是否应该写回 disk ，MAP_PRIVATE 表示不用，MAP_SHARED 代表需要写回；fd 很好理解，就是文件描述符；offset 是指映射内容在整个文件中的偏移。Lab10: mmap 实验主页原话，

mmap can be called in many ways, but this lab requires only a subset of its features relevant to memory-mapping a file. You can assume that addr will always be zero, meaning that the kernel should decide the virtual address at which to map the file. mmap returns that address, or 0xffffffffffffffff if it fails. length is the number of bytes to map; it might not be the same as the file’s length. prot indicates whether the memory should be mapped readable, writeable, and/or executable; you can assume that prot is PROT_READ or PROT_WRITE or both. flags will be either MAP_SHARED, meaning that modifications to the mapped memory should be written back to the file, or MAP_PRIVATE, meaning that they should not. You don’t have to implement any other bits in flags. fd is the open file descriptor of the file to map. You can assume offset is zero (it’s the starting point in the file at which to map).

而 munmap() 对应着回收这块区域，它要简单许多，就俩参数，一个是映射区域的起始地址，再者就是映射区域有多大。Lab10: mmap 实验主页原话，

munmap(addr, length) should remove mmap mappings in the indicated address range. If the process has modified the memory and has it mapped MAP_SHARED, the modifications should first be written to the file. An munmap call might cover only a portion of an mmap-ed region, but you can assume that it will either unmap at the start, or at the end, or the whole region (but not punch a hole in the middle of a region).

还需要在 Makefile 中追加 mmaptest 的编译选项，

UPROGS=\
	$U/_cat\
	...
	$U/_zombie\
	$U/_mmaptest\

和 Lab2: system calls 一样，在 user/usys.pl 中添加 entry 字段，

entry("mmap");
entry("munmap");

在 kernel/syscall.h 中添加这两个 system call 的编号，

#define SYS_mmap   22
#define SYS_munmap 23

最后，在 kernel/syscall.c 中序列化这两个 system call ，

extern uint64 sys_mmap(void);
extern uint64 sys_munmap(void);

static uint64 (*syscalls[])(void) = {
...
[SYS_mmap]    sys_mmap,
[SYS_munmap]  sys_munmap,
};

S2 - 定义 struct vma

在 S1 - 声明 mmap() 和 munmap() 中提到过，在使用 mmap 的时候，需要记录下映射区域的一些基本信息，包括起始地址、映射区域长度、权限和文件偏移等等。这些 metadata 反应在代码中，就是要定义一个结构体，来存放这些信息

在 Lab10: mmap 中我们在 kernel/proc.h 中定义 struct vma ，意为虚拟内存区域（ virtual memory area ），

struct vma {
  uint64 addr;
  int len;
  int prot;
  int flags;
  int fd;
  int offset;
  struct file *file;
};

和 Lab10: mmap 实验主页给出的 mmap 原型声明中所需的参数保持一致。另外，xv6 的进程是可以映射多个文件的，原话，

Since the xv6 kernel doesn’t have a memory allocator in the kernel, it’s OK to declare a fixed-size array of VMAs and allocate from that array as needed. A size of 16 should be sufficient.

所以我们在 struct proc 中还需添加 struct vma[] 字段，修改 kernel/proc.h 如下，

#define NVMA 16

// Per-process state
struct proc {
	...
  char name[16];               // Process name (debugging)
  struct vma vmas[NVMA];
};

其中宏 NVMA 表示虚拟内存区域的个数，规定是一个进程能够支持 16 个文件

S3 - 设计 & 实现 mmap 机制

定义好 struct vma 等一系列数据结构之后，就可以开始着手设计并实现 mmap 机制了。首先在 kernel/sysfile.c 中定义 sys_mmap() ，我先把完整的业务流程展示出来，稍后再做解释，

uint64
sys_mmap(void)
{
  uint64 addr;
  int len, prot, flags, fd, offset;
  struct file* file;
  struct vma* vma = 0;

  if(argaddr(0, &addr)<0 || argint(1, &len)<0
    || argint(2, &prot)<0 || argint(3, &flags)<0
    || argfd(4, &fd, &file)<0 || argint(5, &offset)<0)
    return -1;

  /** 保护权限冲突 */
  if(!file->writable && (prot & PROT_WRITE) && flags==MAP_SHARED)
    return -1;

  struct proc* p = myproc();
  len = PGROUNDUP(len);

  if(p->sz+len > MAXVA)
    return -1;

  if(offset<0 || offset%PGSIZE)
    return -1;

  for(int i=0; i<NVMA; i++) {
    if(p->vmas[i].addr)
      continue;

    vma = &p->vmas[i];
    break;
  }

  if(!vma) /** 在 vm 中没找到可以被用作映射的空闲区域 */
    return -1;

  if(addr == 0) 
    vma->addr = p->sz;
  else  /** Caller 指定映射的起始地址 */
    vma->addr = addr;
  
  vma->len = len;
  vma->prot = prot;
  vma->flags = flags;
  vma->fd = fd;
  vma->offset = offset;
  vma->file = file;
  filedup(file);
  p->sz += len;

  return vma->addr;
}

在 sys_mmap() 中，我们通过寄存器来传递 Caller 的参数（套路见 Lab2: system calls ）。读取各个参数之后，再判断文件的可访问性是否与 Caller 所要求的权限相冲突，即是，文件本身是不可写的，但将来文件映射在 memory 中的那块区域却被规定为可写的，且 Caller 还要求 in-memory 的文件副本被修改后应该更新 on-disk 的文件本体。这一系列操作，你品读一下，就知道其中的逻辑是多么荒诞了！

随后，就是一系列的越界检查，最大不要超出 heap 以及 offset 是否为 PGSIZE 的整数倍，这个在 Lab10: mmap 实验主页中有规定，

read 4096 bytes of the relevant file into that page, and map it into the user address space.

之后，就进入预留映射区域环节了。第一步，就是找找还有空位嘛？反应在代码里，就是 for-loop 遍历整个 struct vma[] ，把空闲的区域揪出来， Lab10: mmap 实验主页原话，

Implement mmap: find an unused region in the process’s address space in which to map the file, and add a VMA to the process’s table of mapped regions.

然后，就是正常的记录 metadata ，包括虚拟内存中哪里还有空位、映射多少文件内容等等。这里需要注意，如果 Caller 在 addr 处填 0 的话，意味着选择空闲区域的权利交给 kernel 了，kernel 可以根据进程当前所使用的内存情况进行分配；反之，则按 Caller 规定的地址来

另外，在记录完之后，要调用 filedup() 增加 file 的引用数，表明当前有多少双眼睛在盯着 file 呢！不要轻易释放等等， Lab10: mmap 实验主页原话，

mmap should increase the file’s reference count so that the structure doesn’t disappear when the file is closed (hint: see filedup).

至此，已经完成了 mmap 的大致设计，是不是很像 Lab5: lazy page allocation ，只做好了简单的预留位置工作（记录 metadata ），并没有做实质性的内存分配和数据拷贝工作

这些分配和拷贝事宜，其实是不急的！待用到时再操作也不迟！ Lab10: mmap 实验主页原话，

Fill in the page table lazily, in response to page faults. That is, mmap should not allocate physical memory or read the file. Instead, do that in page fault handling code in (or called by) usertrap, as in the lazy page allocation lab. The reason to be lazy is to ensure that mmap of a large file is fast, and that mmap of a file larger than physical memory is possible.

S4 - page-fault 内存分配 & 数据拷贝

当进程第一次访问 file 的某段内容时，会发生什么？进程会先去 memory 中寻找是否有 file 对应的内容，当然，结果一定是 NO ！具体原因，请移步 Lab5: lazy page allocation 的 Lazy allocation

此时会发生 page-fault 缺页中断，进入 kernel/trap.c:usertrap() 中，

//
// handle an interrupt, exception, or system call from user space.
// called from trampoline.S
//
void
usertrap(void)
{
 	...
  if(r_scause() == 8){
    // system call
		...
  } else if((which_dev = devintr()) != 0){
    // ok
  } else if(r_scause()==13 || r_scause()==15) { /** 缺页中断 */
    uint64 va = r_stval();
    struct vma* vma = 0;

    if(va>=p->sz || va<=p->trapframe->sp) /** va 必须在 heap 中，件 xv6 book Figure 3.4 */
      goto killing;
    
    for(int i=0; i<NVMA; i++) {
      if(va>=p->vmas[i].addr && va<p->vmas[i].addr+p->vmas[i].len) {
        vma = &p->vmas[i];
        break;
      }
    }

    if(!vma)
      goto killing;

    /** 在 vm 中找到了缺页的文件对象 */
    va = PGROUNDDOWN(va);
    
    /** 尝试为文件对象的 vm 分配内存，用来容乃新的内容 */
    char* mem = kalloc();
    if(mem == 0)
      goto killing;
    
    memset(mem, 0, PGSIZE);
    /** 将存储在 disk 中的文件对象的新内容拷贝到 vm */
    ilock(vma->file->ip);
    readi(vma->file->ip, 0, (uint64)mem, va-vma->addr+vma->offset, PGSIZE);
    iunlock(vma->file->ip);

    /** 根据 prot 设置 PTE 权限 */
    int flags = PTE_U;
    if(vma->prot & PROT_READ) 
      flags |= PTE_R;
    if(vma->prot & PROT_WRITE)
      flags |= PTE_W;
    if(vma->prot & PROT_EXEC)
      flags |= PTE_X;
    
    if(mappages(p->pagetable, va, PGSIZE, (uint64)mem, flags) != 0)
      goto freeing;

    /** 顺利结束缺页中断流程 */
    goto rest;

  freeing:
    kfree(mem);

  killing:
    p->killed = 1;
  
  rest:
    ;
  } else {
   ...
  }

 	...
}

根据 Lab5: lazy page allocation ，我们知道缺页中断一般是 scause 为 13 or 15 。缺页中断后，我们通过 r_stval() 先获取 page-fault 的虚拟地址 va ，然后简单判断一下是否越界

顺利的话，就能定位到 va 所在的映射区域。紧接着，就是尝试分配内存和拷贝数据，这里的拷贝数据是读取 on-disk 的文件内容，所以要调用 readi() ，前后是要对 inode 上锁放锁的，这个在 Lab10: mmap 实验主页中有交代，

Read the file with readi, which takes an offset argument at which to read in the file (but you will have to lock/unlock the inode passed to readi). Don’t forget to set the permissions correctly on the page.

并且它提醒我们要正确设置映射区域的权限。最后调用 mappages() 建立进程的虚拟内存与物理内存之间的联系

至此，完成了由于 page-fault 带来的一系列内存分配和数据拷贝的工作

S5 - munmap() 解除映射

在 S3 - 设计 & 实现 mmap 机制和 S4 - page-fault 内存分配 & 数据拷贝中我们大讲了如何建立映射，如何分配和拷贝的问题。有建立，必然要有解除，这两个要配套出现，不然会造成资源紧张的局面

munmap() 做的事较为简单，就是根据 addr 在虚拟内存中释放映射区域，Lab10: mmap 实验主页原话，

Implement munmap: find the VMA for the address range and unmap the specified pages (hint: use uvmunmap). If munmap removes all pages of a previous mmap, it should decrement the reference count of the corresponding struct file. If an unmapped page has been modified and the file is mapped MAP_SHARED, write the page back to the file.

我的定义如下，

uint64
sys_munmap(void)
{
  uint64 addr;
  int len;
  struct vma* vma = 0;
  struct proc* p = myproc();

  if(argaddr(0, &addr)<0 || argint(1, &len)<0)
    return -1;

  addr = PGROUNDDOWN(addr);
  len = PGROUNDUP(len);

  for(int i=0; i<NVMA; i++) {
    if(p->vmas[i].addr && addr>=p->vmas[i].addr 
      && addr+len<=p->vmas[i].addr+p->vmas[i].len) {
      vma = &p->vmas[i];
      break;
    }
  }

  if(!vma)
    return -1;

  if(addr != vma->addr)
    return -1;

  /** 逐个释放 file 映射在 vm 中的 pages */
  vma->addr += len;
  vma->len -= len;
  if(vma->flags & MAP_SHARED)
    filewrite(vma->file, addr, len);
  uvmunmap(p->pagetable, addr, len/PGSIZE, 1);

  return 0;  
}

就是根据 addr 锁定映射区域，然后逐一释放文件映射在虚拟内存中的 pages 。其中要注意，如果 Caller 的 flags 是 MAP_SHARED 的话，则需要将 in-memory 中的更新内容回写至 disk 中

关于 MAP_SHARED ，可以理解成文件是共享的，那必然要保证文件本体的内容是最新的，如果文件副本被修改过，那么一定要通知所有的副本及本体，这在 MIT-6.S081 Paper Virtual memory primitives for user programs 个人笔记的 shared virtual memory 中有提及

期间，还需要屏蔽掉 PTE 无效的情况，因为在 page 还未分配的情况下，去访问 page ，状态势必是无效的。对应 kernel/vm.c:uvmunmap() 和 uvmcopy() ，

void
uvmunmap(pagetable_t pagetable, uint64 va, uint64 npages, int do_free)
{
  ...
  for(a = va; a < va + npages*PGSIZE; a += PGSIZE){
    ...
    if((*pte & PTE_V) == 0)
      continue;
      // panic("uvmunmap: not mapped");
    ...
  }
}

int
uvmcopy(pagetable_t old, pagetable_t new, uint64 sz)
{
  ..
  for(i = 0; i < sz; i += PGSIZE){
    ..
    if((*pte & PTE_V) == 0)
      continue;
      // panic("uvmcopy: page not present");
    ...
  }
 	...
}

具体请移步 Lab5: lazy page allocation 的 Lazy allocation

S6 - 善始善终的 fork() 和 exit()

在 fork 子进程时，我们不光需要记录父进程编号、虚拟内存剩余空间等重要信息，还要拷贝父进程的映射区域相关信息，在 kernel/proc.c:fork() 中这样定义，

int
fork(void)
{
 	...
  np->state = RUNNABLE;

  /** 子进程要拷贝父进程的 vmas */
  for(int i=0; i<NVMA; i++) {
    memmove(&np->vmas[i], &p->vmas[i], sizeof(p->vmas[i]));
    if(p->vmas[i].file)
      filedup(p->vmas[i].file);
  }

 	...
}

并且对父进程使用的文件进行再次引用，Lab10: mmap 实验主页原话，

Modify fork to ensure that the child has the same mapped regions as the parent. Don’t forget to increment the reference count for a VMA’s struct file.

对应进程结束时的 exit ，我们在 kernel/proc.c:exit() 中插上一段解除映射区域与物理内存的关联语句即可，表明这块文件的映射区域，我再也不会使用了，请回收吧，自便！

void
exit(int status)
{
  ...

  // Close all open files.
  for(int fd = 0; fd < NOFILE; fd++){
    ...
  }

  for(int i=0; i<NVMA; i++) {
    uvmunmap(p->pagetable, p->vmas[i].addr, p->vmas[i].len/PGSIZE, 1);
  }

 	...
}