XV6 Lab1： System Call

M号攻城狮

已于 2022-12-18 15:27:28 修改

阅读量1.2k

点赞数 1

分类专栏：操作系统文章标签： perl 开发语言

于 2022-10-23 16:52:41 首次发布

本文链接：https://blog.csdn.net/weixin_53215555/article/details/127477065

版权

操作系统专栏收录该内容

8 篇文章 2 订阅

订阅专栏

Lab1：System Calls

【p->trapframe结构和作用：由中断、陷入、异常进入内核后，在堆栈上形成的一种数据结构。】

本 lab 的任务是添加系统调用，理解 xv6 内核和系统调用。

阅读指路：
xv6book：Chapter 2 & Chapter 4 (4.3, 4.4)
user/user.h：用户态的系统调用封装函数和相关数据结构，以及ulib.c中的通用函数
user/usys.pl：(perl script) generate usys.S, the stubs for syscalls
kernel/syscall.h：(#define) system call numbers
kernel/proc.h：进程相关的数据结构定义
kernel/proc.c：进程相关的函数（例如fork/wait/kill/exit… ）
kernel/syscall.c：系统调用函数syscall()以及相关数据结构
kernel/sysproc.c：管理进程的系统调用函数

【解析】（感谢参考文章的讲解让我明白了系统调用过程的细节，在这里分享）
添加一个系统调用应该包含以下步骤：
1.在syscall.h中添加系统调用号。
2.在系统调用入口函数syscall.c增加系统调用。
3.添加一个entry到user/usys.pl：perl语言自动生成汇编语言usys.S，是用户态系统调用接口，首先把系统调用号压入a7寄存器，然后就直接ecall进入系统内核。而syscall函数就把a7寄存器的数字读出来调用对应的函数，所以这里就是系统调用用户态和内核态的切换接口。
4.添加声明到user/user.h，让用户态程序在编译的时候可以通过。

xv6book：
4.3 Code: Calling system calls 系统调用过程

{
第2章以initcode.S调用exec系统调用结束（user/initcode.S:11）。让我们来看看用户调用是如何在内核中实现exec系统调用的。

用户代码将exec的参数放在寄存器a0和a1中，并将系统调用号放在a7中。系统调用号与函数指针表syscalls数组（kernel/syscall.c:108）中的项匹配。ecall指令进入内核，执行uservec、usertrap，然后执行syscall()。

syscall（kernel/syscall.c:133）从trapframe中的a7中得到系统调用号，并其作为索引在syscalls查找相应函数。（对于第一个系统调用exec，a7将为SYS_exec（kernel/syscall.h:8），这会让syscall调用exec的实现函数sys_exec）。

当系统调用函数返回时，syscall将其返回值记录在p->trapframe->a0中。用户空间的exec()将会返回该值，因为RISC-V上的C调用通常将返回值放在a0中。系统调用返回负数表示错误，0或正数表示成功。如果系统调用号无效，syscall会打印错误并返回-1。
}

4.4 Code: System call arguments 系统调用参数

{
内核的系统调用实现需要找到用户代码传递的参数。

因为用户代码调用系统调用的包装函数，
参数首先会存放在寄存器中，这是C语言存放参数的惯例位置。
内核trap代码将用户寄存器保存到当前进程的trap frame中，内核代码可以在那里找到它们。
函数argint、argaddr和argfd从trap frame中以整数、指针或文件描述符的形式检索第n个系统调用参数。它们都调用argraw来获取保存的用户寄存器（kernel/syscall.c:35）。

一些系统调用传递指针作为参数，而内核必须使用这些指针来读取或写入用户内存。 例如，exec系统调用会向内核传递一个指向用户空间中的字符串的指针数组。这些指针带来了两个挑战。首先，用户程序可能是错误的或恶意的，可能会传递给内核一个无效的指针或一个旨在欺骗内核访问内核内存而不是用户内存的指针。第二，xv6内核页表映射与用户页表映射不一样，所以内核不能使用普通指令从用户提供的地址加载或存储。

内核实现了安全地将数据复制到用户提供的地址或从用户提供的地址复制数据的函数。例如fetchstr（kernel/syscall.c:25）。文件系统调用，如exec，使用fetchstr从用户空间中检索字符串文件名参数。fetchstr调用copyinstr来做这些困难的工作。

copyinstr（kernel/vm.c:406）将用户页表pagetable中的虚拟地址srcva复制到dst，需指定最大复制字节数。它使用walkaddr（调用walk函数）在软件中模拟分页硬件的操作，以确定srcva的物理地址pa0。walkaddr（kernel/vm.c:95）检查用户提供的虚拟地址是否是进程用户地址空间的一部分，所以程序不能欺骗内核读取其他内存。类似的函数copyout，可以将数据从内核复制到用户提供的地址。
}

1. System call tracing

目标：实现新的系统调用trace，跟踪特定的系统调用（该功能在调试以后的 lab 时会有所帮助）
介绍：
系统调用接收一个参数，即整数“掩码”，指定要跟踪的系统调用（例如，要跟踪 fork 系统调用，程序将调用 trace(1 << SYS_fork)，其中 SYS_fork 是 kernel/syscall.h 中的 syscall 号）
修改 xv6 内核，使得如果一个系统调用在掩码中被设置，则必须在每次系统调用即将返回时打印出一行：该行应包含进程 ID，系统调用的名称和返回值，无需打印系统调用参数。
trace 系统调用应该启用对调用它的进程及其随后派生的所有子进程的跟踪，但不应影响其他进程。

准备工作（添加新的系统调用）：

Add $U/_trace to UPROGS in Makefile
（The Makefile invokes the perl script user/usys.pl, which produces user/usys.S, the actual system call stubs, which use the RISC-V ecall instruction to transition to the kernel.）
add a prototype for the system call to user/user.h：

int trace(int); // for lab1:syscall:trace

add a stub to user/usys.pl：

entry("trace");     # for lab1:syscall:trace

add a syscall number to kernel/syscall.h：

#define SYS_trace 22    // for lab1:syscall:trace

主要工作：

Add a sys_trace() function in kernel/sysproc.c that implements the new system call by remembering its argument in a new variable in the proc structure (see kernel/proc.h). The functions to retrieve system call arguments from user space are in kernel/syscall.c, and you can see examples of their use in kernel/sysproc.c.
在 kernel/sysproc.c 添加新的系统调用 sys_trace()，将参数 (要跟踪的系统调用“掩码”) 存储在proc数据结构的新变量中，syscall.c中的函数 syscall() 获取用户态输入的系统调用参数：

/*
【New System Call】:
by remembering its argument in a new variable in the proc structure 
(see [kernel/proc.h]).√
(I add a new variable in the proc structure)
The functions to retrieve system call arguments from user space are in [kernel/syscall.c], 
and you can see examples of their use in [kernel/sysproc.c].√(I use argint())
*/
uint64
sys_trace(void){
  int mask;  // "掩码"
  if(argint(0, &mask) < 0){	
  // 【retrieve system call arguments from user space】
    return -1;
  }
  myproc()->trace_mask = mask;
  // 【remembering its argument in a new variable in the proc structure】
  return 0;
}

其中 argint() 函数定义在 kernel/syscall.c，功能是获取用户态的系统调用命令的参数：

static uint64
argraw(int n)
{
  struct proc *p = myproc();
  switch (n) {
  case 0:
    return p->trapframe->a0;
  case 1:
    return p->trapframe->a1;
  case 2:
    return p->trapframe->a2;
  case 3:
    return p->trapframe->a3;
  case 4:
    return p->trapframe->a4;
  case 5:
    return p->trapframe->a5;
  }
  panic("argraw");
  return -1;
}

// Fetch the nth 32-bit system call argument.
int
argint(int n, int *ip)
{
  *ip = argraw(n);
  return 0;
}

Modify fork() (see kernel/proc.c) to copy the trace mask from the parent to the child process.
父进程将自己的mask复制给子进程，以跟踪子进程的特定sysytem call：

np->trace_mask = p->trace_mask; // 复制父进程的trace状态到子进程

Modify the syscall() function in kernel/syscall.c to print the trace output.
You will need to add an array of syscall names to index into.
新建一个系统调用号到名称的索引，实现在syscall()函数中输出trace的信息：

static uint64 (*syscalls[])(void) = {
[SYS_fork]    sys_fork,
[SYS_exit]    sys_exit,
...
[SYS_trace]   sys_trace,  // for Lab1:syscall:trace
[SYS_sysinfo]    sys_sysinfo,   // for Lab1:syscall:sysinfo  
};

char* syscalls_name[30] = 
{"", "fork", "exit", "wait", "pipe", "read", "kill", "exec",
"fstat", "chdir", "dup", "getpid", "sbrk", "sleep", "uptime",
"open", "write", "mknod", "unlink", "link", "mkdir", "close", 
"trace", "sysinfo",};

// [kernel/syscall.c]
void
syscall(void)	// 【系统调用入口函数】
{
  int num;
  struct proc *p = myproc();  // current process

  num = p->trapframe->a7; // 系统调用号
  if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {   // 合法且可执行的系统调用
    p->trapframe->a0 = syscalls[num]();	// 【通过系统调用号找到具体的系统调用函数】
    // 如果系统调用号 num 在 trace_mask 上的对应位为1，输出信息，完成跟踪任务
    if((1 << num) & p->trace_mask) {
    	printf("%d: syscall %s -> %d\n",
              p->pid, syscalls_name[num], p->trapframe->a0);
              // 【系统调用返回，返回值存储在 p->trapeframe->a0】
  	}
  } else {
    printf("%d %s: unknown sys call %d\n",
            p->pid, p->name, num);
    p->trapframe->a0 = -1;
  }
}

以上是我们需要做的全部工作，下面给出用户态程序trace.c，通过注释帮助理解【从命令行输入到系统调用实现】的整个流程是如何进行的：

// 一个用户程序，调用系统调用sys_trace()实现跟踪系统调用的功能
//（下面通过注释帮助理解，整个以用户程序开始并由操作系统控制的实现过程）

#include "kernel/param.h"
#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
  int i;
  char *nargv[MAXARG];

  if(argc < 3 || (argv[1][0] < '0' || argv[1][0] > '9')){   // 命令行的规范形式
    fprintf(2, "Usage: %s mask command\n", argv[0]);
    exit(1);
  }

  if (trace(atoi(argv[1])) < 0) {   
    /* 
    (trace函数已添加到 user/user.h, 对此用户程序可见)
    所有的系统调用都首先经过 kernel/syscall.c 的 syscall()【系统调用入口函数】,
    此处调用trace系统调用, 转到 syscall(), 后转到kernel/sysproc.c 的sys_trace()函数,
    接收“掩码”并设置状态到当前进程的数据结构中,syscall()返回;
    因此在之后的系统调用发生时, 在syscall()函数中系统调用返回后,
    根据进程传入的参数“掩码”和系统调用号关系, 判断系统调用是否应该被跟踪并输出.
    
    需要注意, 要求满足子进程的系统调用依然被跟踪, 对于 kernel/proc.c 的fork()函数,
    子进程继承(复制)父进程的“掩码”.
    */
    fprintf(2, "%s: trace failed\n", argv[0]);
    exit(1);
  }
  
  for(i = 2; i < argc && i < MAXARG; i++){  // 提取"trace mask"后面的命令行
    nargv[i-2] = argv[i];
  }
  exec(nargv[0], nargv);
  // exec系统调用, 执行"trace mask"后面的命令, 跟踪特定的系统调用.
  exit(0);
}

2. Sysinfo

添加系统调用 sysinfo，收集正在运行的系统的相关信息。
系统调用接收一个参数：一个指向 struct sysinfo 的指针

// [kernel/sysinfo.h]
struct sysinfo {
  uint64 freemem;   // amount of free memory (bytes)
  uint64 nproc;     // number of process
};

内核应填写此结构体的字段：freemem 字段应设置为可用内存的字节数，nproc 字段应设置为进程的 state 不是 UNUSED 的进程数。

准备工作：

Add $U/_sysinfotest to UPROGS in Makefile
Add the system call sysinfo, following the same steps as in the previous assignment.
新建一个系统调用的流程，和上面的trace一致
To declare the prototype for sysinfo() in user/user.h, you need predeclare the existence of struct sysinfo:
struct sysinfo;
int sysinfo(struct sysinfo *);

主要工作：

sysinfo needs to copy a struct sysinfo back to user space; see sys_fstat() (kernel/sysfile.c) and filestat() (kernel/file.c) for examples of how to do that using copyout().

// [kernel/sysproc.c]
uint64
sys_sysinfo(void){
  uint64 addr;
  if(argaddr(0, &addr) < 0)	
  // 【获取系统调用参数：用户态的虚拟地址，需要将sysinfo信息从内核态复制到此地址】
    return -1;
  struct sysinfo mysysinfo;	// 内核态新建sysinfo结构体变量
  mysysinfo.freemem = collect_mem();  // [kernel/kalloc.c]
  mysysinfo.nproc = collect_proc_num(); // [kernel/proc.c]
  // 【以上两个获取信息的函数在后面分别实现】

  // copy a struct sysinfo back to user space.
  struct proc *p = myproc();	// 当前进程
  if(copyout(p->pagetable, addr, (char*)&mysysinfo, sizeof(mysysinfo)) < 0)
  // 将sysinfo从内核态复制到用户态的虚拟地址
    return -1;
  return 0;
}

解析：kernel/vm.c 的函数 copyout() 实现从内核栈复制内容到用户栈：

// Copy from kernel to user.
// Copy len bytes from src to virtual address dstva in a given page table.
// Return 0 on success, -1 on error.
int
copyout(pagetable_t pagetable, uint64 dstva, char *src, uint64 len)
{
  uint64 n, va0, pa0;

  while(len > 0){
    va0 = PGROUNDDOWN(dstva);
    pa0 = walkaddr(pagetable, va0);
    if(pa0 == 0)
      return -1;
    n = PGSIZE - (dstva - va0);
    if(n > len)
      n = len;
    memmove((void *)(pa0 + (dstva - va0)), src, n);

    len -= n;
    src += n;
    dstva = va0 + PGSIZE;
  }
  return 0;
}

argaddr() 函数定义在 kernel/syscall.c，将系统调用参数作为指针读入：

static uint64
argraw(int n)
{
  struct proc *p = myproc();
  switch (n) {
  case 0:
    return p->trapframe->a0;
  case 1:
    return p->trapframe->a1;
  case 2:
    return p->trapframe->a2;
  case 3:
    return p->trapframe->a3;
  case 4:
    return p->trapframe->a4;
  case 5:
    return p->trapframe->a5;
  }
  panic("argraw");
  return -1;
}

// Retrieve an argument as a pointer.
// Doesn't check for legality, since
// copyin/copyout will do that.
int
argaddr(int n, uint64 *ip)
{
  *ip = argraw(n);
  return 0;
}

To collect the amount of free memory, add a function to kernel/kalloc.c
获得当前空闲内存的数量 freemem：

// [kernel/kalloc.c]
// To collect the amount of free memory, add a function to kernel/kalloc.c
// kmem.freelist以空闲内存块起始块作为节点，用链表的数据结构存储；
// 每个内存块大小设为恒定的4096，遍历freelist即可求出空闲内存大小。
uint64
collect_mem(void){
  struct run* p = kmem.freelist;
  uint64 blocks = 0;
  while(p){
    p = p->next;
    blocks++;
  }
  return blocks * PGSIZE;
}

To collect the number of processes, add a function to kernel/proc.c
获取当前进程状态state不是UNUSED的进程数：

// [kernel/proc.c]
// To collect the number of processes, add a function to kernel/proc.c
// proc[NPROC]存储所有存在的进程，遍历数组，统计状态不是UNUSED的进程。
uint64
collect_proc_num(void){
  uint64 num = 0;
  for(int i = 0; i < NPROC; i++){
    if(proc[i].state != UNUSED)
      num++;
  }
  return num;
}