MIT 6.S081 Operating System--Lecture 1 (作业记录)

最新推荐文章于 2024-07-31 14:24:40 发布

Artintel

最新推荐文章于 2024-07-31 14:24:40 发布

阅读量1.4k

点赞数

分类专栏：学习文章标签： os

本文链接：https://blog.csdn.net/qq_39274501/article/details/117263047

版权

学习专栏收录该内容

76 篇文章 4 订阅

订阅专栏

文章目录

作业在下方
open
重定向
- question
- answer
fork
exec
- question
- answer
forkexec.c
question & answer
work1 - sleep
work2 - pingpong
work3 - primes

作业在下方

顺便推一个哔哩哔哩老哥的环境搭建流程，跟着他搭建完全没问题，以自测
MIT 6S081 开发环境搭建全过程

系统调用，应用程序调用的系统 API ，看上去和我们写程序过程中的函数调用很相似，但其实是应用程序进入内核系统，进入操作系统内核态的方式。

几种不同常用的系统调用API

fd = open("out", 1)

这一个系统调用，打开名为 “out” 的文件，1 代表我要写；返回一个文件描述符 fd
write(fd, "hello\n", 6)

进入系统调用，传入 (buffer) 字符串 “hello\n” 在内存中的地址，6 代表 buffer 长度，在这里是 6 个字节的长度
pid = fork();

这些看起来都像是函数调用，但都是应用程序进入内核态的方式

question1: 系统调用跳入内核有什么特别之处？与跳转到另一个函数的标准函数调用相比

answer: 内核是一段始终驻留的代码，具有特权，因为内核在及其启动的时候就加载，它可以直接访问各种硬件，通过系统调用进入内核，这样就可以修改各种敏感的受保护的硬件资源，比如硬盘，但是普通用户无法使用；但是对于普通函数调用是没有任何硬件使用的特权的。

在学习操作系统的过程中，我们需要知道这些系统调用所导致的一些交互，比如

fd = open(...);
pid = fork();

fork() 系统调用会创建子进程，是对父进程的副本 copy, 并且之前父进程打开的 fd ，这个子进程副本同样拥有

open

// open.c
#include "kernel/types.h"
#include "user/user.h"
#include "kernel/fcntl.h"

int main(){
	int fd = open("output.txt", O_WRONLY | O_CREATE);
	write(fd, "ooo\n", 4);
	exit(0);
}

代码 open.c 中使用了 open 系统调用，给它一个名为 output.txt 的文件，open 中第二个参数中的 O_ 选项或者标志是告诉内核中的 open 系统调用，我们用这个名字创建一个文件，并写入， open 返回新分配的文件描述符

文件描述符
对于文件描述符索引的文件，文件描述符本质上对应了内核中一个表单数据。内核维护了每个运行进程的状态，内核会为每一个运行进程保存一个表单，表单的 key 是文件描述符。这个表单让内核知道，每个文件描述符对应的实际内容是什么。这里比较关键的点是，每个进程都有自己独立的文件描述符空间，所以如果运行了两个不同的程序，对应两个不同的进程，如果它们都打开一个文件，它们或许可以得到相同数字的文件描述符，但是因为内核为每个进程都维护了一个独立的文件描述符空间，这里相同数字的文件描述符可能会对应到不同的文件。

重定向

ls > out 运行 ls 程序， ls 的结果将重定向到 out 这个文件中，ls 的结果可以通过 cat out 来查看

在这里插入图片描述
grep x < out 告诉 shell 把 grep 的输入重定向到 out 文件

在这里插入图片描述

question

编译器如何处理系统调用？是由汇编语言对操作系统定义的一些代码段做过程调用吗？

answer

有一个特殊的 RISC-V 指令，程序可以调用它将控制权转到内核，所以当编写 C 代码，调用系统调用，比如 open, write, 实际上 open 是 C 库中的一个 C 函数，但是这个函数中的指令实际上是机器指令，它不是 open, open 函数不是一个 C 函数, 而是由汇编语言实现。在 RISC-V 中，汇编语言的这个特殊指令交做 ecall, 这个特殊指令转移控制权到内核中，然后内核查看进程内存和寄存器，找出参数是什么。

fork

#include "kernel/types.h"
#include "user/user.h"

int main(){
	int pid;
	pid = fork();
	printf("fork() returned %d\n", pid);
	if(pid == 0){
		printf("child\n");
	}else{
		printf("parent\n");
	}
	exot(0);
}

fork 做的事情是创建一个调用进程的质量和数据的内存的复制， fork 系统调用在两个进程中都会返回，在父进程中，fork 返回进程 id ，是一个大于零的整数，也就是新进程的 id，而在新进程中 fork 返回 0；

在这里插入图片描述

exec

shell 使用命令的时候，实际上就是开辟了一个新的进程来执行一些操作

#include "kernel/types.h"
#include "user/user.h"

int main() {
    char* argv[] = {"echo", "this", "is", "echo", 0};
    exec("echo", argv);
    printf("exec failed!\n");
    exit(0);
}

代码会执行 exec 系统调用，这个系统调用会从指定的文件中读取并加载指令，并替代当前调用进程的指令。从某种程度上来说，这样相当于丢弃了调用进程的内存，并开始执行新加载的指令。所以上述代码的系统调用 exec 会有如下的效果：
操作系统从名为 echo 的文件中加载指令到当前进程中，并替换了当前进程的内存，之后开始执行这些新加载的指令。同时，可以传入命令行参数，exec 允许你传入一个命令行参数的数组，这里的 C 语言中的指针数组，在上面代码 char* argv 设置好了一个字符指针的数组，本质上就是一个字符串
所以这里等价于运行 echo 命令，并带上 "this is echo` 这三个参数
在这里插入图片描述

关于 exec 系统调用

exec 系统调用会保留当前文件描述符表单。所以任何在 exec 系统调用之前的文件描述符，例如 0, 1, 2 等。它们在新的程序中表示相同的东西
通常来说 exec 系统调用不会返回，因为 exec 会完全替换当前进程的内存，相当于当前进程不复存在了，所以 exec 系统调用已经没有地方能返回了，只有出错时才返回，比如程序根本不存在

question

argv 中的最后一个 0 是什么作用？

answer

表示数组的结尾， C 是低级语言(接近机器语言)，它没有数组语法，并没有一个方法来确定一个数组究竟由多长。所以为了告诉内核数组的结尾在哪儿，我们将 0 作为最后一个指针。argv 中的每一个字符串实际上是一块包含了数据的内存指针，但是第 5 个元素是 0, 通常来说指针 0 是一个 NULL 指针，它只表明结束。所以内核中的代码会遍历这里的数组，直到它找到了值为 0 的指针

但在 shell 中，不能把程序替换了，所以 shell 会调用 fork 在后台进程中运行程序

forkexec.c

#include "user/user.h"
int main(){
	int pid, status;
	pid = fork();
	if(pid == 0){
		char* argv[] = {"echo", "THIS", "IS", "ECHO", 0};
		exec(echo, argv);
		printf("exec failed\n");
		exit(1);
	}
	else{
		printf("parent waiting\n");
		wait(&status);
		printf("the child exited with status %d\n", status);
	}
	exit(0);
}

wait 会等待之前创建的子进程退出。当我们在命令行执行一个命令时，我们一般会希望 shell 等待指令执行完成。所以 wait 系统调用，使得父进程可以等待任何一个子进程返回。这里 wait 的参数 status ，是一种让退出的子进程以一个整数的格式与等待的父进程通信的方式。exit 参数是 1，操作系统会将 1 从退出的子进程传递到 wait，也就是等待父进程处， &status ，是将 status 对应的地址传递给内核，内核回想这个地址写入子进程向 exit 传入的参数。

Unix 中的风格是，如果一个程序成功的退出， exit 参数会是0，如果出现错会，向 exit 传递 1.

question & answer

question

在刚刚提到 exec 会完全走到 echo 程序，而不会返回 fork 处的子进程，所以代码有可能走到
printf(exec failed!\n); exit(1); 吗？

answer

不会，因为这里就是调用了 echo, 但是，如果我修改代码，可能会走到那儿，首先运行以下原始版本程序

在这里插入图片描述

可以看出，程序执行了 echo，并传入了相应的参数。同时子进程以状态 0 退出，表明 echo 成功退出，并且父进程在等待子进程

接下来，我修改一下代码。这次我将会运行一个不存在的指令

#include "kernel/types.h"
#include "user/user.h"

int main(){
	int pid, status;
	pid = fork();
	if(pid == 0){
		char* argv[] = {"echo", "THIS", "IS", "ECHO", 0};
		exec("xklsdsjoijecho", argv);
		printf("exec failed\n");
		exit(1);
	}
	else{
		printf("parent waiting\n");
		wait(&status);
		printf("the child exited with status %d\n", status);
	}
	exit(0);
}

在这里插入图片描述
这一次，因为我们想要执行的指令并不存在，exec系统调用会返回，我们可以看到“exec failed!”的输出，同时exit(1)的参数1，传递给了父进程，父进程会打印出子进程的退出码。所以，exec系统调用只会在出错的时候返回给调用进程。

这里有一个常用的写法，先调用 fork ，再在子进程中调用 exec。这里实际上有些浪费，fork 首先拷贝了整个父进程，但是之后 exec 整个将这个拷贝丢弃了，并用你要运行的文件替换了内存的内容。某种程度上来说这里的拷贝操作浪费了，因为所有拷贝的内存被丢弃并被 exec 替换。在大型程序中这里的影响会比较明显。如果你运行了一个几G的程序，并且调用fork，那么实际就会拷贝所有的内存，可能会要消耗将近1秒钟来完成拷贝，这可能会是个问题。

在这门课程的后面，你们会实现一些优化，比如说copy-on-write fork(写时复制)，这种方式会消除fork的几乎所有的明显的低效，而只拷贝执行exec所需要的内存，这里需要很多涉及到虚拟内存系统的技巧。你可以构建一个fork，对于内存实行lazy拷贝，通常来说fork之后立刻是exec，这样你就不用实际的拷贝，因为子进程实际上并没有使用大部分的内存。我认为你们会觉得这将是一个有趣的实验。

question

为什么父进程在子进程调用exec之前就打印了“parent waiting”？

answer

这里只是巧合。父进程的输出有可能与子进程的输出交织在一起，就像我们之前在fork的例子中看到的一样，只是这里正好没有发生而已。并不是说我们一定能看到上面的输出，实际上，如果看到其他的输出也不用奇怪。我怀疑这里背后的原因是，exec系统调用代价比较高，它需要访问文件系统，访问磁盘，分配内存，并读取磁盘中echo文件的内容到分配的内存中，分配内存又可能需要等待内存释放。所以，exec系统调用背后会有很多逻辑，很明显，处理这些逻辑的时间足够长，这样父进程可以在exec开始执行echo指令之前完成输出。这样说得通吧？

question

子进程可以等待父进程吗？

answer

Unix并没有一个直接的方法让子进程等待父进程。wait系统调用只能等待当前进程的子进程。所以wait的工作原理是，如果当前进程有任何子进程，并且其中一个已经退出了，那么wait会返回。但是如果当前进程没有任何子进程，比如在这个简单的例子中，如果子进程调用了wait，因为子进程自己没有子进程了，所以wait会立即返回-1，表明出现错误了，当前的进程并没有任何子进程。

简单来说，不可能让子进程等待父进程退出。

question

当我们说子进程从父进程拷贝了所有的内存，这里具体指的是什么呢？是不是说子进程需要重新定义变量之类的？

answer

在编译之后，你的C程序就是一些在内存中的指令，这些指令存在于内存中。所以这些指令可以被拷贝，因为它们就是内存中的字节，它们可以被拷贝到别处。通过一些有关虚拟内存的技巧，可以使得子进程的内存与父进程的内存一样，这里实际就是将父进程的内存镜像拷贝给子进程，并在子进程中执行。

实际上，当我们在看C程序时，你应该认为它们就是一些机器指令，这些机器指令就是内存中的数据，所以可以被拷贝。

question

如果父进程有多个子进程，wait是不是会在第一个子进程完成时就退出？这样的话，还有一些与父进程交错运行的子进程，是不是需要有多个wait来确保所有的子进程都完成？

answer

是的，如果一个进程调用fork两次，如果它想要等两个子进程都退出，它需要调用wait两次。每个wait会在一个子进程退出时立即返回。当wait返回时，你实际上没有必要知道哪个子进程退出了，但是wait返回了子进程的进程号，所以在wait返回之后，你就可以知道是哪个子进程退出了。

work1 - sleep

前期的作业还是非常简单的，要自己做哦!

Implement the UNIX program sleep for xv6; your sleep should pause for a user-specified number of ticks. A tick is a notion of time defined by the xv6 kernel, namely the time between two interrupts from the timer chip. Your solution should be in the file user/sleep.c.

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"


int
main(int argc, char* argv[]){
    if(argc <= 1){
        fprintf(2, "usage: press times number of how long to sleep\n");
        exit(1);
    }
    int sec = atoi(argv[1]);
    printf("%d\n", sec);
    sleep(sec);
    /*
        C definition - user/user.h sleep.h
        --> user/usys.pl Gnerate usys.S for the assembler code that
        jumps from user code into the kernel for sleep
        kernel/sysproc.c --> sys/sleep
    */
    exit(0);
}

测试： ./grade-lab-util sleep

== Test sleep, no arguments == sleep, no arguments: OK (2.8s)
== Test sleep, returns == sleep, returns: OK (1.0s)
== Test sleep, makes syscall == sleep, makes syscall: OK (0.7s)

work2 - pingpong

Write a program that uses UNIX system calls to ‘‘ping-pong’’ a byte between two processes over a pair of pipes, one for each direction. The parent should send a byte to the child; the child should print “: received ping”, where is its process ID, write the byte on the pipe to the parent, and exit; the parent should read the byte from the child, print “: received pong”, and exit. Your solution should be in the file user/pingpong.c.

Some hints:

Use pipe to create a pipe.
Use fork to create a child.
Use read to read from the pipe, and write to write to the pipe.
Use getpid to find the process ID of the calling process.
Add the program to UPROGS in Makefile.

User programs on xv6 have a limited set of library functions available to them. You can see the list in user/user.h; the source (other than for system calls) is in user/ulib.c, user/printf.c, and user/umalloc.c.

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int 
main(){
    int p1[2];
    int p2[2];
    pipe(p1);
    pipe(p2);
    int pid = fork();
    char buf1[1024];
    char buf2[1024];
    if(pid == 0){    // child
        close(p1[0]);
        close(p2[1]);
        read(p2[0], &buf2, sizeof(buf2));
        printf("%d: received %s\n", getpid(), buf2);
        write(p1[1], "pong", sizeof(buf1));
        exit(1);
    }
    else{   // parent
        close(p1[1]);
        close(p2[0]);
        write(p2[1], "ping", sizeof(buf2));
        read(p1[0], &buf1, sizeof(buf1));
        printf("%d: received %s\n", getpid(), buf1);
    }
    close(p1[1]);
    close(p1[0]);
    close(p2[0]);
    close(p2[1]);
    exit(0);
}

在这里插入图片描述

work3 - primes

Write a concurrent version of prime sieve using pipes. This idea is due to Doug McIlroy, inventor of Unix pipes. The picture halfway down this page and the surrounding text explain how to do it. Your solution should be in the file user/primes.c.

Your goal is to use pipe and fork to set up the pipeline. The first process feeds the numbers 2 through 35 into the pipeline. For each prime number, you will arrange to create one process that reads from its left neighbor over a pipe and writes to its right neighbor over another pipe. Since xv6 has limited number of file descriptors and processes, the first process can stop at 35.

Some hints:

Be careful to close file descriptors that a process doesn’t need, because otherwise your program will run xv6 out of resources before the first process reaches 35.
Once the first process reaches 35, it should wait until the entire pipeline terminates, including all children, grandchildren, &c. Thus the main primes process should only exit after all the output has been printed, and after all the other primes processes have exited.
Hint: read returns zero when the write-side of a pipe is closed.
It’s simplest to directly write 32-bit (4-byte) ints to the pipes, rather than using formatted ASCII I/O.
You should create the processes in the pipeline only as they are needed.
Add the program to UPROGS in Makefile.

在这里插入图片描述