(四)6.828 Operating System lab3: User Environments

Introduction


In this lab you will implement the basic kernel facilities required to get a protected user-mode environment (i.e., "process") running. You will enhance the JOS kernel to set up the data structures to keep track of user environments, create a single user environment, load a program image into it, and start it running. You will also make the JOS kernel capable of handling any system calls the user environment makes and handling any other exceptions it causes.

Note: In this lab, the terms environment and process are interchangeable - both refer to an abstraction that allows you to run a program. We introduce the term "environment" instead of the traditional term "process" in order to stress the point that JOS environments and UNIX processes provide different interfaces, and do not provide the same semantics.

  在lab3中,我们要实现运行被保护的用户模式进程的内核服务,在JOS内核中增加数据结构来记录用户进程,创建用户进程以及装载程序镜像,并能够运行用户进程。同样也要让JOS内核能够处理进程的系统调用和异常。

  提示就是文档中所说的环境(environment)和进程(process)是一回事儿。

 

Part A: User Environments and Exception Handling

The new include file inc/env.h contains basic definitions for user environments in JOS. Read it now. The kernel uses the Env data structure to keep track of each user environment. In this lab you will initially create just one environment, but you will need to design the JOS kernel to support multiple environments; lab 4 will take advantage of this feature by allowing a user environment to fork other environments.

As you can see in kern/env.c, the kernel maintains three main global variables pertaining to environments:

struct Env *envs = NULL;		// All environments
struct Env *curenv = NULL;		// The current env
static struct Env *env_free_list;	// Free environment list

Once JOS gets up and running, the envs pointer points to an array of Env structures representing all the environments in the system. In our design, the JOS kernel will support a maximum of NENV simultaneously active environments, although there will typically be far fewer running environments at any given time. (NENV is a constant #define'd in inc/env.h.) Once it is allocated, the envs array will contain a single instance of the Env data structure for each of the NENV possible environments.

The JOS kernel keeps all of the inactive Env structures on the env_free_list. This design allows easy allocation and deallocation of environments, as they merely have to be added to or removed from the free list.

The kernel uses the curenv symbol to keep track of the currently executing environment at any given time. During boot up, before the first environment is run, curenv is initially set to NULL.

  新的库文件inc/env.h包含了JOS对用户进程的基本定义,内核利用Env数据结构记录用户进程的状态。在这个lab中要先能够创建一个进程,之后需要设计JOS内核能够支持多个进程。lab4会在此基础上扩展,允许一个进程创建其他进程。

  库文件 kern/env.c中可以看到,内核为用户进程设置了3个主要的全局变量:

  • struct Env *envs = NULL; // All environments
  • struct Env *curenv = NULL; // The current env
  • static struct Env *env_free_list; // Free environment list

  一旦JOS启动并运行,envs指针指向env结构的数组,数组存储了所有系统中的进程。在我们的设计中,JOS内核能够支持最大NENV个进程同时处于运行态,尽管大多数时候运行的进程数会比NENV小(NENV值在inc/env.h中定义的常数)。当一个进程内存被分配,envs数组中会添加一个Env数据结构的实例。

  JOS内核把所有不活跃的Env数据结构存在env_free_list中,这种设计方便了进程的内存分配和释放。

  内核使用curenv记录此段时间内正在执行的进程,在内核刚启动时,还没有用户进程运行,curenv初始化为NULL。

 

知识点:Environment State

  进程Env结构的定义放在inc/env.h,如下所示:

The Env structure is defined in inc/env.h as follows (although more fields will be added in future labs):

struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run

	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};

Here's what the Env fields are for:

env_tf:

This structure, defined in inc/trap.h, holds the saved register values for the environment while that environment is not running: i.e., when the kernel or a different environment is running. The kernel saves these when switching from user to kernel mode, so that the environment can later be resumed where it left off.

env_link:

This is a link to the next Env on the env_free_listenv_free_list points to the first free environment on the list.

env_id:

The kernel stores here a value that uniquely identifiers the environment currently using this Env structure (i.e., using this particular slot in the envs array). After a user environment terminates, the kernel may re-allocate the same Env structure to a different environment - but the new environment will have a different env_id from the old one even though the new environment is re-using the same slot in the envs array.

env_parent_id:

The kernel stores here the env_id of the environment that created this environment. In this way the environments can form a “family tree,” which will be useful for making security decisions about which environments are allowed to do what to whom.

env_type:

This is used to distinguish special environments. For most environments, it will be ENV_TYPE_USER. We'll introduce a few more types for special system service environments in later labs.

env_status:

This variable holds one of the following values:

ENV_FREE:

Indicates that the Env structure is inactive, and therefore on the env_free_list.

ENV_RUNNABLE:

Indicates that the Env structure represents an environment that is waiting to run on the processor.

ENV_RUNNING:

Indicates that the Env structure represents the currently running environment.

ENV_NOT_RUNNABLE:

Indicates that the Env structure represents a currently active environment, but it is not currently ready to run: for example, because it is waiting for an interprocess communication (IPC) from another environment.

ENV_DYING:

Indicates that the Env structure represents a zombie environment. A zombie environment will be freed the next time it traps to the kernel. We will not use this flag until Lab 4.

env_pgdir:

This variable holds the kernel virtual address of this environment's page directory.

Like a Unix process, a JOS environment couples the concepts of "thread" and "address space". The thread is defined primarily by the saved registers (the env_tf field), and the address space is defined by the page directory and page tables pointed to by env_pgdir. To run an environment, the kernel must set up the CPU with both the saved registers and the appropriate address space.

Our struct Env is analogous to struct proc in xv6. Both structures hold the environment's (i.e., process's) user-mode register state in a Trapframe structure. In JOS, individual environments do not have their own kernel stacks as processes do in xv6. There can be only one JOS environment active in the kernel at a time, so JOS needs only a single kernel stack.

  Env结构每个字段的含义是:

  env_tf:是一个定义在 inc/trap.h的结构体,当进程不在运行态时此结构体保存了寄存器的数值(进程运行上下文)。比如当内核或者另一个进程在运行态,内核保存了这个数据结构,从用户态转到内核态,以便当转回该进程继续执行时能够恢复现场。

  env_link:这是一个指针指向env_free_list链表中的下一个Env,env_free_list指向空闲列表中第一个空闲进程。

  env_id:进程ID,是内核保存的唯一能标识这个进程的数值(比如envs数组中的索引)。在用户进程结束后,内核会重新分配这个Env结构给另一个进程,但是新进程的env_id会和原来进程不一样,即使在envs数组中处于一样的位置。

  env_parent_id:内核也保存了创建当前进程的父进程id。依据env_parent_id可以构建进程的家谱树,这有助于安全的结束和创建进程。(父进程结束,子进程也会跟着结束)。

  env_type:这个字段用于辨析特定过的进程,对于大多数进程,这个字段被设置为ENV_TYPE_USER,在后面的lab中会介绍更多特殊的系统服务进程。

  env_status:这个字段保存了下面状态中的一个:

  •   ENV_FREE:指这个Env结构是不活跃的,应该在链表env_free_list中。
  •   ENV_RUNNABLE:指这个Env结构处于就绪状态,等待到处理器中执行。
  •   ENV_RUNNING:值这个Env结构当前正在运行状态。
  •   ENV_NOT_RUNNABLE:指这个Env结构当前处于活跃的状态,但是还没有准备继续运行。比如该进程正在等待另一个进程的交互。
  •   ENV_DYING:指这个Env结构是一个僵尸进程,一个僵尸进程会在下一次内核使用时被释放,到lab4再做使用说明,现在还用不到这个状态。

  env_pgdir:这个字段保存着进程页表的内核虚拟地址。

  就像Unix一样,一个JOS进程包含了“线程”和“地址空间”两部分内容,线程由寄存器保存的值定义(env_tf),地址空间由指向env_pgdir的页目录和页表组成,为了能够运行一个进程,内核需要调用CPU设置寄存器的值以及合适的地址空间。

 

知识点:Allocating the Environments Array

In lab 2, you allocated memory in mem_init() for the pages[] array, which is a table the kernel uses to keep track of which pages are free and which are not. You will now need to modify mem_init() further to allocate a similar array of Env structures, called envs.

  在lab2中,已经使用mem_init()函数为pages数组分配了内存空间,pages是内核用来记录哪些页空闲哪些页使用的数组,现在需要修改mem_init()来给Env结构的数组分配空间,成为envs。

Exercise 1

Exercise 1. Modify mem_init() in kern/pmap.c to allocate and map the envs array. This array consists of exactly NENV instances of the Env structure allocated much like how you allocated the pages array. Also like the pages array, the memory backing envs should also be mapped user read-only at UENVS (defined in inc/memlayout.h) so user processes can read from this array.

You should run your code and make sure check_kern_pgdir() succeeds.

  修改mem_init()代码分配和映射envs数组,这个数组包含NENV个Env结构实例,就像前面分配的pages数组一样。同样一样的还有,为envs数组分配的内存应该对用户是只读的,这样用户进程可以读取这个数组。

  先找到一些关键变量。比如在inc/env.h中的NENV常数定义,Env结构的定义。

#define LOG2NENV		10
#define NENV			(1 << LOG2NENV)


struct Env {
	struct Trapframe env_tf;	// Saved registers
	struct Env *env_link;		// Next free Env
	envid_t env_id;			// Unique environment identifier
	envid_t env_parent_id;		// env_id of this env's parent
	enum EnvType env_type;		// Indicates special system environments
	unsigned env_status;		// Status of the environment
	uint32_t env_runs;		// Number of times environment has run

	// Address space
	pde_t *env_pgdir;		// Kernel virtual address of page dir
};

 类比前面写的pages,我们这里为Env类型的数组开辟物理空间,并用envs指针指向数组头。

	//
	// Make 'envs' point to an array of size 'NENV' of 'struct Env'.
	// LAB 3: Your code here.
	// boot_alloc返回的是下一个空闲地址的地址头
	envs = (struct Env*)boot_alloc(NENV*sizeof(struct Env));
	memset(envs,0,NENV * sizeof(struct Env));

  然后用boot_map_region()函数将虚拟地址UENVS映射到物理地址envs。  

  结果如下:

check_kern_pgdir() succeeded!

知识点:Creating and Running Environments

You will now write the code in kern/env.c necessary to run a user environment. Because we do not yet have a filesystem, we will set up the kernel to load a static binary image that is embedded within the kernel itself. JOS embeds this binary in the kernel as a ELF executable image.

The Lab 3 GNUmakefile generates a number of binary images in the obj/user/ directory. If you look at kern/Makefrag, you will notice some magic that "links" these binaries directly into the kernel executable as if they were .o files. The -b binary option on the linker command line causes these files to be linked in as "raw" uninterpreted binary files rather than as regular .o files produced by the compiler. (As far as the linker is concerned, these files do not have to be ELF images at all - they could be anything, such as text files or pictures!) If you look at obj/kern/kernel.sym after building the kernel, you will notice that the linker has "magically" produced a number of funny symbols with obscure names like _binary_obj_user_hello_start, _binary_obj_user_hello_end, and _binary_obj_user_hello_size. The linker generates these symbol names by mangling the file names of the binary files; the symbols provide the regular kernel code with a way to reference the embedded binary files.

In i386_init() in kern/init.c you'll see code to run one of these binary images in an environment. However, the critical functions to set up user environments are not complete; you will need to fill them in.

  现在在kern/env.c中写代码运行一个进程。因为我们还没有文件系统,需要在内核中设置成内核能够加载二进制程序映像文件。JOS内置这个二进制文件为ELF执行映像。Lab3 里面的 GNUmakefile 文件在obj/user/目录下面生成了一系列的二进制映像文件,在编译器编译时如果使用-b选项,会把二进制程序映像“原始”内容进行链接,而不是被编译器变成.o文件。

  在kern/init.c的i386_init()函数中,可以看到一个进程运行这些二进制映像的代码。但是,设置用户进程的关键函数还没完全实现,需要后面补全。

 

Exercise 2

Exercise 2. In the file env.c, finish coding the following functions:

env_init()

Initialize all of the Env structures in the envs array and add them to the env_free_list. Also calls env_init_percpu, which configures the segmentation hardware with separate segments for privilege level 0 (kernel) and privilege level 3 (user).

env_setup_vm()

Allocate a page directory for a new environment and initialize the kernel portion of the new environment's address space.

region_alloc()

Allocates and maps physical memory for an environment

load_icode()

You will need to parse an ELF binary image, much like the boot loader already does, and load its contents into the user address space of a new environment.

env_create()

Allocate an environment with env_alloc and call load_icode load an ELF binary into it.

env_run()

Start a given environment running in user mode.

As you write these functions, you might find the new cprintf verb %e useful -- it prints a description corresponding to an error code. For example,

	r = -E_NO_MEM;
	panic("env_alloc: %e", r);

will panic with the message "env_alloc: out of memory".

  在env.c文件中完成上述函数的编写。

  第一个要实现的函数是env_init()。初始化envs数组中所有的Env结构,并添加到env_free_list链表中。函数中也调用了env_init_percpu()函数让CPU加载GDT和段描述符。

// Mark all environments in 'envs' as free, set their env_ids to 0,
// and insert them into the env_free_list.
// Make sure the environments are in the free list in the same order
// they are in the envs array (i.e., so that the first call to
// env_alloc() returns envs[0]).
//
void
env_init(void)
{
	// Set up envs array
	// LAB 3: Your code here.
    // 要求所有的 Env 在 env_free_list 中的顺序,要和它在 envs 中的顺序一致,所以需要采用头插法。 从后向前插入链表
	for(int i=NENV-1;i>=0;--i){
		envs[i].env_id = 0;
		envs[i].env_status = ENV_FREE;
		envs[i].env_link = env_free_list;
		env_free_list = &envs[i];
	} 

	// Per-CPU part of the initialization
	env_init_percpu();
}

  第二个要实现的函数是 env_setup_vm()。新建进程,然后初始化新进程的页目录,进程的页目录与内核的页目录基本相同,仅需修改一下 UVPT,所以可以直接 memcpy

//
// Initialize the kernel virtual memory layout for environment e.
// Allocate a page directory, set e->env_pgdir accordingly,
// and initialize the kernel portion of the new environment's address space.
// Do NOT (yet) map anything into the user portion
// of the environment's virtual address space.
//
// Returns 0 on success, < 0 on error.  Errors include:
//	-E_NO_MEM if page directory or table could not be allocated.
//
static int
env_setup_vm(struct Env *e)
{
	int i;
	struct PageInfo *p = NULL;

	// Allocate a page for the page directory
	if (!(p = page_alloc(ALLOC_ZERO)))
		return -E_NO_MEM;

	// Now, set e->env_pgdir and initialize the page directory.
	//
	// Hint:
	//    - The VA space of all envs is identical above UTOP
	//	(except at UVPT, which we've set below).
	//	See inc/memlayout.h for permissions and layout.
	//	Can you use kern_pgdir as a template?  Hint: Yes.
	//	(Make sure you got the permissions right in Lab 2.)
	//    - The initial VA below UTOP is empty.
	//    - You do not need to make any more calls to page_alloc.
	//    - Note: In general, pp_ref is not maintained for
	//	physical pages mapped only above UTOP, but env_pgdir
	//	is an exception -- you need to increment env_pgdir's
	//	pp_ref for env_free to work correctly.
	//    - The functions in kern/pmap.h are handy.

	// LAB 3: Your code here.
	e->env_pgdir = page2kva(p);
	memcpy(e->env_pgdir,kern_pgdir,PGSIZE);		//把kern_pgdir当作模板分配内存
	p->pp_ref++;


	// UVPT maps the env's own page table read-only.
	// Permissions: kernel R, user R
	e->env_pgdir[PDX(UVPT)] = PADDR(e->env_pgdir) | PTE_P | PTE_U;

	return 0;
}

  第三个要实现的函数是region_alloc()。为进程分配和映射物理空间。

//
// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
	// LAB 3: Your code here.
	// (But only if you need it for load_icode.)
	//
	// Hint: It is easier to use region_alloc if the caller can pass
	//   'va' and 'len' values that are not page-aligned.
	//   You should round va down, and round (va + len) up.
	//   (Watch out for corner-cases!)
	size_t pg_num = ROUNDUP(len,PGSIZE)/PGSIZE;
	uintptr_t va_start = ROUNDDOWN((uintptr_t)va,PGSIZE);		//round va down
	struct PageInfo *pp = NULL;
	cprintf("Allocate size: %d, Start from: %08x\n", len, va);

	for(size_t i=0;i<pg_num;++i){
		pp = page_alloc(0);		//调用page_alloc分配物理内存
		if(!pp){
			panic("region_alloc failed!\n");
		}
		int r = page_insert(e->env_pgdir,pp,va_start,PTE_U | PTE_W);		//映射va到物理页pp
		if(r){
			panic("page_insert failed!\n");
		}
		va_start += PGSIZE;
	}
}

  第四个要实现的函数是load_icode()。解析ELF二进制映像文件,就像boot loader做过的一样,将解析的内容放入进程的用户地址空间。

 

//
// Set up the initial program binary, stack, and processor flags
// for a user process.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
//
// This function loads all loadable segments from the ELF binary image
// into the environment's user memory, starting at the appropriate
// virtual addresses indicated in the ELF program header.
// At the same time it clears to zero any portions of these segments
// that are marked in the program header as being mapped
// but not actually present in the ELF file - i.e., the program's bss section.
//
// All this is very similar to what our boot loader does, except the boot
// loader also needs to read the code from disk.  Take a look at
// boot/main.c to get ideas.
//
// Finally, this function maps one page for the program's initial stack.
//
// load_icode panics if it encounters problems.
//  - How might load_icode fail?  What might be wrong with the given input?
//
static void
load_icode(struct Env *e, uint8_t *binary)
{
	// Hints:
	//  Load each program segment into virtual memory
	//  at the address specified in the ELF segment header.
	//  You should only load segments with ph->p_type == ELF_PROG_LOAD.
	//  Each segment's virtual address can be found in ph->p_va
	//  and its size in memory can be found in ph->p_memsz.
	//  The ph->p_filesz bytes from the ELF binary, starting at
	//  'binary + ph->p_offset', should be copied to virtual address
	//  ph->p_va.  Any remaining memory bytes should be cleared to zero.
	//  (The ELF header should have ph->p_filesz <= ph->p_memsz.)
	//  Use functions from the previous lab to allocate and map pages.
	//
	//  All page protection bits should be user read/write for now.
	//  ELF segments are not necessarily page-aligned, but you can
	//  assume for this function that no two segments will touch
	//  the same virtual page.
	//
	//  You may find a function like region_alloc useful.
	//
	//  Loading the segments is much simpler if you can move data
	//  directly into the virtual addresses stored in the ELF binary.
	//  So which page directory should be in force during
	//  this function?
	//
	//  You must also do something with the program's entry point,
	//  to make sure that the environment starts executing there.
	//  What?  (See env_run() and env_pop_tf() below.)

	// LAB 3: Your code here.
	// 参考main.c
	struct Proghdr *ph, *eph;	
	struct Elf *elf_header = (struct Elf*)binary;

	// is this a valid ELF?
	if (elf_header->e_magic != ELF_MAGIC) panic("this is not a valid ELF\n");

	// load each program segment (ignores ph flags)
	ph = (struct Proghdr *) ((uint8_t *) elf_header + elf_header->e_phoff);
	eph = ph + elf_header->e_phnum;

	// 寄存器改变页目录:lcr3([页目录物理地址])
	lcr3(PADDR(e->env_pgdir));
	// 遍历所有 Programm header,分配好内存,加载类型为 ELF_PROG_LOAD 的段
	for (; ph < eph; ph++){
		if(ph->p_type == ELF_PROG_LOAD){
			if (ph->p_filesz > ph->p_memsz) {
                panic("file size is great than memory\n");
            }
            // 调用region_alloc
            region_alloc(e, (void *)ph->p_va, ph->p_memsz);
            memcpy((void *)ph->p_va, binary + ph->p_offset, ph->p_filesz);
            memset((void *)ph->p_va + ph->p_filesz, 0, ph->p_memsz - ph->p_filesz);
		}
	}
	//将 env->env_tf.tf_eip 设置为 elf->e_entry,等待之后的 env_pop_tf() 调用。
	e->env_tf.tf_eip = elf_header->e_entry;


	// Now map one page for the program's initial stack
	// at virtual address USTACKTOP - PGSIZE.

	// LAB 3: Your code here.
	// 切换回内核页目录,分配用户栈
	lcr3(PADDR(kern_pgdir));
	region_alloc(e,(void*)USTACKTOP - PGSIZE,PGSIZE);
}

  第五个要实现的函数是env_create()。此函数调用env_alloc()给进程分配空间,并调用load_icode()读取ELF文件。

//
// Allocates a new env with env_alloc, loads the named elf
// binary into it with load_icode, and sets its env_type.
// This function is ONLY called during kernel initialization,
// before running the first user-mode environment.
// The new env's parent ID is set to 0.
//
void
env_create(uint8_t *binary, enum EnvType type)
{
	// LAB 3: Your code here.
	struct Env *e;
	int r = env_alloc(&e,0);
	if(r<0){
		panic("env_create failed\n");
	}
	e->env_type = type;
	load_icode(e,binary);
}

  第六个要实现的函数是env_run()。在用户态下运行一个进程。

//
// Context switch from curenv to env e.
// Note: if this is the first call to env_run, curenv is NULL.
//
// This function does not return.
//
void
env_run(struct Env *e)
{
	// Step 1: If this is a context switch (a new environment is running):
	//	   1. Set the current environment (if any) back to
	//	      ENV_RUNNABLE if it is ENV_RUNNING (think about
	//	      what other states it can be in),
	//	   2. Set 'curenv' to the new environment,
	//	   3. Set its status to ENV_RUNNING,
	//	   4. Update its 'env_runs' counter,
	//	   5. Use lcr3() to switch to its address space.
	// Step 2: Use env_pop_tf() to restore the environment's
	//	   registers and drop into user mode in the
	//	   environment.

	// Hint: This function loads the new environment's state from
	//	e->env_tf.  Go back through the code you wrote above
	//	and make sure you have set the relevant parts of
	//	e->env_tf to sensible values.

	// LAB 3: Your code here.
	// panic("env_run not yet implemented");

	// 有一个进程正在运行,需要涉及到上下文切换
	if(curenv && curenv->env_status == ENV_RUNNING){
		curenv->env_status = ENV_RUNNABLE;
	}
	// cprintf("[%08x] new env %08x and pgdir address %x\n", curenv ? curenv->env_id : 0, e->env_id,e->env_pgdir);
	// 设置新进程
	curenv = e;
	e->env_status = ENV_RUNNING;
	e->env_runs++;


	lcr3(PADDR(e->env_pgdir));


	env_pop_tf(&(e->env_tf));

}

  运行结果如下:

Triple fault.  Halting for inspection via QEMU monitor.

Below is a call graph of the code up to the point where the user code is invoked. Make sure you understand the purpose of each step.

  • start (kern/entry.S)
  • i386_init (kern/init.c)
    • cons_init
    • mem_init
    • env_init
    • trap_init (still incomplete at this point)
    • env_create
    • env_run
      • env_pop_tf

  上面是各函数被调用的顺序。

Once you are done you should compile your kernel and run it under QEMU. If all goes well, your system should enter user space and execute the hello binary until it makes a system call with the int instruction. At that point there will be trouble, since JOS has not set up the hardware to allow any kind of transition from user space into the kernel. When the CPU discovers that it is not set up to handle this system call interrupt, it will generate a general protection exception, find that it can't handle that, generate a double fault exception, find that it can't handle that either, and finally give up with what's known as a "triple fault". Usually, you would then see the CPU reset and the system reboot. While this is important for legacy applications (see this blog post for an explanation of why), it's a pain for kernel development, so with the 6.828 patched QEMU you'll instead see a register dump and a "Triple fault." message.

We'll address this problem shortly, but for now we can use the debugger to check that we're entering user mode. Use make qemu-gdb and set a GDB breakpoint at env_pop_tf, which should be the last function you hit before actually entering user mode. Single step through this function using si; the processor should enter user mode after the iret instruction. You should then see the first instruction in the user environment's executable, which is the cmpl instruction at the label start in lib/entry.S. Now use b *0x... to set a breakpoint at the int $0x30 in sys_cputs() in hello (see obj/user/hello.asm for the user-space address). This int is the system call to display a character to the console. If you cannot execute as far as the int, then something is wrong with your address space setup or program loading code; go back and fix it before continuing.

  在写完、调试完上面的函数运行以后,操作系统进入用户空间,执行hello二进制文件直到遇到int指令执行系统调用,这时就会出现问题,因为JOS还没有设置硬件实现从用户态到内核态的转换,当CPU发现没有办法处理系统调用时,会触发一个保护异常,发现这个保护异常也没办法处理,会产生一个二重错误异常,发现这个二重错误异常也没办法处理,最终会放弃处理,并报出“triple fault”这样的三重错误。也就是一旦进程切换到用户态,内核态将无法获得控制权,JOS还没有允许用户态到内核态的切换。

  我们会在后面解决这个问题,现在我们先用debugger检查,将断点设在env_pop_tf,这也是在进入用户模式时调用的最后一个函数。用si一步一步观察函数,处理器应该在iret指令后进入用户态,能够看到在用户进程中执行的第一条指令是 lib/entry.S:start中的cmpl指令,用 b *0x... 在obj/user/hello.asm: sys_cputs()函数中int $0x30设置断点,这个int是打印字符到输出台的系统调用。如果不能执行到int,那肯定是前面写错了,改好了以后重做。

void
sys_cputs(const char *s, size_t len)
{
  800a1c:	55                   	push   %ebp
  800a1d:	89 e5                	mov    %esp,%ebp
  800a1f:	57                   	push   %edi
  800a20:	56                   	push   %esi
  800a21:	53                   	push   %ebx
	//
	// The last clause tells the assembler that this can
	// potentially change the condition codes and arbitrary
	// memory locations.

	asm volatile("int %1\n"
  800a22:	b8 00 00 00 00       	mov    $0x0,%eax
  800a27:	8b 4d 0c             	mov    0xc(%ebp),%ecx
  800a2a:	8b 55 08             	mov    0x8(%ebp),%edx
  800a2d:	89 c3                	mov    %eax,%ebx
  800a2f:	89 c7                	mov    %eax,%edi
  800a31:	89 c6                	mov    %eax,%esi
  800a33:	cd 30                	int    $0x30

  运行到0x800a33后执行si会不断重启,qemu控制台不断打印Triple fault,结果如下:

(gdb) si
=> 0x800a2f:	mov    %eax,%edi
0x00800a2f in ?? ()
(gdb) si
=> 0x800a31:	mov    %eax,%esi
0x00800a31 in ?? ()
(gdb) si
=> 0x800a33:	int    $0x30
0x00800a33 in ?? ()
(gdb) si
=> 0x800a33:	int    $0x30
0x00800a33 in ?? ()
(gdb) si
=> 0x800a33:	int    $0x30
0x00800a33 in ?? ()
(gdb) si
=> 0x800a33:	int    $0x30
0x00800a33 in ?? ()
(gdb) 

 

知识点:Handling Interrupts and Exceptions

At this point, the first int $0x30 system call instruction in user space is a dead end: once the processor gets into user mode, there is no way to get back out. You will now need to implement basic exception and system call handling, so that it is possible for the kernel to recover control of the processor from user-mode code. The first thing you should do is thoroughly familiarize yourself with the x86 interrupt and exception mechanism.

In this lab we generally follow Intel's terminology for interrupts, exceptions, and the like. However, terms such as exception, trap, interrupt, fault and abort have no standard meaning across architectures or operating systems, and are often used without regard to the subtle distinctions between them on a particular architecture such as the x86. When you see these terms outside of this lab, the meanings might be slightly different.

  这个时候,在用户控件内执行 int $0x30这样的系统调用会进入死循环。因为一旦处理器进入用户态,无法转换回内核态,现在需要实现基础的异常和系统调用,这样内核可以从用户态重新拿回处理器的控制权。首先要做的就是熟悉X86系统的中断和异常处理。

  这个lab中我们使用到和中断、异常等概念相关的术语是依照intel标准,但其他系统架构或者操作系统用词可能会有细微不同。

Exercise 3

  Exercise 3. Read Chapter 9, Exceptions and Interrupts in the 80386 Programmer's Manual (or Chapter 5 of the IA-32 Developer's Manual), if you haven't already.

  阅读中断和异常相关的概念。

知识点:Basics of Protected Control Transfer

Exceptions and interrupts are both "protected control transfers," which cause the processor to switch from user to kernel mode (CPL=0) without giving the user-mode code any opportunity to interfere with the functioning of the kernel or other environments. In Intel's terminology, an interrupt is a protected control transfer that is caused by an asynchronous event usually external to the processor, such as notification of external device I/O activity. An exception, in contrast, is a protected control transfer caused synchronously by the currently running code, for example due to a divide by zero or an invalid memory access.

In order to ensure that these protected control transfers are actually protected, the processor's interrupt/exception mechanism is designed so that the code currently running when the interrupt or exception occurs does not get to choose arbitrarily where the kernel is entered or how. Instead, the processor ensures that the kernel can be entered only under carefully controlled conditions. On the x86, two mechanisms work together to provide this protection:

  异常和中断都是保护控制转移(protected control transfers),保护控制转移能让处理器从用户态转换到内核态,并且用户态的代码无法干扰内核和其他进程的运行。

  中断是由处理器外部的异步事件产生的保护控制转移,比如外部I/O设备的请求。

  异常是由正在运行代码的同步事件发生错误产生的保护控制转移,比如除零操作或者异常的内存访问。

  为了确保这些保护控制转移操作确实是受到“保护”的,处理器的中断/异常机制不会随便放下正在运行的代码进入内核态,处理器必须保证在安全的情况下才能进入内核态。

  X86提供了两种机制来共同保证安全状态:

  1. The Interrupt Descriptor Table. The processor ensures that interrupts and exceptions can only cause the kernel to be entered at a few specific, well-defined entry-points determined by the kernel itself, and not by the code running when the interrupt or exception is taken.

    The x86 allows up to 256 different interrupt or exception entry points into the kernel, each with a different interrupt vector. A vector is a number between 0 and 255. An interrupt's vector is determined by the source of the interrupt: different devices, error conditions, and application requests to the kernel generate interrupts with different vectors. The CPU uses the vector as an index into the processor's interrupt descriptor table (IDT), which the kernel sets up in kernel-private memory, much like the GDT. From the appropriate entry in this table the processor loads:

    • the value to load into the instruction pointer (EIP) register, pointing to the kernel code designated to handle that type of exception.
    • the value to load into the code segment (CS) register, which includes in bits 0-1 the privilege level at which the exception handler is to run. (In JOS, all exceptions are handled in kernel mode, privilege level 0.)
  2. The Task State Segment. The processor needs a place to save the old processor state before the interrupt or exception occurred, such as the original values of EIP and CS before the processor invoked the exception handler, so that the exception handler can later restore that old state and resume the interrupted code from where it left off. But this save area for the old processor state must in turn be protected from unprivileged user-mode code; otherwise buggy or malicious user code could compromise the kernel.

    For this reason, when an x86 processor takes an interrupt or trap that causes a privilege level change from user to kernel mode, it also switches to a stack in the kernel's memory. A structure called the task state segment (TSS) specifies the segment selector and address where this stack lives. The processor pushes (on this new stack) SS, ESP, EFLAGS, CS, EIP, and an optional error code. Then it loads the CS and EIP from the interrupt descriptor, and sets the ESP and SS to refer to the new stack.

    Although the TSS is large and can potentially serve a variety of purposes, JOS only uses it to define the kernel stack that the processor should switch to when it transfers from user to kernel mode. Since "kernel mode" in JOS is privilege level 0 on the x86, the processor uses the ESP0 and SS0 fields of the TSS to define the kernel stack when entering kernel mode. JOS doesn't use any other TSS fields.

  1.中断描述符表(The Interrupt Descriptor Table)
  操作系统的实现确保了处理器只会在发生中断或异常的一些特殊、合适的条件下进入内核态,这些条件是由内核自己决定的,而不是由中断或异常发生时的代码段决定的。

  X86支持最多256条中断/异常进入内核的条件,每个条件包含一个中断向量,是一个0~255之间的数。不同中断向量是由不同的中断源(source of the interrupt),不同中断源源自不同设备、不同错误信息、不同程序请求。CPU把中断向量当作处理器中中段描述符表(IDT)的索引使用,是内核设置在内核私有内存中的,就像GDT(全局描述符表/段描述符表)一样。从表中每一项,处理器可以得到:

  • 需要加载到指令指针寄存器(EIP)的值,该值指向内核中处理这类异常的代码。
  • 需要加载到代码段寄存器(CS)的值,其中最低两位表示优先级(这也是为什么说可以寻址 2^46 的空间而不是 2^48)。 在JOS 中,所有的异常都在内核模式处理,优先级为0 (用户模式为3)。

  2.任务状态段(The Task State Segment)

  当中断或异常发生时,处理器需要存储自身状态,方便保护当前执行现场。保存的内容包括EIP和CS寄存器的值,存储上下文的区域必须受到保护,不能遭到没有权限的用户态代码修改(恶意代码或bug)。

  当X86处理器执行中断或异常时,从用户态转换到内核态,同时也会自动切换到内核的栈地址空间,一个叫做任务状态栈(the task state segment ,TSS)的结构将会详细记录这个堆栈所在的段的段描述符和地址,处理器压入SS,ESP,EFLAGS,CS,EIP和可选的error code,之后载入中断描述符的CS和EIP,然后将ESP和SS指向新的栈。 

  尽管TSS很大且有很多功能,JOS仅用它来定义从用户态转到内核态时内核的栈。因为JOS中内核态的优先级是0,处理器用TSS中的ESP0和SS0定义内核栈,JOS不用其他TSS的变量。

 

知识点:Types of Exceptions and Interrupts

All of the synchronous exceptions that the x86 processor can generate internally use interrupt vectors between 0 and 31, and therefore map to IDT entries 0-31. For example, a page fault always causes an exception through vector 14. Interrupt vectors greater than 31 are only used by software interrupts, which can be generated by the int instruction, or asynchronous hardware interrupts, caused by external devices when they need attention.

In this section we will extend JOS to handle the internally generated x86 exceptions in vectors 0-31. In the next section we will make JOS handle software interrupt vector 48 (0x30), which JOS (fairly arbitrarily) uses as its system call interrupt vector. In Lab 4 we will extend JOS to handle externally generated hardware interrupts such as the clock interrupt.

  X86处理器能够产生的所有异常可以用0~31号中断向量表示,映射到IDT中0~31项。例如,页错误总是会产生一个中断向量为14的异常。大于31的中断向量植被用作软件中断(int指令)或者异步的硬件中断(外部设备请求)。

  这节我们会扩展JOS处理中断向量号0~31的错误,在下一节我们会让JOS处理中断向量号48(0x30)的软件中断,也是JOS用作系统调用的中断向量。在lab4会扩展JOS能够处理外部硬件中断。  

An Example

Let's put these pieces together and trace through an example. Let's say the processor is executing code in a user environment and encounters a divide instruction that attempts to divide by zero.

  举一个例子,处理器正在执行用户进程,然后遇到了一条除零指令。

  1. The processor switches to the stack defined by the SS0 and ESP0 fields of the TSS, which in JOS will hold the values GD_KD and KSTACKTOP, respectively.
  2. The processor pushes the exception parameters on the kernel stack, starting at address KSTACKTOP:
                         +--------------------+ KSTACKTOP             
                         | 0x00000 | old SS   |     " - 4
                         |      old ESP       |     " - 8
                         |     old EFLAGS     |     " - 12
                         | 0x00000 | old CS   |     " - 16
                         |      old EIP       |     " - 20 <---- ESP 
                         +--------------------+             
    	
  3. Because we're handling a divide error, which is interrupt vector 0 on the x86, the processor reads IDT entry 0 and sets CS:EIP to point to the handler function described by the entry.
  4. The handler function takes control and handles the exception, for example by terminating the user environment.

  1.处理器利用TSS中的SS0和ESP0,切换到内核栈,此时SS0==GD_KD,ESP0==KSTACKTOP。

  2.处理器将异常的参数压入内核栈,从KSTACKTOP开始压入SS, ESP, EFLAGS, CS, EIP。其中SS(堆栈选择器)的低16位和ESP共同确认了当前栈的状态,EFLAGS(标志寄存器)存储当前FLAG,CS(代码段寄存器)和EIP(指令指针寄存器)确定了当时要执行的代码地址。根据这些信息中断结束后可以恢复至中断前的状态。

  3.因为我们处理的是除法错误,在X86中的中断向量是0,处理器读取IDT索引为0的项,设置CS:EIP指向该错误的处理函数。

  4.处理函数接过控制权,处理错误,比如除法错误就终结进程。

For certain types of x86 exceptions, in addition to the "standard" five words above, the processor pushes onto the stack another word containing an error code. The page fault exception, number 14, is an important example. See the 80386 manual to determine for which exception numbers the processor pushes an error code, and what the error code means in that case. When the processor pushes an error code, the stack would look as follows at the beginning of the exception handler when coming in from user mode:

                     +--------------------+ KSTACKTOP             
                     | 0x00000 | old SS   |     " - 4
                     |      old ESP       |     " - 8
                     |     old EFLAGS     |     " - 12
                     | 0x00000 | old CS   |     " - 16
                     |      old EIP       |     " - 20
                     |     error code     |     " - 24 <---- ESP
                     +--------------------+             

  X86下对于特定的异常,除了会压入栈标准的五个寄存器值,还会压入一个叫error code的值,页错误异常(中断向量号14)是一个重要的例子,它需要额外压入一个 error code。

知识点:Nested Exceptions and Interrupts

The processor can take exceptions and interrupts both from kernel and user mode. It is only when entering the kernel from user mode, however, that the x86 processor automatically switches stacks before pushing its old register state onto the stack and invoking the appropriate exception handler through the IDT. If the processor is already in kernel mode when the interrupt or exception occurs (the low 2 bits of the CS register are already zero), then the CPU just pushes more values on the same kernel stack. In this way, the kernel can gracefully handle nested exceptions caused by code within the kernel itself. This capability is an important tool in implementing protection, as we will see later in the section on system calls.

If the processor is already in kernel mode and takes a nested exception, since it does not need to switch stacks, it does not save the old SS or ESP registers. For exception types that do not push an error code, the kernel stack therefore looks like the following on entry to the exception handler:

  处理器在用户态或者内核态都有可能处理异常和中断,在切换内核态和用户态时,X86处理器在压入老的寄存器状态前会自动切换对应的栈,然后通过IDT触发合适的异常处理函数。如果处理器在中断或异常发生的时候已经在内核态,那么 CPU 就直接将状态压入内核栈,不再需要切换栈。这样,内核就能处理内核自身引起的"嵌套异常",这是实现保护的重要工具。

  如果处理器已经在内核态,然后发生了嵌套异常,由于它不需要切换栈,不会保存老的SS和ESP。对于不包含 error code 的异常,在进入处理函数前内核栈状态如下所示:

                     +--------------------+ <---- old ESP
                     |     old EFLAGS     |     " - 4
                     | 0x00000 | old CS   |     " - 8
                     |      old EIP       |     " - 12
                     +--------------------+             

For exception types that push an error code, the processor pushes the error code immediately after the old EIP, as before.

There is one important caveat to the processor's nested exception capability. If the processor takes an exception while already in kernel mode, and cannot push its old state onto the kernel stack for any reason such as lack of stack space, then there is nothing the processor can do to recover, so it simply resets itself. Needless to say, the kernel should be designed so that this can't happen.

  对于包含了 error code 的异常,则将 error code 继续 push 到 EIP之后。如果 CPU 处理嵌套异常的时候,无法将状态 push 到内核栈(由于栈空间不足等原因),则 CPU 无法恢复当前状态,只能重启。当然,这是内核设计中必须避免的。

Setting Up the IDT

You should now have the basic information you need in order to set up the IDT and handle exceptions in JOS. For now, you will set up the IDT to handle interrupt vectors 0-31 (the processor exceptions). We'll handle system call interrupts later in this lab and add interrupts 32-47 (the device IRQs) in a later lab.

The header files inc/trap.h and kern/trap.h contain important definitions related to interrupts and exceptions that you will need to become familiar with. The file kern/trap.h contains definitions that are strictly private to the kernel, while inc/trap.h contains definitions that may also be useful to user-level programs and libraries.

Note: Some of the exceptions in the range 0-31 are defined by Intel to be reserved. Since they will never be generated by the processor, it doesn't really matter how you handle them. Do whatever you think is cleanest.

  为了建立IDT以及处理JOS中的异常,应该知道一些基础信息。现在设置IDT处理中断向量0~31的异常,之后lab中处理其他异常。头文件inc/trap.h和kern/trap.h包含了包含了中断和异常相关的定义,kern/trap.h 包含了内核私有定义,inc/trap.h 包含对内核以及用户进程和库都有用的定义。

The overall flow of control that you should achieve is depicted below:

      IDT                   trapentry.S         trap.c
   
+----------------+                        
|   &handler1    |---------> handler1:          trap (struct Trapframe *tf)
|                |             // do stuff      {
|                |             call trap          // handle the exception/interrupt
|                |             // ...           }
+----------------+
|   &handler2    |--------> handler2:
|                |            // do stuff
|                |            call trap
|                |            // ...
+----------------+
       .
       .
       .
+----------------+
|   &handlerX    |--------> handlerX:
|                |             // do stuff
|                |             call trap
|                |             // ...
+----------------+

Each exception or interrupt should have its own handler in trapentry.S and trap_init() should initialize the IDT with the addresses of these handlers. Each of the handlers should build a struct Trapframe (see inc/trap.h) on the stack and call trap() (in trap.c) with a pointer to the Trapframe. trap() then handles the exception/interrupt or dispatches to a specific handler function.

  每一种异常或中断都应该有trapentry.S和trap_init()有自己的处理函数,每个处理函数都需要在栈上新建一个 struct Trapframe(见 inc/trap.h),以其地址为参数调用 trap() 函数,然后进行异常处理。

 

Exercise 4

Exercise 4. Edit trapentry.S and trap.c and implement the features described above. The macros TRAPHANDLER and TRAPHANDLER_NOEC in trapentry.S should help you, as well as the T_* defines in inc/trap.h. You will need to add an entry point in trapentry.S (using those macros) for each trap defined in inc/trap.h, and you'll have to provide _alltraps which the TRAPHANDLER macros refer to. You will also need to modify trap_init() to initialize the idt to point to each of these entry points defined in trapentry.S; the SETGATE macro will be helpful here.

Your _alltraps should:

  1. push values to make the stack look like a struct Trapframe
  2. load GD_KD into %ds and %es
  3. pushl %esp to pass a pointer to the Trapframe as an argument to trap()
  4. call trap (can trap ever return?)

Consider using the pushal instruction; it fits nicely with the layout of the struct Trapframe.

Test your trap handling code using some of the test programs in the user directory that cause exceptions before making any system calls, such as user/divzero. You should be able to get make grade to succeed on the divzero, softint, and badsegment tests at this point.

  编写trapentry.S和trap.c 实现上述功能。

  首先观察 trapentry.S中定义的TRAPHANDLER和TRAPHANDLER_NOEC。

#define TRAPHANDLER(name, num)						\
	.globl name;		/* define global symbol for 'name' */	\
	.type name, @function;	/* symbol type is function */		\
	.align 2;		/* align function definition */		\
	name:			/* function starts here */		\
	pushl $(num);							\
	jmp _alltraps

/* Use TRAPHANDLER_NOEC for traps where the CPU doesn't push an error code.
 * It pushes a 0 in place of the error code, so the trap frame has the same
 * format in either case.
 */
#define TRAPHANDLER_NOEC(name, num)					\
	.globl name;							\
	.type name, @function;						\
	.align 2;							\
	name:								\
	pushl $0;							\
	pushl $(num);							\
	jmp _alltraps

.text

  里面的指令包括:先声明一个全局符号name,可以在其他文件调用;type定义一个符号是函数(function)还是对象(object);align用来指定内存对齐的方式,align2等同于按两字节对齐;push寄存器结构Trapframe来保存上下文,然后进入处理函数;最后会调用 _alltraps。其中两个函数定义的不同在于,可能有的异常会额外压入error code,这时使用TRAPHANDLER_NOEC,否则使用TRAPHANDLER。

  依据inc/trap.h中找到的中断类型,在trapentry.S中编写对应函数的宏定义。

/*
 * Lab 3: Your code here for generating entry points for the different traps.
 * interrupt vector 8、10、11、12、13、14 have error code
 */
TRAPHANDLER_NOEC(divide_handler, T_DIVIDE);
TRAPHANDLER_NOEC(debug_handler, T_DEBUG);
TRAPHANDLER_NOEC(nmi_handler, T_NMI);
TRAPHANDLER_NOEC(brkpt_handler, T_BRKPT);
TRAPHANDLER_NOEC(oflow_handler, T_OFLOW);
TRAPHANDLER_NOEC(bound_handler, T_BOUND);
TRAPHANDLER_NOEC(illop_handler, T_ILLOP);
TRAPHANDLER_NOEC(device_handler, T_DEVICE);

TRAPHANDLER(dblflt_handler, T_DBLFLT);
TRAPHANDLER(tss_handler, T_TSS);
TRAPHANDLER(segnp_handler, T_SEGNP);
TRAPHANDLER(stack_handler, T_STACK);
TRAPHANDLER(gpflt_handler, T_GPFLT);
TRAPHANDLER(pgflt_handler, T_PGFLT);

TRAPHANDLER_NOEC(fperr_handler, T_FPERR);
TRAPHANDLER_NOEC(align_handler, T_ALIGN);
TRAPHANDLER_NOEC(mchk_handler, T_MCHK);
TRAPHANDLER_NOEC(simderr_handler, T_SIMDERR);
TRAPHANDLER_NOEC(syscall_handler, T_SYSCALL);

  该部分主要作用是声明函数。

   接下来完善_alltraps函数,作用是包装好Trapframe结构,调用trap()时传入Trapframe指针类型的参数。

/*
 * Lab 3: Your code here for _alltraps
 * 可从trap.h中得知,已在内核态时tf_esp、tf_ss不用入栈,tf_err、tf_eip、tf_cs、tf_eflags入栈由硬件自动完成,只需要入栈tf_es、tf_ds。
 * 入栈的参数在函数中是从右往左,在结构体中就要从下往上入栈,这也是因为栈从高地址到低地址
 */
_alltraps:

	/*make the stack look like a struct Trapframe*/
	pushl %ds
	pushl %es
	pushal 

	/*load GD_KD into %ds and %es*/
	movl $GD_KD, %eax
	movl %eax, %ds
	movl %eax, %es

	/*push %esp as an argument to trap()*/
	pushl %esp 

	call trap

  最后重写trap_init()初始化IDT表,按照索引指向trapentry.S中对应的函数入口,入口函数的地址由SETGATE()返回。注意trapentry.S中不同处理函数是全局的,但是在 C 文件中使用的时候需要声明一下。

void
trap_init(void)
{
	extern struct Segdesc gdt[];

	// LAB 3: Your code here.
	void divide_handler();
	void debug_handler();
	void nmi_handler();    
	void brkpt_handler(); 
	void oflow_handler();
	void bound_handler();
	void illop_handler();
	void device_handler();

	void dblflt_handler();
	void tss_handler();
	void segnp_handler();
	void stack_handler();
	void gpflt_handler();
	void pgflt_handler();

	void fperr_handler();
	void align_handler();
	void mchk_handler();
	void simderr_handler();
	void syscall_handler();

	// #define SETGATE(gate, istrap, sel, off, dpl)
	// SETGATE第一个参数是idt表的index入口,第二个参数是 是否为异常,第三个参数是代码段选择符
	// 第四个参数是处理函数的地址,第五个参数是出发中断或异常的权限
	SETGATE(idt[T_DIVIDE], 1, GD_KT, divide_handler, 0);
    SETGATE(idt[T_DEBUG], 1, GD_KT, debug_handler, 0);
    SETGATE(idt[T_NMI], 1, GD_KT, nmi_handler, 0);
    SETGATE(idt[T_BRKPT], 1, GD_KT, brkpt_handler, 0);
    SETGATE(idt[T_OFLOW], 1, GD_KT, oflow_handler, 0);
    SETGATE(idt[T_BOUND], 1, GD_KT, bound_handler, 0);
    SETGATE(idt[T_ILLOP], 1, GD_KT, illop_handler, 0);
    SETGATE(idt[T_DEVICE], 1, GD_KT, device_handler, 0);

    SETGATE(idt[T_DBLFLT], 1, GD_KT, dblflt_handler, 0);
    SETGATE(idt[T_TSS], 1, GD_KT, tss_handler, 0);
    SETGATE(idt[T_SEGNP], 1, GD_KT, segnp_handler, 0);
    SETGATE(idt[T_STACK], 1, GD_KT, stack_handler, 0);
    SETGATE(idt[T_GPFLT], 1, GD_KT, gpflt_handler, 0);
    SETGATE(idt[T_PGFLT], 1, GD_KT, pgflt_handler, 0);
    
    SETGATE(idt[T_FPERR], 1, GD_KT, fperr_handler, 0);
    SETGATE(idt[T_ALIGN], 1, GD_KT, align_handler, 0);
    SETGATE(idt[T_MCHK], 1, GD_KT, mchk_handler, 0);
    SETGATE(idt[T_SIMDERR], 1, GD_KT, simderr_handler, 0);

    // interrupt
    SETGATE(idt[T_SYSCALL], 0, GD_KT, syscall_handler, 3);

	// Per-CPU setup 
	trap_init_percpu();
}

Questions

Answer the following questions in your answers-lab3.txt:

  1.What is the purpose of having an individual handler function for each exception/interrupt? (i.e., if all exceptions/interrupts were delivered to the same handler, what feature that exists in the current implementation could not be provided?)

  设置很多个独立的中断/异常处理方程的目的在于不同的中断/异常可能有不同的处理方式,比如遇到异常可能会尝试重新运行或者尝试失败后终止进程,而中断多是由外部请求引起,执行完外部请求还要继续执行。

  2.Did you have to do anything to make the user/softint program behave correctly? The grade script expects it to produce a general protection fault (trap 13), but softint's code says int $14Why should this produce interrupt vector 13? What happens if the kernel actually allows softint's int $14 instruction to invoke the kernel's page fault handler (which is interrupt vector 14)?

  softint是用户程序(特权值为3),却在里面调用了int这样需要系统调用(特权值为0) 的指令,会导致general protection fault(trap 13)。如果允许执行,会导致trap 14的缺页错误。

 

Part B: Page Faults, Breakpoints Exceptions, and System Calls


Now that your kernel has basic exception handling capabilities, you will refine it to provide important operating system primitives that depend on exception handling.

  现在内核已经能处理基础的异常啦,现在可以进一步改进。

Handling Page Faults

The page fault exception, interrupt vector 14 (T_PGFLT), is a particularly important one that we will exercise heavily throughout this lab and the next. When the processor takes a page fault, it stores the linear (i.e., virtual) address that caused the fault in a special processor control register, CR2. In trap.c we have provided the beginnings of a special function, page_fault_handler(), to handle page fault exceptions.

  缺页异常(中断向量14)是一个非常重要的异常。当处理器遇到缺页异常,它把造成错误的虚拟地址存在一个特殊的处理器控制寄存器CR2中。在trap.c里,已经提够了初始的缺页处理函数 page_fault_handler()处理缺页异常。

Exercise 5

Exercise 5. Modify trap_dispatch() to dispatch page fault exceptions to page_fault_handler(). You should now be able to get make grade to succeed on the faultread, faultreadkernel, faultwrite, and faultwritekernel tests. If any of them don't work, figure out why and fix them. Remember that you can boot JOS into a particular user program using make run-x or make run-x-nox.  

  修改trap_dispatch()函数将缺页异常的处理分配给page_fault_handler()。

static void
trap_dispatch(struct Trapframe *tf)
{
	// Handle processor exceptions.
	// LAB 3: Your code here.
	switch(tf->tf_trapno){
	
	case T_PGFLT:
		page_fault_handler(tf);
		break;

	default:
		// Unexpected trap: The user process or the kernel has a bug.
		print_trapframe(tf);
		if (tf->tf_cs == GD_KT)
			panic("unhandled trap in kernel");
		else {
			env_destroy(curenv);
			return;
		}
	}
}

  后面还会继续完善缺页异常的处理。

The Breakpoint Exception

The breakpoint exception, interrupt vector 3 (T_BRKPT), is normally used to allow debuggers to insert breakpoints in a program's code by temporarily replacing the relevant program instruction with the special 1-byte int3 software interrupt instruction. In JOS we will abuse this exception slightly by turning it into a primitive pseudo-system call that any user environment can use to invoke the JOS kernel monitor. This usage is actually somewhat appropriate if we think of the JOS kernel monitor as a primitive debugger. The user-mode implementation of panic() in lib/panic.c, for example, performs an int3 after displaying its panic message.

  断点(breakpoint)异常(中断向量3)是在运行的代码中设置一个断点方便debug,实现方式是暂时在断点处插入一个1-byte int3软件中断指令。在JOS中我们先滥用以下这个异常,把它变成一个任何用户进程都能调用的伪系统调用,作用是打开JOS内核监视器。这种处理是合适的,把JOS内核监视器看做一个比较原始的debug工具。在lib/panic.c中有panic()的实现,可以在panic信息后展示一个 int3。

void
_panic(const char *file, int line, const char *fmt, ...)
{
	va_list ap;

	va_start(ap, fmt);

	// Print the panic message
	cprintf("[%08x] user panic in %s at %s:%d: ",
		sys_getenvid(), binaryname, file, line);
	vcprintf(fmt, ap);
	cprintf("\n");

	// Cause a breakpoint exception
	while (1)
		asm volatile("int3");
}

Exercise 6

Exercise 6. Modify trap_dispatch() to make breakpoint exceptions invoke the kernel monitor. You should now be able to get make grade to succeed on the breakpoint test.

  kern/monitor.c中提供了转换到内核监视器的函数monitor(),定义如下:

void
monitor(struct Trapframe *tf)
{
	char *buf;

	cprintf("Welcome to the JOS kernel monitor!\n");
	cprintf("Type 'help' for a list of commands.\n");

	if (tf != NULL)
		print_trapframe(tf);

	while (1) {
		buf = readline("K> ");
		if (buf != NULL)
			if (runcmd(buf, tf) < 0)
				break;
	}
}

  修改trap_dispatch()函数将断点异常的处理分配给monitor()。

  还需要修改SETGATE()的特权级别,能够让用户进程调用。

    SETGATE(idt[T_BRKPT], 1, GD_KT, brkpt_handler, 3);

  结果:

faultread: OK (1.9s) 
faultreadkernel: OK (2.1s) 
faultwrite: OK (1.9s) 
faultwritekernel: OK (2.1s) 
breakpoint: OK (1.9s) 

Questions

  1.The break point test case will either generate a break point exception or a general protection fault depending on how you initialized the break point entry in the IDT (i.e., your call to SETGATE from trap_init). Why? How do you need to set it up in order to get the breakpoint exception to work as specified above and what incorrect setup would cause it to trigger a general protection fault?

  产生哪种异常取决于SETGATE()设置IDT入口函数的特权值,如果特权值为0就只有内核才能调用,如果用户进程调用就会产生通用保护异常,如果特权值为3用户进程也能调用,就会产生断点异常。

What do you think is the point of these mechanisms, particularly in light of what the user/softint test program does?

  重点在于不同的优先级可能造成不同的异常,要正确设置优先级。

System calls

User processes ask the kernel to do things for them by invoking system calls. When the user process invokes a system call, the processor enters kernel mode, the processor and the kernel cooperate to save the user process's state, the kernel executes appropriate code in order to carry out the system call, and then resumes the user process. The exact details of how the user process gets the kernel's attention and how it specifies which call it wants to execute vary from system to system.

In the JOS kernel, we will use the int instruction, which causes a processor interrupt. In particular, we will use int $0x30 as the system call interrupt. We have defined the constant T_SYSCALL to 48 (0x30) for you. You will have to set up the interrupt descriptor to allow user processes to cause that interrupt. Note that interrupt 0x30 cannot be generated by hardware, so there is no ambiguity caused by allowing user code to generate it.

The application will pass the system call number and the system call arguments in registers. This way, the kernel won't need to grub around in the user environment's stack or instruction stream. The system call number will go in %eax, and the arguments (up to five of them) will go in %edx%ecx%ebx%edi, and %esi, respectively. The kernel passes the return value back in %eax. The assembly code to invoke a system call has been written for you, in syscall() in lib/syscall.c. You should read through it and make sure you understand what is going on.

  用户进程通过调用系统调用来请求内核做一些事。当用户进程请求系统调用时,处理器进入内核态,处理器和内核合作保存处理器状态,内核将执行适当的代码来完成系统调用,之后恢复用户进程继续执行。

  在JOS内核中,我们使用int指令造成处理器中断,尤其会使用int $0x30这条特定指令来执行系统调用中断。已经定义了T_SYSCALL常数作为中断向量,需要在中断表中定义,允许用户进程可以调用中断。注意中断48不是已经由硬件产生的,所以由用户进程定义它不会引起歧义。

  程序会把系统调用号和系统调用参数放入寄存器,这样,内核不用去找用户进程栈上的数据或指令流。系统调用号会放在 %eax里,参数(最多五个)会依次进入%edx%ecx%ebx%edi, %esi。系统调用结束后,内核将返回值放入%eax。系统调用的汇编代码已经在lib/syscall.c中的 syscall() 写好了,先了解清楚系统调用发生了什么。

static inline int32_t
syscall(int num, int check, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	int32_t ret;

	// Generic system call: pass system call number in AX,
	// up to five parameters in DX, CX, BX, DI, SI.
	// Interrupt kernel with T_SYSCALL.
	//
	// The "volatile" tells the assembler not to optimize
	// this instruction away just because we don't use the
	// return value.
	//
	// The last clause tells the assembler that this can
	// potentially change the condition codes and arbitrary
	// memory locations.

	asm volatile("int %1\n"
		     : "=a" (ret)
		     : "i" (T_SYSCALL),
		       "a" (num),
		       "d" (a1),
		       "c" (a2),
		       "b" (a3),
		       "D" (a4),
		       "S" (a5)
		     : "cc", "memory");

	if(check && ret > 0)
		panic("syscall %d returned %d (> 0)", num, ret);

	return ret;
}

  这是系统调用的通用模板,不同的系统调用 (例如sys_cputs, sys_cgetc) 都会以不同参数调用 syscall 函数。asm volatile是GCC内联汇编的固定语法。

  我们可以要求编译器将一个函数的代码插入到调用者代码中函数被实际调用的地方。这样的函数就是内联函数。内联函数和宏定义的区别是内联函数并非引用,而是直接复制到调用的地方执行。

gcc 可以将变量保存在任何可用的 GPR 中。要指定寄存器,你必须使用特定寄存器约束直接地指定寄存器的名字。它们为:

+---+--------------------+
| r |    Register(s)     |
+---+--------------------+
| a |   %eax, %ax, %al   |
| b |   %ebx, %bx, %bl   |
| c |   %ecx, %cx, %cl   |
| d |   %edx, %dx, %dl   |
| S |   %esi, %si        |
| D |   %edi, %di        |
+---+--------------------+

其他一些约束:

  1. "m" : 允许一个内存操作数,可以使用机器普遍支持的任一种地址。
  2. "o" : 允许一个内存操作数,但只有当地址是可偏移的。即,该地址加上一个小的偏移量可以得到一个有效地址。
  3. "V" : 一个不允许偏移的内存操作数。换言之,任何适合 "m" 约束而不适合 "o" 约束的操作数。
  4. "i" : 允许一个(带有常量)的立即整形操作数。这包括其值仅在汇编时期知道的符号常量。
  5. "n" : 允许一个带有已知数字的立即整形操作数。许多系统不支持汇编时期的常量,因为操作数少于一个字宽。对于此种操作数,约束应该使用 'n' 而不是'i'。
  6. "g" : 允许任一寄存器、内存或者立即整形操作数,不包括通用寄存器之外的寄存器。

  可以看出该内联汇编的作用是引发一个int中断,中断向量为T_SYSCALL,将参数放入寄存器

Exercise 7

Exercise 7. Add a handler in the kernel for interrupt vector T_SYSCALL. You will have to edit kern/trapentry.S and kern/trap.c's trap_init(). You also need to change trap_dispatch() to handle the system call interrupt by calling syscall() (defined in kern/syscall.c) with the appropriate arguments, and then arranging for the return value to be passed back to the user process in %eax. Finally, you need to implement syscall() in kern/syscall.c. Make sure syscall() returns -E_INVAL if the system call number is invalid. You should read and understand lib/syscall.c (especially the inline assembly routine) in order to confirm your understanding of the system call interface. Handle all the systems calls listed in inc/syscall.h by invoking the corresponding kernel function for each call.

Run the user/hello program under your kernel (make run-hello). It should print "hello, world" on the console and then cause a page fault in user mode. If this does not happen, it probably means your system call handler isn't quite right. You should also now be able to get make grade to succeed on the testbss test.

通过 exercise 7,可以看出 JOS系 统调用的步骤为:

  1. 用户进程使用 inc/ 目录下暴露的接口
  2. lib/syscall.c 中的函数将系统调用号及必要参数传给寄存器,并引起一次 int $0x30 中断
  3. kern/trap.c 捕捉到这个中断,并将 TrapFrame 记录的寄存器状态作为参数,调用处理中断的函数
  4. kern/syscall.c 处理中断 

  JOS用户进程请求系统调用的流程是,先去使用inc/目录下暴露的接口,lib/syscall.c中的函数将系统调用号即

  先为中断向量  T_SYSCALL增加一个处理函数,先在trap_init()中设置好入口,并且设置特权值为3。

SETGATE(idt[T_SYSCALL], 0, GD_KT, syscall_handler, 3);
TRAPHANDLER_NOEC(syscall_handler, T_SYSCALL);

  修改trap_dispatch()函数处理系统调用的情况,调用syscall函数,先来看看syscall中的定义。

  在inc/syscall.h中定义了系统调用号:

/* system call numbers */
enum {
	SYS_cputs = 0,
	SYS_cgetc,
	SYS_getenvid,
	SYS_env_destroy,
	NSYSCALLS
};

  在lib/syscall.c中定义了调用函数,调用函数上面已经给出了解析。

	case T_SYSCALL:
		// 按照a,d,c,b,D,S传入参数,分别对应eax,edx,ecx,ebx,edi,esi
		// 内核将返回值放入%eax
		tf->tf_regs.reg_eax = syscall(tf->tf_regs.reg_eax,tf->tf_regs.reg_edx,
				tf->tf_regs.reg_ecx,tf->tf_regs.reg_ebx,tf->tf_regs.reg_edi,tf->tf_regs.reg_esi);
		break;

  在 kern/trap.c 中调用的是kern/syscall.c的 syscall 函数,对应不同的系统调用号,分类处理。

// Dispatches to the correct kernel function, passing the arguments.
int32_t
syscall(uint32_t syscallno, uint32_t a1, uint32_t a2, uint32_t a3, uint32_t a4, uint32_t a5)
{
	// Call the function corresponding to the 'syscallno' parameter.
	// Return any appropriate return value.
	// LAB 3: Your code here.

	// panic("syscall not implemented");
	int32_t ret = 0;

	switch (syscallno) {
	case SYS_cputs:
		sys_cputs((const char*)a1,a2);
		break;
	case SYS_cgetc:
		ret = sys_cgetc();
		break;
	case SYS_env_destroy:
		ret = sys_env_destroy(a1);
		break;
	case SYS_getenvid:
		ret = sys_getenvid();
		break;

	default:
		ret = -E_INVAL;
	}
	return ret;
}

  结果:

HemingbeardeMacBook-Pro:lab hemingbear$ make run-hello
+ cc kern/init.c
+ ld obj/kern/kernel
+ mk obj/kern/kernel.img
qemu-system-i386 -drive file=obj/kern/kernel.img,index=0,media=disk,format=raw -serial mon:stdio -gdb tcp::25501 -D qemu.log 
6828 decimal is 15254 octal!
Physical memory: 131072K available, base = 640K, extended = 130432K
check_page_free_list() succeeded!
check_page_alloc() succeeded!
check_page() succeeded!
check_kern_pgdir() succeeded!
check_page_free_list() succeeded!
check_page_installed_pgdir() succeeded!
[00000000] new env 00001000 and pgdir address f03bc000
Incoming TRAP frame at 0xefffffbc
hello, world
Incoming TRAP frame at 0xefffffbc

User-mode startup

A user program starts running at the top of lib/entry.S. After some setup, this code calls libmain(), in lib/libmain.c. You should modify libmain() to initialize the global pointer thisenv to point at this environment's struct Env in the envs[] array. (Note that lib/entry.S has already defined envs to point at the UENVS mapping you set up in Part A.) Hint: look in inc/env.h and use sys_getenvid.

libmain() then calls umain, which, in the case of the hello program, is in user/hello.c. Note that after printing "hello, world", it tries to access thisenv->env_id. This is why it faulted earlier. Now that you've initialized thisenv properly, it should not fault. If it still faults, you probably haven't mapped the UENVS area user-readable (back in Part A in pmap.c; this is the first time we've actually used the UENVS area).

  一个用户程序是从lib/entry.S的顶部开始运行的,经过一些设置,程序调用 lib/libmain.c中的libmain(),需要修改libmain()初始化全局指针thisenv指向envs数组中这个进程对应的Env。

  libmain()之后调用umain(用户进程的main函数),在hello程序中main函数放在user/hello.c中。注意打印"hello, world"之后,会访问thisenv->env_id,这就是为什么exercise 7为什么出现页错误,如果修改好libmain()后thisenv会被正确初始化,就不会报错了。如果还是有错误,可能在做partA的时候UENVS区域没有设置为用户程序可读的。

  按指示修改libmain()。

	// set thisenv to point at our Env structure in envs[].
	// LAB 3: Your code here.
	// thisenv = 0;
	thisenv = &envs[ENVX(sys_getenvid())];

  运行报错,还要修改以前的代码:

//
// Allocate len bytes of physical memory for environment env,
// and map it at virtual address va in the environment's address space.
// Does not zero or otherwise initialize the mapped pages in any way.
// Pages should be writable by user and kernel.
// Panic if any allocation attempt fails.
//
static void
region_alloc(struct Env *e, void *va, size_t len)
{
	// LAB 3: Your code here.
	// (But only if you need it for load_icode.)
	//
	// Hint: It is easier to use region_alloc if the caller can pass
	//   'va' and 'len' values that are not page-aligned.
	//   You should round va down, and round (va + len) up.
	//   (Watch out for corner-cases!)
	// 页机制是将整个内存按页的粒度分配内存的,而不是给一个起始地址开辟一块页大小的空间,
	// 假如起始地址映射到物理空间在一个页中间,要分配一个页大小,那么会占用两个页的位置

	uintptr_t va_start = ROUNDDOWN((uintptr_t)va,PGSIZE);		//round va down
	uintptr_t va_end = ROUNDUP((uintptr_t)va+len,PGSIZE);
	size_t pg_num = (va_end-va_start)/PGSIZE;
	struct PageInfo *pp = NULL;

	for(size_t i=0;i<pg_num;++i){
		pp = page_alloc(0);		//调用page_alloc分配物理内存
		if(!pp){
			panic("region_alloc failed!\n");
		}
		int r = page_insert(e->env_pgdir,pp,(void*)va_start,PTE_U | PTE_W);		//映射va到物理页pp
		if(r){
			panic("page_insert failed!\n");
		}
		va_start += PGSIZE;
	}
}

知识点:Page faults and memory protection

Memory protection is a crucial feature of an operating system, ensuring that bugs in one program cannot corrupt other programs or corrupt the operating system itself.

Operating systems usually rely on hardware support to implement memory protection. The OS keeps the hardware informed about which virtual addresses are valid and which are not. When a program tries to access an invalid address or one for which it has no permissions, the processor stops the program at the instruction causing the fault and then traps into the kernel with information about the attempted operation. If the fault is fixable, the kernel can fix it and let the program continue running. If the fault is not fixable, then the program cannot continue, since it will never get past the instruction causing the fault.

  内存保护是操作系统的重要特性,保证一个程序的bug不会危害到其他程序甚至操作系统。

  操作系统通常通过硬件支持来实现内存保护,OS让硬件知道哪些地址是可以访问的哪些不可以。如果一个程序访问了非法地址或者它没有权限访问的地址,处理区会停止程序继续执行,然后通知内核产生了错误。如果这个错误是可以修复的,内核会修复它并让程序继续执行,如果错误不能修复,终止程序运行。

As an example of a fixable fault, consider an automatically extended stack. In many systems the kernel initially allocates a single stack page, and then if a program faults accessing pages further down the stack, the kernel will allocate those pages automatically and let the program continue. By doing this, the kernel only allocates as much stack memory as the program needs, but the program can work under the illusion that it has an arbitrarily large stack.

System calls present an interesting problem for memory protection. Most system call interfaces let user programs pass pointers to the kernel. These pointers point at user buffers to be read or written. The kernel then dereferences these pointers while carrying out the system call. There are two problems with this:

  1. A page fault in the kernel is potentially a lot more serious than a page fault in a user program. If the kernel page-faults while manipulating its own data structures, that's a kernel bug, and the fault handler should panic the kernel (and hence the whole system). But when the kernel is dereferencing pointers given to it by the user program, it needs a way to remember that any page faults these dereferences cause are actually on behalf of the user program.
  2. The kernel typically has more memory permissions than the user program. The user program might pass a pointer to a system call that points to memory that the kernel can read or write but that the program cannot. The kernel must be careful not to be tricked into dereferencing such a pointer, since that might reveal private information or destroy the integrity of the kernel.

For both of these reasons the kernel must be extremely careful when handling pointers presented by user programs.

  举一个可以修复的错误的例子,考虑一种自动增长的栈,在很多操作系统里内核先分配给栈一个页,如果程序出现对于栈的访问超过了此页的内存大小,内核会再分配一页给栈使用,然后继续运行。通过这样每次需要时增加一页,内核能够大致分配程序需要的栈空间大小,并且让程序感觉拥有任意大小的栈空间。

  系统调用请求会给内核保护带来有趣的问题,大部分系统调用接口让用户程序将指针传到内核,这些指针是用来读和写用户进程中数据,内核处理系统调用的时候会对这些指针进行解析,会带来两方面问题:

  1.   内核产生的页错误会比用户程序产生的页错误严重得多,如果内核在操作自己的数据结构时产生页错误,这是内核bug,引起系统崩溃(panic)。但当内核解析由用户进程传来的指针时,它需要一种方式标记解析带来的页错误是来自于用户程序,这样操作系统不会随便崩溃。
  2.   内核访问内存的权限显然比用户程序的权限高很多。用户程序可能传过来的指针,指向的内存位置是内核可以读写用户程序却不能读写的。内核解析的时候要十分小心,这样可能导致内核的私有信息泄露或破坏内核完整性。

  由于上述原因,内核在处理用户进程指针的时候要非常小心。

You will now solve these two problems with a single mechanism that scrutinizes all pointers passed from userspace into the kernel. When a program passes the kernel a pointer, the kernel will check that the address is in the user part of the address space, and that the page table would allow the memory operation.

Thus, the kernel will never suffer a page fault due to dereferencing a user-supplied pointer. If the kernel does page fault, it should panic and terminate.

  现在构造一种机制处理从用户空间传到内核空间的所有指针,这样内核不会再因解析用户进程指针而产生页错误。

Exercise 9

Exercise 9. Change kern/trap.c to panic if a page fault happens in kernel mode.

Hint: to determine whether a fault happened in user mode or in kernel mode, check the low bits of the tf_cs.

Read user_mem_assert in kern/pmap.c and implement user_mem_check in that same file.

Change kern/syscall.c to sanity check arguments to system calls.

Boot your kernel, running user/buggyhello. The environment should be destroyed, and the kernel should not panic. You should see:

	[00001000] user_mem_check assertion failure for va 00000001
	[00001000] free env 00001000
	Destroyed the only environment - nothing more to do!
	

Finally, change debuginfo_eip in kern/kdebug.c to call user_mem_check on usdstabs, and stabstr. If you now run user/breakpoint, you should be able to run backtrace from the kernel monitor and see the backtrace traverse into lib/libmain.c before the kernel panics with a page fault. What causes this page fault? You don't need to fix it, but you should understand why it happens.

  修改kern/trap.c 判断页错误是否来自内核,来自内核就panic:

	// Handle kernel-mode page faults.

	// LAB 3: Your code here.
	// check the low bits of the tf_cs
	if((tf->tf_cs & 3)==0) panic("kernel-mode page faults\n");

  观察kern/pmap.c中的user_mem_assert(),实现user_mem_check():

//
// Checks that environment 'env' is allowed to access the range
// of memory [va, va+len) with permissions 'perm | PTE_U | PTE_P'.
// If it can, then the function simply returns.
// If it cannot, 'env' is destroyed and, if env is the current
// environment, this function will not return.
//
void
user_mem_assert(struct Env *env, const void *va, size_t len, int perm)
{
	if (user_mem_check(env, va, len, perm | PTE_U) < 0) {
		cprintf("[%08x] user_mem_check assertion failure for "
			"va %08x\n", env->env_id, user_mem_check_addr);
		env_destroy(env);	// may not return
	}
}

//
// Check that an environment is allowed to access the range of memory
// [va, va+len) with permissions 'perm | PTE_P'.
// Normally 'perm' will contain PTE_U at least, but this is not required.
// 'va' and 'len' need not be page-aligned; you must test every page that
// contains any of that range.  You will test either 'len/PGSIZE',
// 'len/PGSIZE + 1', or 'len/PGSIZE + 2' pages.
//
// A user program can access a virtual address if (1) the address is below
// ULIM, and (2) the page table gives it permission.  These are exactly
// the tests you should implement here.
//
// If there is an error, set the 'user_mem_check_addr' variable to the first
// erroneous virtual address.
//
// Returns 0 if the user program can access this range of addresses,
// and -E_FAULT otherwise.
//
int
user_mem_check(struct Env *env, const void *va, size_t len, int perm)
{
	// LAB 3: Your code here.
	// cprintf("user_mem_check va: %x, len: %x\n", va, len);
	uintptr_t start = ROUNDDOWN((uintptr_t)va, PGSIZE);
	uintptr_t end = ROUNDUP((uintptr_t)va + len, PGSIZE);
	for(uintptr_t cur = start;cur<end;cur += PGSIZE){
		pte_t *pte = pgdir_walk(env->env_pgdir, (void *)cur, 0);
		if(pte == NULL || cur >=ULIM || (*pte & (perm | PTE_P)) != (perm|PTE_P)){
			if(cur == start) user_mem_check_addr = (uintptr_t)va;	// first page begin from va not ROUNDDOWN(va,PGSIZE)
			else user_mem_check_addr = cur;
			return -E_FAULT;
		}
	}
	return 0;	
}

  修改kern/syscall.c,调用user_mem_assert检查用户进程是否有权写读,如果没有终止进程。

// Print a string to the system console.
// The string is exactly 'len' characters long.
// Destroys the environment on memory errors.
static void
sys_cputs(const char *s, size_t len)
{
	// Check that the user has permission to read memory [s, s+len).
	// Destroy the environment if not.

	// LAB 3: Your code here.
	user_mem_assert(curenv,s,len,PTE_U);

	// Print the string supplied by the user.
	cprintf("%.*s", len, s);
}

  在kern/kdebug.c 的debuginfo_eip调用user_mem_check。

		// Make sure this memory is valid.
		// Return -1 if it is not.  Hint: Call user_mem_check.
		// LAB 3: Your code here.
		if(user_mem_check(curenv, (void *)usd, sizeof(struct UserStabData), PTE_U)<0) 
			return -1;

		stabs = usd->stabs;
		stab_end = usd->stab_end;
		stabstr = usd->stabstr;
		stabstr_end = usd->stabstr_end;

		// Make sure the STABS and string table memory is valid.
		// LAB 3: Your code here.
		if(user_mem_check(curenv, (void *)stabs, stab_end-stabs, PTE_U)<0 || 
			  user_mem_check(curenv, (void *)stabstr, stabstr_end-stabstr, PTE_U)<0) 
			return -1;

 

If you now run user/breakpoint, you should be able to run backtrace from the kernel monitor and see the backtrace traverse into lib/libmain.c before the kernel panics with a page fault. What causes this page fault? You don't need to fix it, but you should understand why it happens.

    运行make run-breakpoint,输入backtrace观察结果:

K> backtrace
Stack backtrace:
ebp efffff10 eip f0100a5e args 00000001 efffff28 f0226000 00000000 f01e4a40

     kern/monitor.c:194: monitor+260
ebp efffff80 eip f0103fb8 args f0226000 efffffbc 00000000 00000000 00000000

     kern/trap.c:202: trap+187
ebp efffffb0 eip f01040d8 args efffffbc 00000000 00000000 eebfdfd0 efffffdc

     kern/syscall.c:69: syscall+0
ebp eebfdfd0 eip 0080007b args 00000000 00000000 00000000 00000000 00000000

     lib/libmain.c:28: libmain+63
ebp eebfdff0 eip 00800031 args 00000000 00000000Incoming TRAP frame at 0xeffffe7c
kernel panic at kern/trap.c:277: kernel-mode page faults

    可能是尝试访问的地址(注意ebp)超出用户栈大小(USTACKTOP),导致的page faults。

  结果:

boot block is 382 bytes (max 510)
+ mk obj/kern/kernel.img
divzero: OK (2.0s) 
softint: OK (1.4s) 
badsegment: OK (2.2s) 
Part A score: 30/30

faultread: OK (1.9s) 
faultreadkernel: OK (2.0s) 
faultwrite: OK (2.1s) 
faultwritekernel: OK (1.9s) 
breakpoint: OK (2.0s) 
testbss: OK (2.0s) 
hello: OK (2.0s) 
buggyhello: OK (2.0s) 
    (Old jos.out.buggyhello failure log removed)
buggyhello2: OK (2.0s) 
    (Old jos.out.buggyhello2 failure log removed)
evilhello: OK (1.9s) 
    (Old jos.out.evilhello failure log removed)
Part B score: 50/50

Score: 80/80

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值