xv6 init process

最新推荐文章于 2024-05-29 15:59:24 发布

原创最新推荐文章于 2024-05-29 15:59:24 发布

· 863 阅读

0 ·

版权

xv6 专栏收录该内容

13 篇文章

订阅专栏

学习xv6第一个进程init启动过程
源码xv6/xv6-public

由xv6 lab1知道kernel入口为entry(entry.S)，涉及启动分页，内核初始化，启动多处理器，启动init进程。下面带着问题去分析。

1.kernel 启动虚拟地址在哪？
这个问题可以问问链接脚本ld，它负责定义虚拟地址。查看kernel.ld 得知，如下所示：

//xv6/xv6-public/kernel.ld:
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(_start)

SECTIONS                                                                                                                        
{
    /* Link the kernel at this address: "." means the current address */
        /* Must be equal to KERNLINK */
    . = 0x80100000;

    .text : AT(0x100000) {
        *(.text .stub .text.* .gnu.linkonce.t.*)
    }   

    PROVIDE(etext = .); /* Define the 'etext' symbol to this value */

    .rodata : { 
        *(.rodata .rodata.* .gnu.linkonce.r.*)
    }   

    /* Include debugging information in kernel memory */
    .stab : { 
        PROVIDE(__STAB_BEGIN__ = .); 
        *(.stab);
        PROVIDE(__STAB_END__ = .); 
        BYTE(0)     /* Force the linker to allocate space
                   for this section */
    }

ENTRY(_start) 　内核的代码段入口：_start
. = 0x80100000 内核的起始虚拟地址位置为：0x80100000
.text : AT(0x100000) 内核代码段的内存装载地址为：0x100000
. = ALIGN(0x1000) 内核代码段保证 4KB 对齐

#memlayout.h  
#define KERNBASE 0x80000000         // First kernel virtual address  
#define EXTMEM  0x100000                // Start of extended memory
#define KERNLINK (KERNBASE+EXTMEM)  // Address where kernel is linked 
#define V2P_WO(x) ((x) - KERNBASE)

V2P_WO将内存虚拟地址转换成物理地址。我们知道内核的虚拟地址为 0x80100000 ，其对应的内存物理地址是 0x100000 。计算代码的偏移量公式为：

指令虚拟地址 = 0x80100000 + 偏移量
指令内存地址 = 0x100000 + 偏移量
执行内存地址 = 0x100000 + 指令虚拟地址 - 0x80100000 = 指令虚拟地址 - 0x80000000

2.xv6 如何建立分页机制？

首先介绍相关的Paging registers，如下所示：在这里插入图片描述
xv6启动分页之前必须创建页表并设置cr3寄存器，然后将cr0寄存器PG位置1。除此之外，x86还允许创建不同粒度的内存页，这涉及到cr4寄存器。

xv6 支持两种粒度的内存页，4k和4M。由于4M是过渡页表，所以 kernel 不能超过 4MB，严格意义上说是不能超过 4MB-64K，因为内核从 0x10000（64K）开始加载。过渡页表主要让分页后的内核能正常运行，主要是内存分配器的代码。后面 kernel 会重新设置页表。

4k页如下所示：
在这里插入图片描述
4M页如下所示：

创建分页流程：

设置cr4 第 5 位PSE ，当该位置为 1 时表示内存页大小为 4MB，当置为 0 时表示内存页大小为 4KB；
设置cr3页目录基地址；
设置cr0启动分页；
创建boot CPU内核栈4K，启动其他CPU时，每个都有自己的stack；
跳转到 main 继续执行。

代码分析：

#entry.S 
.globl _start
_start = V2P_WO(entry)
# Entering xv6 on boot processor, with paging off.
.globl entry
entry:
  # Turn on page size extension for 4Mbyte pages
  movl    %cr4, %eax
  orl     $(CR4_PSE), %eax
  movl    %eax, %cr4              //cr4=0x10
                                                                                                        
  # Set page directory
  movl    $(V2P_WO(entrypgdir)), %eax   //entrypgdir=0x80109000
  movl    %eax, %cr3                                //cr3=0x109000
  
  # Turn on paging.
  movl    %cr0, %eax
  orl     $(CR0_PG|CR0_WP), %eax
  movl    %eax, %cr0               //cr0= 0x80010011
  
 # Set up the stack pointer
  movl $(stack + KSTACKSIZE), %esp   //esp=0x8010b5c0（esp 指向高地址）
  
  # Enter the high address (2GB above)
  mov $main, %eax
  jmp *%eax       //jmp 0x80102e00

# common symbol
.comm stack, KSTACKSIZE

#param.h
#define KSTACKSIZE 4096  // size of per-process kernel stack

页目录entrypgdir的物理地址存到CR3寄存器中，这个数组就是页目录，它有两种映射：

//main.c  
pde_t entrypgdir[NPDENTRIES] = {
  // Map VA's [0, 4MB) to PA's [0, 4MB)
  [0] = (0) | PTE_P | PTE_W | PTE_PS,
  // Map VA's [KERNBASE, KERNBASE+4MB) （2GB，2GB+4MB）to PA's [0, 4MB)
  [KERNBASE>>PDXSHIFT] = (0) | PTE_P | PTE_W | PTE_PS,
};

#翻译之后：
unsigned int entrypgdir[1024] = {
    [0] = 0 | 0x001 | 0x002 | 0x080,               // 0x083 = 0000 1000 0011
    [0x80000000 >> 22] = 0 | 0x001 | 0x002 | 0x080  // 0x083
};

Entry 0 maps virtual addresses 0:0x400000 to physical addresses 0:0x400000. This mapping is required as long as entry is executing at low addresses, but will eventually be removed.
Entry 512 maps virtual addresses KERNBASE:KERNBASE+0x400000 to physical addresses 0:0x400000. This entry will be used by the kernel after entry has finished; it maps the high virtual addresses at which the kernel expects to find its instructions and data to the low physical addresses where the boot loader loaded them. This mapping restricts the kernel instructions and data to 4 Mbytes.

这个页目录只有页表项 0 和页表项512，页表项0将虚拟地址0:0x400000映射到物理地址0:0x400000。页表项512将虚拟地址的KERNBASE：KERNBASE+0x400000映射到物理地址0:0x400000。(注意：PTE_PS in a page directory entry enables 4M byte pages)，其他页表项全部未作设置，而且通过这两个页表项的值也可以清楚的看出这段基地址为 0 的 4MB 大小的内存页还是特级权限内存页。
这只是一个临时页表，只保证内核在即将打开内存分页支持后内核可以正常执行接下来的代码，而内核在紧接着执行 main 方法时会马上再次重新分配新的页表，而且最终的页表是 4KB 单位页面的精细页表。

QEMU includes a built-in monitor that can inspect and modify the machine state in useful ways. To enter the monitor, press Ctrl-a c in the terminal running QEMU. Press Ctrl-a c again to switch back to the serial console.

Documents/work/code/xv6/xv6-public$ make qemu-gdb 
*** Now run 'gdb'.
qemu-system-i386 -serial mon:stdio -drive file=fs.img,index=1,media=disk,format=raw -drive file=xv6.img,index=0,media=disk,format=raw -smp 2 -m 512  -S -gdb tcp::26000
QEMU 2.3.0 monitor - type 'help' for more information
(qemu)

如上所示，执行make qemu-gdb ，然后ctrl + a + c，进入qemu monitor mode。

另开一个窗口，执行：

Documents/work/code/xv6/xv6-public$gdb
(gdb) b *0x7c00
(gdb) c
(gdb) b *0x0010000c //分页启用前
(gdb) si

设置断点 b * 0x0010000c，运行到此断点处，查看分页和寄存器信息：

(qemu) info mem
PG disabled
(qemu) info pg
PG disabled
(qemu) info registers
...
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000010

当执行si到开启分页时，查看此时mem/pg：

(qemu) info mem
0000000000000000-0000000000400000 0000000000400000 -rw
0000000080000000-0000000080400000 0000000000400000 -rw
(qemu) info pg
VPN range     Entry         Flags        Physical page
[00000-003ff]  PDE[000]     --S-A---WP 00000-003ff
[80000-803ff]  PDE[200]     --SDA---WP 00000-003ff
(qemu) info registers
...
CR0=80010011 CR2=00000000 CR3=00109000 CR4=00000010
...

Note:

info mem – Display mapped virtual memory and permission. It tells us that the 0x00000000004000000 bytes of memory from 0x0000000000000000 to 0x00000000004000000 are mapped read/write and only kernel-accessible, while the memory from 0000000080000000 to 0000000080400000 is mapped read/write, only kernel-accessible.
info pg – Display the current page table structure. The output is similar to info mem, but distinguishes page directory entries and page table entries and gives the permissions for each separately.

最后跳转到main函数：

#进入高地址空间(2GB以上)
mov $main, %eax
jmp *%eax

为什么使用直接寻址？
采用直接寻址可以实现从低地址跳到高地址。如果直接jmp main，实际是相对当前位置的偏称跳转，跳转仍然在低地址。

1.jmp mylable          //eb 03
2.jmp 0x8048377        //e9 03 00 00 00
 
3.jmp   *%eax        //ff e0
4.jmp   *(%ebx)      //ff 23
5.jmp   *0x80494A8   //ff 25 a8 94 04 08

1和2叫做间接寻址，就是算偏移量的。后面没有星号，而是直接一个标签或者地址（标签就可以看做是地址），所以说，就是一个直接的地址的值。间接跳转的二进制代码是eb或者e9，是e开头的。
3，4，5叫做直接寻址，直接寻址的标识就是带星号！直接寻址，就是PC直接赋值某个地址，而不是加偏移量。直接跳转的二进制代码是ff开头的。

采用间接寻址方式：jmp 跳转到0x80102e00
在这里插入图片描述
采用jmp main : jmp跳转到0x102e00

 65 # mov $main, %eax
 66 # jmp *%eax
 67   jmp main

gdb 调试：

b *0x7c00
c
//进入kernel.asm: 8010000c <entry>:  
b *0x10000c 
c

在这里插入图片描述
jmp introduction：

unconditional jump
conditional jump

unconditional jump 分为3类：

short jump (relative jump)
- 2-byte instruction that allows jumps or branches to memory locations within +127 and –128 bytes from the address following the jump. When the microprocessor executes a short jump, the displacement is sign-extended and added to the instruction pointer (IP/EIP) to generate the jump address within the current code segment.
near jump
- 3-byte instruction that allows a branch or jump within ±32K bytes from the instruction in the current code segment. Remember that segments are cyclic in nature, which means that one location above offset address FFFFH is offset address 0000H.
far jump
- A far jump instruction obtains a new segment and offset address to
  accomplish the jump. bytes 5-byte allows a jump to any memory location within the real memory system. Bytes 2 and 3 contain the new offset address; Byte 4 and 5 contain the new segment address.

3.内核初始化
到现在为止，只有boot CPU在运转，其他CPU需要boot CPU去启动，在启动之前，boot CPU先要作必要的初始化工作。

3.1.内存重新初始化
main函数中调用kinit1和kinit2来初始化物理内存。kinit1初始化内核end地址到4M的物理内存空间为未使用，即 0x801154a8–0x80400000，调用freerange 将内存加入空闲链表，freearnge 通过kfree实现该功能。所以当分配内存时将页移出该链表，释放内存时将页加入该链表。

kinit1(end, P2V(4*1024*1024));

xv6定义物理内存分配器的结构，如下所示：

   16 struct run {
   17   struct run *next;                                                                                                       
   18 };
   19   
   20 struct {
   21   struct spinlock lock;
   22   int use_lock;
   23   struct run *freelist;
   24 } kmem;

boot CPU启动其他CPU之后，继续执行kinit2()初始化剩余的内存空间，即从0x80400000–0x8e000000（PHYSTOP）。

 kinit2(P2V(4*1024*1024), P2V(PHYSTOP));

内存分配如下图所示：
在这里插入图片描述
调用kinit1之后，接着调用kvmalloc 创建并切换到一个拥有内核运行所需 KERNBASE 以上映射的页表。

setupkvm完成的工作：

首先分配一页内存来放置页目录，调用mappages 来建立内核需要的映射，这些映射在kmap数组中找到。这里的映射包括内核的指令和数据，PHYSTOP以下的物理内存，以及I/O设备所占的内存。注意，setupkvm 不会建立任何用户内存的映射。
mappages 是在页表中建立一段虚拟内存到物理内存的映射。它是在页的级别，即一页一页地建立映射的。对于每一个待映射的虚拟地址，mappages调用walkpgdir 来找到该地址对应的PTE地址，然后初始化该PTE以保存对应的物理页号等信息。

3.2.启动其他CPU

xv6启动时先将系统放入BSP（Bootstrap processor，启动CPU）中启动，BSP进入main()方法后首先进行了一系列初始化，其中包括mpinit()，此方法目的是检测CPU个数并将检测到的CPU存入一个全局的数组中，之后进入startothers()方法通过向AP（non-boot CPU,非启动CPU）发送中断的方式来启动AP，最后执行mpmain()方法。

startothers()主要工作：

复制启动代码到0x7000处，这部分代码相当于boot CPU的启动扇区代码
为每个AP分配4K stack
告知每个AP kernel入口在哪里(通过mpenter函数)
告知每个AP页目录在哪里(entrypgdir)
然后控制local apic进行CPU间通讯，依次启动其他CPU。启动之后cpu执行mpenter()，进而进入 scheduler()开始执行程序。

3.3.启动init进程

Unix operating systems have a process that is responsible for setting up the environment that the user sees (starting up terminals, etc.). This process is called “init”. It is the very first user level process and is assigned the special PID 1. initcode.S and the userinit function start up this user level process.

//Makefile:
 1 initcode: initcode.S
 2         $(CC) $(CFLAGS) -nostdinc -I. -c initcode.S
 3         $(LD) $(LDFLAGS) -N -e start -Ttext 0 -o initcode.out initcode.o
 4         $(OBJCOPY) -S -O binary initcode.out initcode
 5         $(OBJDUMP) -S initcode.o > initcode.asm
 6
 7 kernel: $(OBJS) bootother initcode
 8         $(LD) $(LDFLAGS) -Ttext 0x100000 -e main -o kernel $(OBJS) -b binary initcode bootother
 9         $(OBJDUMP) -S kernel > kernel.asm
10         $(OBJDUMP) -t kernel | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > kernel.sym

Lines 1-5 above generate the initcode used in the function userinit. On the second line, initcode.S is compiled and on lines 3 and 4 packaged as a binary file called initcode. Notice that on line 3, the default entry point is “start” and that the code is linked starting at address 0. This means that the initcode binary expects to be loaded so that the “start” function is at address 0.

After initcode is compiled, it is linked into the kernel so that it can be used by userinit during runtime. The linking takes place on line 8; notice initcode at the end of the line. The symbols _binary_initcode_start, _binary_initcode_size and _binary_initcode_end are added by the linker on line 8 and the symbols can be found by looking at the kernel.sym file.

xv6 has a mechanism to take an existing user process and fork a new user process. This new process can continue executing the code or call exec to load a new application off of disk and run it. However, before init is started, there are no user processes at all; init is the very first user process. Therefore, we cannot use the fork and exec to start the init process because fork and exec need to be called from an active process. To start init, xv6 actually needs to create a user process first. How do we do that? We can setup the address space (using segments), create a stack, a heap and an area for code. We can then load some code into the code area and execute it. (This is exactly what userinit does.)

Init process does the following:

Prepare the Kernel Memory Space ( userinit(), setupkvm() )
Prepare Kernel Stack( allocproc() )
Prepare User Memory Space( userinit(), inituvm() )
Set up Trap Frame ( allocproc(), userinit() )
Set up Context ( allocproc() )

userinit主要工作就是分配内存，设置 struct proc 结构体相关的信息。userinit 调用 allocproc 分配 struct proc，并设置相关字段。xv6 维护一个 struct proc 数组，当创建新进程的时候，找到表中未用的元素，用来存放当前进程的 struct proc。如果没有找到，返回 NULL指针。如果表中有可用的元素，接下来就是设置 struct proc的相关字段，首先设置 pid 和进程状态，然后分配内核堆栈内存，并初始化内核堆栈。

   73 static struct proc*
   74 allocproc(void)
   75 {
          ...
   94   // Allocate kernel stack.
   95   if((p->kstack = kalloc()) == 0){
   96     p->state = UNUSED;
   97     return 0;
   98   }
   99   sp = p->kstack + KSTACKSIZE;
  100 
  101   // Leave room for trap frame.
  102   sp -= sizeof *p->tf;
  103   p->tf = (struct trapframe*)sp;
  104 
  105   // Set up new context to start executing at forkret,
  106   // which returns to trapret.
  107   sp -= 4;
  108   *(uint*)sp = (uint)trapret;
  109 
  110   sp -= sizeof *p->context;
  111   p->context = (struct context*)sp;                                                                                       
  112   memset(p->context, 0, sizeof *p->context);
  113   p->context->eip = (uint)forkret;

用户栈和内核栈：
每个进程都有用户栈和内核栈，当执行用户指令时，使用用户栈；当进程通过系统调用和中断进入内核时，切换用户栈到内核栈。

When the process is executing user instructions, only its user stack is in use, and its kernel stack is empty. When the process enters the kernel (for a system call or interrupt), the kernel code executes on the process’s kernel stack; while a process is in the kernel, its user stack still contains saved data, but isn’t actively used. A process’s thread alternates between actively using its user stack and its kernel stack. The kernel stack is separate (and protected from user code) so that the kernel can execute even if a process has wrecked its user stack.

创建的内核栈从下向上分为三部分：

struct trapframe：系统调用或者中断发生时，需要保存的信息
trapret
struct context：进程切换需要保存的上下文


                  /   +---------------+ <-- stack base(= p->kstack + KSTACKSIZE)
                  |   | ss            |                           
                  |   +---------------+                           
                  |   | esp           |                           
                  |   +---------------+                           
                  |   | eflags        |                           
                  |   +---------------+                           
                  |   | cs            |                           
                  |   +---------------+                           
                  |   | eip           | <-- 从此往上部分，在iret时自动弹出到相关寄存器中，只需把%esp指到这里即可
                  |   +---------------+    
                  |   | err           |  
                  |   +---------------+  
                  |   | trapno        |  
                  |   +---------------+                       
                  |   | ds            |                           
                  |   +---------------+                           
                  |   | es            |                           
                  |   +---------------+                           
                  |   | fs            |                           
 struct trapframe |   +---------------+                           
                  |   | gs            |                           
                  |   +---------------+   
                  |   | eax           |   
                  |   +---------------+   
                  |   | ecx           |   
                  |   +---------------+   
                  |   | edx           |   
                  |   +---------------+   
                  |   | ebx           |   
                  |   +---------------+                        
                  |   | oesp          |   
                  |   +---------------+   
                  |   | ebp           |   
                  |   +---------------+   
                  |   | esi           |   
                  |   +---------------+   
                  |   | edi           |   
                  \   +---------------+ <-- p->tf                 
                      | trapret       |                           
                  /   +---------------+ <-- forkret will return to
                  |   | eip(=forkret) | <-- return addr           
                  |   +---------------+                           
                  |   | ebp           |                           
                  |   +---------------+                           
   struct context |   | ebx           |                           
                  |   +---------------+                           
                  |   | esi           |                           
                  |   +---------------+                           
                  |   | edi           |                           
                  \   +-------+-------+ <-- p->context            
                      |       |       |                           
                      |       v       |                           
                      |     empty     |                           
                      +---------------+ <-- p->kstack             
 */

在这里插入图片描述
接着userinit 调用 setupkvm 创建页表，映射内核代码到用户的地址空间。然后调用 inituvm 分配物理内存，将程序拷贝到物理内存，创建用户程序的页表映射。程序被映射到虚拟地址 0 开始的位置，所以第一条指令的地址是 0。接下来初始化 struct trapframe，主要是段寄存器，用户栈相关的寄存器，状态寄存器和 eip。最后将进程状态设置为 RUNNABLE，调用 mpmain找到RUNNABLE process 等待scheduler(）调度，获取 CPU 运行。

inituvm：
mappages(pgdir, 0, PGSIZE, V2P(mem), PTE_W|PTE_U);

进程调度器scheduler：
到现在为止，CPU 运行的所有代码都是内核代码，包括前面的进程创建代码。接下来，每个 CPU 会起一个调度器，找到一个 RUNNABLE 进程，切换当前内核调度器到可运行的用户线程上，运行用户进程。

函数mpmain调用函数 scheduler运行调度器，调度器是一个死循环，它查找 proc 数组，找到可运行的进程，切换调用内核调度器到可运行的进程并运行，并设置进程状态为 RUNNING。

scheduler调用swtch，因为 swtch 是函数调用，所以内核堆栈会把参数和 eip 压栈，然后跳转到 swtch.S。

 # proc.c ：
 //the context switch to first process to run
 swtch(&(c->scheduler), p->context);
 
  #swtch.S：
  9 .globl swtch
 10 swtch:
 11   movl 4(%esp), %eax
 12   movl 8(%esp), %edx
 13 
 14   # Save old callee-saved registers
 15   pushl %ebp
 16   pushl %ebx
 17   pushl %esi
 18   pushl %edi
 19 
 20   # Switch stacks
 21   movl %esp, (%eax)
 22   movl %edx, %esp
 23 
 24   # Load new callee-saved registers
 25   popl %edi
 26   popl %esi
 27   popl %ebx
 28   popl %ebp
 29   ret

内存映射空间如下所示：
在这里插入图片描述