MIT JOS学习笔记01：环境配置、Boot Loader（2016.10.22）

最新推荐文章于 2022-07-10 16:14:39 发布

weixin_33951761

最新推荐文章于 2022-07-10 16:14:39 发布

阅读量171

点赞数

文章标签：运维操作系统

原文链接：http://www.cnblogs.com/LostChristmas/p/5987381.html

版权

未经许可谢绝以任何形式对本文内容进行转载！

一、环境配置

　　关于MIT课程中使用的JOS的配置教程网上已经有很多了，在这里就不做介绍，个人使用的是Ubuntu 16.04 + qemu。另注，本文章中贴出的代码均是JOS中未经修改的源代码，其中有一些细节是MIT课程中要求学生自己实现的。

二、Boot Loader代码分析

　　1.boot.S（AT&T汇编格式）

 1 #include <inc/mmu.h>
 2 
 3 # Start the CPU: switch to 32-bit protected mode, jump into C.
 4 # The BIOS loads this code from the first sector of the hard disk into
 5 # memory at physical address 0x7c00 and starts executing in real mode
 6 # with %cs=0 %ip=7c00.
 7 
 8 .set PROT_MODE_CSEG, 0x8         # kernel code segment selector
 9 .set PROT_MODE_DSEG, 0x10        # kernel data segment selector
10 .set CR0_PE_ON,      0x1         # protected mode enable flag
11 
12 .globl start
13 start:
14   .code16                     # Assemble for 16-bit mode
15   cli                         # Disable interrupts
16   cld                         # String operations increment
17 
18   # Set up the important data segment registers (DS, ES, SS).
19   xorw    %ax,%ax             # Segment number zero
20   movw    %ax,%ds             # -> Data Segment
21   movw    %ax,%es             # -> Extra Segment
22   movw    %ax,%ss             # -> Stack Segment
23 
24   # Enable A20:
25   #   For backwards compatibility with the earliest PCs, physical
26   #   address line 20 is tied low, so that addresses higher than
27   #   1MB wrap around to zero by default.  This code undoes this.
28 seta20.1:
29   inb     $0x64,%al               # Wait for not busy
30   testb   $0x2,%al
31   jnz     seta20.1
32 
33   movb    $0xd1,%al               # 0xd1 -> port 0x64
34   outb    %al,$0x64
35 
36 seta20.2:
37   inb     $0x64,%al               # Wait for not busy
38   testb   $0x2,%al
39   jnz     seta20.2
40 
41   movb    $0xdf,%al               # 0xdf -> port 0x60
42   outb    %al,$0x60
43 
44   # Switch from real to protected mode, using a bootstrap GDT
45   # and segment translation that makes virtual addresses 
46   # identical to their physical addresses, so that the 
47   # effective memory map does not change during the switch.
48   lgdt    gdtdesc
49   movl    %cr0, %eax
50   orl     $CR0_PE_ON, %eax
51   movl    %eax, %cr0
52   
53   # Jump to next instruction, but in 32-bit code segment.
54   # Switches processor into 32-bit mode.
55   ljmp    $PROT_MODE_CSEG, $protcseg
56 
57   .code32                     # Assemble for 32-bit mode
58 protcseg:
59   # Set up the protected-mode data segment registers
60   movw    $PROT_MODE_DSEG, %ax    # Our data segment selector
61   movw    %ax, %ds                # -> DS: Data Segment
62   movw    %ax, %es                # -> ES: Extra Segment
63   movw    %ax, %fs                # -> FS
64   movw    %ax, %gs                # -> GS
65   movw    %ax, %ss                # -> SS: Stack Segment
66   
67   # Set up the stack pointer and call into C.
68   movl    $start, %esp
69   call bootmain
70 
71   # If bootmain returns (it shouldn't), loop.
72 spin:
73   jmp spin
74 
75 # Bootstrap GDT
76 .p2align 2                                # force 4 byte alignment
77 gdt:
78   SEG_NULL                # null seg
79   SEG(STA_X|STA_R, 0x0, 0xffffffff)    # code seg
80   SEG(STA_W, 0x0, 0xffffffff)            # data seg
81 
82 gdtdesc:
83   .word   0x17                            # sizeof(gdt) - 1
84   .long   gdt                             # address gdt

boot.S

　　boot.S的代码如上所示，这部分代码的作用是将处理器从实模式切换到保护模式，然后再进行后续的加载内核程序的操作。为什么要让处理器切换到保护模式下工作？这就要从PC物理内存的分布来考虑，以现在4GB的内存为例，PC物理内存的分布大致可以用以下的图来表示：

在早期16bits的8088处理器上，地址总线为20位，可寻址空间为2^20，因此只能访问最下方1MB的内存。之后随着技术发展，Intel公司的处理器发展到了32bits寻址，为了兼容原来的软硬件，保留了PC物理内存中最下方1MB内存的布局和使用方式，把这之上的内存高地址部分设置为扩展内存（尽管这部分内存有一部分预留给了32bits的设备，如上图所示），而只有处理器工作在保护模式下才能访问到这部分扩展内存（详见其他关于保护模式的分析）。

　　在boot.S的开头，使用了.set汇编伪指令定义了Boot Loader代码段和数据段的段选择子（Segment Selector）和标志位CR0_PE_ON（这个标志位与切换到保护模式有关，具体会在后面介绍）。这之后，进行了一系列的初始化，包括关中断、DF寄存器复位、用0初始化部分段寄存器等（还有一部分是通过in、out等汇编指令和端口交换字节信息，因为个人对硬件端口和工作原理并不熟悉，这部分的分析可能以后补上）。再之后就是关键的从实模式切换到保护模式的代码：

1   lgdt    gdtdesc
2   movl    %cr0, %eax
3   orl     $CR0_PE_ON, %eax
4   movl    %eax, %cr0

其中第1行使用lgdt指令载入了事先定义好的GDT（全局描述符表，Global Descriptor Table），这张GDT的内容在boot.S的末尾：

 1 # Bootstrap GDT
 2 .p2align 2                                # force 4 byte alignment
 3 gdt:
 4   SEG_NULL                # null seg
 5   SEG(STA_X|STA_R, 0x0, 0xffffffff)    # code seg
 6   SEG(STA_W, 0x0, 0xffffffff)            # data seg
 7 
 8 gdtdesc:
 9   .word   0x17                            # sizeof(gdt) - 1
10   .long   gdt                             # address gdt

值得注意的是，lgdt指令需要的参数共6bytes，其中低位的2bytes表示该GDT的大小，高位的4bytes表示指向该GDT的32bits基址（gdtdesc所指向的参数满足这一要求），使用lgdt指令的目的是在实模式切换到保护模式之前进行初始化。接着我们来分析GDT的结构（上述代码的3-6行），其中SEG_NULL、SEG宏均是在“inc/mmu.h”中定义的宏：

1 #define SEG_NULL                        \
2     .word 0, 0;                        \
3     .byte 0, 0, 0, 0
4 #define SEG(type,base,lim)                    \
5     .word (((lim) >> 12) & 0xffff), ((base) & 0xffff);    \
6     .byte (((base) >> 16) & 0xff), (0x90 | (type)),        \
7         (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)

从宏定义来看，SEG_NULL定义了一个空段（根据习惯，GDT的第一个段都是空段），接着

1 SEG(STA_X|STA_R, 0x0, 0xffffffff)    # code seg

定义了可执行的（STA_X）、可读的（STA_R）、基址为0x0且大小为0xffffffff（即占整个PC内存、大小为4GB的）的代码段，而

1 SEG(STA_W, 0x0, 0xffffffff)            # data seg

定义了可读的（STA_W，取这个值时该段不可执行）、基址为0x0且大小为0xffffffff（同上）的数据段，以上就是boot.S中预定义的GDT的内容。

　　然后我们再回到从模式切换部分的汇编代码的第2-4行（值得注意的是，AT&T汇编和Intel汇编的源操作数和目标操作数的顺序不同），这3行代码利用eax寄存器，让原本的控制寄存器CR0和标志位CR0_PE_ON进行or运算（实质上是将CR0寄存器的第0位置1），将处理器由实模式切换到了保护模式。为什么这3行代码能够做到模式的切换？这就需要了解CR0各位表示的含义：

CR0各位含义
比特位	简写	全称	描述
0	PE	Protected Mode Enable	保护模式使能，PE=1表示CPU处于保护模式，PE=0表示CPU处于实模式
1	MP	Monitor co-processor	协处理器监控，MP=1表示协处理器在工作，MP=0表示协处理器未工作
2	EM	Emulation	协处理器仿真，当MP=0且EM=1表示正在使用软件仿真协处理器工作
3	TS	Task switched	任务转换，每次任务转换时，TS=1表示任务转换完毕
4	ET	Extension type	处理器扩展类型，表示所扩展的协处理器的类型，ET=0表示80287，ET=1表示80387
5	NE	Numeric error	数值异常中断控制，如果运行协处理器指令发生故障，NE=1表示使用异常中断处理，NE=0表示用外部中断处理
16	WP	Write protect	写保护，WP=1表示对只读页面进行写操作时会产生页故障
18	AM	Alignment mask	对齐标志，AM=1表示允许对齐检查，AM=0表示不允许对其检查
29	NW	Not-write through	和CD一起控制CPU内部Cache，NW=0且CD=0表示Cache使能，其他组合参见Intel手册
30	CD	Cache disable	同上
31	PG	Paging	页式管理机制使能，PG=1表示页式管理机制工作，PG=0表示不工作

可以看出，CPU的工作模式是依靠CR0的PE位控制的，因此要想切换到保护模式，只需要把CR0的PE位置为1。

　　在CPU完成从实模式到保护模式的切换之后，boot.S使用一条ljmp指令跳转到模式切换后的第一条指令地址：

1   # Jump to next instruction, but in 32-bit code segment.
2   # Switches processor into 32-bit mode.
3   ljmp    $PROT_MODE_CSEG, $protcseg

还记得boot.S开头的两个段选择子吗？这里的PROT_MODE_CSEG就是其中的一个标记了代码段的段选择子，在解释为什么PROT_MODE_CSEG要定义为0x8之前，我们有必要先了解一下实模式和保护模式下寻址方式的不同。

　　在实模式下，要想寻址某个内存单元，需要知道所在段的基地址base和它在段中的偏移量offset，由公式：base<<4 + offset得到。而在保护模式下，寻址不再需要段的基地址，而是换成了段选择子。段选择子的结构如下：

其中RPL（第0、1位）表示特权请求级，TI（第2位）表示描述符表标识符（用于区分GDT和LDT），Index（第3-15位）表示描述符在描述符表中的索引（从0计起）。现在我们再来考虑为什么PROT_MODE_CSEG定义为0x8。我们把PROT_MODE_CSEG用二进制的段选择子的结构表示为：0000 0000 0000 1000，根据段选择子的结构可以得到：

RPL = 00（0），TI = 0，Index = 0000 0000 0000 1（1）

这说明该段选择子访问RPL为0的、GDT中的第1个段。那这个段究竟是什么？我们回顾一下之前在boot.S中预定义的GDT：

1 gdt:
2   SEG_NULL                # null seg
3   SEG(STA_X|STA_R, 0x0, 0xffffffff)    # code seg
4   SEG(STA_W, 0x0, 0xffffffff)            # data seg

其中第1个段就是代码段（注意是从0开始计数），另一个段选择子PROT_MODE_DSEG的分析同上。到这里上述ljmp指令的功能就很清楚了：跳转到由PROT_MODE_CSEG段选择子和protcseg偏移量指定的代码入口处。而跳转到的代码所完成的工作是：通过.code32伪指令编码的对各段寄存器的初始化，随后利用start标号指向的地址作为esp调用main.c中的bootmain()函数（注意bootmain()函数不需要参数，因此也就没有参数的压栈操作）。

1   # Set up the stack pointer and call into C.
2   movl    $start, %esp
3   call bootmain

　　到此boot.S的功能结束。

　　2.main.c

  1 #include <inc/x86.h>
  2 #include <inc/elf.h>
  3 
  4 /**********************************************************************
  5  * This a dirt simple boot loader, whose sole job is to boot
  6  * an ELF kernel image from the first IDE hard disk.
  7  *
  8  * DISK LAYOUT
  9  *  * This program(boot.S and main.c) is the bootloader.  It should
 10  *    be stored in the first sector of the disk.
 11  * 
 12  *  * The 2nd sector onward holds the kernel image.
 13  *    
 14  *  * The kernel image must be in ELF format.
 15  *
 16  * BOOT UP STEPS    
 17  *  * when the CPU boots it loads the BIOS into memory and executes it
 18  *
 19  *  * the BIOS intializes devices, sets of the interrupt routines, and
 20  *    reads the first sector of the boot device(e.g., hard-drive) 
 21  *    into memory and jumps to it.
 22  *
 23  *  * Assuming this boot loader is stored in the first sector of the
 24  *    hard-drive, this code takes over...
 25  *
 26  *  * control starts in boot.S -- which sets up protected mode,
 27  *    and a stack so C code then run, then calls bootmain()
 28  *
 29  *  * bootmain() in this file takes over, reads in the kernel and jumps to it.
 30  **********************************************************************/
 31 
 32 #define SECTSIZE    512
 33 #define ELFHDR        ((struct Elf *) 0x10000) // scratch space
 34 
 35 void readsect(void*, uint32_t);
 36 void readseg(uint32_t, uint32_t, uint32_t);
 37 
 38 void
 39 bootmain(void)
 40 {
 41     struct Proghdr *ph, *eph;
 42 
 43     // read 1st page off disk
 44     readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);
 45 
 46     // is this a valid ELF?
 47     if (ELFHDR->e_magic != ELF_MAGIC)
 48         goto bad;
 49 
 50     // load each program segment (ignores ph flags)
 51     ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
 52     eph = ph + ELFHDR->e_phnum;
 53     for (; ph < eph; ph++)
 54         // p_pa is the load address of this segment (as well
 55         // as the physical address)
 56         readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
 57 
 58     // call the entry point from the ELF header
 59     // note: does not return!
 60     ((void (*)(void)) (ELFHDR->e_entry))();
 61 
 62 bad:
 63     outw(0x8A00, 0x8A00);
 64     outw(0x8A00, 0x8E00);
 65     while (1)
 66         /* do nothing */;
 67 }
 68 
 69 // Read 'count' bytes at 'offset' from kernel into physical address 'pa'.
 70 // Might copy more than asked
 71 void
 72 readseg(uint32_t pa, uint32_t count, uint32_t offset)
 73 {
 74     uint32_t end_pa;
 75 
 76     end_pa = pa + count;
 77     
 78     // round down to sector boundary
 79     pa &= ~(SECTSIZE - 1);
 80 
 81     // translate from bytes to sectors, and kernel starts at sector 1
 82     offset = (offset / SECTSIZE) + 1;
 83 
 84     // If this is too slow, we could read lots of sectors at a time.
 85     // We'd write more to memory than asked, but it doesn't matter --
 86     // we load in increasing order.
 87     while (pa < end_pa) {
 88         // Since we haven't enabled paging yet and we're using
 89         // an identity segment mapping (see boot.S), we can
 90         // use physical addresses directly.  This won't be the
 91         // case once JOS enables the MMU.
 92         readsect((uint8_t*) pa, offset);
 93         pa += SECTSIZE;
 94         offset++;
 95     }
 96 }
 97 
 98 void
 99 waitdisk(void)
100 {
101     // wait for disk reaady
102     while ((inb(0x1F7) & 0xC0) != 0x40)
103         /* do nothing */;
104 }
105 
106 void
107 readsect(void *dst, uint32_t offset)
108 {
109     // wait for disk to be ready
110     waitdisk();
111 
112     outb(0x1F2, 1);        // count = 1
113     outb(0x1F3, offset);
114     outb(0x1F4, offset >> 8);
115     outb(0x1F5, offset >> 16);
116     outb(0x1F6, (offset >> 24) | 0xE0);
117     outb(0x1F7, 0x20);    // cmd 0x20 - read sectors
118 
119     // wait for disk to be ready
120     waitdisk();
121 
122     // read a sector
123     insl(0x1F0, dst, SECTSIZE/4);
124 }

main.c

　　在分析main.c的功能之前，先介绍3种将要使用到的结构体：Elf（Executable and Linkable Format）、Proghdr（Program Header）、Secthdr（Section Header），这三种结构体在“inc/elf.h”中定义如下（详见：https://en.wikipedia.org/wiki/Executable_and_Linkable_Format）：

 1 #define ELF_MAGIC 0x464C457FU    /* "\x7FELF" in little endian */
 2 
 3 struct Elf {
 4     uint32_t e_magic;      //此处必须与ELF_MAGIC相等，否则不是有效Elf文件
 5     uint8_t e_elf[12];     //
 6     uint16_t e_type;       //
 7     uint16_t e_machine;    //标明支持的指令集结构
 8     uint32_t e_version;    //
 9     uint32_t e_entry;      //kernel进程开始执行的入口地址
10     uint32_t e_phoff;      //Program Header表的偏移量
11     uint32_t e_shoff;      //Section Header表的偏移量
12     uint32_t e_flags;      //
13     uint16_t e_ehsize;     //Elf文件头的大小
14     uint16_t e_phentsize;  //
15     uint16_t e_phnum;      //Program Header表中的条目数量
16     uint16_t e_shentsize;  //
17     uint16_t e_shnum;      //Section Header表中的条目数量
18     uint16_t e_shstrndx;   //
19 };
20 
21 struct Proghdr {
22     uint32_t p_type;       //标明该段的类型
23     uint32_t p_offset;     //该段在文件镜像中的偏移量
24     uint32_t p_va;         //该段在内存中的虚拟地址
25     uint32_t p_pa;         //在使用相对物理地址的系统中，表示该段在内存中的物理地址
26     uint32_t p_filesz;     //该段在文件镜像中的大小（以bytes计，可以为0）
27     uint32_t p_memsz;      //该段在内存中的大小（以bytes计，可以为0）
28     uint32_t p_flags;      //
29     uint32_t p_align;      //
30 };
31 
32 struct Secthdr {
33     uint32_t sh_name;      //
34     uint32_t sh_type;      //
35     uint32_t sh_flags;     //
36     uint32_t sh_addr;      //该节在内存中的虚拟地址（对已经加载的扇区而言）
37     uint32_t sh_offset;    //该节在文件镜像中的偏移量
38     uint32_t sh_size;      //该节在文件镜像中的大小（以bytes计，可以为0）
39     uint32_t sh_link;      //
40     uint32_t sh_info;      //
41     uint32_t sh_addralign; //
42     uint32_t sh_entsize;   //
43 };

　　接下来我们从函数的角度来分析main.c都做了什么工作。在main.c的开头处我们看到两行函数头，分别是readsect(void *, uint32_t)和readseg(uint32_t, uint32_t, uint32_t)，紧接着是在boot.S中被调用的bootmain()函数体。我们依次对这三个函数进行观察：

　　首先是readsect(void *dst, uint32_t offset)。该函数中的函数通过__asm __volatile()间接使用了in和out汇编指令（详见“inc/x86.h”）对指定编号（在函数中，编号参数被定义为offset）的磁盘扇区进行读操作，并把读到的信息写到dst指向的内存中。

　　其次是readseg(uint32_t pa, uint32_t count, uint32_t offset)。从接口上看，该函数是向pa所指向的地址中读入count个bytes的、在磁盘上偏移量为offset的数据。而在函数的实现中，该函数对数据的读取是借助readsect(void *dst, uint32_t offset)，以扇区为单位进行的，即通过offset计算所在的磁盘扇区，然后再将整个扇区读入，最终达到读入所有数据的目的。

　　最后我们对bootmain()函数进行观察：

　　在bootmain()函数的开头，先通过

1 readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);

从磁盘偏移量为0的地方向ELFHDR这个指针指向的地址（在main.c中该地址由宏定义为0x10000）读入1个页（512 * 8 = 4096 bytes，即8个扇区）大小的数据。随后对读入的kernel是否为Elf格式进行了校验。若校验不通过，向特定端口输出特定字信息后进入死循环（bad标号指向的代码，此处也通过__asm __volatile()间接使用了out汇编指令）。若通过文件格式校验，则通过ph和eph两个Proghdr *类型的指针加载该Elf文件中的所有程序段（其中ph指向Program Header表的表头，eph指向Program Header表的表尾，每一次指针的自加操作都是指向当前表中的下一条目，表中条目的数量都保存在ELFHDR->e_phnum中），更具体地说，对于表中的每一项ph（即每个程序段），通过

1 readseg(ph->p_pa, ph->p_memsz, ph->p_offset);

将ph所对应的偏移量为ph->offset的、大小为ph->memsz的程序段从磁盘中都入到ph->p_pa指向的物理内存中。个人认为，上述部分代码实质上是将整个kernel程序从磁盘部署到内存中等待执行。而最后一行

1 ((void (*)(void)) (ELFHDR->e_entry))();

显然是通过函数指针进行函数调用，这里的ELFHDR->e_entry指向的地址就是上面一开始所介绍的kernel进程的第一条指令的地址，也就是说，这条C语句执行过后，Boot Loader就将执行权交给了kernel，因为理论上kernel并不返回，所以这是Boot Loader所执行的最后一条语句（如果kernel返回说明操作系统出现严重错误，跳转到bad标号指向的代码执行）。

　　到此main.c的功能结束。到此整个Boot Loader的功能结束。

转载于:https://www.cnblogs.com/LostChristmas/p/5987381.html