MIT 6.828 Lab2

Lab 2

@[6.828|OS kernel|MIT2014]
课程地址:http://pdos.csail.mit.edu/6.828/2014/schedule.html

Lec 5 note

Isolation mechanisms

  • OS design driven by isolation(隔离), multiplexing(复用), and sharing(共享、分享)

  • What is isolation
    the process is the unit of isolation
    prevent process X from wrecking(破坏) or spying on process Y
    memory, cpu, FDs, resource exhaustion
    prevent a process from wrecking the operating system itself
    i.e. from preventing kernel from enforcing isolation
    in the face of bugs or malice
    e.g. a bad process may try to trick the h/w(hardware) or kernel

  • what are all the mechanisms that keep processes isolated?
    user/kernel mode flag
    address spaces
    timeslicing
    system call interface

  • the foundation of xv6’s isolation: user/kernel mode flag
    controls whether instructions can access privileged h/w
    called CPL on the x86, bottom two bits of %cs
    CPL=0 – kernel mode – privileged
    CPL=3 – user mode – no privilege
    x86 CPL protects everything relevant to isolation
    writes to %cs (to defend CPL)
    every memory read/write
    I/O port accesses
    control register accesses (eflags, %cs4, …)
    every serious microprocessor has something similar

  • user/kernel mode flag is not enough
    protects only against direct attacks on the hardware
    kernel must configure control regs, page tables, &c to protect other stuff
    e.g. kernel memory

  • how to do a system call – switching CPL
    Q: would this be an OK design for user programs to make a system call:
    set CPL=0
    jmp sys_open
    bad: user-specified instructions with CPL=0
    Q: how about a combined instruction that sets CPL=0,
    but requires an immediate jump to someplace in the kernel?
    bad: user might jump somewhere awkward in the kernel
    the x86 answer:
    there are only a few permissible kernel entry points
    INT instruction sets CPL=0 and jumps to an entry point
    but user code can’t otherwise modify CPL or jump anywhere else in kernel
    system call return sets CPL=3 before returning to user code
    also a combined instruction (can’t separately set CPL and jmp)
    but kernel is allowed to jump anywhere in user code

  • the result: well-defined notion of user vs kernel
    either CPL=3 and executing user code
    or CPL=0 and executing from entry point in kernel code
    not:
    CPL=0 and executing user
    CPL=0 and executing anywhere in kernel the user pleases

  • how to isolate process memory?
    idea: “address space”
    give each process some memory it can access
    for its code, variables, heap, stack
    prevent it from accessing other memory (kernel or other processes)

  • how to create isolated address spaces?
    xv6 uses x86 “paging hardware”
    MMU translates (or “maps”) every address issued by program
    VA -> PA
    instruction fetch, data load/store
    for kernel and user
    there’s no way for any instruction to directly use a PA
    MMU array w/ entry for each 4k range of “virtual” address space
    refers to phy address for that “page”
    this is the page table
    o/s tells h/w to switch page table when switching process
    why isolated?
    each page table entry (PTE) has a bit saying if user-mode instructions can use
    kernel only sets the bit for the memory in current process’s address space
    paging h/w used in many ways, not just isolation
    e.g. copy-on-write fork(), see Lab 4
    note: you don’t need paging to isolate memory
    type safety, JVM, Singularity
    but paging is the most popular plan

  • how to isolate CPU?
    prevent a process from hogging the CPU, e.g. buggy infinite loop
    how to force uncooperative process to yield
    h/w provides a periodic “clock interrupt”(时钟中断)
    forcefully suspends current process
    jumps into kernel
    which can switch to a different process
    kernel must save/restore process state (registers)
    totally transparent, even to cooperative processes
    called “pre-emptive context switch”
    note: traditional, but maybe not perfect; see exokernel paper

  • back to system calls
    i’ve talked a lot about how o/s isolates processes
    but need user/kernel to cooperate! user needs kernel services.
    what should user/kernel interaction look like?
    can’t let user r/w kernel mem (well, you can, later…)
    kernel can r/w user mem
    but don’t want to do this too much!
    so style of system call interface is pretty simple
    integers, strings (copying only), user-allocated buffers
    no objects, data structures, &c
    never any doubt about who owns memory

somethine matters

  • .bss的解释为:There is another section called the .bss. This section is like the data section, except that it doesn’t take up space in the executable.text和data段都在可执行文件中(在嵌入式系统里一般是固化在镜像文件中),由系统从可执行文件中加载;而bss段不在可执行文件中,由系统初始化。

LAB part

Part 1: Physical Page Management

Exercise 1

In the file kern/pmap.c, you must implement code for the following functions (probably in the order given).

boot_alloc()
mem_init() (only up to the call to check_page_free_list(1))
page_init()
page_alloc()
page_free()

check_page_free_list() and check_page_alloc() test your physical page allocator. You should boot JOS and see whether check_page_alloc() reports success. Fix your code so that it passes. You may find it helpful to add your own assert()s to verify that your assumptions are correct.

page granularity:页粒度


写程序前 仔细看memlayout.h mmu.h以及pmap.c


根据程序中的注释写出代码:

  • 1

static void *
boot_alloc(uint32_t n)
{
    static char *nextfree;  // virtual address of next byte of free memory
    char *result;

    // Initialize nextfree if this is the first time.
    // 'end' is a magic symbol automatically generated by the linker,
    // which points to the end of the kernel's bss segment:
    // the first virtual address that the linker did *not* assign
    // to any kernel code or global variables.
    if (!nextfree) {
        extern char end[];
        nextfree = ROUNDUP((char *) end, PGSIZE);
    }

    // Allocate a chunk large enough to hold 'n' bytes, then update
    // nextfree.  Make sure nextfree is kept aligned
    // to a multiple of PGSIZE.
    //
    // LAB 2: Your code here.

    //cprintf("boot_alloc\r\n");
    result = nextfree;  
    nextfree += ROUNDUP(n,PGSIZE);  
    return result;  

    //return NULL;
}
  • 2
void
mem_init(void)
{
    //增添部分
    pages = (struct PageInfo*) boot_alloc (npages * sizeof(struct PageInfo));
    memset(pages, 0 , sizeof(struct PageInfo)*npages);
}
  • 3
void
page_init(void)
{
    // The example code here marks all physical pages as free.
    // However this is not truly the case.  What memory is free?
    //  1) Mark physical page 0 as in use.
    //     This way we preserve the real-mode IDT and BIOS structures
    //     in case we ever need them.  (Currently we don't, but...)
    //  2) The rest of base memory, [PGSIZE, npages_basemem * PGSIZE)
    //     is free.
    //  3) Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must
    //     never be allocated.
    //  4) Then extended memory [EXTPHYSMEM, ...).
    //     Some of it is in use, some is free. Where is the kernel
    //     in physical memory?  Which pages are already in use for
    //     page tables and other data structures?
    //
    // Change the code to reflect this.
    // NB: DO NOT actually touch the physical memory corresponding to
    // free pages!
    size_t i;
    uint32_t nextfree = (uint32_t)boot_alloc(0);
    //cprintf("1 %d \r\n", nextfree);
    nextfree -= KERNBASE;
    //pages[1].pp_link = 0;
    //int lower_p = IOPHYSMEM;
    //int upper_p = ROUNDUP (nextfree,PGSIZE);
    page_free_list = &pages[1];
    int lower_p = PGNUM (IOPHYSMEM);
    int upper_p = PGNUM (ROUNDUP (nextfree,PGSIZE)) ;
    //cprintf("page_init\r\n");
    //cprintf("2 %d \r\n", upper_p);

    for (i = 0; i < npages; i++) {

        if(i == 0  || i == 1) 
        {
            pages[i].pp_ref = 0;
            continue;
        }

        if(i >= 2 && i < npages_basemem ) 
        {
            pages[i].pp_ref = 0;
            pages[i].pp_link = page_free_list;
            page_free_list = &pages[i];
            continue;
        }

        if(lower_p <= i && i < upper_p)
        {
            pages[i].pp_ref = 1;
            continue;
        }
        else
        {   
            pages[i].pp_ref = 0;        
            pages[i].pp_link = page_free_list;
            page_free_list = &pages[i];
        }
    }
}
  • 4
struct PageInfo *
page_alloc(int alloc_flags)
{
    // Fill this function in

    //cprintf("page_alloc\r\n");
    if (page_free_list == NULL)
        return NULL;

    struct PageInfo *res = page_free_list;
    page_free_list = page_free_list->pp_link;

    if(alloc_flags & ALLOC_ZERO)
        memset(page2kva(res),'\0',PGSIZE);    
        //page2kva:for the conversion of phy addr to vir addr
        //memset need vir addr
    return res;

}
  • 5
void
page_free(struct PageInfo *pp)
{
    // Fill this function in
    // Hint: You may want to panic if pp->pp_ref is nonzero or
    // pp->pp_link is not NULL.
    //cprintf("page_free\r\n");
    assert(pp->pp_ref == 0);
    pp->pp_link = page_free_list;
    page_free_list = pp;

}

Part1 实验结果截图:
title


Part 2: Virtual Memory

Exercise 2

Look at chapters 5 and 6 of the Intel 80386 Reference Manual, if you haven’t done so already. Read the sections about page translation and page-based protection closely (5.2 and 6.4). We recommend that you also skim the sections about segmentation; while JOS uses paging for virtual memory and protection, segment translation and segment-based protection cannot be disabled on the x86, so you will need a basic understanding of it.

Virtual, Linear, and Physical Addresses

Exercise 3

While GDB can only access QEMU’s memory by virtual address, it’s often useful to be able to inspect physical memory while setting up virtual memory. Review the QEMU monitor commands from the lab tools guide, especially the xp command, which lets you inspect physical memory. To access the QEMU monitor, press Ctrl-a c in the terminal (the same binding returns to the serial console).

Use the xp command in the QEMU monitor and the x command in GDB to inspect memory at corresponding physical and virtual addresses and make sure you see the same data.

Our patched version of QEMU provides an info pg command that may also prove useful: it shows a compact but detailed representation of the current page tables, including all mapped memory ranges, permissions, and flags. Stock QEMU also provides an info mem command that shows an overview of which ranges of virtual memory are mapped and with what permissions.

  • 所有指针引用均采用虚拟地址
  • pp_ref(struct PageInfo) In general, this count(pp_ref) should equal to the number of times the physical page appears below UTOP(0xeec00000) in all page tables (the mappings above UTOP are mostly set up at boot time by the kernel and should never be freed, so there’s no need to reference count them).

Page Table Management

Exercise 4

In the file kern/pmap.c, you must implement code for the following functions.

        pgdir_walk()
        boot_map_region()
        page_lookup()
        page_remove()
        page_insert()

check_page(), called from mem_init(), tests your page table management routines. You should make sure it reports success before proceeding.

pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
    // Fill this function in
    pte_t *pte = NULL;
    pte_t *entry;

    if(pgdir[PDX(va)] == (pde_t)NULL)
    {
        if(create == 0)
            return NULL;
        else
        {
            struct PageInfo *page = page_alloc(1); //physical addr
            if(page == NULL)
                return NULL;

            page->pp_ref++;
            pgdir[PDX(va)] = page2pa(page)|PTE_P|PTE_W|PTE_U;  
            // 20bit(page)->32bit  &&  why store phy addr to pg dir?


            //pte = PTE_ADDR(pgdir[PDX(va)]);  // 20bit->32bit
            //pgdir[PTX(va)] = page2pa(page)|PTE_P|PTE_W|PTE_U;
            //pte = page2kva(page);
        }
    }
    /*
        //cprintf("%u ",PGNUM(PTE_ADDR(pgdir[PDX(va)])));
        pte = page2kva(pa2page(PTE_ADDR(pgdir[PDX(va)])));

    */

    entry = (pte_t *)PTE_ADDR(pgdir[PDX(va)]);  
    //cprintf("entry: %x ",entry);
    //why PTE_ADDR
    pte = &entry[PTX(va)];
    return (pte_t *)KADDR((pte_t)pte);
    //return &pte[PTX(va)];
}

boot_map_region():

static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
    // Fill this function in
    int offset;
    pte_t *pte;

    for(offset=0;offset<size;offset+=PGSIZE)
    {
        pte = pgdir_walk(pgdir,(void*)(va),1);

        *pte = pa|perm|PTE_P;
        va += PGSIZE;
        pa += PGSIZE;
    }
}

page_lookup()

struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
    // Fill this function in
    pte_t *pte = pgdir_walk(pgdir,va,0);

    if(pte == NULL){
        return NULL;
    }

    if(pte_store != 0){
        *pte_store = pte;
    }

    if(pte != NULL && (*pte & PTE_P)){
        return pa2page(PTE_ADDR(*pte));
    }

    return NULL;
}

page_remove():

void
page_remove(pde_t *pgdir, void *va)
{
    // Fill this function in

    pte_t *pte = 0;
    struct PageInfo *phypage = page_lookup(pgdir,va,&pte);

    //cprintf("page_remove %x\n", phypage);
    if(phypage){
        page_decref(phypage);
    }
    if(pte){
        *pte = 0;
    }
    tlb_invalidate(pgdir,va);  
    //对此函数的目的并不是很了解,且并没有找到原函数
}

page_insert():

int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
    // Fill this function in
    //struct PageInfo *page;
    pte_t *pte;

    pte = pgdir_walk(pgdir,va,1);
    if(!pte){
        return -E_NO_MEM; 
    }
    else if(PTE_ADDR(*pte) != page2pa(pp)){
        page_remove(pgdir,va);
        //pte = pgdir_walk(pgdir,va,1);
    }
    else{
        //tlb_invalidate(pgdir,va);
        pp->pp_ref--;
    }

    *pte = page2pa(pp)|perm|PTE_P;
    pp->pp_ref++;

    //page = page_lookup()
    tlb_invalidate(pgdir, va);
    return 0;
}       

各函数的调用关系:title(http://leanote.com/file/outputImage?fileId=55e5c4eb38f41128cb000202)


Part 3: Kernel Address Space

Permissions and Fault Isolation

Initializing the Kernel Address Space

Exercise 5

Fill in the missing code in mem_init() after the call to check_page().

Your code should now pass the check_kern_pgdir() and check_page_installed_pgdir() checks.

所补充函数:

    //////////////////////////////////////////////////////////////////////
    // Map 'pages' read-only by the user at linear address UPAGES
    // Permissions:
    //    - the new image at UPAGES -- kernel R, user R
    //      (ie. perm = PTE_U | PTE_P)
    //    - pages itself -- kernel RW, user NONE
    // Your code goes here:
    boot_map_region(kern_pgdir,UPAGES,PTSIZE,PADDR((uintptr_t *) pages), PTE_U);

    //////////////////////////////////////////////////////////////////////
    // Use the physical memory that 'bootstack' refers to as the kernel
    // stack.  The kernel stack grows down from virtual address KSTACKTOP.
    // We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
    // to be the kernel stack, but break this into two pieces:
    //     * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
    //     * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
    //       the kernel overflows its stack, it will fault rather than
    //       overwrite memory.  Known as a "guard page".
    //     Permissions: kernel RW, user NONE
    // Your code goes here:
    boot_map_region(kern_pgdir,KSTACKTOP - KSTKSIZE,KSTKSIZE,PADDR((uintptr_t *) bootstack),PTE_W);
    //boot_map_region(kern_pgdir,KSTACKTOP - PTSIZE,PTSIZE - KSTKSIZE,0,0);

    //////////////////////////////////////////////////////////////////////
    // Map all of physical memory at KERNBASE.
    // Ie.  the VA range [KERNBASE, 2^32) should map to
    //      the PA range [0, 2^32 - KERNBASE)
    // We might not have 2^32 - KERNBASE bytes of physical memory, but
    // we just set up the mapping anyway.
    // Permissions: kernel RW, user NONE
    // Your code goes here:
    boot_map_region(kern_pgdir,KERNBASE,0xffffffff - KERNBASE,(physaddr_t) 0,PTE_W);

Question&Answer:

1 What entries (rows) in the page directory have been filled in at this point? What addresses do they map and where do they point? In other words, fill out this table as much as possible:

EntryBase Virtual AddressPoints to (logically):
1023?Page table for top 4MB of phys memory
1022??
.??
.??
.??
20x00800000?
10x00400000?
00x00000000[see next question]

2 We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel’s memory? What specific mechanisms protect the kernel memory?
3 What is the maximum amount of physical memory that this operating system can support? Why?
4 How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down?
5 Revisit the page table setup in kern/entry.S and kern/entrypgdir.c. Immediately after we turn on paging, EIP is still a low number (a little over 1MB). At what point do we transition to running at an EIP above KERNBASE? What makes it possible for us to continue executing at a low EIP between when we enable paging and when we begin running at an EIP above KERNBASE? Why is this transition necessary?


  • 1 在 points to 那一列中所指向的逻辑地址为从UPAGES(0xef000000)往上,每一个值加4B,Base Virtual Address 的值则是每一个加一个PTSIZE(即0x00400000)

  • 2 for safety , 用户显然不能触碰内核数据, 原理参见上面DPL等的系统保护机制的介绍,

  • 3 4GB,一个PDT(page directory table)包含1024个表项,可以指向1024的page table,但UPAGES以上PTSIZE只能存储约莫1/3M个pages(struct PageInfo),每个pages对应4KB,故总共可以支持约莫1.3GB的物理内存。

  • 4 一个页目录 + 1024个页表,虽然页目录也只有4KB 但是UVPT(0xef400000)以上PTSIZE(4MB)都只存了一个页目录,总共1025*4KB ,约莫4MB

  • 5 在entry.s中有这样一段汇编代码:

.globl entry
entry:
    movw    $0x1234,0x472          # warm boot

    # We haven't set up virtual memory yet, so we're running from
    # the physical address the boot loader loaded the kernel at: 1MB
    # (plus a few bytes).  However, the C code is linked to run at
    # KERNBASE+1MB.  Hence, we set up a trivial page directory that
    # translates virtual addresses [KERNBASE, KERNBASE+4MB) to
    # physical addresses [0, 4MB).  This 4MB region will be
    # sufficient until we set up our real page table in mem_init
    # in lab 2.

    # Load the physical address of entry_pgdir into cr3.  entry_pgdir
    # is defined in entrypgdir.c.
    movl    $(RELOC(entry_pgdir)), %eax
    movl    %eax, %cr3
    # Turn on paging.
    movl    %cr0, %eax
    orl $(CR0_PE|CR0_PG|CR0_WP), %eax
    movl    %eax, %cr0

    # Now paging is enabled, but we're still running at a low EIP
    # (why is this okay?).  Jump up above KERNBASE before entering
    # C code.
    mov $relocated, %eax
    jmp *%eax
relocated:
  • 第17行 页目录地址赋到cr3寄存器中,打开cr0中相关标志位,最后第27行讲eip跳到新的地址去执行。
  • 至于第24行那个问题:# (why is this okay?). 我认为可能是因为虚拟地址所映射的物理地址刚好是这块区域,所以才不冲突,然而我自己都觉得这个解答不靠谱。。然后查找了一些资料得知:在entrypgdir.c中预设了两个数组,entry_pgtable[]:预设了一个二级页表中的值,
    title
    如图每一个元素的值对应一个物理页的起始地址,以及标志位

    entry_pgdir[]:预设了页目录的值,如下:

    __attribute__((__aligned__(PGSIZE)))
    pde_t entry_pgdir[NPDENTRIES] = {
    // Map VA's [0, 4MB) to PA's [0, 4MB)
    [0]
        = ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P,
    // Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
    [KERNBASE>>PDXSHIFT]
        = ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P + PTE_W
    };
    

    即已经把entry_pgdir[]数组中0号元素设置成了与虚拟地址0:0x400000对应的物理地址0:0x400000(页表项 0 将虚拟地址 0:0x400000 映射到物理地址 0:0x400000),同样,页表项 960将虚拟地址的 KERNBASE:KERNBASE+0x400000 映射到物理地址 0:0x400000。这个页表项将在 entry.S 的代码结束后被使用;它将内核指令和内核数据应该出现的高虚拟地址处映射到了 boot loader 实际将它们载入的低物理地址处(0x100000)。这个映射就限制内核的指令+代码必须在 4mb 以内(实际是在3MB以内)。

  • 因为将系统区域放在虚拟地址的高地址是约定好了的。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值