kvm: hardware assisted paging

最新推荐文章于 2024-09-18 18:16:42 发布

gudujianjsk

最新推荐文章于 2024-09-18 18:16:42 发布

阅读量891

点赞数

分类专栏：虚拟化内核与驱动文章标签： 64bit struct cache list function table

内核与驱动同时被 2 个专栏收录

35 篇文章 2 订阅

订阅专栏

虚拟化

15 篇文章 0 订阅

订阅专栏

CPU vendors began adding hardware virtual memory management unit (vMMU) support circa 2009, with Intel's VT-x (vmx flag) addition. Historically, the guest's physical (gpa) to host physical (hpa) addresses where translated through software, using shadow page tables. These tables are kept synchronized with the guest's page tables, and are one of the main sources of overhead in virtual machines, as they incur in expensive vm exits. A common way of keeping the shadow pages up to date are to write-protect the guest's pages, so that when they are changed, page faults are triggered and intercepted by the VMM, which emulates it (injecting the page) and updating the shadow ones, accordingly. This, of course, is transparent to the guest. Another major problem, is that TLB semantics require flushes upon context switching, as newly assigned processes need to have it empty to cache entries only belonging to the process's address space.To overcome this, CPUs now incorporate包含 tags into the TLB - also known as vpid, which allow mapping that associate addresses to processes and thus reducing the amount of flushes.

With hardware vMMUs, in order to avoid the VMM overhead with shadow paging,the guest is left alone to update its page tables, while the hardware maintains its own page tables which maps gpa to hpa. Intel calls these Extended Page Tables (EPT). Having two page tables now requires that when a guest translates and address, two levels must be walked (sometimesreferred to as 2D page walks). http://blog.chinaunix.net/uid-1858380-id-3205061.html

So hardware support can come at a greater cost for programs with bad locality and cache unfriendly, than its software equivalent. When a TLB miss occurs, and the guest does a page walk,for each hierarchical分层的 level, the entire EPT must be walked as well, to obtain the hpa. For 64bit guests, this is worse than 32bit ones, as the 64bit address space requires more levels (PML4, PDP, PD, PTE) of translation.

KVM's implementation of EPT is quite unique and uses both the guest's tables and the hardware's to translate addresses. When a guest needs to translate virtual addresses to physical ones, the gva_to_gpa()function is called:

static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
{
        struct guest_walker walker;
        gpa_t gpa = UNMAPPED_GVA;
        int r;
        
        r = FNAME(walk_addr)(&walker, vcpu, vaddr, 0, 0, 0);

        if (r) {
                gpa = gfn_to_gpa(walker.gfn);
                gpa |= vaddr & ~PAGE_MASK;
        }
                
        return gpa;
}

If the guest's walk fails and the gva-gpa mapping is not present, a page fault is raised, andtdp_page_fault() - two diminutional paging - is invoked through an EPT violation -handle_ept_violation() to translate gpa to hpa. A new page table entry is created and the shadow page code is reused throughmmu_set_spte()and added to the beginning of the page list throughpte_list_add(). This way, the next time the guest virtual address is accessed, it will already be in the guest's pages and walk_addr() will be done successfully, and the gpa can be returned without further a due.