kvm: hardware assisted paging

   CPU vendors began adding hardware virtual memory management unit (vMMU) support circa 2009, with Intel's VT-x (vmx flag) addition. Historically, the guest's physical (gpa) to host physical  (hpa) addresses where translated through software, using shadow page tables. These tables are kept synchronized with the guest's page tables, and are one of the main sources of overhead in virtual machines, as they incur in expensive vm exits. A common way of keeping the shadow pages up to date are to write-protect the guest's pages, so that when they are changed, page faults are triggered and intercepted by the VMM, which emulates it (injecting the page) and updating the shadow ones, accordingly. This, of course, is transparent to the guest. Another major problem, is that TLB semantics require flushes upon context switching, as newly assigned processes need to have it empty to cache entries only belonging to the process's address space.To overcome this, CPUs now incorporate包含 tags into the TLB - also known as vpid, which allow mapping that associate addresses to processes and thus reducing the amount of flushes.

     With hardware vMMUs, in order to avoid the VMM overhead with shadow paging,the guest is left alone to update its page tables, while the hardware maintains its own page tables which maps gpa to hpa. Intel calls these Extended Page Tables (EPT). Having two page tables now requires that when a guest translates and address, two levels must be walked (sometimesreferred to as 2D page walks). http://blog.chinaunix.net/uid-1858380-id-3205061.html

So hardware support can come at a greater cost for programs with bad locality and cache unfriendly, than its software equivalent. When a TLB miss occurs, and the guest does a page walk,for each hierarchical分层的 level, the entire EPT must be walked as well, to obtain the hpa. For 64bit guests, this is worse than 32bit ones,  as the 64bit address space requires more levels (PML4, PDP, PD, PTE) of translation.


    KVM's implementation of EPT is quite unique and uses both the guest's tables and the hardware's to translate addresses. When a guest needs to translate virtual addresses to physical ones, the gva_to_gpa()function is called:
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
{
        struct guest_walker walker;
        gpa_t gpa = UNMAPPED_GVA;
        int r;
        
        r = FNAME(walk_addr)(&walker, vcpu, vaddr, 0, 0, 0);

        if (r) {
                gpa = gfn_to_gpa(walker.gfn);
                gpa |= vaddr & ~PAGE_MASK;
        }
                
        return gpa;
}      

If the guest's walk fails and the gva-gpa mapping is not present, a page fault is raised, andtdp_page_fault() - two diminutional paging - is invoked through an EPT violation -handle_ept_violation() to translate gpa to hpa. A new page table entry is created and the shadow page code is reused throughmmu_set_spte()and added to the beginning of the page list throughpte_list_add(). This way, the next time the guest virtual address is accessed, it will already be in the guest's pages and walk_addr() will be done successfully, and the gpa can be returned without further a due.





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值