Intel EPT硬件辅助性能评价

最新推荐文章于 2024-08-23 16:54:15 发布

gudujianjsk

最新推荐文章于 2024-08-23 16:54:15 发布

阅读量1.8k

点赞数

分类专栏：内核与驱动文章标签： performance translation system nested numbers table

内核与驱动专栏收录该内容

35 篇文章 2 订阅

订阅专栏

Introduction

For the majority of common workloads(工作负载), performance in a virtualized environment is close to that in a native environment. Virtualization does create some overheads, however. These come from the virtualization of the CPU, the MMU (Memory Management Unit), and the I/O devices. In some of their recent x86 processors AMD and Intel have begun to provide hardware extensions to help bridge this performance gap. In 2006, both vendors introduced their first-generation hardware support for x86 virtualization with AMD-VirtualizationTM (AMD-VTM) and Intel® VT-x technologies. Recently Intel introduced its second generation of hardware support that incorporates MMU virtualization, called Extended Page Tables (EPT).
We evaluated EPT performance by comparing it to the performance of our software-only shadow page table technique on an EPT-enabled Intel system. From our studies we conclude that EPT-enabled systems can improve performance compared to using shadow paging for MMU virtualization. EPT provides performance
gains of up to 48% for MMU-intensive benchmarks and up to 600% for MMU-intensive microbenchmarks. We have also observed that although EPT increases memory access latencies for a few workloads, this cost can be reduced by effectively using large pages in the guest and the hypervisor.

        NOTE Many of the workloads presented in this paper are similar to those used in our recent paper about AMD RVI performance (Performance Evaluation of AMD RVI Hardware Assist). Because the papers used different ESX versions, however, the results are not directly comparable.

Background
         Prior to the introduction of hardware support for virtualization, the VMware® virtual machine monitor (VMM) used software techniques for virtualizing x86 processors. We used binary translation (BT) for instruction set virtualization, shadow paging for MMU virtualization, and device emulation for device virtualization.
         With the advent of Intel-VT in 2006 the VMM used VT for instruction-set virtualization on Intel processors that supported this feature. Due to the lack of hardware support for MMU virtualization in older CPUs, the VMM still used shadow paging for MMU virtualization. The shadow page tables stored information about the
physical location of guest memory. Under shadow paging, in order to provide transparent MMU virtualization the VMM intercepted截取 guest page table updates to keep the shadow page tables coherent一致的 with the guest page tables. This caused some overhead in the virtual execution of those applications for which the guest had to frequently update its page table structures.
        With the introduction of EPT, the VMM can now rely on hardware to eliminate the need for shadow page tables. This removes much of the overhead otherwise incurred to keep the shadow page tables up-to-date. We describe these various paging methods in more detail in the next section and describe our experimental
methodologies, benchmarks, and results in subsequent sections. Finally, we conclude by providing a summary of our performance experience with EPT.

MMU Architecture and Performance
In a native system the operating system maintains a mapping of logical page numbers (LPNs) to physical page numbers (PPNs) in page table structures (see Figure 1). When a logical address is accessed, the hardware walks these page tables to determine the corresponding physical address. For faster memory access the x86 hardware caches the most recently used LPN->PPN mappings in its translation lookaside buffer (TLB).

In a virtualized system the guest operating system maintains page tables just as the operating system in a native system does, but in addition the VMM maintains a mapping of PPNs to machine page numbers (MPNs), as described in the following two sections, “Software MMU” and “Hardware MMU.”

Hardware MMU
Using EPT, the guest operating system continues to maintain LPN->PPN mappings in the guest page tables, but the VMM maintains PPN->MPN mappings in an additional level of page tables, called nested page tables (see Figure 3). In this case both the guest page tables and the nested page tables are exposed to the hardware.
When a logical address is accessed, the hardware walks the guest page tables as in the case of native execution,but for every PPN accessed during the guest page table walk, the hardware also walks the nested page tables to determine the corresponding MPN. This composite（复合的） translation eliminates the need to maintain shadow page tables and synchronize them with the guest page tables. However the extra operation also increases the cost of a page walk, thereby impacting the performance of applications that stress the TLB. This cost can be reduced by using large pages, thus reducing the stress on the TLB for applications with good spatial locality. For optimal performance the ESX VMM and VMkernel aggressively try to use large pages for their own memory when EPT is used.