Chapter 9-01

最新推荐文章于 2024-10-06 17:07:32 发布

gaoxiangnumber1

最新推荐文章于 2024-10-06 17:07:32 发布

阅读量666

点赞数

分类专栏：深入理解计算机系统文章标签： c语言 it each 网络 exchange

本文链接：https://blog.csdn.net/gaoxiangnumber1/article/details/50650127

版权

深入理解计算机系统专栏收录该内容

30 篇文章 0 订阅

订阅专栏

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1.

l Virtual memory is an elegant interaction of hardware exceptions, hardware address translation, main memory, disk files, and kernel software that provides each process with a large, uniform and private address space.

l Virtual memory provides three important capabilities.

1. It uses main memory efficiently by treating it as a cache for an address space stored on disk, keeping only the active areas in main memory, and transferring data back and forth between disk and memory as needed.

2. It simplifies memory management by providing each process with a uniform address space.

3. It protects the address space of each process from corruption by other processes.

9.1 Physical and Virtual Addressing

l The main memory of a computer system is organized as an array of M contiguous byte-sized cells. Each byte has a unique physical address (PA). The first byte has an address of 0, the next byte an address of 1 and so on.

l The most natural way for a CPU to access memory would be to use physical addresses. We call this approach physical addressing.

l Figure 9.1 shows an example of physical addressing in the context of a load instruction that reads the word starting at physical address 4. When the CPU executes the load instruction, it generates an effective physical address and passes it to main memory over the memory bus. The main memory fetches the 4-byte word starting at physical address 4 and returns it to the CPU, which stores it in a register.

l Modern processors use a form of addressing known as virtual addressing, as shown in Figure 9.2.

l With virtual addressing, the CPU accesses main memory by generating a virtual address (VA), which is converted to the appropriate physical address before being sent to the memory. The task of converting a virtual address to a physical one is known as address translation. Dedicated hardware on the CPU chip called the memory management unit (MMU) translates virtual addresses on the fly, using a look-up table stored in main memory whose contents are managed by the operating system.

9.2 Address Spaces

l An address space is an ordered set of nonnegative integer addresses {0, 1, 2, . . .}. If the integers in the address space are consecutive, then we say that it is a linear address space. We will always assume linear address spaces.

l In a system with virtual memory, the CPU generates virtual addresses from an address space of N = 2ⁿ addresses called the virtual address space: {0, 1, 2, . . . , N − 1}.

l The size of an address space is characterized by the number of bits that are needed to represent the largest address. For example, a virtual address space with N = 2ⁿ addresses is called an n-bit address space. Modern systems typically support either 32-bit or 64-bit virtual address spaces.

l A system has a physical address space that corresponds to the M bytes of physical memory in the system: {0, 1, 2, . . . , M − 1}. M is not required to be a power of two, but we will assume that M = 2^m .

l The concept of an address space makes a clean distinction between data objects (bytes) and their attributes (addresses). So we can generalize and allow each data object to have multiple independent addresses, each chosen from a different address space. That is, each byte of main memory has a virtual address chosen from the virtual address space, and a physical address chosen from the physical address space.

9.3 VM as a Tool for Caching

l A virtual memory is organized as an array of N contiguous byte-sized cells stored on disk. Each byte has a unique virtual address that serves as an index into the array. The contents of the array on disk are cached in main memory. As with any other cache in the memory hierarchy, the data on disk (the lower level) is partitioned into blocks that serve as the transfer units between the disk and the main memory (the upper level).

l VM systems handle this by partitioning the virtual memory into fixed-sized blocks called virtual pages (VPs). Each virtual page is P = 2^P bytes in size. Similarly, physical memory is partitioned into physical pages (PPs), also P bytes in size. Physical pages are also referred to as page frames.

l At any point in time, the set of virtual pages is partitioned into three disjoint subsets:

1. Unallocated: Pages that have not yet been allocated (or created) by the VM system. Unallocated blocks do not have any data associated with them, and thus do not occupy any space on disk.

2. Cached: Allocated pages that are currently cached in physical memory.

3. Uncached: Allocated pages that are not cached in physical memory.

l The example in Figure 9.3 shows a small virtual memory with eight virtual pages. Virtual pages 0 and 3 have not been allocated yet, and thus do not exist on disk. Virtual pages 1, 4, and 6 are cached in physical memory. Pages 2, 5, and 7 are allocated, but are not currently cached in main memory.

9.3.1 DRAM Cache Organization

l To keep the different caches in the memory hierarchy, we will use the term SRAM (Static Random Access Memory) cache to denote the L1, L2, and L3 cache memories between the CPU and main memory, and the term DRAM (Dynamic Random Access Memory) cache to denote the VM system’s cache that caches virtual pages in main memory.

l Recall that a DRAM is at least 10 times slower than an SRAM and that disk is about 100,000 times slower than a DRAM. Thus, misses in DRAM caches are very expensive compared to misses in SRAM caches because DRAM cache misses are served from disk, while SRAM cache misses are usually served from DRAM-based main memory. Further, the cost of reading the first byte from a disk sector is about 100,000 times slower than reading successive bytes in the sector. The bottom line is that the organization of the DRAM cache is driven entirely by the enormous cost of misses.

l Because of the large miss penalty and the expense of accessing the first byte, virtual pages tend to be large, typically 4 KB to 2 MB. Due to the large miss penalty, DRAM caches are fully associative, that is, any virtual page can be placed in any physical page.

l The replacement policy on misses also assumes greater importance, because the penalty associated with replacing the wrong virtual page is so high. Thus, operating systems use more sophisticated replacement algorithms for DRAM caches than the hardware does for SRAM caches. (These replacement algorithms are beyond our scope here.)

l Finally, because of the large access time of disk, DRAM caches always use write-back instead of write-through.

9.3.2 Page Tables

l As with any cache, the VM system must determine which physical page that a virtual page is cached in. If there is a miss, the system must determine where the virtual page is stored on disk, select a victim page in physical memory, and copy the virtual page from disk to DRAM, replacing the victim page.

l These capabilities are provided by a combination of operating system software, address translation hardware in the MMU (memory management unit), and a data structure stored in physical memory known as a page table that maps virtual pages to physical pages.

l The address translation hardware reads the page table each time it converts a virtual address to a physical address. The operating system is responsible for maintaining the contents of the page table and transferring pages back and forth between disk and DRAM.

l A page table is an array of page table entries (PTEs). Each page in the virtual address space has a PTE at a fixed offset in the page table.

l Assume that each PTE consists of a valid bit and an n-bit address field. The valid bit indicates whether the virtual page is currently cached in DRAM.

² If the valid bit is set, the address field indicates the start of the corresponding physical page in DRAM where the virtual page is cached.

² If the valid bit is not set, then a null address indicates that the virtual page has not yet been allocated. Otherwise, the address points to the start of the virtual page on disk.

l Figure 9.4 shows a page table for a system with eight virtual pages and four physical pages. Four virtual pages (VP1, VP2, VP4, and VP7) are currently cached in DRAM. Two pages (VP0 and VP5) have not yet been allocated, and the rest (VP3 and VP6) have been allocated, but are not currently cached.

l Note that because the DRAM cache is fully associative, any physical page can contain any virtual page.

9.3.3 Page Hits

l Consider what happens when the CPU reads a word of virtual memory contained in VP2, which is cached in DRAM (Figure 9.5).

l The address translation hardware uses the virtual address as an index to locate PTE 2 and read it from memory. Since the valid bit is set, the address translation hardware knows that VP2 is cached in memory. So it uses the physical memory address in the PTE (which points to the start of the cached page in PP1) to construct the physical address of the word.

9.3.4 Page Faults

l In virtual memory terminology, a DRAM cache miss is known as a page fault.

l Figure 9.6 shows the state of page table before the fault. The CPU has referenced a word in VP3, which is not cached in DRAM. The address translation hardware reads PTE 3 from memory, infers from the valid bit that VP3 is not cached, and triggers a page fault exception.

l The page fault exception invokes a page fault exception handler in the kernel, which selects a victim page, in this case VP4 stored in PP3. If VP4 has been modified, then the kernel copies it back to disk. In either case, the kernel modifies the page table entry for VP4 to reflect the fact that VP4 is no longer cached in main memory.

l Next, the kernel copies VP3 from disk to PP3 in memory, updates PTE 3, and then returns.

l When the handler returns, it restarts the faulting instruction, which resends the faulting virtual address to the address translation hardware. Now VP3 is cached in main memory, and the page hit is handled normally by the address translation hardware.

l In virtual memory terminology, blocks are known as pages. The activity of transferring a page between disk and memory is known as swapping or paging. Pages are swapped/paged in from disk to DRAM, and swapped/paged out from DRAM to disk.

l The strategy of waiting until the last moment to swap in a page, when a miss occurs, is known as demand paging. Other approaches, such as trying to predict misses and swap pages in before they are actually referenced, are possible. But all modern systems use demand paging.

9.3.5 Allocating Pages

l Figure 9.8 shows the effect on our example page table when the operating system allocates a new page of virtual memory, for example, as a result of calling malloc.

l In the example, VP5 is allocated by creating room on disk and updating PTE 5 to point to the newly created page on disk.

9.3.6 Locality to the Rescue Again

l Given the large miss penalties, paging will destroy program performance. But virtual memory works well because of locality.

l Although the total number of distinct pages that programs reference during an entire run might exceed the total size of physical memory, the principle of locality promises that at any point in time they will tend to work on a smaller set of active pages known as the working set or resident set. After an initial overhead where the working set is paged into memory, subsequent references to the working set result in hits, with no additional disk traffic.

l But not all programs exhibit good temporal locality. If the working set size exceeds the size of physical memory, then the program can produce a situation known as thrashing(往复移动), where pages are swapped in and out continuously.

9.4 VM as a Tool for Memory Management

l We have assumed a single page table that maps a single virtual address space to the physical address space. In fact, operating systems provide a separate page table, and thus a separate virtual address space, for each process.

l In Figure 9.9, the page table for process i maps VP1 to PP2 and VP2 to PP7; the page table for process j maps VP1 to PP7 and VP2 to PP10. Multiple virtual pages can be mapped to the same shared physical page.

l VM simplifies linking and loading, the sharing of code and data, and allocating memory to applications.

Simplifying linking

² A separate address space allows each process to use the same basic format for its memory image, regardless of where the code and data actually reside in physical memory.

² Every process on a given Linux system has a similar memory format. The .text section always starts at virtual address 0x08048000 (for 32-bit address spaces), or at address 0x400000 (for 64-bit address spaces). The .data and .bss sections follow immediately after the text section. The stack occupies the highest portion of the process address space and grows downward.

² Such uniformity greatly simplifies the design and implementation of linkers, allowing them to produce fully linked executables that are independent of the ultimate location of the code and data in physical memory.

Simplifying loading

² Recall from Chapter 7 that the .text and .data sections in ELF executables are contiguous. To load these sections into a newly created process, the Linux loader allocates a contiguous chunk of virtual pages starting at address 0x08048000 (32-bit address spaces) or 0x400000 (64-bit address spaces), marks them as invalid (i.e., not cached), and points their page table entries to the appropriate locations in the object file.

² The loader never copies any data from disk into memory. The data is paged in automatically and on demand by the virtual memory system the first time each page is referenced, either by the CPU when it fetches an instruction, or by an executing instruction when it references a memory location.

² This notion of mapping a set of contiguous virtual pages to an arbitrary location in an arbitrary file is known as memory mapping. Unix provides a system call called mmap that allows application programs to do their own memory mapping.

Simplifying sharing

² In general, each process has its own private code, data, heap, and stack areas that are not shared with any other process. In this case, the operating system creates page tables that map the corresponding virtual pages to disjoint physical pages.

² In some instances it is desirable for processes to share code and data. For example, every process must call the same operating system kernel code, and every C program makes calls to routines in the standard C library such as printf. Rather than including separate copies of the kernel and standard C library in each process, the operating system can arrange for multiple processes to share a single copy of this code by mapping the appropriate virtual pages in different processes to the same physical pages.

Simplifying memory allocation.

² When a program running in a user process requests additional heap space (e.g., as a result of calling malloc), the operating system allocates an appropriate number, say, k, of contiguous virtual memory pages, and maps them to k arbitrary physical pages located anywhere in physical memory.

² Because of the way page tables work, there is no need for the operating system to locate k contiguous pages of physical memory. The pages can be scattered randomly in physical memory.

9.5 VM as a Tool for Memory Protection

l A user process should not be allowed to:
modify its read-only .text section;
read or modify any of the code and data structures in the kernel;
read or write the private memory of other processes;
modify any virtual pages that are shared with other processes, unless all parties explicitly allow it (via calls to explicit interprocess communication system calls).

l In this example, we have added three permission bits to each PTE.

l The SUP bit indicates whether processes must be running in kernel (supervisor) mode to access the page. Processes running in kernel mode can access any page, but processes running in user mode are only allowed to access pages for which SUP is 0.

l The READ and WRITE bits control read and write access to the page.

l For example, if process i is running in user mode, then it has permission to read VP0 and to read or write VP1. But it is not allowed to access VP 2.

l If an instruction violates these permissions, then the CPU triggers a general protection fault that transfers control to an exception handler in the kernel. Unix shells typically report this exception as a “segmentation fault.”

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1.