OSTEP Note: Ch13-23, Memory Virtualization

这是OSTEP 第二部分的阅读笔记,之前的笔记见

这一部分介绍了OS如何给每个process提供拥有private memory的vision,从base and bound 的方法到segmentation再讲到最终paging。

Memory Virtualization

The Abstraction: Address Space

Address Space: abstraction of physical memory, running program’s view of memory in the system.

  • transparency: invisible to running program -> process has private physical memory
  • efficiency: in terms of both time and space, with features like TLB
  • protection: process cannot touch memory outside its address space
Memory API
  1. Libray calls: malloc(), free()

  2. No memory leak if process exits.

  3. Tools: valgrind, purify

  4. System call:

    • brk(addr) - change the end of the heap to new addr
    • sbrk(increment)
    • mmap() - allocate anonymous memory which is not assoicated with file, swap space
  5. Other library calls

    • calloc(): also zeros it
    • realloc(): move a memory pointer to a larger region

Mechanism: Address Translation

Efficiency: hardware support (TLB, page table)

Control: protection

Flexibility: process can use the address space at will

Dynamic Relocation (Base and Bounds)

physical address = virtual address + base

  • Make sure virtual address within bound

memory management unit (MMU) helps with address translation

Hardware Support
  • privileged mode and user mode
    • Set base/ bound register
  • base and bounds registers - part of MMU (All translation is done by MMU)
    • translate (add base, check within bound)
    • raise exception when outside of bound
  • Instruction for modifying base and bound register - in kernel mode
OS Support
  • Track free physical memory (free list)
  • Base/bounds management
    • Set base and bound register correctly during context switch (Store it in PCB)
  • Exception handling
    • Terminate offending process
  • Internal fragmentation: Space inside the allocated unit is not used since the heap and stack is small.
    • Besides, address space cannot be larger than the physical memory.

Segmentation: Generalized Base/ Bounds

base + bound pair

top 2 bits: which segment the virtual address refers to在这里插入图片描述

Support for Sharing

Read only segment can be shared by several processes

OS Support
  • Set up base and bound register
  • Allocate more space when segments grow, update bound register
  • Free list?
  1. external fragmentation: segment is with different size, there will be holes between segments.

Detour: Free-Space Management

Important for managing memory with variable-sized units.

(heap/ segmentation)

Problem: how to minimize external fragmentation?

Low-level Mechanism
Splitting and Coalescing


  • Represented with list
  • Split: the node is split into two and one is returned
  • Coalescing: when some space is freed, check its left and right and whether they can be merged into a larger list
Tracking the size of Allocated Region
  • header block for tracking the size
    • Magic number is for sanity check
  • When user asks for size n, we have to provide n+sizeof(header)
Embedding a Free List


Growing the Heap

Calling sbrk() might be needed to grow the heap

Basic Strategies
Best Fit

Smallest chunk that is larger than the required size

  • +Reduce wasted space
  • -performance penalty for exhaustive search
Worst Fit

Always divide the largest chunk

  • +leave several big chunks
  • -high overhead
  • -research shows the performance is bad
First Fit

Return the first fitted one

  • +Speed (no need to search exhaustively)
  • Order is important!
    • address-based ordering
      • +simple coalescing
      • +reduced fragmentation
Next Fit

Record the last returned pointer, start search from the next chunk

  • +spread the searches for free space throughout hte list more uniformly
Other Approaches
Segregated Lists
  • Assumption: There is a request of popular size


  • Particular memory is dedicated for one particular size of requests.

  • Other chunk of memory for other size.

Buddy Allocation

Make coalesce simple!

When a chunk is freed, check whether its buddy is free, if so, do the coalescing.

Beauty: the buddy block address differs by 1 bit

  1. scaling: more complex data structure (balanced binary tree, splay tree, partially-ordered tree)
  2. multithread: allocator should be thread-safe


Divide memory into fixed-sized unit (page)


  1. flexibility: x consider how the process use the page
  2. simplicity: free list for handling free page
Page Table
  • We need 1 page table per process

  • a 32 bit machine, 20 bits for VPN, then it requires 2^20 * 4B = 4MB for page table

    • Very large! -> stored in memory (Actually, it is in OS virtual memory, can be swapped to disk)
What stored?


  • valid bit: whether the translation is valid. e.g., the space between stack and heap is invalid.
    • solving internal fragmentation
  • protection bit: readable/ writable/ executable
  • present bit: in physical memory or on disk
    • allow OS to hold memory larger than physical memory
  • dirty bit
  • referenced bit
    • for replacement
  • Slow!
    • Suppose we have page-table base register for accessing location of the page table
      • One extra memory reference
Faster Translations (TLBs, translation-lookaside buffer)

address-translation cache in MMU

Put PTE(page table entry) inside TLB for referenced VPN.

TLB Miss?
  • hardware handle
    • Find the PTE with page-table base register
  • software handle
    • cause exception! Bring the PTE to TLB
    • Be careful! Do not cause ping pong problem, ensure that the PTE is inside TLB
    • +simplicity + flexibility
    • Modifying TLB with privileged instruction!

Aside: CISC vs RISC

CISC: instruction high-level, code compact

RISC: compiler needs simplified instruction -> simple, uniform, and fast

What stored?

valid|VPN(tag) | PFN | other bits(similar to what’s in PTE)

  • valid bit indicate whether TLB entry is valid
TLB issue: Context Switch

TLB entry is not useful on context switch


  1. flush TLB (Set all valid bit to 0, either performed by hardware or OS)

  2. sharing of the TLB (no flush during context switch, faster)

    • use ASID(Address space identifier, usually 8 bits, fewer than PID, 32 bits) to check whether the TLB entry is valid for current process
    • Possible that 2 process share 1 PFN
Paging: Smaller Tables

How to make the page table smaller?

Bigger Pages

Make the page size larger can make the page table smaller.

-internal fragmentation

-Increase efficiency of TLB

Multi-level Page Tables

page directory: contains a page of page table

PDE: valid bit + PFN

valid bit = 0 means the page in this directory is not used.

+benefit sparse address space

+page directory also fits in one page

-extra memory reference to get page directory on TLB miss

Hashing: Inverted Page Tables

use hashtable to map VPN to PFN

Physical Memory

What if physical memory is not large enough?

Mechanism: Swap Space

Reserved space on the disk for moving pages back and forth

  • present bit: track whether a page is in physical memory

    • page not present -> page fault
      • The disk location of the page can be stored in PTE
      • Usually handled by OS directly for simplicity (performance is not an issue as accessing disk is slow)
  • Full control flow:

  • When page replacement happens?

    • When there are fewer than low watermark(LW) pages available in physical memory
    • Evict until more than high watermark(HW) pages available
      • Send a lot to disk at once!


Regards to replacement policy when the physical memory is full.


Average memory access Time (AMAT)
A M A T = T M + ( P M i s s ⋅ T D ) AMAT = T_M + (P_{Miss}\cdot T_D) AMAT=TM+(PMissTD)

The Optimal Replacement Policy

Replace the one that will be accessed furthest in the future

  • Impossible!

3 type of cache miss:

Compulsory, Capacity, Conflict


+simple to implement

-cannot determine the importance of a page


random evict, depend on luck!


LRU & LFU(Least frequently used)

swapping out the least recently used or least frequently used

-perfect LRU is very expensive (go through all page in the physical memory)

Approximate LRU

Clock algorithm

A ring, a reference bit for each page, if the page is referenced when the clock walks a whole round, it will not be evicted. Do the clock walk when eviction is needed.

Considering Dirty Bit for Replacement Policy

Intuition: Prefer to evict clean page! Otherwise a write to disk is necessary.

Modified clock algorithm

First try to find a clean page to evict.


Q1. When to bring page into physical memory?

A: demand paging + prefetching

Q2. What if the memory is too full?

A: Some processes will be chosen to be kicked off.

Put everything together: Complete Virtual Memory Systems

VAX/ VMS Virtual Memory

The OS for Digital Equipment Corporation (DEC)

  • Page size: 512B

  • Segmenting user address space

    • Save space for page table
  • Kernel virtual address space

    • One more page table lookup in kernel address space
    • Can swap process page table to disk
  • More classic VM!

A Real Address Space


  • page 0 is set to be invalid
  • kernel virtual address space in process virtual address space
    • When context switch, the PTBR for P0 and P1 points to that of new process, however, PTBR for S stays the same (same kernel).
    • +kernel memory can be swapped to disk
    • +moving data from user address space will be simple (in same address space)
      • Need to derenfence a pointer in user address space
Page Replacement

valid bit, protection bit, dirty bit, but no referemce bit

segmented FIFO replacement policy: each process has a fixed size (resident set size(RSS)) of page that can reside in memory.

second-chance lists: If the page is evicted from RSS, it will be put into the second-chance lists

  • dirty and clean pages are in separate lists
  • also in FIFO order, and has chance to be reclaimed

clustering during swapping dirty page to disk enhance efficiency

Other Tricks
  • demand zeroing: defer zeroing the page when the page is read or written
    • If a page is requested, it is marked inaccessible, only when the page is used, a free page is taken and zeroed
  • copy-on-write: copy a page to another process, the page will be marked as read-only
    • If some process write to that page, it will then get its private copy

The Linux Virtual Memory System

Focus on Linux on x86

The Linux Address Space


  • Kernel address space
    • Two type: Logical and Virtual
      • Kernel logical address
        • Allocated with kmalloc
        • Cannot be swapped to memory
        • Mapped to physical memory directly (0xc0000000 -> 0x00000000), suitable for operations need contiguous physical memory(Direct Memory Access(DMA))
      • Kernel virtual address
        • Allocated with vmalloc
        • Kernel can has more than 1 GB of memory (some can be swapped to disk)
Page Table Structure

for x86: hardware manger, multilevel page table, one page table per process

  • 32bit - 4GB, memory size incresed, 4GB is not enough, 64bits memory is needed

4 level page table, 4KB per page

  • Large Page Support: process can choose to use huge pages(2MB or 1GB)
    • +better TLB efficiency
    • -internal fragmentation
The Page Cache

Save page in the unused part of the memory.

  • sources:
    • memory-mapped files
    • Anonymous memory (process heap/stack)
  • Keeps track of dirty page in cache
    • Write back to swap space or file
  • 2Q replacement policy
    • One active list and another inactive list, the page is put in inactive list first, and sent to the active queue when re-referenced
    • bottom of the active list will be sent to the inactive queue
Security Issue
Buffer Overflow

Insert code into user address space with buffer overflow.

int some_function(char *input) {
  char dest_buffer[100];
  strcpy(dest_buffer, input); // oops, unbounded copy!

Solution: Do not execute code on stack

Return-Oriented Programming

Overwrite return address of a function.

Solution: Address Space layout randomized - randomize place for stack, heap and code in address space.

Meltdown and Spectre

Computer speculatively execute code.

  • 0
  • 1
    觉得还不错? 一键收藏
  • 0
[root@DigitalVirt-9929-8 ~]# systemctl status libvirtd.service ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2023-05-09 07:20:01 EDT; 2 months 0 days ago Docs: man:libvirtd(8) https://libvirt.org Main PID: 1776946 (libvirtd) Tasks: 62 (limit: 32768) Memory: 56.1M CGroup: /system.slice/libvirtd.service └─1776946 /usr/sbin/libvirtd --timeout 120 Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (query, none, none) for domain kvm9039; current job is (async nested, none, snapshot) owned by (2095081 r> Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (query, none, none) for domain kvm9028; current job is (async nested, none, snapshot) owned by (1776947 r> Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (modify, none, none) for domain kvm9018; current job is (async nested, none, snapshot) owned by (2091619 > Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (modify, none, none) for domain kvm9028; current job is (async nested, none, snapshot) owned by (1776947 > Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (modify, none, none) for domain kvm9039; current job is (async nested, none, snapshot) owned by (2095081 > Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) 请问如果处理这个问题


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


