OSTEP Note: Ch13-23, Memory Virtualization

这是OSTEP 第二部分的阅读笔记,之前的笔记见

这一部分介绍了OS如何给每个process提供拥有private memory的vision,从base and bound 的方法到segmentation再讲到最终paging。

Memory Virtualization

The Abstraction: Address Space

Address Space: abstraction of physical memory, running program’s view of memory in the system.

Goal
  • transparency: invisible to running program -> process has private physical memory
  • efficiency: in terms of both time and space, with features like TLB
  • protection: process cannot touch memory outside its address space
Memory API
  1. Libray calls: malloc(), free()

  2. No memory leak if process exits.

  3. Tools: valgrind, purify

  4. System call:

    • brk(addr) - change the end of the heap to new addr
    • sbrk(increment)
    • mmap() - allocate anonymous memory which is not assoicated with file, swap space
  5. Other library calls

    • calloc(): also zeros it
    • realloc(): move a memory pointer to a larger region

Mechanism: Address Translation

Efficiency: hardware support (TLB, page table)

Control: protection

Flexibility: process can use the address space at will

Dynamic Relocation (Base and Bounds)

physical address = virtual address + base

  • Make sure virtual address within bound

memory management unit (MMU) helps with address translation

Hardware Support
  • privileged mode and user mode
    • Set base/ bound register
  • base and bounds registers - part of MMU (All translation is done by MMU)
    • translate (add base, check within bound)
    • raise exception when outside of bound
  • Instruction for modifying base and bound register - in kernel mode
OS Support
  • Track free physical memory (free list)
  • Base/bounds management
    • Set base and bound register correctly during context switch (Store it in PCB)
  • Exception handling
    • Terminate offending process
Problem
  • Internal fragmentation: Space inside the allocated unit is not used since the heap and stack is small.
    • Besides, address space cannot be larger than the physical memory.

Segmentation: Generalized Base/ Bounds

base + bound pair

top 2 bits: which segment the virtual address refers to在这里插入图片描述

Support for Sharing

Read only segment can be shared by several processes

OS Support
  • Set up base and bound register
  • Allocate more space when segments grow, update bound register
  • Free list?
Problem
  1. external fragmentation: segment is with different size, there will be holes between segments.

Detour: Free-Space Management

Important for managing memory with variable-sized units.

(heap/ segmentation)

Problem: how to minimize external fragmentation?

Low-level Mechanism
Splitting and Coalescing

在这里插入图片描述
在这里插入图片描述

  • Represented with list
  • Split: the node is split into two and one is returned
  • Coalescing: when some space is freed, check its left and right and whether they can be merged into a larger list
Tracking the size of Allocated Region
  • header block for tracking the size
    • Magic number is for sanity check
      在这里插入图片描述
  • When user asks for size n, we have to provide n+sizeof(header)
Embedding a Free List

在这里插入图片描述

Growing the Heap

Calling sbrk() might be needed to grow the heap

Basic Strategies
Best Fit

Smallest chunk that is larger than the required size

  • +Reduce wasted space
  • -performance penalty for exhaustive search
Worst Fit

Always divide the largest chunk

  • +leave several big chunks
  • -high overhead
  • -research shows the performance is bad
First Fit

Return the first fitted one

  • +Speed (no need to search exhaustively)
  • Order is important!
    • address-based ordering
      • +simple coalescing
      • +reduced fragmentation
Next Fit

Record the last returned pointer, start search from the next chunk

  • +spread the searches for free space throughout hte list more uniformly
Other Approaches
Segregated Lists
  • Assumption: There is a request of popular size

Method:

  • Particular memory is dedicated for one particular size of requests.

  • Other chunk of memory for other size.

Buddy Allocation

Make coalesce simple!
在这里插入图片描述

When a chunk is freed, check whether its buddy is free, if so, do the coalescing.

Beauty: the buddy block address differs by 1 bit

Problem
  1. scaling: more complex data structure (balanced binary tree, splay tree, partially-ordered tree)
  2. multithread: allocator should be thread-safe

Paging

Divide memory into fixed-sized unit (page)

Advantage

  1. flexibility: x consider how the process use the page
  2. simplicity: free list for handling free page
Page Table
Size
  • We need 1 page table per process

  • a 32 bit machine, 20 bits for VPN, then it requires 2^20 * 4B = 4MB for page table

    • Very large! -> stored in memory (Actually, it is in OS virtual memory, can be swapped to disk)
What stored?

在这里插入图片描述

  • valid bit: whether the translation is valid. e.g., the space between stack and heap is invalid.
    • solving internal fragmentation
  • protection bit: readable/ writable/ executable
  • present bit: in physical memory or on disk
    • allow OS to hold memory larger than physical memory
  • dirty bit
  • referenced bit
    • for replacement
Problem
  • Slow!
    • Suppose we have page-table base register for accessing location of the page table
      • One extra memory reference
Faster Translations (TLBs, translation-lookaside buffer)

address-translation cache in MMU

Put PTE(page table entry) inside TLB for referenced VPN.

TLB Miss?
  • hardware handle
    • Find the PTE with page-table base register
  • software handle
    • cause exception! Bring the PTE to TLB
    • Be careful! Do not cause ping pong problem, ensure that the PTE is inside TLB
    • +simplicity + flexibility
    • Modifying TLB with privileged instruction!

Aside: CISC vs RISC

CISC: instruction high-level, code compact

RISC: compiler needs simplified instruction -> simple, uniform, and fast

What stored?

valid|VPN(tag) | PFN | other bits(similar to what’s in PTE)

  • valid bit indicate whether TLB entry is valid
TLB issue: Context Switch

TLB entry is not useful on context switch

Solution:

  1. flush TLB (Set all valid bit to 0, either performed by hardware or OS)

  2. sharing of the TLB (no flush during context switch, faster)

    • use ASID(Address space identifier, usually 8 bits, fewer than PID, 32 bits) to check whether the TLB entry is valid for current process
      在这里插入图片描述
    • Possible that 2 process share 1 PFN
Paging: Smaller Tables

How to make the page table smaller?

Bigger Pages

Make the page size larger can make the page table smaller.

-internal fragmentation

-Increase efficiency of TLB

Multi-level Page Tables

page directory: contains a page of page table

PDE: valid bit + PFN

valid bit = 0 means the page in this directory is not used.

+benefit sparse address space

+page directory also fits in one page

-extra memory reference to get page directory on TLB miss

Hashing: Inverted Page Tables

use hashtable to map VPN to PFN

Physical Memory

What if physical memory is not large enough?

Mechanism: Swap Space

Reserved space on the disk for moving pages back and forth

  • present bit: track whether a page is in physical memory

    • page not present -> page fault
      • The disk location of the page can be stored in PTE
      • Usually handled by OS directly for simplicity (performance is not an issue as accessing disk is slow)
  • Full control flow:
    在这里插入图片描述

  • When page replacement happens?

    • When there are fewer than low watermark(LW) pages available in physical memory
    • Evict until more than high watermark(HW) pages available
      • Send a lot to disk at once!

Policy

Regards to replacement policy when the physical memory is full.

Metrics

Average memory access Time (AMAT)
A M A T = T M + ( P M i s s ⋅ T D ) AMAT = T_M + (P_{Miss}\cdot T_D) AMAT=TM+(PMissTD)

The Optimal Replacement Policy

Replace the one that will be accessed furthest in the future

  • Impossible!

3 type of cache miss:

Compulsory, Capacity, Conflict

FIFO

+simple to implement

-cannot determine the importance of a page

Random

random evict, depend on luck!

+simple

LRU & LFU(Least frequently used)

swapping out the least recently used or least frequently used

-perfect LRU is very expensive (go through all page in the physical memory)

Approximate LRU

Clock algorithm

A ring, a reference bit for each page, if the page is referenced when the clock walks a whole round, it will not be evicted. Do the clock walk when eviction is needed.

Considering Dirty Bit for Replacement Policy

Intuition: Prefer to evict clean page! Otherwise a write to disk is necessary.

Modified clock algorithm

First try to find a clean page to evict.

Question

Q1. When to bring page into physical memory?

A: demand paging + prefetching

Q2. What if the memory is too full?

A: Some processes will be chosen to be kicked off.

Put everything together: Complete Virtual Memory Systems

VAX/ VMS Virtual Memory

The OS for Digital Equipment Corporation (DEC)

  • Page size: 512B

  • Segmenting user address space

    • Save space for page table
  • Kernel virtual address space

    • One more page table lookup in kernel address space
    • Can swap process page table to disk
  • More classic VM!

A Real Address Space

在这里插入图片描述

  • page 0 is set to be invalid
  • kernel virtual address space in process virtual address space
    • When context switch, the PTBR for P0 and P1 points to that of new process, however, PTBR for S stays the same (same kernel).
    • +kernel memory can be swapped to disk
    • +moving data from user address space will be simple (in same address space)
      • Need to derenfence a pointer in user address space
Page Replacement

valid bit, protection bit, dirty bit, but no referemce bit

segmented FIFO replacement policy: each process has a fixed size (resident set size(RSS)) of page that can reside in memory.

second-chance lists: If the page is evicted from RSS, it will be put into the second-chance lists

  • dirty and clean pages are in separate lists
  • also in FIFO order, and has chance to be reclaimed

clustering during swapping dirty page to disk enhance efficiency

Other Tricks
  • demand zeroing: defer zeroing the page when the page is read or written
    • If a page is requested, it is marked inaccessible, only when the page is used, a free page is taken and zeroed
  • copy-on-write: copy a page to another process, the page will be marked as read-only
    • If some process write to that page, it will then get its private copy

The Linux Virtual Memory System

Focus on Linux on x86

The Linux Address Space

在这里插入图片描述

  • Kernel address space
    • Two type: Logical and Virtual
      • Kernel logical address
        • Allocated with kmalloc
        • Cannot be swapped to memory
        • Mapped to physical memory directly (0xc0000000 -> 0x00000000), suitable for operations need contiguous physical memory(Direct Memory Access(DMA))
      • Kernel virtual address
        • Allocated with vmalloc
        • Kernel can has more than 1 GB of memory (some can be swapped to disk)
Page Table Structure

for x86: hardware manger, multilevel page table, one page table per process

  • 32bit - 4GB, memory size incresed, 4GB is not enough, 64bits memory is needed

4 level page table, 4KB per page
在这里插入图片描述

  • Large Page Support: process can choose to use huge pages(2MB or 1GB)
    • +better TLB efficiency
    • -internal fragmentation
The Page Cache

Save page in the unused part of the memory.

  • sources:
    • memory-mapped files
    • Anonymous memory (process heap/stack)
  • Keeps track of dirty page in cache
    • Write back to swap space or file
  • 2Q replacement policy
    • One active list and another inactive list, the page is put in inactive list first, and sent to the active queue when re-referenced
    • bottom of the active list will be sent to the inactive queue
Security Issue
Buffer Overflow

Insert code into user address space with buffer overflow.

int some_function(char *input) {
  char dest_buffer[100];
  strcpy(dest_buffer, input); // oops, unbounded copy!
}

Solution: Do not execute code on stack

Return-Oriented Programming

Overwrite return address of a function.

Solution: Address Space layout randomized - randomize place for stack, heap and code in address space.

Meltdown and Spectre

Computer speculatively execute code.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
[root@DigitalVirt-9929-8 ~]# systemctl status libvirtd.service ● libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2023-05-09 07:20:01 EDT; 2 months 0 days ago Docs: man:libvirtd(8) https://libvirt.org Main PID: 1776946 (libvirtd) Tasks: 62 (limit: 32768) Memory: 56.1M CGroup: /system.slice/libvirtd.service └─1776946 /usr/sbin/libvirtd --timeout 120 Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (query, none, none) for domain kvm9039; current job is (async nested, none, snapshot) owned by (2095081 r> Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (query, none, none) for domain kvm9028; current job is (async nested, none, snapshot) owned by (1776947 r> Jul 09 22:21:37 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (modify, none, none) for domain kvm9018; current job is (async nested, none, snapshot) owned by (2091619 > Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (modify, none, none) for domain kvm9028; current job is (async nested, none, snapshot) owned by (1776947 > Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Cannot start job (modify, none, none) for domain kvm9039; current job is (async nested, none, snapshot) owned by (2095081 > Jul 09 22:25:36 DigitalVirt-9929-8 libvirtd[1776946]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) 请问如果处理这个问题
07-11

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值