OS Three Easy Pieces Ⅱ: virtualization(of memory)

kinobhu

已于 2024-04-02 22:52:41 修改

阅读量870

点赞数 25

分类专栏：系统能力文章标签： linux 硬件架构笔记课程设计

于 2024-03-31 19:52:43 首次发布

本文链接：https://blog.csdn.net/kino2004/article/details/137201387

版权

系统能力专栏收录该内容

1 篇文章 0 订阅

订阅专栏

To be continued

notes for Operating System: Three Easy Pieces, part 2

Note that contents marked with FYI(for your information) is created by the author and unconfirmed.

Abstraction: address space & memory layout

The early-stage OS mem layout: | OS | SINGLE_PROGRAM|, simple

Supporting time sharing:

inherit the memory layout, swap out a suspended process to disk, and a process to start from disk?
the CONTEXT being CPU states + main mem states
TOO SLOW! recovery for CPU states are fast, but extremly slow for mem states regarding disk accessing
needs to share a memory between multiple processes from now on!

Virtualize the Physical Memory

Shared memory, but an illusion of simple, infinite address space: need virtualization

the program can access the desired address in its illusion: the VIRTUAL ADDRESS
the physical memory has fixed, limited address
a single process has to share the physical memory
physically, the program can only access the PHYSICAL ADDRESS
needs a mapping from VA space to PA space: MMU

Our requirements for the mapping process:

main target: from virtual address space to physical address space
transparency: the user programs knows nothing(lives in its illusion)
protection: provide isolation between multiple processes & user and the OS
efficiency: minimize time and space costs

reminder: separate mechanisms and policies

Mechanisms

from

CPU -> VA -> MEM(the early-stage OS)

CPU -> VA -> MMU -> PA -> MEM(our new requirement)

*interposition is powerful: add the new mechanism between the existing client and server

1st try: dynamic relocation(or base and bound)

details

"translate"(平移) the virtual address space from 0 to an offset:

PA_SPACE= MMU(VA_SPACE)

PA = MMU(VA) = VA + OFFSET

the user program: user -> VA -> MMU(hardware) -> PA -> memory

transparent!
someone setup the BASE & BOUND?
- finished by the softwore: the OS
- privileged instructions: FYI, e.g. rmmu $MMUREG, $REG and wmmu $MMUREG, $REG etc.

protection: EXCEED flag

inform the CPU of the exception(internal error)
Note: the bound can be checked either before or after the addition.

(MMU, FYI)

the procedure: basics for base-bound method

take UNIX for example: loading(startup) of a process(mem operations only)

the shell FORKs a subprocess;
- the OS looks at current memory usage and finds a proper offset
  - OS needs overview of the memory usage - the free list(this is a policy for later discussions)
- the OS writes the determined base(offset) and bound value in the PCB(process control block) for later use
- setup other info in the PCB …
- add the new process to a ready-to-run list(process management)
now the subprocess's been ready, waiting for a chance to run!
- the scheduler decides to run the subprocess - perform context switch:
  - save current process' base and bound in its PCB
    - FYI, rmmu $base, $t0, rmmu $bound, $t1, sw $t0, addr_PCB_base, sw $t1, addr_PCB_bound
  - save other states of the current process …
  - load the subprocess' base and bound from its PCB to the MMU hardware
    - FYI, lw $t0, addr_PCB_base, lw $t1, addr_PCB_bound, wmmu $base, $t0, wmmu $bound, $t1
  - load other states of the subprocess …
- the context has been switched!
the subprocess runs EXEC system call;
- the EXEC syscall, as a subprocess, has its own illusion of mem-layout(virtual address space)
  - this mapping for the illusion has formed in the FORK procedure before
  - the EXEC then overwrite the current mem content with that in the executable file(ELF) in disk
    - disk accessing is provided by the File System abstraction
  - the EXEC then set the PC register to the starting point defined in the executable file*

*note: for a C program, it actually jumps to a _start() in ctr1.o, provided by the standard lib of C, which calls the main() of the program later

now the control has been handed to the new process!

2nd try: dynamic relocation based on segmentation

details

problems with the 1st try: internal and external frags

internal fragmentation: unused parts within allocated memory blocks
external fragmentation: parts between allocated blocks that are hard for another allocation

we can optimize internal frags first

1st try: dynamic relocation based on whole processes
smaller units may minimize internal frags
give each logical segment a pair of base-size regs

a hardware-friendly implementation

VA = #SEG || IN_SEG_OFFSET
- fixed-size segments in the virtual address space
use #SEG to find out the specified base-size pairs
PA = base[#SEG] + IN_SEG_OFFSET
compare IN_SEG_OFFSET directly with size for excess check

for fixed-size segments

only memory space of its "size" needed

for dynamically-growing segments

FYI: allocate a small part of its virtual space
- most segs don't grow that much - the pre-allocated space satisfies most needs
- the OS make use of the over-boundary exception to extend if the seg wants more space
- if the segment doesn't grow, the memory never needs an extension
for segments growing backwards(e.g. the stack)
- hardware support: reverse flag
- if reversed: PA = base[#SEG] + IN_SEG_OFFSET - MAX_SEG_OFFSET(not the value of size!)
- excess check for reversely growing segs: MAX_SEG_OFFSET - IN_SEG_OFFSET <= size

(address mapping in reversed growing seg)

(MMU, writing circuits skipped, FYI)

from coarse-grained to fine-grained

more flexible for OS to manage the mem
many segs -> many base-size regs pair -> too expensive!
put the base and size info in memory: the segment table
- tradeoff: trade accessing time for cost
- FYI, optimize time efficiency: SRAM buffer for segment table etc.

supporting memory sharing: more protection bits

e.g. sharable code seg(reentrant code): r-x
if infringe protection, the MMU raises an exception with error code for the CPU(various types of error)

the procedure: what's new?

allocation unit: the whole process VS a single segment

less internal frags: e.g. spaces between heap/stack no longer needs much space

reentrant code supported

the FORK will first try to share the code
the EXEC try to modify the code
the MMU raises an exception, the OS gets the code copied

the OS has more complex structure to manage the process

for segmentation with segment table:
- PCB: segment table entrance address required
- one segment table for each process held in memory

the context switch

refresh the value of the segment table entry register
if there's a SRAM buffer, more operations …

Better Mechenism: Paging

recall previous methods

base-bound relocation: unit = the whole process's VA space

you'll know it if you've written pure machine instructions(hex number lines) for a single machine without OS
the programmer decides everything:
- length & location of the code
- number of data and location of each
- the pre-allocated space for stack and heap, and the location, and the direction of growth
- …
the OS only translate the process as a whole in the physical memory
- if you leave some space in your program VA space, the OS won't optimize it
so the programmers have to write compact programs to avoid internal frags
- each part(the later segments) starts at the end of the previous
these cause the programs' difference in size

segmentation relocation: unit = a segment

the programmers still determine the size of segments, but don't care about location
OS extracts segments from the VA space and view them as single units
now programmers can fix the segments in specified location in VA space!
also minimize external frags in a way: smaller units, more flexibility in using free spaces

we want to eliminate external frags

early attempts: OS supports huge amounts of "segments" in a process
- extremely and configurable small "segment" units
free-list management algorithms are always imperfect
- if there's a thousand way, there isn't a perfect way
the basic problem: the size of the units aren't the same

attempts towards paging

if the size are fixed, there're no external frags

now look back on the internal frags:

our first attempt: fixed size, unit = program

assume that PA_SIZE = N * VA_SIZE
when loading, the whole VA space is a unit
- a substitution for base: a number n (0 < n < N) is enough in describing the process's physical location
- no need for bound(size): all process' size = size of VA space
PA = n * VA_SIZE + VA (in compare: PA = base + VA)
lots of internal frags however
- all of VA space is loaded, though may be unused

our second attempt: fixed size, unit = segment

let size of every segment(SEG_SIZE) be the size determined by its IN_SEG_OFFSET space
assume that PA_SIZE = N * SEG_SIZE
when loaded, each segment is a unit
- base -> number of the segment's location
- no need for size
PA = n * SEG_SIZE + IN_SEG_OFFSET (in compare: PA = base[#SEG] + IN_SEG_OFFSET)
still lots of internal frags

how traditional segmentation minimize internal frags?

by loading the segments in a compact manner
- unused VA space won't be loaded at all

our loading unit is the fixed-size segments, called "pages"

& we want the unused VA space to not be loaded as well
so: create pages that contains only unused parts of VA_SPACE, so they can be ignored when loading
if the pages are small enough, the used and unused parts may be extracted by pages seperately

our final attempt: fixed size, unit = a "page" small enough

assume that PA_SIZE = N * PAGE_SIZE
when loading, each page is a unit
- n = number of the page's location
- no need for size
PA = n * PAGE_SIZE + IN_PAGE_OFFSET
VALID flag: page totally unused is invalid, thus don't have to be loaded to the memory
size of each internal frag < PAGE_SIZE: acceptable!

(VALID_FLAG & VM usage relationship, FYI)

basic implementation

loading the program

the OS looks at a free-page list, and allocates unused physical pages for the new process
- only a used page(VALID=1) will be allocated a physical page
the numbers of the allocated pages are stored in the process's ~~segment table~~ page table(page map)
- the page table entry: PhysicalPageNumber || Valid
the page table offset(PTO) is stored in the process PCB
the EXEC load ELF content to physical mem by accessing VM

switching context

load the process's PTO value to PTO register

illusion of bigger memory: paging(swapping)

what if the physical memory is full?

there're too many process, whose valid pages take up all the physical space
the OS found no free space, thus a new process can't be created until a process ends
we can select a page, put it somewhere else for now so that the new process can use that space

where to put the swapped out pages?

disks are big, use a certain part of it - the swapping section
copy the pages of the old process to the swapping area
the physical pages are free to use now

now load the new process

write the new process's page table - fill these physical pages in it
from the ELF of the new process on the disk(not in swapping area) load program into physical pages
store the new page table offset in its PCB

new process ended, context switch, old process runs again

the old process access the swapped area in VM, but get the data of the new process - WRONG!
where's its data? in the swapping area!
need PRESENT bit
- if PRESENT, the page is in the memory
- if not PRESENT, use the DISK_ADDR in our page table entry to find the page in swapping area
- we can use bits for #PPage as DISK_ADDR
  - judge #PPage & DISK_ADDR by PRESENT
in context switch, find all pages with PRESENT=0
from the page table's DISK_ADDR, reload pages in free pages
set PRESENT = 1, set #PPage
now able to run the old process again

lazy optimization: demand paging

being lazy is faster

we initialize all the pages of a process with the file upon its startup
we load all the pages of the new one in swapping upon its startup
some pages may never be used, but we still loaded them

do the loading until it's accessed

when trying accessing a page with PRESENT=0, raise a page fault
- this exception also occurs when accessing VALID=0, but that simply causes a great segmentation faultO(∩_∩)O
- there's another reason for the page fault(in later discussions)
the OS's page fault handler reads the DISK_ADDR, load it to the physical memory, set PRESENT=1
jump back to the memory accessing instruction and do it again
access a physical page successfully!

use disk more than swapping area: mapping the disk

until now we've been loading the ELF by directly accessing the disk in OS

we can map an area on VM to an area on disk
reading disk -> reading VM

Linux use objects on disks

an object can be a file
a general file's file section is diveded into size of page
recall DISK_ADDR in swapping area - do the same!
- fill the DISK_ADDR of the pages in file in the new process's page table
- set the PRESENT = 0
EXEC jumps to the start
- access the first code - page fault!
- the OS load the page
- only used page will be loaded
  - these pages are called working set/resident set

conclusion: meaning of PRESENT = 0

the page has not been initialized by a file
the page has been swapped out to the swapping area on the disk

we're actually caching the swapping area with the physical memory!

space optimization: sharing pages between processes

imagine multiple processes accessing the same file

the file initialized some pages in process_0
process_1 want to use the file
process_1 uses the file to initialize another PPage
2 pages of the same contents!

if the processes wants to share a file

process_0 create file pages
the pages are recorded by the OS in an object list
process_1 searches the file name in the object list
process_1 finds the PPages of the file already in use
process_1 fill the #PPages in its page table
they both read/modify the same file now

what if the process don’t wants to modify a shared file?

the process_0 calls FORK
recall: the OS used to find free space in physical memory, fill in the new process's page table …
now the OS creates a page table identical to the old one
- the same mapping of VA -> PA!
however, both the old and the new pages are set to read-only mode
- our page table entry now: #PPage || VALID || PRESENT || W
- W = 0 indicates a read-only page
when either of the two processes tries modifying the page
- page fault raised
- the OS look at the exception cause, and knows it's a read-write error in a private object
- the OS now searches through its free space for a free page, and copy the page there
- the OS change the process page table, substitute the old page with the copied one, and set both pages writable
  - we're actually being lazy again
- returned from the handler, the write instruction runs again
  - successfully modified the new page!