To be continued
notes for Operating System: Three Easy Pieces, part 2
Note that contents marked with FYI(for your information) is created by the author and unconfirmed.
Abstraction: address space & memory layout
The early-stage OS mem layout: | OS | SINGLE_PROGRAM|, simple
Supporting time sharing:
- inherit the memory layout, swap out a suspended process to disk, and a process to start from disk?
- the CONTEXT being CPU states + main mem states
- TOO SLOW! recovery for CPU states are fast, but extremly slow for mem states regarding disk accessing
- needs to share a memory between multiple processes from now on!
Virtualize the Physical Memory
Shared memory, but an illusion of simple, infinite address space: need virtualization
- the program can access the desired address in its illusion: the VIRTUAL ADDRESS
- the physical memory has fixed, limited address
- a single process has to share the physical memory
- physically, the program can only access the PHYSICAL ADDRESS
- needs a mapping from VA space to PA space: MMU
Our requirements for the mapping process:
- main target: from virtual address space to physical address space
- transparency: the user programs knows nothing(lives in its illusion)
- protection: provide isolation between multiple processes & user and the OS
- efficiency: minimize time and space costs
reminder: separate mechanisms and policies
Mechanisms
from
CPU -> VA -> MEM(the early-stage OS)
to
CPU -> VA -> MMU -> PA -> MEM(our new requirement)
*interposition is powerful: add the new mechanism between the existing client and server
1st try: dynamic relocation(or base and bound)
details
"translate"(平移) the virtual address space from 0 to an offset:
- PA_SPACE= MMU(VA_SPACE)
PA = MMU(VA) = VA + OFFSET
the user program: user -> VA -> MMU(hardware) -> PA -> memory
- transparent!
- someone setup the BASE & BOUND?
- finished by the softwore: the OS
- privileged instructions: FYI, e.g. rmmu $MMUREG, $REG and wmmu $MMUREG, $REG etc.
protection: EXCEED flag
- inform the CPU of the exception(internal error)
- Note: the bound can be checked either before or after the addition.
(MMU, FYI)
the procedure: basics for base-bound method
take UNIX for example: loading(startup) of a process(mem operations only)
- the shell FORKs a subprocess;
- the OS looks at current memory usage and finds a proper offset
- OS needs overview of the memory usage - the free list(this is a policy for later discussions)
- the OS writes the determined base(offset) and bound value in the PCB(process control block) for later use
- setup other info in the PCB …
- add the new process to a ready-to-run list(process management)
- the OS looks at current memory usage and finds a proper offset
- now the subprocess's been ready, waiting for a chance to run!
- the scheduler decides to run the subprocess - perform context switch:
- save current process' base and bound in its PCB
- FYI, rmmu $base, $t0, rmmu $bound, $t1, sw $t0, addr_PCB_base, sw $t1, addr_PCB_bound
- save other states of the current process …
- load the subprocess' base and bound from its PCB to the MMU hardware
- FYI, lw $t0, addr_PCB_base, lw $t1, addr_PCB_bound, wmmu $base, $t0, wmmu $bound, $t1
- load other states of the subprocess …
- save current process' base and bound in its PCB
- the context has been switched!
- the scheduler decides to run the subprocess - perform context switch:
- the subprocess runs EXEC system call;
- the EXEC syscall, as a subprocess, has its own illusion of mem-layout(virtual address space)
- this mapping for the illusion has formed in the FORK procedure before
- the EXEC then overwrite the current mem content with that in the executable file(ELF) in disk
- disk accessing is provided by the File System abstraction
- the EXEC then set the PC register to the starting point defined in the executable file*
- the EXEC syscall, as a subprocess, has its own illusion of mem-layout(virtual address space)
*note: for a C program, it actually jumps to a _start() in ctr1.o, provided by the standard lib of C, which calls the main() of the program later
- now the control has been handed to the new process!
2nd try: dynamic relocation based on segmentation
details
problems with the 1st try: internal and external frags
- internal fragmentation: unused parts within allocated memory blocks
- external fragmentation: parts between allocated blocks that are hard for another allocation
we can optimize internal frags first
- 1st try: dynamic relocation based on whole processes
- smaller units may minimize internal frags
- give each logical segment a pair of base-size regs
a hardware-friendly implementation
- VA = #SEG || IN_SEG_OFFSET
- fixed-size segments in the virtual address space
- use #SEG to find out the specified base-size pairs
- PA = base[#SEG] + IN_SEG_OFFSET
- compare IN_SEG_OFFSET directly with size for excess check
for fixed-size segments
- only memory space of its "size" needed
for dynamically-growing segments
- FYI: allocate a small part of its virtual space
- most segs don't grow that much - the pre-allocated space satisfies most needs
- the OS make use of the over-boundary exception to extend if the seg wants more space
- if the segment doesn't grow, the memory never needs an extension
- for segments growing backwards(e.g. the stack)
- hardware support: reverse flag
- if reversed: PA = base[#SEG] + IN_SEG_OFFSET - MAX_SEG_OFFSET(not the value of size!)
- excess check for reversely growing segs: MAX_SEG_OFFSET - IN_SEG_OFFSET <= size
(address mapping in reversed growing seg)
(MMU, writing circuits skipped, FYI)
from coarse-grained to fine-grained
- more flexible for OS to manage the mem
- many segs -> many base-size regs pair -> too expensive!
- put the base and size info in memory: the segment table
- tradeoff: trade accessing time for cost
- FYI, optimize time efficiency: SRAM buffer for segment table etc.
supporting memory sharing: more protection bits
- e.g. sharable code seg(reentrant code): r-x
- if infringe protection, the MMU raises an exception with error code for the CPU(various types of error)
the procedure: what's new?
allocation unit: the whole process VS a single segment
- less internal frags: e.g. spaces between heap/stack no longer needs much space
reentrant code supported
- the FORK will first try to share the code
- the EXEC try to modify the code
- the MMU raises an exception, the OS gets the code copied
the OS has more complex structure to manage the process
- for segmentation with segment table:
- PCB: segment table entrance address required
- one segment table for each process held in memory
the context switch
- refresh the value of the segment table entry register
- if there's a SRAM buffer, more operations …
Better Mechenism: Paging
recall previous methods
base-bound relocation: unit = the whole process's VA space
- you'll know it if you've written pure machine instructions(hex number lines) for a single machine without OS
- the programmer decides everything:
- length & location of the code
- number of data and location of each
- the pre-allocated space for stack and heap, and the location, and the direction of growth
- …
- the OS only translate the process as a whole in the physical memory
- if you leave some space in your program VA space, the OS won't optimize it
- so the programmers have to write compact programs to avoid internal frags
- each part(the later segments) starts at the end of the previous
- these cause the programs' difference in size
segmentation relocation: unit = a segment
- the programmers still determine the size of segments, but don't care about location
- OS extracts segments from the VA space and view them as single units
- now programmers can fix the segments in specified location in VA space!
- also minimize external frags in a way: smaller units, more flexibility in using free spaces
we want to eliminate external frags
- early attempts: OS supports huge amounts of "segments" in a process
- extremely and configurable small "segment" units
- free-list management algorithms are always imperfect
- if there's a thousand way, there isn't a perfect way
- the basic problem: the size of the units aren't the same
attempts towards paging
if the size are fixed, there're no external frags
- now look back on the internal frags:
our first attempt: fixed size, unit = program
- assume that PA_SIZE = N * VA_SIZE
- when loading, the whole VA space is a unit
- a substitution for base: a number n (0 < n < N) is enough in describing the process's physical location
- no need for bound(size): all process' size = size of VA space
- PA = n * VA_SIZE + VA (in compare: PA = base + VA)
- lots of internal frags however
- all of VA space is loaded, though may be unused
our second attempt: fixed size, unit = segment
- let size of every segment(SEG_SIZE) be the size determined by its IN_SEG_OFFSET space
- assume that PA_SIZE = N * SEG_SIZE
- when loaded, each segment is a unit
- base -> number of the segment's location
- no need for size
- PA = n * SEG_SIZE + IN_SEG_OFFSET (in compare: PA = base[#SEG] + IN_SEG_OFFSET)
- still lots of internal frags
how traditional segmentation minimize internal frags?
- by loading the segments in a compact manner
- unused VA space won't be loaded at all
our loading unit is the fixed-size segments, called "pages"
- & we want the unused VA space to not be loaded as well
- so: create pages that contains only unused parts of VA_SPACE, so they can be ignored when loading
- if the pages are small enough, the used and unused parts may be extracted by pages seperately
our final attempt: fixed size, unit = a "page" small enough
- assume that PA_SIZE = N * PAGE_SIZE
- when loading, each page is a unit
- n = number of the page's location
- no need for size
- PA = n * PAGE_SIZE + IN_PAGE_OFFSET
- VALID flag: page totally unused is invalid, thus don't have to be loaded to the memory
- size of each internal frag < PAGE_SIZE: acceptable!
(VALID_FLAG & VM usage relationship, FYI)
basic implementation
loading the program
- the OS looks at a free-page list, and allocates unused physical pages for the new process
- only a used page(VALID=1) will be allocated a physical page
- the numbers of the allocated pages are stored in the process's
segment tablepage table(page map)- the page table entry: PhysicalPageNumber || Valid
- the page table offset(PTO) is stored in the process PCB
- the EXEC load ELF content to physical mem by accessing VM
switching context
- load the process's PTO value to PTO register
illusion of bigger memory: paging(swapping)
what if the physical memory is full?
- there're too many process, whose valid pages take up all the physical space
- the OS found no free space, thus a new process can't be created until a process ends
- we can select a page, put it somewhere else for now so that the new process can use that space
where to put the swapped out pages?
- disks are big, use a certain part of it - the swapping section
- copy the pages of the old process to the swapping area
- the physical pages are free to use now
now load the new process
- write the new process's page table - fill these physical pages in it
- from the ELF of the new process on the disk(not in swapping area) load program into physical pages
- store the new page table offset in its PCB
new process ended, context switch, old process runs again
- the old process access the swapped area in VM, but get the data of the new process - WRONG!
- where's its data? in the swapping area!
- need PRESENT bit
- if PRESENT, the page is in the memory
- if not PRESENT, use the DISK_ADDR in our page table entry to find the page in swapping area
- we can use bits for #PPage as DISK_ADDR
- judge #PPage & DISK_ADDR by PRESENT
- in context switch, find all pages with PRESENT=0
- from the page table's DISK_ADDR, reload pages in free pages
- set PRESENT = 1, set #PPage
- now able to run the old process again
lazy optimization: demand paging
being lazy is faster
- we initialize all the pages of a process with the file upon its startup
- we load all the pages of the new one in swapping upon its startup
- some pages may never be used, but we still loaded them
do the loading until it's accessed
- when trying accessing a page with PRESENT=0, raise a page fault
- this exception also occurs when accessing VALID=0, but that simply causes a great segmentation faultO(∩_∩)O
- there's another reason for the page fault(in later discussions)
- the OS's page fault handler reads the DISK_ADDR, load it to the physical memory, set PRESENT=1
- jump back to the memory accessing instruction and do it again
- access a physical page successfully!
use disk more than swapping area: mapping the disk
until now we've been loading the ELF by directly accessing the disk in OS
- we can map an area on VM to an area on disk
- reading disk -> reading VM
Linux use objects on disks
- an object can be a file
- a general file's file section is diveded into size of page
- recall DISK_ADDR in swapping area - do the same!
- fill the DISK_ADDR of the pages in file in the new process's page table
- set the PRESENT = 0
- EXEC jumps to the start
- access the first code - page fault!
- the OS load the page
- only used page will be loaded
- these pages are called working set/resident set
conclusion: meaning of PRESENT = 0
- the page has not been initialized by a file
- the page has been swapped out to the swapping area on the disk
we're actually caching the swapping area with the physical memory!
space optimization: sharing pages between processes
imagine multiple processes accessing the same file
- the file initialized some pages in process_0
- process_1 want to use the file
- process_1 uses the file to initialize another PPage
- 2 pages of the same contents!
if the processes wants to share a file
- process_0 create file pages
- the pages are recorded by the OS in an object list
- process_1 searches the file name in the object list
- process_1 finds the PPages of the file already in use
- process_1 fill the #PPages in its page table
- they both read/modify the same file now
what if the process don’t wants to modify a shared file?
- the process_0 calls FORK
- recall: the OS used to find free space in physical memory, fill in the new process's page table …
- now the OS creates a page table identical to the old one
- the same mapping of VA -> PA!
- however, both the old and the new pages are set to read-only mode
- our page table entry now: #PPage || VALID || PRESENT || W
- W = 0 indicates a read-only page
- when either of the two processes tries modifying the page
- page fault raised
- the OS look at the exception cause, and knows it's a read-write error in a private object
- the OS now searches through its free space for a free page, and copy the page there
- the OS change the process page table, substitute the old page with the copied one, and set both pages writable
- we're actually being lazy again
- returned from the handler, the write instruction runs again
- successfully modified the new page!
(our MMU, FYI. my computer's OS has started swapping out memory now/(ㄒoㄒ)/~~)