File System Review Note - Operating System

最新推荐文章于 2021-10-22 23:15:01 发布

weixin_30776545

最新推荐文章于 2021-10-22 23:15:01 发布

阅读量108

点赞数

原文链接：http://www.cnblogs.com/yxcindy/p/10223522.html

版权

Cindy Chen的笔记地址： https://docs.google.com/document/d/14qvHObfoNGrAA6SB6C688ErOzDby8A-5wxqLD_97-p8/edit?usp=sharing

Disk I/O

Understand the memory hierarchy concept, locality
Physical disk structure
1. platters
2. surfaces
3. tracks
4. sectors
5. cylinders
6. arms
7. heads

disk scheduling
1. FCFS
  1. reasonable when load is low
  2. long waiting times for long request queues
2. SSTF (shortest seek time first)
  1. minimize arm movement (seek time), maximize request rate
  2. favors middle blocks
3. SCAN (elevator)
  1. service requests in one direction until done, then reverse
4. C-SCAN
  1. like SCAN, but only go in one direction
5. in general, unless there are request queues, disk scheduling does not have much impact
6. modern disks often do the disk scheduling themselves
  1. disk know their layout better than OS, can optimize better
  2. ignores, undoes any scheduling done by OS
Disk interface
1. How does the OS make requests to the disk?
Disk performance
1. What steps determine disk request performance?
2. What are seek, rotation, transfer?

File systems

1. Topics
  1. files
  2. directories
  3. sharing
  4. protection
  5. layouts
  6. buffer cache
2. What is a file system?
  1. implement an abstraction (files) for secondary storage
  2. organize files locally (directories)
  3. permit sharing of data between processes, people and machines
  4. protect data from unwanted access (security)
3. Why are file systems useful (why do we have them)?

Files and directories

What is a file?

1. What characteristics do they have?
  1. data with some properties
  2. can have a type
    1. understood by file system: block, character, device, portal, link, etc
    2. understood by other parts of the OS or runtime libraries
  3. a file's type can be encoded in its name or contents
2. What are file access methods?
  1. sequential access
  2. direct access
  3. indexed sequential access
What is a directory?
1. What are they used for?
  1. for users, they provide a structured way to organize files
  2. for the file system, they provide a convenient naming interface that allows the implementation to separate logical file organization from physical file placement on the disk
2. How are the implemented?
  1. <name, location>
  2. name is just the name of the file or directory
  3. location depends upon how file is represented on disk
3. What is a directory entry?
  1. a struct of inode and filename: just enough information to translate from a filename to an inode and get to the actual file
How are directories used to do path name translation?

Protection

What is file protection used for?
1. a protection system dictates whether a given action performed by a given subject on a given object should be allowed
How is it implemented?
What are access control lists (ACLs)?
1. for each object, maintain a list of subjects and their permitted actions
What are capabilities?
1. for each subject, maintain a list of objects and their permitted actions
What are the advantages/disadvantages of each
1. capabilities: compact enough to fit in just a few bytes; not very expressive
2. access control list: a per-file list that tells who can access that file: highly expressive; harder to represent in a compact way
3. Which one is easier for deleting an object (file)? - ACL

File system layouts

What are file system layouts used for? - to store files
1. file systems define a block size (like 4 KB)
  1. disk space is allocated in granularity of blocks
2. a "master block" determines location of root directory
3. a free map determines which blocks are free, allocated
  1. usually a bitmap, one bit per block on the disk
  2. also stored on disk, cached in memory for performance
4. remaining disk blocks used to store files
What are the general strategies?
1. contiguous allocation
  1. like memory
  2. fast, simplifies directory access
  3. inflexible, causes fragmentation, needs compaction
2. linked structure
  1. each block points to the next, directory points to the first
  2. good for sequential access, bad for all others
3. indexed structure (indirection, hierarchy)
  1. an "index block" contains pointers to many other blocks
  2. handles random better, still good for sequential
  3. may need multiple index blocks
What are the tradeoffs for those strategies?
1. contiguous allocation
  1. pros:
    1. simple
    2. performance
  2. cons:
    1. fragmentation
    2. usability
    3. hard for a file to dynamically grow
  3. used in CDROMs, DVDs
2. linked list allocation
  1. pros:
    1. no space lost to external fragmentation
    2. disk only needs to maintain first block of each file
  2. cons:
    1. random access is costly
    2. overheads of pointers
MS-DOS file system
1. file allocation table (FAT)
2. take pointers away from blocks, store in this table
3. pros:
  1. entire block is available for data
  2. random access is faster than linked list
4. cons:
  1. many file seeks unless entire FAT is in memory
How do those strategies reflect file access methods?
What is an inode?
1. How are inodes different from directories?
  1. unix inodes implement an indexed structure for files
  2. also store metadata info (protection, timestamps, length, ref count, ...)
  3. each inode contains 15 block pointers
  4. first 12 are direct blocks, then single, double, and triple indirect
2. How are inodes and directories used to do path resolution, find files?
  1. read inode for ...
  2. read data block for ...
  3. read inode for ...
  4. read data block for ...

A hard link creates a new link to the same inode underneath. When you delete the original file, you just decrease the reference count to the inode by one. The new created hard link still exists.

A soft link is just a link to another path name. If remove the original file, the path does not exist anymore. Then it is a dangling ptr.

shared files

if B wants to share a file owned by C
1. one solution: copy disk addresses in B's directory entry
2. problem: modification by one not reflected in other user's view

File buffer cache

What is the file buffer cache, and why do operating systems use one?
1. applications exhibit significant locality for reading and writing files
2. cache file blocks in memory to capture locality
3. file buffer cache
4. cache is system wide, used and shared by all processes
5. reading from the cache makes a disk perform like memory
6. even a 4 MB cache can be very effective
What is the difference between caching reads and caching writes?
1. on a write, some applications assume that data makes it through the buffer cache and onto the disk
2. as a result, writes are often slow even with caching
3. compensates:
  1. write-behind
    1. maintain a queue of uncommitted blocks
    2. periodically flush the queue to disk
    3. unreliable
  2. battery backed-up RAM (NVRAM)
    1. as with write-behind, but maintain queue in NVRAM
    2. expensive
  3. log-structured file system
    1. always write next block after last block written
    2. complicated
4. read ahead
5. predicts that the process will request next block
6. FS goes ahead and requests it from the disk
7. this can happen while the process is computing on previous block
8. when the process requests block, it will be in cache
9. compliments the disk cache, which also is doing read ahead
10. big win for sequentially accessed files
What are the tradeoffs of using memory for a file buffer cache vs. VM?
1. file buffer competes with VM
2. like VM, it has limited size
3. need replacement algorithms again (LRU)

Advanced topics

1. Distributed/parallel systems
  1. benefits of distributed systems
    1. performance: parallelism across multiple nodes
      1. google file systems, big table, map reduce, hadoop, etc
    2. reliability and fault tolerance
      1. redundancy
      2. eg. google search engine
    3. scalability by adding more nodes
  2. benefits of RPC (remote procedure calls)
    1. most common model for communication in distributed applications
    2. RPC is essentially language support for distributed programming
    3. relies upon a stub compiler to automatically generate client/server stubs from the IDL server descriptions
      1. these stubs do the marshalling/unmarshalling, message sending/receiving/replying
    4. NFS uses RPC to implement remote file systems
2. Big data and cloud
  1. main goal/objectives of hadoop/Map Reduce
  2. main challenges

转载于:https://www.cnblogs.com/yxcindy/p/10223522.html

weixin_30776545

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
File System Review Note - Operating System

Cindy Chen的笔记地址：https://docs.google.com/document/d/14qvHObfoNGrAA6SB6C688ErOzDby8A-5wxqLD_97-p8/edit?usp=sharingDisk I/OUnderstand the memory hierarchy concept, localityPhysical disk...
复制链接

扫一扫