Unix Filesystem Organization

最新推荐文章于 2021-12-01 20:36:19 发布

weixin_33826268

最新推荐文章于 2021-12-01 20:36:19 发布

阅读量201

点赞数

原文链接：http://blog.51cto.com/11259454/1764181

版权

File allocation methods(linked,indexed)

http://www2.cs.uregina.ca/~hamilton/courses/330/notes/allocate/allocate.html

File Allocation Table (FAT) is a computer file system architecture and a family of industry-standard file systems utilizing it. The FAT file system is a legacy file system which is simple and robust

Inode is the data structure that describes the attributes of a file, including the layout of its data on disk.

Two version of the inode

Disk copy : store the inode information when file is not in use

In-core copy : record the information about active files.

ialloc/ifree : assignment of a disk inode

iget/iput : allocation of in-core inodes

bmap : locate disk blocks of a file, according to byte offset

Directories : files that correlate file name components to inode numbers

namei : convert file names to inodes

alloc/free : assignment of new disk blocks to a file

Definition of Inodes

Every file has a unique inode

Contain the information necessary for a process to access a file

Exist in a static form on disk

Kernel reads them into an in-core inode to manipulate them.

Contents of Disk Inodes

File owner identifier (individual/group owner)

File type (regular, directory,..)

File access permission (owner,group,other)

File access time

Number of links to the file (chap5)

Table of contents for the disk address of data in a file (byte stream vs discontiguous disk blocks)

File size

* Inode does not specify the path name that access the file

Distinction Between Writing Inode and File

File change only when writing it.

Inode change when changing the file, or when changing its owner, permisson,or link settings.

Changing a file implies a change to the inode,

But, changing the inode does not imply that the file change.

Contents of The In-core copy of The Inode

Fields of the disk inode

Status of the in-core inode, indicating whether

Inode is locked

Process is waiting for the inode to become unlocked

Differ from the disk copy as a result of a change to the data in the inode

Differ from the disk copy as a result of a change to the file data

File is a mount point

Logical device number of the file system

Inode number (linear array on disk, disk inode not need this field)

Pointers to other in-core inodes

Reference count

Directories

A directory is a file

Its data is a sequence of entries, each consisting of an inode number and the name of a file contained in the directory

Path name is a null terminated character string divided by “/”

Each component except the last must be the name of a directory, last component may be a non-directory file

Inode Assignment to a New File

File system contains a linear list of inodes

Inode is free : its type field is zero (0)

Super block contains an array to cache the numbers of free inodes in the file system (to improve performance)

Algorithm for Assigning New Inodes

Algorithm ialloc /* allocate inode */

Input : file system

Output : locked inode

{

while(not done){

if(super block locked) {

sleep(event super block becomes free); continue;

}

if(inode list in super block is empty){

lock super block;

get remembered inode for free inode search;

search disk for free inodes until super block full,

or no more free inodes (bread and brelese);

unlock super block;

wake up (event super block becomes free);

if(no free inodes found on disk) return (no inode);

set remembered inode for next free inode search;

}

/* there are inodes in super block inode list */

get inode number from super block inode list;

get inode (algorithm iget);

if(inode not free after all) {

write inode to disk;

release inode (algorithm iput);

continue; /* while loop */

}

/* inode is free */

initialize inode;

write inode to disk;

decrement file system free inode count;

return (inode);

} // end of while

}

。。。。。

Algorithm alloc /* file system block allocation */

Input : file system number

Output : buffer for new block

{

while(super block locked) sleep (event super block not locked);

remove block from super block free list;

if(removed last block from free list){

lock super block;

read block just taken from free list (algorithm bread);

copy block numbers in block into super block;

release block buffer (algorithm brelse);

unlock super block;

wake up processes (event super block not locked);

}

…

get buffer form block removed from super block list (algorithm getblk);

zero buffer contents;

decrement total count of free blocks;

mark super block modified;

return buffer;

}

Requesting and Freeing Disk Blocks

Other File Types

Pipe

fifo(first-in-first-out)

Its data is transient: once data is read from a pipe, it cannot be read again

Use only direct block (not the indirect block)

Special file

block device, character device

The inode contains the major and minor device number

Major number indicates a device type such as terminal or disk

Minor number indicates the unit number of the device

``Old'' (Original) file system

In the original Unix file system, Unix divided physical disks into logical disks called partitions. Each partition is a standalone file system. We will use the term ``file system'' when referring to a single partition.

Each disk device is given its own major device number, and each partition has an associated minor device number which the device driver uses to access the raw file system.

The major/minor device number combination serves as a handle into the device switch table. That is, the major number acts as an index, and the minor number is passed as an argument to the driver routines so that they can recognize the specific instance of a device.

Each filesystem contains:

1.
a boot block located in the first few sectors of a file system. The boot block contains the initial bootstrap program used to load the operating system.
Typically, the first sector contains a bootstrap program that reads in a larger bootstrap program from the next few sectors, and so forth.
2.
a super block describes the state of the file system: the total size of the partition, the block size, pointers to a list of free blocks, the inode number of the root directory, magic number, etc.
3.
a linear array of inodes (short for ``index nodes''). There is a one to one mapping of files to inodes and vice versa. An inode is identified by its ``inode number'', which contains the information needed to find the inode itself on the disk
Thus, while users think of files in terms of file names, Unix thinks of files in terms of inodes.
4.
data blocks blocks containing the actual contents of files

---------------------------------------------------------------
|          |           | | | | | | | |        |  |  |  |      |
| B. B.    | S. B.     | Inodes  | | | ...    |  Data Blocks  |
|          |           | | | | | | | |        |  |  |  |      |
---------------------------------------------------------------

An inode is the ``handle'' to a file and contains the following information:

file ownership indicationfile type (e.g., regular, directory, special device, pipes, etc.)file access permissions. May have setuid (sticky) bit set.time of last access, and modificationnumber of links (aliases) to the filepointers to the data blocks for the filesize of the file in bytes (for regular files), major and minor device numbers for special devices.

An integral number of inodes fits in a single data block.

Information the inode does not contain:

path (short or full) name of file

Example

Look at /cs/bin/

< wpi /cs/bin 1 >ls -l
total 516
drwx------   4 mvoorhis csadmin     4096 Jan 10 09:53 archives/
-rws--x---   1 root     771        32768 Jan 18  1999 csquotamgr*
-rwx------   1 csadmin  csadmin      162 Jan 12  1998 genQuota*
-rwx------   1 csadmin  csadmin       46 Feb 16  1998 generic*
-rwx------   1 mvoorhis csadmin      672 Jan 20 16:16 list1*
-rwx------   1 mvoorhis csadmin      859 Jan 20 16:17 list2*
-rwx------   1 csadmin  646          140 Jan 10 14:43 reclaim*
-rwxrwx---   1 csadmin  csadmin     1635 Sep 26  1995 stp_create_system.pl*
-rwxrwxr-x   1 csadmin  csadmin      725 Sep 26  1995 stp_default_system.pl*
-rw-rw-r--   1 csadmin  csadmin      114 Feb 10  1995 stp_setup
drwx------  32 mvoorhis 1067        4096 Mar 17 13:25 tDir/
-rw-------   1 mvoorhis csadmin   314057 Jan 20 12:23 this
-rwsr-xr-x   1 mvoorhis 1067      131072 Jan 20 13:22 turnin*
drwxr-xr-x   2 root     771         4096 May 26  1999 utility/

Internally, Unix stores directories in files. The file type (of the inode) is marked ``directory'', and the file contains pairs of name/inode numbers.

For example, when a user issues open(``/etc/passwd'', ...) the kernel performs the following operations:

1.
because the file name is a full path name, find the inode of the root directory (found in superblock) and search the corresponding file for the entry ``etc''
2.、
when the entry ``etc'' is found, fetch its corresponding inode and check that it is of type directory
3.
scan the file associated with ``/etc'' looking for ``passwd''
4.
finally, fetch the inode associated with passwd's directory entry, verify that it is a regular file, and start accessing the file.

Note: What would the system do when opening ``/dev/tty01''?

Eventually, the system would find the inode corresponding to the device, and note that its file type was ``special''. Thus, it would extract the major/minor device number pair from the length field of the inode, and use the device number as an index into the device switch table.

Getwd()

How to get string of current directory? Have only the inode of the current directory.

get current inode
while (inode != root inode) {
    get inode of parent from ..
    search parent's directory file to match our inode number

Where should a file's data blocks be physically located?

to improve performance, we might want to place a file's data blocks in contiguous sectors on disk. However, this leads to inefficiencies in allocating space, or forces the user to specify the size of the file at creation time.

The Unix file system allocates data blocks (blocks that contain a file's contents) one at a time from a pool of free blocks. Unix uses 4K blocks. Moreover, a file's blocks are scattered randomly within the physical disk.

Inodes include pointers to the data blocks. Each inode contains 15 pointers:

the first 12 pointers point directly to data blocksthe 13th pointer points to an indirect block, a block containing pointers to data blocksthe 14th pointer points to a doubly-indirect block, a block containing 128 addresses of singly indirect blocksthe 15th pointer points to a triply indirect block (which contains pointers to doubly indirect blocks, etc.)

-------------------------------
| | | | | | | | | | | | | | | |
-------------------------------
 | | | | | | | | | | | | | | |--------------------------
      data blocks        | |-----------|               |
                         |             |               |
                       -----         -----           -----
                       |   |         |   |           |   |
                       -----         -----           -----
                        |||           |||             |||
                        data         -----           -----
                                     |   |           |   |
                                     -----           -----
                                      |||             |||
                                      data           -----
                                                     |   |
                                                     -----
                                                      |||
                                                      data
with 4K blocks:
direct 12x4K = 48K
indirect 1024x4K = 4MB
double indirect 1024x1024x4K = 4GB
triple indirect 1024x1024x1024x4K = 4TB

Advantages:

data in small files can be accessed directly from the inode That is, one read operation fetches the inode, and another read fetches the first data block.larger files can be accessed efficiently, because an indirect block points to many data blocksdisk can be filled completely, with little wasted space (ignoring partially-filled blocks)

Disadvantages:

because inode information is kept separately from data, access of data often requires a long seek when file is initially accessedinodes of files in a common directory not kept together, leading to low performance when searching directoriesoriginal file system only used 512-byte blocks, an inefficient transfer sizedata blocks of a file are not stored together, leading to poor performance when accessing files sequentially.free list quickly becomes scrambled increasing overhead of finding free blocks (seek for each new block)original file system used as little as 2% of the available disk bandwidth

The Berkeley Fast File System

The Berkeley Fast File System used the following principles to improve the performance (and reliability) of the file system:

duplicate the super block, so that it can easily be recovered after a disk crashuse a large block size to improve throughputadd the block size to the superblock, so that different file systems could be accessed using different block sizesstore related data blocks within cylinder groups, one or more consecutive cylinders on the disk. Blocks within in a cylinder group can be accessed with only a small seek (if any)because large blocks leads to fragmentation, small files (or the remaining bytes of a file) should be stored in fragments, where an integral number of fragments (e.g., 4 or 8) fits in a single block. Use 4K/512 or 8K/1K block/fragment size combinations. Inode stores 8-bit mask for fragment use in last data block of the file.

A data structure within each cylinder group contains status information about the blocks stored within that group:

1.
a bit map of blocks and fragments indicates which blocks and fragments are free
2.
a list of inodes within the cylinder group (why?)
3.
duplicate copy of the superblock (stored on a different platter for each cylinder group. why?)

When allocating space, the fast file system uses a global policy to determine where to place new directories and files. For example:

place inodes of files in the same directory in the same cylinder group (makes programs like ls faster)place new directories in a cylinder group that has a higher than average number of free inodes and the smallest number of directories already in ittry to place all data blocks for a file in the same cylinder groupmove to a new cylinder group when a file exceeds 48kb, and every megabyte thereafter.

The fast file system also uses a local policy to allocate blocks at the lower levels. For instance:

when adding a data block to an existing file, pick the next block to be rotationally closesttry to allocate blocks out of the same cylinder, before looking at other blocks in the same cylinder group

The new file system increased the throughput to as much as 30% of the raw bandwidth.

Linux ext2fs

The 2nd extended file system. Same standard file system as Unix.

Similar to the Berkeley fast file system, but does not use fragments. Rather it uses smaller block sizes (1K, but can be 2K or 4K).

Tries to cluster disk blocks so that a single I/O request can read multiple blocks.

Modern disk technologies pack sectors onto disks at different densities--Linux uses variable size block groups (like cylinder groups in BSD FFS).

Allocation:

Tries to allocate data blocks in same group as inode.Tries to allocate nondirectory inodes in same group as parent directoryTries to allocate directory inodes in different group than parent directory

Also has a proc file system to allow access to process information through the file system interface.

Also supports other file systems such as FAT and NTFS.

File Mounting

When the system initially boots, the only file system Unix knows about is the root partition from which the system was booted. A special system call:

mount(special, path_name, options)

mounts the filesystem given by special at the point path_name in the root filesystem, thus allowing multiple file systems to be merged into a single global tree.

Internally, the kernel maintains a mount table that keeps information about the mounted file systems. Each entry in the table contains:

the device number of partition that has been mounteda pointer to the buffer containing the super block for the file systema pointer to the root inode of the file systema pointer to the inode of the directory in which the file system is mounted (e.g., a pointer to the parent directory)

As the kernel is translating a path name, it consults the mount table as needed.

In Memory Data Structures

The kernel maintains a system-wide file table that describes open files. Each entry in the file table contains:

the read/write mark that indicates from where the next data block is to be reada pointer to an entry in the active inode table, so that the access times can be modified efficiently

Finally, each process maintains a user file table that describes the files opened by the process. Entries in the user file table point to the system-wide file table.

Thus, a process can have its own private read/write mark (the default when a file is initially opened), or it can share a read/write mark (as is the case when a new process is created via fork).

Caching

Unix relies heavily on caching to improve performance. It keeps in memory:

recently accessed disk blocksthe inode of each open file in the systema cache of recent name-to-inode mappingsa directory offset cache. If a process requests a file name in the same directory as its previous request, the search through the directory is started where the previous search ended. Improves the performance of applications that read all files in a directory (like ls).

转载于:https://blog.51cto.com/11259454/1764181

weixin_33826268

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Unix Filesystem Organization

File allocation methods(linked,indexed)http://www2.cs.uregina.ca/~hamilton/courses/330/notes/allocate/allocate.htmlFile Allocation Table(FAT) is a computerfile systemarchitecture and a f...
复制链接

扫一扫