深入理解linux内核-ch01

1.4

  • Unix is a multiprocessing operating system with preemptable processes .

  • several system processes monitor the peripheral devices.

  • Unix-like operating systems adopt a process/kernel model . Each process has the illusion that it’s the only process on the machine, and it has exclusive access to the operating system services.

  • In contrast, microkernel operating systems demand a very small set of functions from the kernel, generally including a few synchronization primitives, a simple scheduler, and an interprocess communication mechanism.

  • A module is an object file whose code can be linked to (and unlinked from) the kernel at runtime. The object code usually consists of a set of functions that implements a filesystem, a device driver, or other features at the kernel’s upper layer.

  • advantages of using modules

    • modularized approach
    • platform independence
    • frugal main memory usage
    • no performance penalty

1.5 An Overview of the Unix Filesystem

/
dev home bin usr
fd0.. hda ls cp

  • While specifying filenames, the notations “.” and “..” are also used. They denote the current working directory and its parent directory, respectively. If the current working directory is the root directory, “.” and “..” coincide.

  • Unix associates a current working directory with each process

ln p1 p2

创建 hard link that has the pathname p2 for a file identified by the pathname p1

* It's not possible to create hard link for directories
* Links can be created only among files included in the same filesystem

soft links (also called symbolic links)

```sh
ln -s p1 p2
```

Symbolic links are short files that contain an arbitrary pathname of another file. The pathname may refer to any file or directory located in any filesystem; it may even refer to a nonexistent file.

1.5.3 File Types

Regular file
Directory
Symbolic link 
Block-oriented device file
Character-oriented device file
Pipe and named pipe (also called FIFO) 
Socket

1.5.4 File Descriptor and Inode

Unix makes a clear distinction between the contents of a file and the information about a file.

With the exception of device files and files of special filesystems, each file consists of a sequence of bytes. The file does not include any control information, such as its length or an end-of-file (EOF) delimiter.

All information needed by the filesystem to handle a file is included in a data structure called an inode. Each file has its own inode, which the filesystem uses to identify the file.

i-node
* File type
* Number of hard link associated with the file
* File length in bytes
* Device ID
* Inode number that identifies the file within the filesystem
* UID of the file owner
* User group ID of the file
* Several timestamps that specify the inode status change time, the last access time, and the last modify time
* Access rights and file mode

1.5.5 Access Rights and File Mode

3种类型的user

  1. owner
  2. users who belong to the same group as the file, not owner
  3. others

There are three types of access rights – read, write, and execute for each of these three classes. Thus, the set of access rights associated with a file consists of nine different binary flags. Three additional flags, called suid (Set User ID), sgid (Set Group ID), and sticky, define the file mode.

suid : 如果可执行文件设置了该位,那么进程在执行它的时候,使用它的UID。A process executing a file normally keeps the User ID (UID ) of the process owner. However, if the executable file has the suid flag set, the process gets the UID of the file owner.

sgid : A process executing a file keeps the user group ID of the process group. However, if the executable file has the sgid flag set, the process gets the user group ID of the file.

sticky : An executable file with the sticky flag set corresponds to a request to the kernel to keep the
program in memory after its execution terminates

1.5.6 File-Handling System Calls

当一个用户访问一个 regular file 或者 directory 的内容,他实际访问了一些存储在 hardware block device 的数据。
所以,文件系统就是物理硬盘的划分。
因为 User Mode 无法直接与 底层 打交道,因此 文件操作必须在Kernel Mode。

1.5.6.1 Opening a file

/*
path
flag: Specifies how the file must be opened (e.g., read, write, read/write, append). It also can specify whether a nonexisting file should be created.
mode: Specifies the access rights of a newly created file.
*/
fd = open(path, flag, mode)

This system call creates an “open file” object and returns an identifier called a file descriptor. An open file object contains:

  • Some file-handling data structures, such as a set of flags specifying how the file has been opened, an offset field that denotes the current position in the file from which the next operation will take place (the so-called file pointer), and so on.
  • Some pointers to kernel functions that the process can invoke. The set of permitted functions depends on the value of the flag parameter.

1.5.6.2 Accessing an opened file

Sequential access is implicitly assumed: the read( ) and write( ) system calls always refer to the position of the current file pointer. To modify the value, a program must explicitly invoke the lseek( ) system call. When a file is opened, the kernel sets the file pointer to the position of the first byte in the file (offset 0).

1.6 An Overview of Unix Kernels

1.6.1 The Process/Kernel Model

a CPU can run in either User Mode or Kernel Mode

  • When a program is executed in User Mode, it cannot directly access the kernel data structures or the kernel programs. When an application executes in Kernel Mode, however, these restrictions no longer apply.

  • Each CPU model provides special instructions to switch from User Mode to Kernel Mode and vice versa. A program usually executes in User Mode and switches to Kernel Mode only when requesting a service provided by the kernel.

The kernel itself is not a process but a process manager.

Besides user processes, Unix systems include a few privileged processes called kernel threads with the following characteristics:
* They run in Kernel Mode in the kernel address space
* They do not interact with users, and thus do not require terminal devices
* They are usually created during system startup and remain alive until the system is shut down.

1.6.2 Process Implementation

To let the kernel manage processes, each process is represented by a process descriptor that includes information about the current state of the process.

When the kernel stops the execution of a process, it saves the current contents of several processor registers in the process descriptor.
* PC SP
* general purpose registers
* floating point registers
* processor control registers (Processor Status Word) containing information about the CPU state
* memory management registers used to keep track of the RAM accessed by the process

When the kernel decides to resume executing a process, it uses the proper process descriptor fields to load the CPU registers.

1.6.3 Reentrant Kernels

All Unix kernels are reentrant. This means that several processes may be executing in Kernel Mode at the same time.

Of course, on uniprocessor systems, only one process can progress, but many can be blocked in Kernel Mode when waiting for the CPU or the completion of some I/O operation.

An interrupt notifies the kernel when the device has satisfied the read, so the former process can resume the execution.

One way to provide reentrancy is to write functions so that they modify only local variables and do not alter global data structures. Such functions are called reentrant functions .

Instead, the kernel can include nonreentrant functions and use locking mechanisms to ensure that only one process can execute a nonreentrant function at a time.

1.6.4 Process Address Space

Each process runs in its private address space. A process running in User Mode refers to private stack, data, and code areas. When running in Kernel Mode, the process addresses the kernel data and code areas and uses another private stack.

inux supports the mmap( ) system call, which allows part of a file or the information stored on a block device to be mapped into a part of a process address space.

1.6.5 Synchronization and Critical Regions

Implementing a reentrant kernel requires the use of synchronization .

When the outcome of a computation depends on how two or more processes are scheduled, the code is incorrect. We say that there is a race condition.

In general, safe access to a global variable is ensured by using atomic operations . In the previous example, data corruption is not possible if the two control paths read and decrease V with a single, noninterruptible operation. However, kernels contain many data structures that cannot be accessed with a single operation. For example, it usually isn’t possible to remove an element from a linked list with a single operation, because the kernel needs to access at least two pointers at once. Any section of code that should be finished by each process that begins it before another process can enter it is called a critical region.

1.6.5.1 Kernel preemption disabling

nonpreemptive: when a process executes in Kernel Mode, it cannot be arbitrarily suspended and substituted with another process

Of course, a process in Kernel Mode can voluntarily relinquish the CPU, but in this case, it must ensure that all data structures are left in a consistent state. Moreover, when it resumes its execution, it must recheck the value of any previously accessed data structures that could be changed.

A synchronization mechanism applicable to preemptive kernels consists of disabling kernel preemption before entering a critical region and reenabling it right after leaving the region.

1.6.5.2 Interrupt disabling

1.6.5.3 Semaphores

A semaphore is simply a counter associated with a data structure; it is checked by all kernel threads before they try to access the data structure. Each semaphore may be viewed as an object composed of:
- An integer variable
- A list of waiting processes
- Two atomic methods: down() and up()

1.6.5.4 Spin locks

In multiprocessor systems, semaphores are not always the best solution to the synchronization problems. Some kernel data structures should be protected from being concurrently accessed by kernel control paths that run on different CPUs.

In these cases, multiprocessor operating systems use spin locks . A spin lock is very similar to a semaphore, but it has no process list; when a process finds the lock closed by another process, it “spins” around repeatedly, executing a tight instruction loop until the lock becomes open.

1.6.5.5 Avoiding deadlocks

1.6.6 Singals and Interprocess Communication

Unix signals provide a mechanism for notifying processes of system events. Each event has its own signal number, which is usually referred to by a symbolic constant such as SIGTERM.
* Asynchronous notificaitons: CTRL-C
* Synchronous notifications: SIGSEGV

The POSIX standard defines about 20 different signals, 2 of which are user-definable and may be used as a primitive mechanism for communication and synchronization among processes in User Mode. In general, a process may react to a signal delivery in two possible ways:
1. Ignore
2. Asychronously execute a specified procedure(the signal handler)

The five possible default actions are:
1. Terminate the process
2. Write the execution context and the contents of the address space in a file (core dump) and terminate the process.
3. Ignore the signal.
4. Suspend the process.
5. Resume the process’s execution, if it was stopped.

semaphores , message queues , and shared memory

Message queues allow processes to exchange messages by using the msgsnd( ) and msgrcv( ) system calls, which insert a message into a specific message queue and extract a message from it

IPC mechanism based on message queues, which is usually known as POSIX message queues

1.6.7 Process Management

fork(), _exit() : respectively to create a new process and to terminate

exex() :

The process that invokes a fork( ) is the parent, while the new process is its child.

A naive implementation of the fork( ) would require both the parent’s data and the parent’s code to be duplicated and the copies assigned to the child. This would be quite time consuming. Current kernels that can rely on hardware paging units follow the Copy-On-Write approach, which defers page duplication until the last moment (i.e., until the parent or the child is required to write into a page). We shall describe how Linux implements this technique in the section “Copy On Write” in Chapter 9.

The _exit( ) system call terminates a process. The kernel handles this system call by releasing the resources owned by the process and sending the parent process a SIGCHLD signal, which is ignored by default.

1.6.7.1 Zombie processes

The wait4( ) system call allows a process to wait until one of its children terminates; it returns the process ID (PID) of the terminated child.

When executing this system call, the kernel checks whether a child has already terminated. A special zombie process state is introduced to represent terminated processes: a process remains in that state until its parent process executes a wait4( ) system call on it.

If no child process has already terminated when the wait4( ) system call is executed, the kernel usually puts the process in a wait state until a child terminates.

Many kernels also implement a waitpid( ) system call, which allows a process to wait for a specific child process

but suppose the parent process terminates without issuing that call? The information takes up valuable memory slots that could be used to serve living processes.

When a process terminates, the kernel changes the appropriate process descriptor pointers of all the existing children of the terminated process to make them become children of init. This process monitors the execution of all its children and routinely issues wait4( ) system calls, whose side effect is to get rid of all orphaned zombies.

1.6.7.2 Process groups and login sessions

Modern Unix operating systems introduce the notion of process groups to represent a “job” abstraction.

e.g. execute the command line

ls | sort | more

a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more

In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise). Each process descriptor includes a field containing the process group ID .

Modern Unix kernels also introduce login sessions.

1.6.8 Memory Management

1.6.8.1 Virtual memory

All recent Unix systems provide a useful abstraction called virtual memory . Virtual memory acts as a logical layer between the application memory requests and the hardware Memory Management Unit (MMU).

The main ingredient of a virtual memory subsystem is the notion of virtual address space.

When a process uses a virtual address, the kernel and the MMU cooperate to find the actual physical location of the requested memory item.

1.6.8.2 Random access memory usage

All Unix operating systems clearly distinguish between two portions of the random access memory (RAM). A few megabytes are dedicated to storing the kernel image (i.e., the kernel code and the kernel static data structures). The remaining portion of RAM is usually handled by the virtual memory system and is used in three possible ways:

  1. To satisfy kernel requests for buffers, descriptors, and other dynamic kernel data structures
  2. To satisfy process requests for generic memory areas and for memory mapping of files
  3. To get better performance from disks and other buffered devices by means of caches

1.6.8.3 Kernel Memory Allocator

Kernel Memory Allocator (KMA) is a subsystem that tries to satisfy the requests for memory areas from all parts of the system.

1.6.8.4 Process virtual address space handling

The address space of a process contains all the virtual memory addresses that the process is allowed to reference. The kernel usually stores a process virtual address space as a list of memory area descriptors

For example, when a process starts the execution of some program via an exec( )-like system call, the kernel assigns to the process a virtual address space that comprises memory areas for:
1. The executable code of the program
2. The initialized data of the program
3. The uninitialized data of the program
4. The initial program stack
5. The executable code and data of needed shared libraries
6. The heap

In a similar fashion, when the process dynamically requires memory by using malloc( ), or the brk( ) system call (which is invoked internally by malloc( )), the kernel just updates the size of the heap memory region of the process. A page frame is assigned to the process only when it generates an exception by trying to refer its virtual memory addresses.

1.6.8.5 Caching

The sync( ) system call forces disk synchronization by writing all of the “dirty” buffers (i.e., all the buffers whose contents differ from that of the corresponding disk blocks) into disk

1.6.9 Device Drivers

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值