Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1
1.1 Introduction
1.2 UNIX Architecture
- An operating system can be defined as the software that controls the hardware resources of the computer and provides an environment under which programs can run. Generally, we call this software the kernel. Figure 1.1 shows a diagram of the UNIX System architecture.
- The interface to the kernel is a layer of software called the system calls (the shaded portion in Figure 1.1). Libraries of common functions are built on top of the system call interface, but applications are free to use both. The shell is a special application that provides an interface for running other applications.
1.3 Logging In
Login Name
- When we log in to a UNIX system, we enter our login name, followed by our password. The system then looks up our login name in its password file, usually the file /etc/passwd. Our entry in the password file is composed of seven colon-separated fields: the login name, encrypted password, numeric user ID (205), numeric group ID (105), a comment field, home directory (/home/sar), and shell program (/bin/ksh). sar:x:205:105:Stephen:/home/sar:/bin/ksh
- All contemporary systems have moved the encrypted password to a different file.
Shells
- Once we log in, some system information messages are typically displayed, and then we can type commands to the shell program. A shell is a command-line interpreter that reads user input and executes commands. The user input to a shell is normally from the terminal (an interactive shell) or sometimes from a file (called a shell script). The common shells in use are summarized in Figure 1.2.
- The system knows which shell to execute for us based on the final field in our entry in the password file.
- The Bourne shell is provided with almost every UNIX system in existence.
- The C shell is provided with all the BSD releases. Its control flow looks more like the C language, and it supports additional features that weren’t provided by the Bourne shell: job control, a history mechanism, and command-line editing.
- The Korn shell is not as widespread as the other two shells. It is upward compatible with the Bourne shell and includes those features that made the C shell popular: job control, command-line editing, and so on.
- The Bourne-again shell is provided with all Linux systems. It was designed to be POSIX conformant, while remaining compatible with the Bourne shell. It supports features from both the C shell and the Korn shell.
- The TENEX C shell adds many features to the C shell and is often used as a replacement for the C shell.
- The shell was standardized in the POSIX 1003.2 standard. The specification was based on features from the Korn shell and Bourne shell.
- The default shell used by different Linux distributions varies. Some distributions use the Bourne-again shell. Others use the BSD replacement for the Bourne shell, called dash.
1.4 Files and Directories
File System
- The UNIX file system is a hierarchical arrangement of directories and files. Everything starts in the directory called root, whose name is the single character /.
- A directory is a file that contains directory entries. Logically, each directory entry contains a filename along with a structure of information describing the attributes of the file. The attributes of a file are such things as the type of file (regular file, directory), the size of the file, the owner of the file, permissions for the file (whether other users may access this file), and when the file was last modified. The stat and fstat functions return a structure of information containing all the attributes of a file.
- We make a distinction between the logical view of a directory entry and the way it is actually stored on disk. Most implementations of UNIX file systems don’t store attributes in the directory entries themselves, because of the difficulty of keeping them in synch when a file has multiple hard links(Chapter 4).
Filename
- The names in a directory are called filenames. The only two characters that cannot appear in a filename are the slash character (/) and the null character. The slash separates the filenames that form a pathname and the null character terminates a pathname.
- It’s good practice to restrict the characters in a filename to a subset of the normal printing characters. If we use some of the shell’s special characters in the filename, we have to use the shell’s quoting mechanism to reference the filename, and this can get complicated. For portability, POSIX.1 recommends restricting filenames to consist of the following characters: letters (a-z, A-Z), numbers (0-9), period (.), dash (-), and underscore ( _ ).
- Two filenames are automatically created whenever a new directory is created: . (called dot) and .. (called dot-dot). Dot refers to the current directory, and dot-dot refers to the parent directory. In the root directory, dot-dot is the same as dot.
- Almost all commercial UNIX file systems support at least 255-character filenames today.
Pathname
- A sequence of one or more filenames, separated by slashes and optionally starting with a slash, forms a pathname.
- A pathname that begins with a slash is called an absolute pathname; otherwise, it’s called a relative pathname. Relative pathnames refer to files relative to the current directory. The name for the root of the file system (/) is a special-case absolute pathname that has no filename component.
Example
- Figure 1.3 shows an implementation of the ls(1) command(listing the names of all the files in a directory).
- ls(1) is referencing a particular entry in the UNIX system manuals. It refers to the entry for ls in Section 1. The sections are numbered 1 through 8, and all the entries within each section are arranged alphabetically.
Number | Content |
---|---|
1 | 用户在shell环境中可以操作的指令或可执行文件 |
2 | 系统核心可呼叫的函数与工具等 |
3 | 一些常用的函数(function)与函式库(library),大部分为C的函式库(libc) |
4 | 装置文件的说明,通常在/dev下的文件 |
5 | 配置文件或者是某些文件的格式 |
6 | 游戏(games) |
7 | 惯例与协议等,例如Linux文件系统、网络协议、ASCII code等等的说明 |
8 | 系统管理员可用的管理指令 |
9 | 跟kernel有关的文件 |
#include <stdio.h> // printf()
#include <stdlib.h> // exit()
#include <sys/types.h>
#include <dirent.h> // struct DIR {}, dirent{}, opendir(), readdir(), closedir()
int main(int argc, char **argv)
{
DIR *dp; // A pointer to the directory stream
struct dirent *dirp; // Next directory entry in the directory stream pointed to by dirp.
if(argc != 2)
{
printf("usage: ls directory_name\n");
exit(-1);
}
if((dp = opendir(argv[1])) == NULL) // <sys/types.h> <dirent.h>
{
printf("can't open %s\n", argv[1]);
exit(-1);
}
while((dirp = readdir(dp)) != NULL) // #include <dirent.h>
{
printf("%s\n", dirp->d_name);
}
closedir(dp); // <sys/types.h> <dirent.h>
exit(0);
}
- The opendir function returns a pointer to a DIR structure, and we pass this pointer to the readdir function. We then call readdir in a loop to read each directory entry. The readdir function returns a pointer to a dirent structure or, when it’s finished with the directory, a null pointer. All we examine in the dirent structure is the name of each directory entry (d_name).
- The function exit terminates a program. By convention, an argument of 0 means OK, and an argument between 1 and 255 means that an error occurred.
Working Directory
- Every process has a working directory, sometimes called the current working directory. This is the directory from which all relative pathnames are interpreted. A process can change its working directory with the chdir function.
Home Directory
- When we log in, the working directory is set to our home directory. Our home directory is obtained from our entry in the password file (Section 1.3).
1.5 Input and Output
File Descriptors
- File descriptors are small non-negative integers that the kernel uses to identify the files accessed by a process. Whenever it opens an existing file or creates a new file, the kernel returns a file descriptor that we use when we want to read or write the file.
Standard Input, Standard Output, and Standard Error
- By convention, all shells open three descriptors whenever a new program is run: standard input, standard output, and standard error. If nothing special is done, as in the command
ls
then all three are connected to the terminal. - Most shells provide a way to redirect any or all of these three descriptors to any file. For example,
ls > file.list
executes the ls command with its standard output redirected to the file named file.list.
Unbuffered I/O
- Unbuffered I/O is provided by the functions open, read, write, lseek, and close. These functions all work with file descriptors.
#include <unistd.h> // read(), write()
#include <stdlib.h>
#define BUFFSIZE 4096
int main()
{
int n;
char buf[BUFFSIZE];
while((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
{
if(write(STDOUT_FILENO, buf, n) != n)
{
printf("write error\n");
exit(-1);
}
}
if(n < 0)
{
printf("read error\n");
exit(-1);
}
exit(0);
}
- The constants STDIN_FILENO and STDOUT_FILENO are defined in unistd.h and specify the file descriptors for standard input(0) and standard output(1).
- The read function returns the number of bytes that are read, and this value is used as the number of bytes to write. When the end of the input file is encountered, read returns 0 and the program stops. If a read error occurs, read returns −1.
- If we compile the program into the standard name (a.out) and execute it as
./a.out > data
standard input is the terminal, standard output is redirected to the file data, and standard error is also the terminal. If this output file doesn’t exist, the shell creates it by default. The program copies lines that we type to the standard output until we type the end-of-file character (usually Control-D). - If we run
./a.out < infile > outfile
then the file named infile will be copied to the file named outfile.
Standard I/O
- The standard I/O functions provide a buffered interface to the unbuffered I/O functions. Using standard I/O relieves us from having to choose optimal buffer sizes, such as the BUFFSIZE constant in Figure 1.4.
#include <stdio.h>
int main()
{
int c;
while((c = getc(stdin)) != EOF)
{
if(putc(c, stdout) == EOF)
{
printf("output error\n");
exit(-1);
}
}
if(ferror(stdin))
{
printf("input error\n");
exit(-1);
}
exit(0);
}
- The function getc reads one character at a time, and this character is written by putc. After the last byte of input has been read, getc returns the constant EOF (defined in
1.6 Programs and Processes
Program
- A program is an executable file residing on disk in a directory. A program is read into memory and is executed by the kernel as a result of one of the seven exec functions.
Processes and Process ID
- An executing instance of a program is called a process.
- The UNIX System guarantees that every process has a unique numeric identifier called the process ID which is a non-negative integer.
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
printf("hello world from process ID %ld\n", (long)getpid());
exit(0);
}
- getpid returns a pid_t data type. We don’t know its size, but the standards guarantee that it will fit in a long integer. Although most process IDs will fit in an int, using a long promotes portability.
Process Control
- There are three primary functions for process control: fork, exec, and waitpid. The exec function has seven variants, but we often refer to them collectively as the exec function.
#include <stdio.h>
#include <stdlib.h> // exit()
#include <string.h> // strlen()
#include <unistd.h> // fork(), execlp()
#include <sys/types.h>
#include <sys/wait.h> // waitpid()
#define MAXLINE 4096
int main()
{
char buf[MAXLINE];
pid_t pid;
int status;
printf("Input your command: ");
while(fgets(buf, MAXLINE, stdin) != NULL)
{
if(buf[strlen(buf) - 1] == '\n') // <string.h>
{
buf[strlen(buf) - 1] = 0; // Replace newline with null
}
if((pid = fork()) < 0) // <unistd.h>
{
printf("fork error\n");
exit(-1);
}
else if(pid == 0) // Child
{
execlp(buf, buf, (char *)0); // <unistd.h>
exit(127);
}
// Parent:
if((pid = waitpid(pid, &status, 0)) < 0) // <sys/types.h> <sys/wait.h>
{
printf("waitpid error\n");
exit(-1);
}
printf("Input your command: ");
}
exit(0);
}
- We use the standard I/O function fgets to read one line at a time from the standard input. When we type the end-of-file character (Control-D) as the first character of a line, fgets returns a null pointer, the loop stops, and the process terminates.
- Because each line returned by fgets is terminated with a newline character, followed by a null byte, we use function strlen to calculate the length of the string, and then replace the newline with a null byte. We do this because the execlp function wants a null-terminated argument, not a newline-terminated argument.
- We call fork to create a new process, which is a copy of the caller. We say that the caller is the parent and that the newly created process is the child. Then fork returns the non-negative process ID of the new child process to the parent, and returns 0 to the child.
- In the child, we call execlp to execute the command that was read from the standard input. This replaces the child process with the new program file. The combination of fork followed by exec is called spawning a new process.
- Because the child calls execlp to execute the new program file, the parent wants to wait for the child to terminate by calling waitpid, specifying which process to wait for: the pid argument, which is the process ID of the child. The waitpid returns the termination status of the child(the status variable). We could examine it to determine how the child terminated.
- The limitation of this program is that we can’t pass arguments to the command we execute. To allow arguments would require that we parse the input line, separating the arguments by some convention, probably spaces or tabs, and then pass each argument as a separate parameter to the execlp function.
Threads and Thread IDs
- All threads within a process share the same address space, file descriptors, stacks, and process-related attributes. Each thread executes on its own stack, and any thread can access the stacks of other threads in the same process. Because they can access the same memory, the threads need to synchronize access to shared data among themselves to avoid inconsistencies.
- Threads are identified by thread IDs which are local to a process. A thread ID from one process has no meaning in another process.
1.7 Error Handling
- When an error occurs in one of the UNIX System functions, a negative value is often returned, and the integer errno is usually set to a value that tells why. For example, the open function returns either a non-negative file descriptor if all is OK or −1 if an error occurs. Some functions use a convention other than returning a negative value. For example, most functions that return a pointer to an object return a null pointer to indicate an error.
- The file
#include <string.h>
char *strerror(int errnum);
Returns: pointer to message string
- This function maps errnum(the errno value) into an error message string and returns a pointer to the string.
#include <stdio.h>
void perror(const char *msg);
- The perror function produces an error message on the standard error, based on the current value of errno, and returns. It outputs the string pointed to by msg, followed by a colon and a space, followed by the error message corresponding to the value of errno, followed by a newline.
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
fprintf(stderr, "EACCES: %s\n", strerror(EACCES)); // EACCES: Permission denied
errno = ENOENT; // ENOENT: No such file or directory
perror(argv[0]);
exit(0);
exit(0);
}
Error Recovery
- The errors defined in errno.h can be divided into two categories: fatal and nonfatal.
- A fatal error has no recovery action. The best to do is print an error message on the user’s screen or to a log file, and then exit.
- Nonfatal errors can be dealt with robustly. Most nonfatal errors are temporary, such as a resource shortage, and might not occur when there is less activity on the system.
- Resource-related nonfatal errors include EAGAIN, ENFILE, ENOBUFS, ENOLCK, ENOSPC, EWOULDBLOCK, and ENOMEM. EBUSY can be treated as nonfatal when it indicates that a shared resource is in use. Sometimes, EINTR can be treated as a nonfatal error when it interrupts a slow system call.
- The typical recovery action for a resource-related nonfatal error is to delay and retry later. Ultimately it is up to the application developer to determine the cases where an application can recover from an error.
1.8 User Identification
User ID
- The user ID from our entry in the password file is a numeric value that identifies us to the system. This user ID is assigned by the system administrator when our login name is assigned, and we cannot change it. The user ID is normally assigned to be unique for every user.
- We call the user whose user ID is 0 either root or the superuser. The entry in the password file normally has a login name of root, and we refer to the special privileges of this user as superuser privileges. The superuser has free rein over the system.
Group ID
- Our entry in the password file also specifies our numeric group ID which is also assigned by the system administrator when our login name is assigned. Typically, the password file contains multiple entries that specify the same group ID. Groups are normally used to collect users together into projects or departments which allows the sharing of resources among members of the same group. We can set the permissions on a file so that all members of a group can access the file, whereas others outside the group cannot.
- There is a group file(usually /etc/group) that maps group names into numeric group IDs.
#include <unistd.h>
#include <sys/types.h> // getuid(), getgid()
#include <stdio.h>
int main()
{
printf("uid = %d, gid = %d\n", getuid(), getgid());
exit(0);
}
Supplementary Group IDs
- In addition to the group ID specified in the password file for a login name, most versions of the UNIX System allow a user to belong to other groups.
- These supplementary group IDs are obtained at login time by reading the file /etc/group and finding the first 16 entries that list the user as a member. POSIX requires that a system support at least 8 supplementary groups per process, but most systems support at least 16.
1.9 Signals
- Signals are a technique used to notify a process that some condition has occurred. The process has three choices for dealing with the signal.
- Ignore the signal. This option isn’t recommended for signals that denote a hardware exception, such as dividing by zero or referencing memory outside the address space of the process, as the results are undefined.
- Let the default action occur. For a divide-by-zero condition, the default is to terminate the process.
- Provide a function that is called when the signal occurs (catching the signal).
- Many conditions generate signals. Two terminal keys, called the interrupt key(the DELETE key or Control-C) and the quit key(Control-backslash) are used to interrupt the currently running process.
- Another way to generate a signal is by calling the kill function. We can call this function from a process to send a signal to another process. But we have to be the owner of the other process (or the superuser) to be able to send it a signal.
#include <stdio.h>
#include <stdlib.h> // exit()
#include <string.h> // strlen()
#include <unistd.h> // fork(), execlp()
#include <sys/types.h>
#include <sys/wait.h> // waitpid()
#define MAXLINE 4096
static void sig_int(int signo)
{
printf("interrupt\n%% ");
}
int main()
{
char buf[MAXLINE];
pid_t pid;
int status;
if(signal(SIGINT, sig_int) == SIG_ERR)
{
printf("signal error\n");
exit(1);
}
printf("Input your command: ");
while(fgets(buf, MAXLINE, stdin) != NULL)
{
if(buf[strlen(buf) - 1] == '\n') // <string.h>
{
buf[strlen(buf) - 1] = 0; // Replace newline with null
}
if((pid = fork()) < 0) // <unistd.h>
{
printf("fork error\n");
exit(-1);
}
else if(pid == 0) // Child
{
execlp(buf, buf, (char *)0); // <unistd.h>
exit(127);
}
// Parent:
if((pid = waitpid(pid, &status, 0)) < 0) // <sys/types.h> <sys/wait.h>
{
printf("waitpid error\n");
exit(-1);
}
printf("Input your command: ");
}
exit(0);
}
- To catch signal SIGINT, the program needs to call the signal function, specifying the name of the function to call when the SIGINT signal is generated. The function is named sig_int; when it’s called, it prints a message and a new prompt. Adding 11 lines to the program in Figure 1.7 gives us the version in Figure 1.10.
1.10 Time Values
- UNIX systems have maintained two different time values:
- Calendar time. This value counts the number of seconds since the Epoch: 00:00:00 January 1, 1970, Coordinated Universal Time (UTC). For example, these time values are used to record the time when a file was last modified. The primitive system data type time_t holds these time values.
- Process time. This is also called CPU time and measures the central processor resources used by a process. Process time is measured in clock ticks, which have been 50, 60, or 100 ticks per second.
- The UNIX System maintains three values for a process:
- Clock time
- User CPU time
- System CPU time
- The clock time, sometimes called wall clock time, is the amount of time the process takes to run, and its value depends on the number of other processes being run on the system. Whenever we report the clock time, the measurements are made with no other activities on the system.
- The user CPU time is the CPU time attributed to user instructions. The system CPU time is the CPU time attributed to the kernel when it executes on behalf of the process. For example, whenever a process executes a system service, such as read, the time spent within the kernel performing that system service is charged to the process. The sum of user CPU time and system CPU time is often called the CPU time.
- It is easy to measure the clock time, user time, and system time of any process: execute the time(1) command, with the argument to the time command being the command we want to measure. For example:
xiang :~ $ time date
2016年 07月 25日 星期一 09:43:59 CST
real 0m0.001s
user 0m0.000s
sys 0m0.001s - The output format from the time command depends on the shell being used, because some shells don’t run /usr/bin/time, but instead have a separate built-in function to measure the time it takes commands to run.
1.11 System Calls and Library Functions
- All operating systems provide service points through which programs request services from the kernel. Linux provide a well-defined, limited number of entry points directly into the kernel called system calls.
- The system call interface has always been documented in Section 2 of the UNIX Programmer’s Manual. Its definition is in the C language, no matter which implementation technique is actually used on any given system to invoke a system call.
- The technique used on UNIX systems is for each system call to have a function of the same name in the standard C library. The user process calls this function, using the standard C calling sequence. This function then invokes the appropriate kernel service, using whatever technique is required on the system. For example, the function may put one or more of the C arguments into general registers and then execute some machine instruction that generates a software interrupt in the kernel.
- Section 3 of the UNIX Programmer’s Manual defines the general-purpose library functions available to programmers. These functions aren’t entry points into the kernel, although they may invoke one or more of the kernel’s system calls. For example, the printf function may use the write system call to output a string, but the atoi (convert ASCII to integer) functions don’t involve the kernel at all.
- From an implementor’s point of view, the distinction between a system call and a library function is fundamental. From a user’s perspective, the difference is not as critical. We can replace the library functions whereas the system calls usually cannot be replaced.
- Consider the memory allocation function malloc as an example. There are many ways to do memory allocation and its associated garbage collection (best fit, first fit, and so on). The UNIX system call that handles memory allocation, sbrk(2), is not a general-purpose memory manager. It increases or decreases the address space of the process by a specified number of bytes. How that space is managed is up to the process. The memory allocation function, malloc(3), implements one particular type of allocation. If we don’t like its operation, we can define our own malloc function, which will probably use the sbrk system call.
- In fact, numerous software packages implement their own memory allocation algorithms with the sbrk system call. Figure 1.11 shows the relationship between the application, the malloc function, and the sbrk system call.
- Here we have a separation of duties: the system call in the kernel allocates an additional chunk of space on behalf of the process; the malloc library function manages this space from user level.
- Another example to illustrate the difference between a system call and a library function is the interface to determine the current time and date. The UNIX System provides a single system call that returns the number of seconds since the Epoch: midnight, January 1, 1970, Coordinated Universal Time. Any interpretation of this value, such as converting it to a human-readable time and date using the local time zone, is left to the user process. The standard C library provides routines to handle most cases. These library routines handle such details as the various algorithms for daylight saving time.
- An application can either make a system call or call a library routine. Also realize that many library routines invoke a system call. This is shown in Figure 1.12.
- Another difference between system calls and library functions is that system calls usually provide a minimal interface, whereas library functions often provide more elaborate functionality.
1.12 Summary
Exercises
1.1 Verify on your system that the directories dot and dot-dot are not the same, except in the root directory.
- Use two arguments for the ls(1) command: -i prints the i-node number of the file or directory, and -d prints information about a directory instead of information on all the files in the directory.
1.2 In the output from the program in Figure 1.6, what happened to the processes process IDs 852 and 853?
- The UNIX System is a multiprogramming, or multitasking system. Other processes were running at the time this program was run.
1.3 In Section 1.7, the argument to perror is defined with the ISO C attribute const, whereas the integer argument to strerror isn’t defined with this attribute. Why?
- Since the msg argument to perror is a pointer, perror could modify the string that msg points to. The qualifier const says that perror does not modify what the pointer points to. On the other hand, the error number argument to strerror is an integer, and since C passes all arguments by value, the strerror function couldn’t modify this value even if it wanted to.
1.4 If the calendar time is stored as a signed 32-bit integer, in which year will it overflow? How can we extend the overflow point? Are these strategies compatible with existing applications?
- During the year 2038. We can solve the problem by making the time_t data type a 64-bit integer. If it is currently a 32-bit integer, applications will have to be recompiled to work properly. But the problem is worse. Some file systems and backup media store times in 32-bit integers. These would need to be updated as well, but we still need to be able to read the old format.
1.5 If the process time is stored as a signed 32-bit integer, and if the system counts 100 ticks per second, after how many days will the value overflow?
- Approximately 248 days.
Please indicate the source: http://blog.csdn.net/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1