APUE读书笔记-第八章进程控制

最新推荐文章于 2021-07-27 19:34:47 发布

mazinkaiser1991

最新推荐文章于 2021-07-27 19:34:47 发布

阅读量978

点赞数 1

分类专栏：读书笔记

本文链接：https://blog.csdn.net/u012927281/article/details/51946020

版权

读书笔记专栏收录该内容

37 篇文章 4 订阅

订阅专栏

两天时间就把第七章看完了，这一章看得快，主要是有以前的基础，今天开始本书的重头戏：进程相关知识。之前看到过一句话：“Linux中除了进程都是文件”，我认为这句话虽然说的挺绝对的，但我想了一下，也没有什么特别好的反驳理由。这些都是题外话，开始今天的主题：进程控制。

8.2 进程标识

每个进程都有一个非负整型表示的唯一进程ID。虽然进程ID是唯一的，但某个ID被回收后，ID号是可以复用的。书中还给出了几个有关于进程功能的例子，例如，ID为0的进程通常是调度进程。进程 1通常是init进程，在自举过程结束时由内核调用。有关于Linux内核启动的过程有机会一定要分析一下。

Linux中通过以下函数返回这些标识符（这些函数都是系统调用）。

#include <unistd.h>
extern __pid_t getpid (void) __THROW; //返回进程ID
extern __pid_t getppid (void) __THROW; //返回父进程ID
extern __uid_t getuid (void) __THROW; //返回实际用户ID
extern __uid_t geteuid (void) __THROW; //返回有效用户ID
extern __gid_t getgid (void) __THROW; //返回实际组ID
extern __gid_t getegid (void) __THROW; //返回有效组ID

8.3 fork函数

在Linux编程中一个非常重要的函数重要登场了。

之前已经对fork的运行过程进行过简单分析：http://blog.csdn.net/u012927281/article/details/51540447 可能还会对其进行更深入的分析。

一个现有的进程可以调用fork函数创建一个新进程。函数原型如下：

extern __pid_t fork (void) __THROWNL;

由fork创建的新进程被称为子进程。fork函数调用一次，但返回两次。两次返回的区别是子进程的返回值是0，而父进程的返回值则是新建子进程的进程ID。子进程是父进程的副本。例如子进程获得父进程数据空间、堆和栈副本。有关于子进程到底获得了父进程的什么资源，有机会也要详细分析一下。

书中给出了一个例子，对fork函数的使用方法进行了初步分析。源码如下：

#include <stdio.h>
#include <unistd.h>

int globvar = 6;
static stavar;
char buf[] = "a write to stdout\n";

int main(int argc,char* argv[])
{
	int var;
	pid_t pid;
	var = 88;
	if(write(STDOUT_FILENO,buf,sizeof(buf)-1)!=sizeof(buf)-1)
		perror(argv[0]);
	printf("before fork\n");

	if((pid=fork())<0) perror(argv[0]);
	else if(pid==0){
		++globvar;
		++stavar;
		++var;
	}else{
		sleep(2);
	}
	printf("pid=%d,glob=%d,sta=%d,var=%d\n",(int)getpid(),globvar,stavar,var);
	return 0;	
}

在这里例子中我还进行了一些小的变化，加入了bss段的相关内容。运行结果如下：

./test_fork a write to stdout
before fork
pid=3449,glob=7,sta=1,var=89
pid=3448,glob=6,sta=0,var=88

通过程序的运行结果可以发现，父子进程之间的bss段，.data段，栈区不共享，而是通过写时复制（Copy-On-Write，COW）进行更新。

但重定向的运行结果又不相同了，先来看看运行结果。

a write to stdout
before fork
pid=3530,glob=7,sta=1,var=89
before fork
pid=3529,glob=6,sta=0,var=88

产生上述运行结果的原因如下：

由于write函数是不带缓冲的，所以在fork前调用该函数，则向标准输入写入数据一次。
接下来看看printf的运行情况，当标准输出连接到终端设备时，它是行缓冲的，否则它是全缓冲的。当以交互方式运行该程序时，只得到该printf输出的行一次，其原因是标准输出缓冲区由换行符冲洗。但由于我们将输出重定向到了文件中，所以当前的缓冲模式是全缓冲的，先来回忆以下全缓冲模式的特点：在填满标准I/O缓冲区后才进行实际I/O操作，或者通过fflush函数强制冲洗缓冲区。再回到我们的程序中，在fork前调用printf函数，但该数据并未输出到文件而是仍然存在于缓冲区中，而后顺理成章这部分数据由父进程拷贝到子进程。在return之前的printf将其数据准加到已有的缓冲区中。当每个进程终止时，其缓冲区中的内容都被写到相应的文件中。

针对以上原因，可以在第一次printf函数后增加fflush函数，程序运行结果如下：

a write to stdout
before fork
pid=3618,glob=7,sta=1,var=89
pid=3617,glob=6,sta=0,var=88

与我们预想的结果相同。

通过运行以上程序可以发现，在重定向父进程的标准输入时，子进程的标准输出也被重定向。我们之前交代过fork会从父进程复制资源到子进程中，如此看来父进程打开的进程描述符也被复制到了子进程中。所以对于子进程到底复制了父进程的哪些资源，有必要写一篇blog专门研究一下这个问题。

8.4 vfork函数

vfork与fork一样都创建一个子进程，但是在父进程与子进程之间不复制地址空间，而是直接共享。vfork与fork的另一个区别在于：vfrok保证子进程先运行，在它调用exec或exit之后父进程才可能被调度运行，当子进程调用这两个函数中的任意一个时，父进程会恢复运行。

8.5 exit函数

exit函数与_exit或_Exit函数的主要区别在于（仅限于Linux操作系统），exit函数关闭所有标准I/O流，并冲洗这些流。_exit或_Exit函数并不冲洗标准I/O流。通过我们之前有关于运行库的研究可以知道，在main函数退出后一定会执行exit函数，这是由运行库的源码决定的。而在exit函数的尾部则调用_exit，_exit的功能就是调用系统调用exit。具体内容请见：http://blog.csdn.net/u012927281/article/details/51356228

基于以上流程我猜测标准I/O清理函数是在程序的启动阶段被注册到终止处理程序列表中，而且是第一个被注册函数，如此标准I/O清理函数才会在exit函数最后一个被执行。

这里还要区别“退出状态”与“终止状态”的区别，“退出状态”是传递给三个终止函数的参数，或main的返回值；而“终止状态”是指无论进程正常或异常退出时所产生的状态。

本节还提到了“孤儿进程”这一概念，“孤儿进程”是指其父进程已经终止，但其自身还未终止，此时其父进程变为init进程。与其类似的一个概念是“僵死进程”，“僵死进程”是指一个已经终止、但是其父进程尚未对其进行善后处理的进程被称为僵死进程。与“孤儿进程”相对的是，“僵死进程”是指子进程已死，而“孤儿进程”则是父进程已死。

8.6 wait函数与waitpid函数

当进程正常或异常终止时，内核会向其父进程发送SIGCHILD信号。父进程可以选择忽略该信号，或者提供一个该信号发生时即被调用执行的函数。对于SIGCHILD的系统默认动作是忽略该信号。但若是父进程希望保证子进程不会变为僵死进程，则可在父进程中调用wait函数或waitpid函数。先来看看这两个函数的原型：

#include <sys/wait.h>
#  define __WAIT_STATUS        int *
extern __pid_t wait (__WAIT_STATUS __stat_loc);
extern __pid_t waitpid (__pid_t __pid, int *__stat_loc, int __options);

以上wait.h文件位于/usr/include/x86_64-linux-gnu/sys。

wait的基本功能如下：

如果所有子进程都还在运行，则阻塞。在接收到SIGCHILD信号后，wait立即返回。
如果一个子进程已终止，正等待父进程获取其终止状态，则取得该子进程的终止状态立即返回。如果一个进程有几个子进程，那么只要有一个子进程终止，wait就返回。
如果它没有任何子进程，则立即出错返回。

对于返回值“__stat_loc”，其结果中的某些位表示退出状态（正常返回），其他位则指示信号编号（异常返回）。对于__stat_loc值与其含义之间对应关系可通过以下四个宏确定：

#   define __WAIT_INT(status) \
  (__extension__ (((union { __typeof(status) __in; int __i; }) \
           { .__in = (status) }).__i))

# define WIFEXITED(status)	__WIFEXITED (__WAIT_INT (status))
#define    __WIFEXITED(status)    (__WTERMSIG(status) == 0) //定义位于/usr/include/x86_64-linux-gnu/bits/waitstatus.h
#define    __WTERMSIG(status)    ((status) & 0x7f)

# define WIFSIGNALED(status)	__WIFSIGNALED (__WAIT_INT (status))
#define __WIFSIGNALED(status) \
  (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) //定义位于/usr/include/x86_64-linux-gnu/bits/waitstatus.h

# define WIFSTOPPED(status)	__WIFSTOPPED (__WAIT_INT (status))
#define    __WIFSTOPPED(status)    (((status) & 0xff) == 0x7f) //定义位于/usr/include/x86_64-linux-gnu/bits/waitstatus.h

# ifdef __WIFCONTINUED
#  define WIFCONTINUED(status)	__WIFCONTINUED (__WAIT_INT (status))
# endif
#ifdef WCONTINUED //定义位于/usr/include/x86_64-linux-gnu/bits/waitstatus.h
# define __WIFCONTINUED(status)    ((status) == __W_CONTINUED)
#endif

下表说明了宏与其含义之间的关系。

宏	说明
WIFEXITED(status)	若为正常终止子进程的返回状态，则为真。对于这种情况可执行WEXITSTATUS(status)，获取子进程传递给exit或_exit参数的低8位
WIFSIGNALED(status)	若为异常终止子进程返回的状态，则为真（接到一个不捕捉的信号）。对于这种情况，可执行WTERMSIG（status），获取使子进程终止的信号编号。
WIFSTOPPED(status)	若为当前暂停子进程的返回状态，则为真。对于这种情况，可执行WSTOPSIG（status），获取使子进程暂停的信号编号。
WIFCONTINUED(status)	若在作业控制暂停后已经继续的子进程返回了状态，则为真。

通过以上描述，可以发现

“WIFEXITED(status)”与“WEXITSTATUS(status)”配合使用，可以获得进程的退出状态。

“WIFSIGNALED(status)”与“WTERMSIG（status）”配合使用，可以获得使子进程终止的信号编号。

“WIFSTOPPED(status)”与“WSTOPSIG（status）”配合使用，可以获得使子进程暂停的信号编号。

书中给出了实例，对以上函数的用法进行了展示：

#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>

void pr_exit(int status);

int main(int argc,char* argv[])
{
	pid_t pid;
	int status;

	/*1*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		exit(7);
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*2*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		abort();
	}else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*3*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		status/0;
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	return 0;
}

void pr_exit(int status)
{
	if(WIFEXITED(status))
		printf("normal termination,exit status = %d\n",WEXITSTATUS(status));
	else if(WIFSIGNALED(status))
		printf("abnormal termination,signal number = %d%s\n",WTERMSIG(status),
#ifdef WCOREDUMP
		WCOREDUMP(status)?"(core file generated)":"");
#else 
	"");
#endif
	else if(WIFSTOPPED(status))
		printf("child stopped,signal number = %d\n",WSTOPSIG(status));
}

不过我的运行结果不太一样：

normal termination,exit status = 7
abnormal termination,signal number = 6(core file generated)
normal termination,exit status = 0

前两个还是一样的，但是第三个不同，在我的机器上，除零被认为是正常退出。

所以针对“除零”的情况，专门给出一个测试用例：

#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>

void pr_exit(int status);

int main(int argc,char* argv[])
{
	pid_t pid;
	int status=1;

	/*1*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		status/0; //直接改为除0
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*2*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		abort();
	}else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*3*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		status/0;
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}
}

void pr_exit(int status)
{
	if(WIFEXITED(status))
		printf("normal termination,exit status = %d\n",WEXITSTATUS(status));
	else if(WIFSIGNALED(status))
		printf("abnormal termination,signal number = %d%s\n",WTERMSIG(status),
#ifdef WCOREDUMP
		WCOREDUMP(status)?"(core file generated)":"");
#else 
	"");
#endif
	else if(WIFSTOPPED(status))
		printf("child stopped,signal number = %d\n",WSTOPSIG(status));
	else if(WIFCONTINUED(status))
		printf("child continued\n"); 
}

这个程序的运行结果更加诡异：

abnormal termination,signal number = 6(core file generated)
normal termination,exit status = 0
normal termination,exit status = 35
abnormal termination,signal number = 6(core file generated)
normal termination,exit status = 0

比预期结果还多出了两行，还是先用最简单的实验试试吧：

#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>

void pr_exit(int status);

int main(int argc,char* argv[])
{
	pid_t pid;
	int status=1;

	/*1*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		status/0;
	}
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}
}

void pr_exit(int status)
{
	if(WIFEXITED(status))
		printf("normal termination,exit status = %d\n",WEXITSTATUS(status));
	else if(WIFSIGNALED(status))
		printf("abnormal termination,signal number = %d%s\n",WTERMSIG(status),
#ifdef WCOREDUMP
		WCOREDUMP(status)?"(core file generated)":"");
#else 
	"");
#endif
	else if(WIFSTOPPED(status))
		printf("child stopped,signal number = %d\n",WSTOPSIG(status));
	else if(WIFCONTINUED(status))
		printf("child continued\n"); 
}

运行结果如下：

normal termination,exit status = 0

这次程序的退出状态又变成了0，但退出状态的35的情况，还是没有复现。我仔细地对比了一下第一个程序与第二个程序之间的不同，差了一个“return 0”，又数了一下“normal termination,exit status = 0\n”的字符数，正好35个，好了，问题得到解决了。第二个程序中运行第一个status/0后，进程并没有被终止，而是继续执行，所以会出现以下两行输出

abnormal termination,signal number = 6(core file generated)
normal termination,exit status = 0

此时程序继续执行，以上两行就是子进程的执行结果，但由于程序没有“return 0”，所以子进程的退出状态使用的是printf的返回值，也就是35。

normal termination,exit status = 35
abnormal termination,signal number = 6(core file generated)
normal termination,exit status = 0

以上三行就是对应的父进程的执行结果。

通过上述程序的运行结果我们基本可以推测除零并不会产生任何异常，而是跳过这一句继续执行。再来通过一个实验验证一下：

#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>

void pr_exit(int status);

int main(int argc,char* argv[])
{
	pid_t pid;
	int status=1;

	/*1*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		status/0;
		printf("after div zero\n");
	}
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}
}

void pr_exit(int status)
{
	if(WIFEXITED(status))
		printf("normal termination,exit status = %d\n",WEXITSTATUS(status));
	else if(WIFSIGNALED(status))
		printf("abnormal termination,signal number = %d%s\n",WTERMSIG(status),
#ifdef WCOREDUMP
		WCOREDUMP(status)?"(core file generated)":"");
#else 
	"");
#endif
	else if(WIFSTOPPED(status))
		printf("child stopped,signal number = %d\n",WSTOPSIG(status));
	else if(WIFCONTINUED(status))
		printf("child continued\n"); 
}

运行结果如下：

./test_div_zero 
after div zero
normal termination,exit status = 15

实验到此，我们能不能让程序产生除零效果呢？当然有，再将程序稍加修改。

#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>

void pr_exit(int status);

int main(int argc,char* argv[])
{
	pid_t pid;
	int status=1;
	int zero = 0;

	/*1*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		exit(7);
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*2*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		abort();
	}else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*3*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		status = status/zero;
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

}

void pr_exit(int status)
{
	if(WIFEXITED(status))
		printf("normal termination,exit status = %d\n",WEXITSTATUS(status));
	else if(WIFSIGNALED(status))
		printf("abnormal termination,signal number = %d%s\n",WTERMSIG(status),
#ifdef WCOREDUMP
		WCOREDUMP(status)?"(core file generated)":"");
#else 
	"");
#endif
	else if(WIFSTOPPED(status))
		printf("child stopped,signal number = %d\n",WSTOPSIG(status));
	else if(WIFCONTINUED(status))
		printf("child continued\n"); 
}

至此终于产生了与书中相同的运行结果。

./test_pr_exit 
normal termination,exit status = 7
abnormal termination,signal number = 6(core file generated)
abnormal termination,signal number = 8(core file generated)

再接再厉，将WIFSTOPPED(status)也试试：

#include <stdio.h>
#include <sys/wait.h>
#include <stdlib.h>

void pr_exit(int status);

int main(int argc,char* argv[])
{
	pid_t pid;
	int status=1;
	int zero = 0;

	/*1*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		exit(7);
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*2*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		abort();
	}else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*3*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0)
		status = status/zero;
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

	/*4*/
	if((pid=fork())<0)
		perror(argv[0]);
	else if(pid==0){
		printf("child pid = %d\n",(int)getpid());
		while(1);
	}
	else{
		if(wait(&status)!=pid)
			perror(argv[0]);
		pr_exit(status);
	}

}

void pr_exit(int status)
{
	if(WIFEXITED(status))
		printf("normal termination,exit status = %d\n",WEXITSTATUS(status));
	else if(WIFSIGNALED(status))
		printf("abnormal termination,signal number = %d%s\n",WTERMSIG(status),
#ifdef WCOREDUMP
		WCOREDUMP(status)?"(core file generated)":"");
#else 
	"");
#endif
	else if(WIFSTOPPED(status))
		printf("child stopped,signal number = %d\n",WSTOPSIG(status));
	else if(WIFCONTINUED(status))
		printf("child continued\n"); 
}

运行结果如下：

./test_pr_exit normal termination,exit status = 7
abnormal termination,signal number = 6(core file generated)
abnormal termination,signal number = 8(core file generated)
child pid = pid //程序运行至此暂停，等待信号

在另一终端中输入以下内容：

kill -STOP 5430 //原程序没有任何响应
kill -9 5430 //发出SIGKILL信号，原程序输出内容，也就是在上一个终端中输出

输出内容如下：

abnormal termination,signal number = 9

从程序的运行结果上来看，好像是进程忽略了“SIGSTOP”信号，而只响应信号9。不过在APUE中明确提到“SIGSTOP”信号不能被捕捉或忽略。对于这个问题其实是我把wait与接下来要介绍的waitpid函数搞混乱了。wait函数只能等待处于僵死状态的子进程，而不能提供类似waipid的作业控制功能。

通过以上分析我们可以发现，wait函数存在一定的局限性：如果一个进程有多个子进程，那么只要有一个子进程终止，wait就返回。那么是否存在一个函数，可以使我们等待一个特定的函数？对于这一问题，答案是肯定的，waitpid函数就可以实现这一功能。实现这一功能的主要方法是通过waitpid函数中的“__pid”参数。“__pid”参数功能如下：

pid==-1，等待任一子进程。此种情况下，waitpid与wait等效。
pid>0，等待进程ID与pid相等的子进程。
pid==0，等待组ID等于调用进程组ID的任一子进程。
pid<-1，等待组ID等于pid绝对值的任一子进程。

waitpid函数返回终止子进程的进程ID，并将该子进程的终止状态存放在由“__stat_loc”指向的存储单元中。

waitpid中的“__options”参数使我们能进一步控制waitpid的操作。此参数或者是0，或者是下表中常量按位或运算的结果。

常量	说明
WCONTINUED	若实现支持作业控制，那么由pid指定的任一子进程在暂停（原文是停止，我认为翻译的有问题）后继续，但其状态尚未报告，则返回其状态。
WNOHANG	若由pid指定的子进程并不是立即可用的，则waitpid不阻塞，此时其返回值为0。
WUNTRACED	若某实现支持作业控制，而由pid指定的任一子进程已处于暂停（原文也是停止）状态，并且其状态自暂停以来还未报告过，则返回其状态。WIFSTOPPED宏确定返回值是否对应于一个暂停的子进程。

看过这三个选项后我感觉自己还是懵懵的，对于这些选项的作用感觉函数一知半解，所以我还是通过几个实验简单实验了一下，以下实验来自于：http://blog.csdn.net/fanbird2008/article/details/6593084

先来看第一个选项：WNOHANG

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc,char* argv[])
{
	pid_t pid;
	int statloc;

	pid = fork();
	if(pid<0){
		perror(argv[0]);
	}else if(pid==0){
		while(1);
	}else{
		waitpid(pid,&statloc,WNOHANG);
		printf("waitpid has returned\n");
	}

	return 0;
}

运行结果如下，waitpid直接返回。

waitpid has returned

再来看第二种情况：WUNTRACED。源码如下，其实就是把“WNOHANG”选项改为“WUNTRACED”。

waitpid(pid,&statloc,WUNTRACED);

运行结果如下：

./test_waitpid 
waitpid has returned //运行至此并未返回

此时开启另一终端，查询该进程的pid

ps aux | grep ./test_waitpid

可以看到父进程状态为“S+”，其中S代表处于睡眠状态，+代表位于后台的进程组。子进程状态为“R+”，其中R代表正在运行。

向子进程发送"STOP"信号，程序返回。

kill -STOP pid（子进程pid）

有关于ptrace的情况，我还没搞懂ptrace到底应该怎么用，所以也就先不给大家详细分析了。

来看最后一种情况，WCONTINUED，还是先修改相关源码：

waitpid(pid,&statloc,WCONTINUED);

运行结果如下：

./test_waitpid //此时程序未返回

在另一个终端中查询子进程ID：

ps aux | grep test_waitpid

发送“-STOP”信号，此时程序不返回。再次查询子进程的状态，为“T+”，代表此时进程为暂停状态。再次发送“-CONT”信号，子进程恢复运行，父进程返回并结束运行。

kill -STOP pid
kill -CONT pid //pid还是代表子进程ID

通过这三个实验，对waitpid的option参数进行了几个实验，现把我的使用体会如下：

WNOHANG：若子进程尚未结束，则函数直接返回，不等待；否则执行函数功能。
WUNTRACED：若子进程处于运行状态，则函数阻塞；否则若子进程处于暂停状态，则函数返回。
WCONTINUED：若子进程处于运行状态，则函数阻塞；否则若子进程处于暂停状态后又恢复执行，则函数返回。

最后要把wait与waitpid的区别总结一下，本来这部分内容不想写的，不过在这上面吃了亏，还是记录一下吧：

waitpid可等待一个特定的进程，而wait则返回任一终止子进程的状态。
waitpid提供了一个wait的非阻塞版本。有时希望获取一个子进程的状态，但不想阻塞。注意若父进程已经停止，则waitpid函数就不再执行其功能了。所以若希望waitpid处理子进程，则必须保证子进程在父进程调用waitpid函数前停止或暂停。
waitpid通过WUNTRACED和WCONTINUED选项支持作业控制。这一点也要特别注意，wait函数没有这一功能。

书中还给出了一个有关于避免僵死进程的实例。一开始对书中的这个实例还不是非常理解，先来看看书中给出的描述：“如果一个fork一个子进程，但不要它等待子进程终止，也不希望进程处于僵死状态直到父进程终止”。有关于这一部分我主要参考了这篇文章：http://blog.chinaunix.net/uid-20729605-id-1884370.html

子进程结束时父进程仍存在，而父进程fork()之前既没安装SIGCHLD 信号处理函数调用waitpid()等待子进程结束，又没有显式忽略该信号，则子进程成为僵尸进程，无法正常结束，此时即使是root身份kill-9也不能杀死僵尸进程。补救办法是杀死僵尸进程的父进程(僵尸进程的父进程必然存在)，僵尸进程成为"孤儿进程"，过继给1号进程init，init始终会负责清理僵尸进程。

在回到我们书中的内容：“也不希望进程处于僵死状态”，这句话的意思是说希望子进程结束后被妥善处理，而不是任由其发展称为僵死进程。“直到父进程终止”是指父进程终止后，僵死进程变成“孤儿进程”并由init进程进行处理。整句话的意思就是说“希望子进程在结束后，资源立即被回收，而不是由内核释放其资源，并变为僵死进程“。

所以针对这一要求，结合我们所学的知识，第一想法就是调用wait或waitpid函数，但题目中又给出了另一个限定条件：”但不要它等待子进程终止“，也就是不希望由父进程来执行清理僵死进程的工作。

上面给出的那篇blog中，介绍了另外两种清理僵死进程的方法，由于不是今天的主题在此也就不给大家详细分享了，让我们还是专注于今天的主题：对由父进程产生的子进程，如何不通过父进程对其进行”善后“处理工作。

书中给出的方法是通过两次fork，源码如下：

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdlib.h>

int main(void)   
{   
    pid_t pid;   
  
    if ((pid = fork()) < 0)   
    {   
        fprintf(stderr,"Fork error!\n");   
        exit(-1);   
    }   
    else if (pid == 0) /* first child */  
    {    
        if ((pid = fork()) < 0)   
        {    
            fprintf(stderr,"Fork error!\n");   
            exit(-1);   
        }   
        else if (pid > 0)   
            exit(0); 
        sleep(5);   
        printf("Second child, parent pid = %d\n", getppid());   
        exit(0);   
    }   
       
    if (waitpid(pid, NULL, 0) != pid)  
    {   
        fprintf(stderr,"Waitpid error!\n");   
        exit(-1);   
    }     
    exit(0);   
}

之所以改为sleep(5)，是由于一开始的时候，输出结果老是”Second child, parent pid = 1048“，所以就以为输出有错误，第二个子进程先于父进程运行了。后来才发现这是由于我使用ubuntu的原因，具体解释请见： http://www.cnblogs.com/chilumanxi/p/5136102.html

这种方法的思想就是：第一个子进程由父进程负责回收，而由于子进程退出了，孙子进程也就被init（我的机器上是upstart进程）接管了，间接造成了孙子进程与父进程脱离了关系。

之所以不能直接让子进程直接变成僵死进程，是由于这个程序给出的限定条件就是”不希望子进程处于僵死状态知道父进程终止“。若子进程先于父进程结束，则子进程就会进入僵死状态。

8.6节就先分析到这里吧，内容确实挺多的，而且也给我留下了很多问题，特别是进程状态这一块，所以对于这一块的内容我特别进行了一下梳理。

有关于进程状态的内容请见：http://blog.csdn.net/u012927281/article/details/52016191

8.7节主要介绍了waitid函数，功能与waitpid类似，函数原型如下：

extern int waitid (idtype_t __idtype, __id_t __id, siginfo_t *__infop,
		   int __options);

其中“__idtype”参数用于指定函数的功能，即等待某一进程等。“__infop”参数包含了造成子进程状态改变有关信号的详细信息。

8.8节主要介绍wait3与wait4函数，上述两个函数原型如下：

extern __pid_t wait3 (__WAIT_STATUS __stat_loc, int __options,
		      struct rusage * __usage) __THROWNL;
extern __pid_t wait4 (__pid_t __pid, __WAIT_STATUS __stat_loc, int __options,
              struct rusage *__usage) __THROWNL;

先来看wait3与wait4函数之间的区别：通过分析其参数，可以发现wait3函数不能等待某一特定的子进程，当有一个子进程退出时，该函数就返回，功能与wait函数类似。wait4函数的功能也就豁然开朗了，它可以等待某一个特定的子进程。wait3、wait4与wait、waitpid、waitid的区别在于，wait3与wait4函数可以获得终止进程及其所有子进程使用的资源概况。资源统计信息包括用户CPU时间总量等，主要就是通过struct rusage结构体，该结构体定义如下：

struct rusage
  {
    /* Total amount of user time used.  */
    struct timeval ru_utime;
    /* Total amount of system time used.  */
    struct timeval ru_stime;
    /* Maximum resident set size (in kilobytes).  */
    __extension__ union
      {
	long int ru_maxrss;
	__syscall_slong_t __ru_maxrss_word;
      };
    /* Amount of sharing of text segment memory
       with other processes (kilobyte-seconds).  */
    /* Maximum resident set size (in kilobytes).  */
    __extension__ union
      {
	long int ru_ixrss;
	__syscall_slong_t __ru_ixrss_word;
      };
    /* Amount of data segment memory used (kilobyte-seconds).  */
    __extension__ union
      {
	long int ru_idrss;
	__syscall_slong_t __ru_idrss_word;
      };
    /* Amount of stack memory used (kilobyte-seconds).  */
    __extension__ union
      {
	long int ru_isrss;
	 __syscall_slong_t __ru_isrss_word;
      };
    /* Number of soft page faults (i.e. those serviced by reclaiming
       a page from the list of pages awaiting reallocation.  */
    __extension__ union
      {
	long int ru_minflt;
	__syscall_slong_t __ru_minflt_word;
      };
    /* Number of hard page faults (i.e. those that required I/O).  */
    __extension__ union
      {
	long int ru_majflt;
	__syscall_slong_t __ru_majflt_word;
      };
    /* Number of times a process was swapped out of physical memory.  */
    __extension__ union
      {
	long int ru_nswap;
	__syscall_slong_t __ru_nswap_word;
      };
    /* Number of input operations via the file system.  Note: This
       and `ru_oublock' do not include operations with the cache.  */
    __extension__ union
      {
	long int ru_inblock;
	__syscall_slong_t __ru_inblock_word;
      };
    /* Number of output operations via the file system.  */
    __extension__ union
      {
	long int ru_oublock;
	__syscall_slong_t __ru_oublock_word;
      };
    /* Number of IPC messages sent.  */
    __extension__ union
      {
	long int ru_msgsnd;
	__syscall_slong_t __ru_msgsnd_word;
      };
    /* Number of IPC messages received.  */
    __extension__ union
      {
	long int ru_msgrcv;
	__syscall_slong_t __ru_msgrcv_word;
      };
    /* Number of signals delivered.  */
    __extension__ union
      {
	long int ru_nsignals;
	__syscall_slong_t __ru_nsignals_word;
      };
    /* Number of voluntary context switches, i.e. because the process
       gave up the process before it had to (usually to wait for some
       resource to be available).  */
    __extension__ union
      {
	long int ru_nvcsw;
	__syscall_slong_t __ru_nvcsw_word;
      };
    /* Number of involuntary context switches, i.e. a higher priority process
       became runnable or the current process used up its time slice.  */
    __extension__ union
      {
	long int ru_nivcsw;
	__syscall_slong_t __ru_nivcsw_word;
      };
  };

该结构体详细描述了wait3与wait4函数可以返回的信息。

8.10 exec函数族

“当进程调用exec函数时，该进程执行的程序完全替换为新程序，而新进程则从其main函数开始执行。因为调用exec并不创建新进程，所以调用前后进程id并未改变。exec只是用磁盘上的一个新程序替换了当前进程的代码段、数据段、堆段和栈段。”以上内容还是直接摘抄自APUE，但书中并未提到bss段的情况，因此通过实验对其加以验证，源码如下：

test_exec.c：

#include <unistd.h>
#include <stdio.h>

int bss_data;

int main()
{
	bss_data = 5;
	printf("data from bss = %d , its address is %p\n",bss_data,&bss_data);
	execl("./test_bss","test_bss",(char*)0);
}

test_data.c

#include <stdio.h>

int bss_data;

int main()
{
	++bss_data;
	printf("data from bss = %d , its address is %p\n",bss_data,&bss_data);
	return 0;
}

运行结果如下：

./test_exec 
data from bss = 5 , its address is 0x804a028
data from bss = 1 , its address is 0x804a024

在两个程序中均定义了bss_data这一全局变量，由于其并未初始化，因此bss_data位于bss段中。通过程序运行结果可以发现两个bss_data分别位于不同的地址上，说明exec调用前后的程序并没有共享bss段，因此上面的描述应该改为：“exec只是用磁盘上的一个新程序替换了当前进程的代码段、数据段、堆段和栈段以及bss段。”对于其他的段由于其功能不是非常重要，因此再次也就不详细研究了。

exec函数族原型如下：

#include <unistd.h>
extern int execl (const char *__path, const char *__arg, ...)
     __THROW __nonnull ((1, 2));
extern int execv (const char *__path, char *const __argv[])
     __THROW __nonnull ((1, 2));
extern int execle (const char *__path, const char *__arg, ...)
     __THROW __nonnull ((1, 2));
extern int execve (const char *__path, char *const __argv[],
           char *const __envp[]) __THROW __nonnull ((1, 2));
extern int execlp (const char *__file, const char *__arg, ...)
     __THROW __nonnull ((1, 2));
extern int execvp (const char *__file, char *const __argv[])
     __THROW __nonnull ((1, 2));
extern int fexecve (int __fd, char *const __argv[], char *const __envp[])
     __THROW __nonnull ((2));

首先来看execl与execv之间的区别，execl通过参数列表传递参数，而execv通过参数数组指针进行传递。execle与execve，execlp与execvp之间均存在这样的区别。这里还要注意的一点是execl、execv、execle、execve中的__path参数需要包括路径名与可执行程序名，无论是参数列表或数组指针，第一个参数应为可执行程序名。
再来看“execl”、“execv”与“execle”、“execve”之间的区别，这两组函数之间的区别很简单，就是在参数的最后还可以传递一个环境变量参数，由于环境变量在父子进程之间共享，因此通过这一方法可在父子进程之间改变环境变量。
接下来是“execl”、“execv”与“execlp”、“execvp”之间的区别，后两个函数取文件名作为参数。

在之前的文章中我们还曾经提到过FD_CLOEXEC标志，当时已经对这一标志的功能进行过探究，现在再重申一遍：若设置了该标志则在调用exec函数时关闭该描述符，否则该描述符仍打开。POSIX.1明确要求在调用exec函数时关闭打开的目录流。

以上7个函数的关系请见下图，其中只有execve是系统调用，其他6个函数均是在此基础上互相调用形成的库函数。

8.11 可通过以下两个函数更改用户ID与组ID

#include <unistd.h>
extern int setuid (__uid_t __uid) __THROW __wur;
extern int setgid (__gid_t __gid) __THROW __wur;

关于setuid具有以下三点规则：

若进程具有超级用户特权，则setuid函数将实际用户ID、有效用户ID以及保存的设置用户ID设置为__uid。
若进程不具有超级用户特权，但是uid等于实际用户ID或保存的设置用户ID，则setuid只将有效用户ID设置为uid。不更改实际用户ID和保存的设置用户ID。
若上述两个条件都不满足，则errno设置为EPERM，函数返回值为-1。

这里“保存的设置用户Id”是指作为有效用户ID的副本,在执行exec调用后能重新恢复原来的有效用户ID。

通过exec函数同样可以更改有效用户ID。当对程序文件设置了“设置用户ID位（注意：此处仅是一个位，要与设置用户ID的概念区分开）”，exec函数才设置有效用户ID。此处就可以与之前介绍过的passwd命令结合起来，由于shell执行命令需要通过execve系统调用，同时/etc/passwd文件设置了“设置用户ID位”，则execve函数设置有效用户ID，即将有效用户ID设置为文件的实际用户ID。

通过以下两个表格可以了解更改这3个用户ID的方法。

1.通过setuid改变程序用户ID的方法

setuid函数	超级用户	非特权用户
实际用户ID	设为uid	不变
有效用户ID	设为uid	设为uid
设置的保存用户ID	设为uid	不变

2. 通过exec改变程序用户ID的方法

exec	设置用户ID位关闭	设置用户ID位打开
实际用户ID	不变	不变
有效用户ID	不变	设置为程序文件的用户ID
设置的保存用户ID	从有效用户ID复制	从有效用户ID复制

这一小节中还给出了一个at程序有关于程序权限切换的实例，不过我并没有看懂。

8.12 解释器文件。

解释器文件是文本文件，其起始行的形式是：

#！ pathname [optional-arguments]

最常见的解释器文件以下列行开始：

#！ /bin/sh

内核使调用exec函数的进程实际执行的并不是该解释器文件，而是在该解释器文件第一行中pathname所制定的文件。一定要将解释器文件（文本文件，以#！开头）和解释器（由该解释器文件第一行中pathname制定）区分开来。

8.13 system函数

system函数的功能就是直接执行一条shell命令，函数原型如下：

#include <stdlib.h>
extern int system (const char *__command) __wur;

该函数具有以下三种返回值，分别是：

fork失败或者waitpid返回除EINTR之外的错误，则system返回-1，并设置errno以指示错误类型。
如果exec失败（表示不能执行shell），则其返回值如同执行exit(127)一样。
否则如果三个函数都成功（fork、exec、waitpid），那么system的返回值是shell的终止状态。

还要注意的一点是不要在设置了“设置用户ID位”的程序中调用system函数。因为若设置了“设置用户ID位”，那么调用fork会继承这一标志，而后调用exec函数同时设置了“设置用户ID位”，则有效用户ID被设置为程序文件的实际用户ID（此时也就是fork继承自父进程的实际用户ID），若调用system的进程具有特权ID，则调用system产生的子进程同样具有特权，这一现象的出现就会产生安全漏洞。

对于上述问题的一个修改方法是在fork函数之后，exec函数之前关闭“设置用户ID位”。

8.14与8.15小节分别讨论了进程会计与用户标识，感觉作用不大，在此就不详细研究了，以后用到了再说。