说明
进程管理相关的系统调用通常不是由应用程序直接调用的,而是使用了C标准库这样的中间层。
进程的创建最重要的是一个复制父进程到子进程的过程。
进程复制
Linux实现了3个系统调用用于进程复制。
fork:重量级调用,它建立父进程的完整副本;
vfork:类似于fork,但并不创建父进程数据的副本,而是与父进程共享数据。为了满足这个要求,子进程在退出或者开始新程序之前内核保证父进程处于堵塞状态;
clone:产生线程,可以对父子进程之间的共享、复制进行精确控制;clone使用的细粒度的资源分配扩展了一般的线程概念,在一定程度上允许线程与进程之间的连续转换;事实上在Linux中,线程和进程之间的差别不是那么刚性,,这两个名词经常用作同义词;
另外最重要的是,Linux使用了写时复制(Copy-On-Write,COW)技术,它使父进程的数据不会直接复制到子进程,而是父子进程的地址空间指向同样的物理内存,这些内存的属性被设置成只读。当一个进程试图向复制的内存写入,处理器会向内核报告“缺页异常”,内核会创建该页专用于当前进程的副本来进行写操作。
上述系统调用的入口分别适合sys_fork、sys_vfork和sys_clone,它们是平台相关的,以x86为例(位于arch\x86\kernel\process_64.c):
asmlinkage long sys_fork(struct pt_regs *regs)
{
return do_fork(SIGCHLD, regs->rsp, regs, 0, NULL, NULL);
}
asmlinkage long
sys_clone(unsigned long clone_flags, unsigned long newsp,
void __user *parent_tid, void __user *child_tid, struct pt_regs *regs)
{
if (!newsp)
newsp = regs->rsp;
return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
}
/*
* This is trivial, and on the face of it looks like it
* could equally well be done in user mode.
*
* Not so, for quite unobvious reasons - register pressure.
* In user mode vfork() cannot have a stack frame, and if
* done by calling the "clone()" system call directly, you
* do not have enough call-clobbered registers to hold all
* the information you need.
*/
asmlinkage long sys_vfork(struct pt_regs *regs)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->rsp, regs, 0,
NULL, NULL);
}
实际上它们都走到了平台无关的函数do_fork()。
/*
* Ok, this is the main fork-routine.
*
* It copies the process, and if successful kick-starts
* it and waits for it to finish using the VM if required.
*/
long do_fork(unsigned long clone_flags,
unsigned long stack_start,
struct pt_regs *regs,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr)
关于这个函数的实现还是直接看书。
内核线程
内核线程是直接由内核本身启动的进程,通过如下的接口创建:
/*
* create a kernel thread without removing it from tasklists
*/
extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
而它的实现,底层调用的还是do_fork:
pid_t
kernel_thread (int (*fn)(void *), void *arg, unsigned long flags)
{
extern void start_kernel_thread (void);
unsigned long *helper_fptr = (unsigned long *) &start_kernel_thread;
struct {
struct switch_stack sw;
struct pt_regs pt;
} regs;
memset(®s, 0, sizeof(regs));
regs.pt.cr_iip = helper_fptr[0]; /* set entry point (IP) */
regs.pt.r1 = helper_fptr[1]; /* set GP */
regs.pt.r9 = (unsigned long) fn; /* 1st argument */
regs.pt.r11 = (unsigned long) arg; /* 2nd argument */
/* Preserve PSR bits, except for bits 32-34 and 37-45, which we can't read. */
regs.pt.cr_ipsr = ia64_getreg(_IA64_REG_PSR) | IA64_PSR_BN;
regs.pt.cr_ifs = 1UL << 63; /* mark as valid, empty frame */
regs.sw.ar_fpsr = regs.pt.ar_fpsr = ia64_getreg(_IA64_REG_AR_FPSR);
regs.sw.ar_bspstore = (unsigned long) current + IA64_RBS_OFFSET;
regs.sw.pr = (1 << PRED_KERNEL_STACK);
return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s.pt, 0, NULL, NULL);
}
另一个创建内核线程的是kthread_create:
/**
* kthread_create - create a kthread.
* @threadfn: the function to run until signal_pending(current).
* @data: data ptr for @threadfn.
* @namefmt: printf-style name for the thread.
*
* Description: This helper function creates and names a kernel
* thread. The thread will be stopped: use wake_up_process() to start
* it. See also kthread_run(), kthread_create_on_cpu().
*
* When woken, the thread will run @threadfn() with @data as its
* argument. @threadfn() can either call do_exit() directly if it is a
* standalone thread for which noone will call kthread_stop(), or
* return when 'kthread_should_stop()' is true (which means
* kthread_stop() has been called). The return value should be zero
* or a negative error number; it will be passed to kthread_stop().
*
* Returns a task_struct or ERR_PTR(-ENOMEM).
*/
struct task_struct *kthread_create(int (*threadfn)(void *data),
void *data,
const char namefmt[],
...)
启动新程序
复制进程之后,用新代码替换现存程序,即可启动新程序。
Linux使用execve系统调用来完成这个操作。
同样execve的入口点对应sys_execve函数:
long
sys_execve (char __user *filename, char __user * __user *argv, char __user * __user *envp,
struct pt_regs *regs)
{
char *fname;
int error;
fname = getname(filename);
error = PTR_ERR(fname);
if (IS_ERR(fname))
goto out;
error = do_execve(fname, argv, envp, regs);
putname(fname);
out:
return error;
}
这个是平台相关的,而对应的do_execve是平台无关的。
关于do_execve()的实现,也还是看书。
退出进程
退出进程使用系统调用exit,它的入口点事sys_exit:
asmlinkage long sys_exit(int error_code)
{
do_exit((error_code&0xff)<<8);
}
它是跟平台无关的。