Chapter 9. Blocking Processes
阻塞进程
Enter Sandman
当别人让你做一件你不能马上去做的事时,你会如何反映?如果你是人类的话,而且对方也是人类的话, 你只会说:“现在不行,我忙着在。闪开!”但是如果你是一个内核模块而且你被一个进程以同样的问题困扰, 你会有另外一个选择。你可以让该进程休眠直到你可以为它服务时。毕竟,这样的情况在内核中时时刻刻都在发生 (这就是系统让多进程在单CPU上同时运行的方法)。
这个内核模块就是一个这样的例子。文件(/proc/sleep))只可以在同一时刻被一个进程打开。 如果该文件已经被打开,内核模块将调用函数 wait_event_interruptible[1]。该函数修改task的状态(task是一个内核中的结构体数据结构, 其中保存着对应进程的信息和该进程正在调用的系统调用,如果有的话)为 TASK_INTERRUPTIBLE意味着改进程将不会继续运行直到被唤醒,然后被添加到系统的进程等待队列 WaitQ中,一个等待打开该文件的队列中。然后,该函数调用系统调度器去切换到另一个不同的 但有CPU运算请求的进程。
当一个进程处理完该文件并且关闭了该文件,module_close就被调用执行了。 该函数唤醒所有在等待队列中的进程(还没有只唤醒特定进程的机制)。然后该函数返回, 那个刚刚关闭文件的进程得以继续运行。及时的,进程调度器会判定该进程执行已执行完毕, 将CPU转让给别的进程。被提供CPU使用权的那个进程就恰好从先前系统调用 module_interruptible_sleep_on[2]后的地方开始继续执行。 它可以设置一个全局变量去通知别的进程该文件已被打开占用了。当别的请求该文件的进程获得CPU时间片时, 它们将检测该变量然后返回休眠。
更有趣的是,module_close并不垄断唤醒等待中的请求文件的进程的权力。一个信号,像Ctrl+c (SIGINT也能够唤醒别的进程 [3]。 在这种情况下,我们想立即返回-EINTR 。 这对用户很重要,举个例子来说,用户可以在某个进程接受到文件前终止该进程。
还有一点值得注意。有些时候进程并不愿意休眠,它们要么立即执行它们想做的, 要么被告知任务无法进行。这样的进程在打开文件时会使用标志O_NONBLOCK。 在别的进程被阻塞时内核应该做出的响应是返回错误代码-EAGAIN,像在本例中对该文件的请求的进程。程序 cat_noblock,在本章的源代码目录下可以找到,就能够使用标志位 O_NONBLOCK打开文件。
Example 9-1. sleep.c
/*
* sleep.c - create a /proc file, and if several processes try to open it at
* the same time, put all but one to sleep
*/
#include <linux/kernel.h> /* We're doing kernel work */
#include <linux/module.h> /* Specifically, a module */
#include <linux/proc_fs.h> /* Necessary because we use proc fs */
#include <linux/sched.h> /* For putting processes to sleep and
waking them up */
#include <asm/uaccess.h> /* for get_user and put_user */
/*
* The module's file functions
*/
/*
* Here we keep the last message received, to prove that we can process our
* input
*/
#define MESSAGE_LENGTH 80
static char Message[MESSAGE_LENGTH];
static struct proc_dir_entry *Our_Proc_File;
#define PROC_ENTRY_FILENAME "sleep"
/*
* Since we use the file operations struct, we can't use the special proc
* output provisions - we have to use a standard read function, which is this
* function
*/
static ssize_t module_output(struct file *file, /* see include/linux/fs.h */
char *buf, /* The buffer to put data to
(in the user segment) */
size_t len, /* The length of the buffer */
loff_t * offset)
{
static int finished = 0;
int i;
char message[MESSAGE_LENGTH + 30];
/*
* Return 0 to signify end of file - that we have nothing
* more to say at this point.
*/
if (finished) {
finished = 0;
return 0;
}
/*
* If you don't understand this by now, you're hopeless as a kernel
* programmer.
*/
sprintf(message, "Last input:%s\n", Message);
for (i = 0; i < len && message[i]; i++)
put_user(message[i], buf + i);
finished = 1;
return i; /* Return the number of bytes "read" */
}
/*
* This function receives input from the user when the user writes to the /proc
* file.
*/
static ssize_t module_input(struct file *file, /* The file itself */
const char *buf, /* The buffer with input */
size_t length, /* The buffer's length */
loff_t * offset)
{ /* offset to file - ignore */
int i;
/*
* Put the input into Message, where module_output will later be
* able to use it
*/
for (i = 0; i < MESSAGE_LENGTH - 1 && i < length; i++)
get_user(Message[i], buf + i);
/*
* we want a standard, zero terminated string
*/
Message[i] = '\0';
/*
* We need to return the number of input characters used
*/
return i;
}
/*
* 1 if the file is currently open by somebody
*/
int Already_Open = 0;
/*
* Queue of processes who want our file
*/
DECLARE_WAIT_QUEUE_HEAD(WaitQ);
/*
* Called when the /proc file is opened
*/
static int module_open(struct inode *inode, struct file *file)
{
/*
* If the file's flags include O_NONBLOCK, it means the process doesn't
* want to wait for the file. In this case, if the file is already
* open, we should fail with -EAGAIN, meaning "you'll have to try
* again", instead of blocking a process which would rather stay awake.
*/
if ((file->f_flags & O_NONBLOCK) && Already_Open)
return -EAGAIN;
/*
* This is the correct place for try_module_get(THIS_MODULE) because
* if a process is in the loop, which is within the kernel module,
* the kernel module must not be removed.
*/
try_module_get(THIS_MODULE);
/*
* If the file is already open, wait until it isn't
*/
while (Already_Open) {
int i, is_sig = 0;
/*
* This function puts the current process, including any system
* calls, such as us, to sleep. Execution will be resumed right
* after the function call, either because somebody called
* wake_up(&WaitQ) (only module_close does that, when the file
* is closed) or when a signal, such as Ctrl-C, is sent
* to the process
*/
wait_event_interruptible(WaitQ, !Already_Open);
/*
* If we woke up because we got a signal we're not blocking,
* return -EINTR (fail the system call). This allows processes
* to be killed or stopped.
*/
/*
* Emmanuel Papirakis:
*
* This is a little update to work with 2.2.*. Signals now are contained in
* two words (64 bits) and are stored in a structure that contains an array of
* two unsigned longs. We now have to make 2 checks in our if.
*
* Ori Pomerantz:
*
* Nobody promised me they'll never use more than 64 bits, or that this book
* won't be used for a version of Linux with a word size of 16 bits. This code
* would work in any case.
*/
for (i = 0; i < _NSIG_WORDS && !is_sig; i++)
is_sig =
current->pending.signal.sig[i] & ~current->
blocked.sig[i];
if (is_sig) {
/*
* It's important to put module_put(THIS_MODULE) here,
* because for processes where the open is interrupted
* there will never be a corresponding close. If we
* don't decrement the usage count here, we will be
* left with a positive usage count which we'll have no
* way to bring down to zero, giving us an immortal
* module, which can only be killed by rebooting
* the machine.
*/
module_put(THIS_MODULE);
return -EINTR;
}
}
/*
* If we got here, Already_Open must be zero
*/
/*
* Open the file
*/
Already_Open = 1;
return 0; /* Allow the access */
}
/*
* Called when the /proc file is closed
*/
int module_close(struct inode *inode, struct file *file)
{
/*
* Set Already_Open to zero, so one of the processes in the WaitQ will
* be able to set Already_Open back to one and to open the file. All
* the other processes will be called when Already_Open is back to one,
* so they'll go back to sleep.
*/
Already_Open = 0;
/*
* Wake up all the processes in WaitQ, so if anybody is waiting for the
* file, they can have it.
*/
wake_up(&WaitQ);
module_put(THIS_MODULE);
return 0; /* success */
}
/*
* This function decides whether to allow an operation (return zero) or not
* allow it (return a non-zero which indicates why it is not allowed).
*
* The operation can be one of the following values:
* 0 - Execute (run the "file" - meaningless in our case)
* 2 - Write (input to the kernel module)
* 4 - Read (output from the kernel module)
*
* This is the real function that checks file permissions. The permissions
* returned by ls -l are for reference only, and can be overridden here.
*/
static int module_permission(struct inode *inode, int op, struct nameidata *nd)
{
/*
* We allow everybody to read from our module, but only root (uid 0)
* may write to it
*/
if (op == 4 || (op == 2 && current->euid == 0))
return 0;
/*
* If it's anything else, access is denied
*/
return -EACCES;
}
/*
* Structures to register as the /proc file, with pointers to all the relevant
* functions.
*/
/*
* File operations for our proc file. This is where we place pointers to all
* the functions called when somebody tries to do something to our file. NULL
* means we don't want to deal with something.
*/
static struct file_operations File_Ops_4_Our_Proc_File = {
.read = module_output, /* "read" from the file */
.write = module_input, /* "write" to the file */
.open = module_open, /* called when the /proc file is opened */
.release = module_close, /* called when it's closed */
};
/*
* Inode operations for our proc file. We need it so we'll have somewhere to
* specify the file operations structure we want to use, and the function we
* use for permissions. It's also possible to specify functions to be called
* for anything else which could be done to an inode (although we don't bother,
* we just put NULL).
*/
static struct inode_operations Inode_Ops_4_Our_Proc_File = {
.permission = module_permission, /* check for permissions */
};
/*
* Module initialization and cleanup
*/
/*
* Initialize the module - register the proc file
*/
int init_module()
{
int rv = 0;
Our_Proc_File = create_proc_entry(PROC_ENTRY_FILENAME, 0644, NULL);
Our_Proc_File->owner = THIS_MODULE;
Our_Proc_File->proc_iops = &Inode_Ops_4_Our_Proc_File;
Our_Proc_File->proc_fops = &File_Ops_4_Our_Proc_File;
Our_Proc_File->mode = S_IFREG | S_IRUGO | S_IWUSR;
Our_Proc_File->uid = 0;
Our_Proc_File->gid = 0;
Our_Proc_File->size = 80;
if (Our_Proc_File == NULL) {
rv = -ENOMEM;
remove_proc_entry(PROC_ENTRY_FILENAME, &proc_root);
printk(KERN_INFO "Error: Could not initialize /proc/test\n" );
}
return rv;
}
/*
* Cleanup - unregister our file from /proc. This could get dangerous if
* there are still processes waiting in WaitQ, because they are inside our
* open function, which will get unloaded. I'll explain how to avoid removal
* of a kernel module in such a case in chapter 10.
*/
void cleanup_module()
{
remove_proc_entry(PROC_ENTRY_FILENAME, &proc_root);
}
Notes
[1] 最方便的保持某个文件被打开的方法是使用命令 tail -f打开该文件。
[2] 这就意味着该进程仍然在内核态中, 该进程已经调用了open的系统调用,但系统调用却没有返回。 在这段时间内该进程将不会得知别人正在使用CPU。
[3] 这是因为我们使用的是module_interruptible_sleep_on。我们也可以使用 module_sleep_on,但这样会导致一些十分愤怒的用户,因为他们的Ctrl+c将不起任何作用。
阻塞进程
Enter Sandman
当别人让你做一件你不能马上去做的事时,你会如何反映?如果你是人类的话,而且对方也是人类的话, 你只会说:“现在不行,我忙着在。闪开!”但是如果你是一个内核模块而且你被一个进程以同样的问题困扰, 你会有另外一个选择。你可以让该进程休眠直到你可以为它服务时。毕竟,这样的情况在内核中时时刻刻都在发生 (这就是系统让多进程在单CPU上同时运行的方法)。
这个内核模块就是一个这样的例子。文件(/proc/sleep))只可以在同一时刻被一个进程打开。 如果该文件已经被打开,内核模块将调用函数 wait_event_interruptible[1]。该函数修改task的状态(task是一个内核中的结构体数据结构, 其中保存着对应进程的信息和该进程正在调用的系统调用,如果有的话)为 TASK_INTERRUPTIBLE意味着改进程将不会继续运行直到被唤醒,然后被添加到系统的进程等待队列 WaitQ中,一个等待打开该文件的队列中。然后,该函数调用系统调度器去切换到另一个不同的 但有CPU运算请求的进程。
当一个进程处理完该文件并且关闭了该文件,module_close就被调用执行了。 该函数唤醒所有在等待队列中的进程(还没有只唤醒特定进程的机制)。然后该函数返回, 那个刚刚关闭文件的进程得以继续运行。及时的,进程调度器会判定该进程执行已执行完毕, 将CPU转让给别的进程。被提供CPU使用权的那个进程就恰好从先前系统调用 module_interruptible_sleep_on[2]后的地方开始继续执行。 它可以设置一个全局变量去通知别的进程该文件已被打开占用了。当别的请求该文件的进程获得CPU时间片时, 它们将检测该变量然后返回休眠。
更有趣的是,module_close并不垄断唤醒等待中的请求文件的进程的权力。一个信号,像Ctrl+c (SIGINT也能够唤醒别的进程 [3]。 在这种情况下,我们想立即返回-EINTR 。 这对用户很重要,举个例子来说,用户可以在某个进程接受到文件前终止该进程。
还有一点值得注意。有些时候进程并不愿意休眠,它们要么立即执行它们想做的, 要么被告知任务无法进行。这样的进程在打开文件时会使用标志O_NONBLOCK。 在别的进程被阻塞时内核应该做出的响应是返回错误代码-EAGAIN,像在本例中对该文件的请求的进程。程序 cat_noblock,在本章的源代码目录下可以找到,就能够使用标志位 O_NONBLOCK打开文件。
Example 9-1. sleep.c
/*
* sleep.c - create a /proc file, and if several processes try to open it at
* the same time, put all but one to sleep
*/
#include <linux/kernel.h> /* We're doing kernel work */
#include <linux/module.h> /* Specifically, a module */
#include <linux/proc_fs.h> /* Necessary because we use proc fs */
#include <linux/sched.h> /* For putting processes to sleep and
waking them up */
#include <asm/uaccess.h> /* for get_user and put_user */
/*
* The module's file functions
*/
/*
* Here we keep the last message received, to prove that we can process our
* input
*/
#define MESSAGE_LENGTH 80
static char Message[MESSAGE_LENGTH];
static struct proc_dir_entry *Our_Proc_File;
#define PROC_ENTRY_FILENAME "sleep"
/*
* Since we use the file operations struct, we can't use the special proc
* output provisions - we have to use a standard read function, which is this
* function
*/
static ssize_t module_output(struct file *file, /* see include/linux/fs.h */
char *buf, /* The buffer to put data to
(in the user segment) */
size_t len, /* The length of the buffer */
loff_t * offset)
{
static int finished = 0;
int i;
char message[MESSAGE_LENGTH + 30];
/*
* Return 0 to signify end of file - that we have nothing
* more to say at this point.
*/
if (finished) {
finished = 0;
return 0;
}
/*
* If you don't understand this by now, you're hopeless as a kernel
* programmer.
*/
sprintf(message, "Last input:%s\n", Message);
for (i = 0; i < len && message[i]; i++)
put_user(message[i], buf + i);
finished = 1;
return i; /* Return the number of bytes "read" */
}
/*
* This function receives input from the user when the user writes to the /proc
* file.
*/
static ssize_t module_input(struct file *file, /* The file itself */
const char *buf, /* The buffer with input */
size_t length, /* The buffer's length */
loff_t * offset)
{ /* offset to file - ignore */
int i;
/*
* Put the input into Message, where module_output will later be
* able to use it
*/
for (i = 0; i < MESSAGE_LENGTH - 1 && i < length; i++)
get_user(Message[i], buf + i);
/*
* we want a standard, zero terminated string
*/
Message[i] = '\0';
/*
* We need to return the number of input characters used
*/
return i;
}
/*
* 1 if the file is currently open by somebody
*/
int Already_Open = 0;
/*
* Queue of processes who want our file
*/
DECLARE_WAIT_QUEUE_HEAD(WaitQ);
/*
* Called when the /proc file is opened
*/
static int module_open(struct inode *inode, struct file *file)
{
/*
* If the file's flags include O_NONBLOCK, it means the process doesn't
* want to wait for the file. In this case, if the file is already
* open, we should fail with -EAGAIN, meaning "you'll have to try
* again", instead of blocking a process which would rather stay awake.
*/
if ((file->f_flags & O_NONBLOCK) && Already_Open)
return -EAGAIN;
/*
* This is the correct place for try_module_get(THIS_MODULE) because
* if a process is in the loop, which is within the kernel module,
* the kernel module must not be removed.
*/
try_module_get(THIS_MODULE);
/*
* If the file is already open, wait until it isn't
*/
while (Already_Open) {
int i, is_sig = 0;
/*
* This function puts the current process, including any system
* calls, such as us, to sleep. Execution will be resumed right
* after the function call, either because somebody called
* wake_up(&WaitQ) (only module_close does that, when the file
* is closed) or when a signal, such as Ctrl-C, is sent
* to the process
*/
wait_event_interruptible(WaitQ, !Already_Open);
/*
* If we woke up because we got a signal we're not blocking,
* return -EINTR (fail the system call). This allows processes
* to be killed or stopped.
*/
/*
* Emmanuel Papirakis:
*
* This is a little update to work with 2.2.*. Signals now are contained in
* two words (64 bits) and are stored in a structure that contains an array of
* two unsigned longs. We now have to make 2 checks in our if.
*
* Ori Pomerantz:
*
* Nobody promised me they'll never use more than 64 bits, or that this book
* won't be used for a version of Linux with a word size of 16 bits. This code
* would work in any case.
*/
for (i = 0; i < _NSIG_WORDS && !is_sig; i++)
is_sig =
current->pending.signal.sig[i] & ~current->
blocked.sig[i];
if (is_sig) {
/*
* It's important to put module_put(THIS_MODULE) here,
* because for processes where the open is interrupted
* there will never be a corresponding close. If we
* don't decrement the usage count here, we will be
* left with a positive usage count which we'll have no
* way to bring down to zero, giving us an immortal
* module, which can only be killed by rebooting
* the machine.
*/
module_put(THIS_MODULE);
return -EINTR;
}
}
/*
* If we got here, Already_Open must be zero
*/
/*
* Open the file
*/
Already_Open = 1;
return 0; /* Allow the access */
}
/*
* Called when the /proc file is closed
*/
int module_close(struct inode *inode, struct file *file)
{
/*
* Set Already_Open to zero, so one of the processes in the WaitQ will
* be able to set Already_Open back to one and to open the file. All
* the other processes will be called when Already_Open is back to one,
* so they'll go back to sleep.
*/
Already_Open = 0;
/*
* Wake up all the processes in WaitQ, so if anybody is waiting for the
* file, they can have it.
*/
wake_up(&WaitQ);
module_put(THIS_MODULE);
return 0; /* success */
}
/*
* This function decides whether to allow an operation (return zero) or not
* allow it (return a non-zero which indicates why it is not allowed).
*
* The operation can be one of the following values:
* 0 - Execute (run the "file" - meaningless in our case)
* 2 - Write (input to the kernel module)
* 4 - Read (output from the kernel module)
*
* This is the real function that checks file permissions. The permissions
* returned by ls -l are for reference only, and can be overridden here.
*/
static int module_permission(struct inode *inode, int op, struct nameidata *nd)
{
/*
* We allow everybody to read from our module, but only root (uid 0)
* may write to it
*/
if (op == 4 || (op == 2 && current->euid == 0))
return 0;
/*
* If it's anything else, access is denied
*/
return -EACCES;
}
/*
* Structures to register as the /proc file, with pointers to all the relevant
* functions.
*/
/*
* File operations for our proc file. This is where we place pointers to all
* the functions called when somebody tries to do something to our file. NULL
* means we don't want to deal with something.
*/
static struct file_operations File_Ops_4_Our_Proc_File = {
.read = module_output, /* "read" from the file */
.write = module_input, /* "write" to the file */
.open = module_open, /* called when the /proc file is opened */
.release = module_close, /* called when it's closed */
};
/*
* Inode operations for our proc file. We need it so we'll have somewhere to
* specify the file operations structure we want to use, and the function we
* use for permissions. It's also possible to specify functions to be called
* for anything else which could be done to an inode (although we don't bother,
* we just put NULL).
*/
static struct inode_operations Inode_Ops_4_Our_Proc_File = {
.permission = module_permission, /* check for permissions */
};
/*
* Module initialization and cleanup
*/
/*
* Initialize the module - register the proc file
*/
int init_module()
{
int rv = 0;
Our_Proc_File = create_proc_entry(PROC_ENTRY_FILENAME, 0644, NULL);
Our_Proc_File->owner = THIS_MODULE;
Our_Proc_File->proc_iops = &Inode_Ops_4_Our_Proc_File;
Our_Proc_File->proc_fops = &File_Ops_4_Our_Proc_File;
Our_Proc_File->mode = S_IFREG | S_IRUGO | S_IWUSR;
Our_Proc_File->uid = 0;
Our_Proc_File->gid = 0;
Our_Proc_File->size = 80;
if (Our_Proc_File == NULL) {
rv = -ENOMEM;
remove_proc_entry(PROC_ENTRY_FILENAME, &proc_root);
printk(KERN_INFO "Error: Could not initialize /proc/test\n" );
}
return rv;
}
/*
* Cleanup - unregister our file from /proc. This could get dangerous if
* there are still processes waiting in WaitQ, because they are inside our
* open function, which will get unloaded. I'll explain how to avoid removal
* of a kernel module in such a case in chapter 10.
*/
void cleanup_module()
{
remove_proc_entry(PROC_ENTRY_FILENAME, &proc_root);
}
Notes
[1] 最方便的保持某个文件被打开的方法是使用命令 tail -f打开该文件。
[2] 这就意味着该进程仍然在内核态中, 该进程已经调用了open的系统调用,但系统调用却没有返回。 在这段时间内该进程将不会得知别人正在使用CPU。
[3] 这是因为我们使用的是module_interruptible_sleep_on。我们也可以使用 module_sleep_on,但这样会导致一些十分愤怒的用户,因为他们的Ctrl+c将不起任何作用。