功能简介
Linux内核等待队列是内核的一套事件唤醒机制,一般用于内核中断与线程的唤醒休眠,内核线程之间的唤醒休眠。常用接口是唤醒:wake_up_interruptible,等待休眠:wait_event_interruptible。
图1:等待队列常见使用流程
图1所示为等待队列常见使用流程,例如图中线程1通过系统调用进入内核,后调用wake_up_interruptible等待唤醒事件,当事件不满足条件时,线程1进入休眠状态。中断或其他线程任务事件满足条件时,调用wait_event_interruptible,唤醒等待在wait_queue_head上的线程1。其中等待队列wait_queue_head是内核中的一个struct,需要在使用时进行初始化。
内核中使用步骤
初始化
声明一个struct wait_queue_head变量test,并初始化:
DECLARE_WAIT_QUEUE_HEAD test;
再声明一个条件变量int condition = 0;
DECLARE_WAIT_QUEUE_HEAD的宏定义在include/linux/wait.h中
线程等待事件
线程的内核态调用
wait_event_interruptible(test,condition);
当condition为非0时,立马返回,不会休眠,当condition为0时,线程加入等待队列链表中,然后线程进入休眠状态。
事件产生,唤醒等待线程
当某一个事件状态满足要求,如某一个中断触发时,内核代码执行:
wake_up_interruptible(&test);
唤醒之前休眠等待在test等待队列中的线程。
内核中数据结构
在include/linux/wait.h中,有struct wait_queue_entry和struct wait_queue_entry。
/*
* A single wait-queue entry structure:
*/
struct wait_queue_entry {
unsigned int flags;
void *private;
wait_queue_func_t func;
struct list_head entry;
};
struct wait_queue_head {
spinlock_t lock;
struct list_head head;
};
typedef struct wait_queue_head wait_queue_head_t;
图2:初始化struct wait_queue_head后的数据结构图
图3:添加一个线程到等待队列后的数据结构图
如图3所示为添加一个线程到等待队列后的数据结构图,每添加一个线程到队列中,反应到数据结构中,就是struct wait_queue_head中添加了一个struct wait_queue_entry,其中void *private记录线程的task_struct,func记录当唤醒等待队列中的线程时,使用的回调函数。接着上面的使用示例,每当不同的线程调用wait_event_interruptible(test,condition)时,就会在上面链表中添加一个struct wait_queue_entry。唤醒时,会遍历链表,逐个唤醒。
内核中关键函数实现
wait_event_interruptible
图4 wait_event_interruptible 代码流程图
图5 任务1、任务2 调用wait_event_interruptible休眠后链表示意图
在include/linux/wait.h中
#define wait_event_interruptible(wq_head, condition) \
({ \
int __ret = 0; \
might_sleep(); \
if (!(condition)) \
__ret = __wait_event_interruptible(wq_head, condition); \
__ret; \
})
当condition为0时,进入__wait_event_interruptible流程,线程进入休眠
当condition为1时,线程直接返回0,不休眠
#define __wait_event_interruptible(wq_head, condition) \
___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \
schedule())
创建一个struct wait_queue_entry __wq_entry结构,
记录线程信息,挂入struct wait_queue_head中的链表中,然后执行schedule,释放cpu给其他线程。
#define ___wait_event(wq_head, condition, state, exclusive, ret, cmd) \
({ \
__label__ __out; \
struct wait_queue_entry __wq_entry; \
long __ret = ret; /* explicit shadow */ \
\
init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\
\
if (condition) \
break; \
\
if (___wait_is_interruptible(state) && __int) { \
__ret = __int; \
goto __out; \
} \
\
cmd; \
} \
finish_wait(&wq_head, &__wq_entry); \
__out: __ret; \
})
kernel/sched/wait.c
void init_wait_entry(struct wait_queue_entry *wq_entry, int flags)
{
wq_entry->flags = flags;
wq_entry->private = current;
wq_entry->func = autoremove_wake_function;
INIT_LIST_HEAD(&wq_entry->entry);
}
EXPORT_SYMBOL(init_wait_entry);
上面的函数设置了等待在队列的进程,设置了唤醒等待队列时的回调函数
autoremove_wake_function。
long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)
{
unsigned long flags;
long ret = 0;
spin_lock_irqsave(&wq_head->lock, flags);
/*
如果state == TASK_INTERRUPTIBLE ,且任务中有信号需要处理,或者任务有收到KILL信号
这时,返回-ERESTARTSYS ,不需要被唤醒
*/
if (unlikely(signal_pending_state(state, current))) {
/*
* Exclusive waiter must not fail if it was selected by wakeup,
* it should "consume" the condition we were waiting for.
*
* The caller will recheck the condition and return success if
* we were already woken up, we can not miss the event because
* wakeup locks/unlocks the same wq_head->lock.
*
* But we need to ensure that set-condition + wakeup after that
* can't see us, it should wake up another exclusive waiter if
* we fail.
*/
list_del_init(&wq_entry->entry);
ret = -ERESTARTSYS;
} else {
/*把wq_entry 加入到wq_head 的链表中*/
if (list_empty(&wq_entry->entry)) {
if (wq_entry->flags & WQ_FLAG_EXCLUSIVE)
__add_wait_queue_entry_tail(wq_head, wq_entry);
else
__add_wait_queue(wq_head, wq_entry);
}
/*设置当前进程的*/
set_current_state(state);
}
spin_unlock_irqrestore(&wq_head->lock, flags);
return ret;
}
EXPORT_SYMBOL(prepare_to_wait_event);
wake_up_interruptible
图6 wake_up_interruptible流程图
include/linux/wait.h
#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)
kernel/sched/wait.c
void __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
int nr_exclusive, void *key)
{
__wake_up_common_lock(wq_head, mode, nr_exclusive, 0, key);
}
EXPORT_SYMBOL(__wake_up);
static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
int nr_exclusive, int wake_flags, void *key,
wait_queue_entry_t *bookmark)
{
wait_queue_entry_t *curr, *next;
int cnt = 0;
if (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) {
curr = list_next_entry(bookmark, entry);
list_del(&bookmark->entry);
bookmark->flags = 0;
} else
curr = list_first_entry(&wq_head->head, wait_queue_entry_t, entry);
if (&curr->entry == &wq_head->head)
return nr_exclusive;
list_for_each_entry_safe_from(curr, next, &wq_head->head, entry) {
unsigned flags = curr->flags;
int ret;
if (flags & WQ_FLAG_BOOKMARK)
continue;
ret = curr->func(curr, mode, wake_flags, key);
if (ret < 0)
break;
if (ret && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
break;
if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) &&
(&next->entry != &wq_head->head)) {
bookmark->flags = WQ_FLAG_BOOKMARK;
list_add_tail(&bookmark->entry, &next->entry);
break;
}
}
return nr_exclusive;
}
int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key)
{
int ret = default_wake_function(wq_entry, mode, sync, key);
if (ret)
list_del_init(&wq_entry->entry);
return ret;
}
EXPORT_SYMBOL(autoremove_wake_function);
kernel/sched/core.c
int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
void *key)
{
return try_to_wake_up(curr->private, mode, wake_flags);
}
EXPORT_SYMBOL(default_wake_function);
主要的唤醒流程在__wake_up_common函数中
遍历struct wait_queue_head的链表中链入的struct wait_queue_entry,并执行等待队列的回调函数:autoremove_wake_function
autoremove_wake_function中调用try_to_wake_up唤醒struct wait_queue_entry中记录的task_struct,如果成功唤醒了线程,那么同时删除节点。如果没有成功唤醒线程,则不删除节点,这样下次调用时还会再唤醒一次。
自问自答
1. 如果wait_event_interruptible(test,condition)中,condition一直为1会怎么样?
答:调用次接口的线程会立马返回不会休眠,如果线程中是死循环调用,则会CPU占用率高
2. 如果两个线程调用wait_event_interruptible(test,condition)阻塞休眠,一个中断调用wake_up_interruptible(test)并设置condition为1,可以唤醒两个线程吗?
答:可以,因为在wake_up_interruptible中会遍历注册到test等待队列中的任务信息,并唤醒。
3. 如果两个线程调用wait_event_interruptible(test,condition)阻塞休眠,一个中断调用wake_up_interruptible(test)但是并没有设置condition为1,会怎么样?
答:两个线程会继续休眠等待。
4. 业务模型为:线程1在循环中调用wait_event_interruptible(test,condition)休眠等待,中断来了后使用wake_up_interruptible(test)并设置condition为1,唤醒线程1,继续执行循环,当怀疑中断来了后,没有唤醒线程1,怎么排查?
答:(1)确定中断确实来了
(2)确定线程1确实没有被唤醒
(3)排查CPU占用,是不是CPU太忙了,导致线程1虽然被唤醒但是没执行
(4)添加定位代码:当中断来时,检查test struct wait_queue_head中是否有等待的struct wait_queue_entry,以及等待的task_struct的state是否处于TASK_INTERRUPTIBLE,pid是否符合预期,如果不是,说明调用wait_event_interruptible的线程存在问题,需要排查