博主你好, 请教一个问题.
__down()里面有一段代码, 我觉得不那么保险.我先把__down的源码贴出来:
=============================================
void __down(struct semaphore * sem)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk); //定义一个"队列项", 等待者是当前进程
tsk->state = TASK_UNINTERRUPTIBLE;
add_wait_queue_exclusive(&sem->wait, &wait); //把当前进程添加到该信号量的wait queue里.
spin_lock_irq(&semaphore_lock); //抓取"大锁"
sem->sleepers++;
for (;;) {
int sleepers = sem->sleepers;
/*
* Add "everybody else" into it. They aren't
* playing, because we own the spinlock.
*/
if (!atomic_add_negative(sleepers - 1, &sem->count)) { //临睡前最后一次尝试
sem->sleepers = 0;
break;
}
sem->sleepers = 1; /* us - see -1 above */
spin_unlock_irq(&semaphore_lock);
schedule(); //睡眠
tsk->state = TASK_UNINTERRUPTIBLE;
spin_lock_irq(&semaphore_lock);
}
spin_unlock_irq(&semaphore_lock);
remove_wait_queue(&sem->wait, &wait); //取得信号量后, 退出该信号量的等待队列
tsk->state = TASK_RUNNING;
wake_up(&sem->wait);
}
我也是这两天才开始读linux的源码. 我先说说我读到的一点经验. 2.4内核里的semaphore结构体里面没有lock字段, 整个semaphor.c里是共用一个文件域的大锁, 就是semaphore.c里定义的semaphore_lock. 每当要操作semaphore结构体之前, 就先抓取这个"全局锁".
但是up()操作的全程都没有理睬这把锁, 我很好奇, 会不会出现这样一种bug呢:
为了方便分析, 假设除了当前进程, 没有别的进程在竞争这个信号量.
刚才说到up()的全程都没理会"大锁", 所以在整个__down()的过程中, 别的cpu上, 随时可能会有一个up()平行的运行. up()最终调用的是wake_up_process().
=====================================
inline void wake_up_process(struct task_struct * p)
{
unsigned long flags;
/*
* We want the common case fall through straight, thus the goto.
*/
spin_lock_irqsave(&runqueue_lock, flags);
p->state = TASK_RUNNING; if (task_on_runqueue(p)) goto out;
add_to_runqueue(p);
reschedule_idle(p);
out:
spin_unlock_irqrestore(&runqueue_lock, flags);
}
===============================================
那么在__down()里面的这段区间,
add_wait_queue_exclusive(&sem->wait, &wait);
...
...
schedule();
也就是, current进入信号量排队之后, 调用schedule()之前, 我们随时可能遭受"wake_up_process()".
如果在if (!atomic_add_negative(sleepers - 1, &sem->count)) 这句之前被wake_up_process(), 倒也无所谓, 因为反正我们能通过这个if拿到信号量, (既然有人up, 肯定就是有门票了).
但是如果这个if失败, 我们就要睡眠了. 在我们调用schdule入睡之前的这个空隙里, 即执行这几行代码的时候:
sem->sleepers = 1; /* us - see -1 above */
spin_unlock_irq(&semaphore_lock);
schedule();
我们遭到了wake_up_process.
会发生什么呢? 其实看wake_up_process()的源码,它也做不了什么( 因为我们已经在运行队列里了 ), 但它把我们的状态设置成TASK_RUNNING了.
就是说, 接下来, 我们是以"TASK_RUNNING"的身份调用schedule的.
更坏的是, 我们等于说是错过了这次up(), 再没有人来唤醒我们了.
我上面说的很麻烦, 简单的说, 就是, up()为什么不理睬semaphore_lock这个锁? 明明会出bug.
我想是我哪里错了, linux肯定不会有这种bug.
恳请指教.
------------------------------------------------------------------------------
*
为了方便一些, 我把相关的源码都贴上:
== down()和up()的入口函数来自 include/asm-i386/semaphore.h
static inline void down(struct semaphore * sem)
{
#if WAITQUEUE_DEBUG
CHECK_MAGIC(sem->__magic);
#endif
__asm__ __volatile__(
"# atomic down operation\n\t"
LOCK "decl %0\n\t" /* --sem->count */
"js 2f\n"
"1:\n"
".section .text.lock,\"ax\"\n"
"2:\tcall __down_failed\n\t"
"jmp 1b\n"
".previous"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
static inline void up(struct semaphore * sem)
{
#if WAITQUEUE_DEBUG
CHECK_MAGIC(sem->__magic);
#endif
__asm__ __volatile__(
"# atomic up operation\n\t"
LOCK "incl %0\n\t" /* ++sem->count */
"jle 2f\n"
"1:\n"
".section .text.lock,\"ax\"\n"
"2:\tcall __up_wakeup\n\t"
"jmp 1b\n"
".previous"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
===fall through失败后的操作, 都在arch/i386/kernel/semaphore.c
asm(
".align 4\n"
".globl __down_failed\n"
"__down_failed:\n\t"
"pushl %eax\n\t"
"pushl %edx\n\t"
"pushl %ecx\n\t"
"call __down\n\t"
"popl %ecx\n\t"
"popl %edx\n\t"
"popl %eax\n\t"
"ret"
);
void __down(struct semaphore * sem)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
tsk->state = TASK_UNINTERRUPTIBLE;
add_wait_queue_exclusive(&sem->wait, &wait);
spin_lock_irq(&semaphore_lock);
sem->sleepers++;
for (;;) {
int sleepers = sem->sleepers;
/*
* Add "everybody else" into it. They aren't
* playing, because we own the spinlock.
*/
if (!atomic_add_negative(sleepers - 1, &sem->count)) {
sem->sleepers = 0;
break;
}
sem->sleepers = 1; /* us - see -1 above */
spin_unlock_irq(&semaphore_lock);
schedule();
tsk->state = TASK_UNINTERRUPTIBLE;
spin_lock_irq(&semaphore_lock);
}
spin_unlock_irq(&semaphore_lock);
remove_wait_queue(&sem->wait, &wait);
tsk->state = TASK_RUNNING;
wake_up(&sem->wait);
}
void __up(struct semaphore *sem)
{
wake_up(&sem->wait);
}
asm(
".align 4\n"
".globl __up_wakeup\n"
"__up_wakeup:\n\t"
"pushl %eax\n\t"
"pushl %edx\n\t"
"pushl %ecx\n\t"
"call __up\n\t"
"popl %ecx\n\t"
"popl %edx\n\t"
"popl %eax\n\t"
"ret"
);
*
这部分的内容在403页前后.