flashcache_md_write函数会在某一cache块由脏变干净或者由干净变脏的情况下调用:这一点可以从代码中这部分看出来:
VERIFY(job->action == WRITEDISK || job->action == WRITECACHE ||
job->action == WRITEDISK_SYNC);
当然也可以从flashcache_md_write_kickoff函数中(同步元数据的实际代码)看出来:
if (job->action == WRITECACHE) {
/* DIRTY the cache block */
md_block[INDEX_TO_MD_BLOCK_OFFSET(dmc, job->index)].cache_state =
(VALID | DIRTY);
} else { /* job->action == WRITEDISK* */
/* un-DIRTY the cache block */
md_block[INDEX_TO_MD_BLOCK_OFFSET(dmc, job->index)].cache_state = VALID;
}
如果该块元数据上已经有IO在进行处理,那么就加到等待队列的队尾上去,这里有一个问题,等待队列上的请求会等到什么时候才被处理呢?
md_block_head = &dmc->md_blocks_buf[INDEX_TO_MD_BLOCK(dmc, job->index)];
spin_lock_irqsave(&md_block_head->md_block_lock, flags);
/* If a write is in progress for this metadata sector, queue this update up */
if (md_block_head->nr_in_prog != 0) {
struct kcached_job **nodepp;
/* A MD update is already in progress, queue this one up for later */
nodepp = &md_block_head->queued_updates;
while (*nodepp != NULL)
nodepp = &((*nodepp)->next);
job->next = NULL;
*nodepp = job;
spin_unlock_irqrestore(&md_block_head->md_block_lock, flags);
这就要看flashcache_md_write_done(struct kcached_job *job)再处理完本次元数据写后是怎么处理的:
if (md_block_head->queued_updates != NULL) {
/* peel off the first job from the pending queue and kick that off */
job = md_block_head->queued_updates;
md_block_head->queued_updates = job->next;
spin_unlock(&md_block_head->md_block_lock);
job->next = NULL;
spin_unlock_irq(&cache_set->set_spin_lock);
VERIFY(job->action == WRITEDISK || job->action == WRITECACHE ||
job->action == WRITEDISK_SYNC);
flashcache_md_write_kickoff(job);
可以看到,在该函数的末尾处,也就是完成了处理这次元数据写完的过程后,会检查该元数据块的等待队列,如果等待队列非空,则需要继续进行元数据写操作。这样可以保证对cache的写操作能够尽快完成(元数据写到cache上才算这次对cache写数据的操作完成)
那么如果原本等待队列为空呢?
} else {
md_block_head->nr_in_prog = 1;
spin_unlock_irqrestore(&md_block_head->md_block_lock, flags);
/*
* Always push to a worker thread. If the driver has
* a completion thread, we could end up deadlocking even
* if the context would be safe enough to write from.
* This could be executed from the context of an IO
* completion thread. Kicking off the write from that
* context could result in the IO completion thread
* blocking (eg on memory allocation). That can easily
* deadlock.
*/
push_md_io(job);
schedule_work(&_kcached_wq);
}
这里看else语句,通过注释得知,一般会在一次IO完成后,写元数据时,进入该else子句,这时候为了防止执行该子句的线程死锁(因为写元数据嘛,可能要申请内存,而如果内存申请不到,该线程就可能阻塞在这)。这时,就需要把写元素据的任务交给work_queue去处理,而此次对数据的IO就可以结束了。这样就避免了该线程的死锁。我用下面的代码测试了一下workqueue。可以发现workqueue和调用schedule_work的线程是两个线程去做的,所以就避免了死锁的情况:
#include <linux/module.h>
#include <linux/init.h>
#include <linux/workqueue.h>
#define N 1000
static struct work_struct work;
static void work_handler(struct work_struct *data)
{
int i;
for(i = 0 ; i < N ;i++){
printk("handle");
}
}
static int __init workqueue_init(void)
{
int i;
// printk("init work queue demo.\n");
INIT_WORK(&work, work_handler);
schedule_work(&work);
for(i = 0 ; i < N ; i++){
printk("init _just a demo for work queue.\n");
}
return 0;
}
static void __exit workqueue_exit(void)
{
printk("exit work queue demo.\n");
}
MODULE_LICENSE("GPL");
module_init(workqueue_init);
module_exit(workqueue_exit);
输出结果会先输出N条init _just a demo for work queue.再输出N个handle,中间没有交叉
从输出结果来看工作队列会推迟任务的执行。