Linux内核进程详解之二:bdi-default

bdi,即是backing device info的缩写,根据英文单词全称可知其通指备用存储设备相关描述信息,这在内核代码里用一个结构体backing_dev_info来表示:http://lxr.linux.no/#linux+v2.6.38.8/include/linux/backing-dev.h#L62

bdi,备用存储设备,简单点说就是能够用来存储数据的设备,而这些设备存储的数据能够保证在计算机电源关闭时也不丢失。这样说来,软盘存储设备、光驱存储设备、USB存储设备、硬盘存储设备都是所谓的备用存储设备(后面都用bdi来指示),而内存显然不是,具体看下面这个链接:http://www.gordonschools.aberdeenshire.sch.uk/Departments/Computing/StandardGrade/SystemsWeb/6BackingStorage.htm

相对于内存来说,bdi设备(比如最常见的硬盘存储设备)的读写速度是非常慢的,因此为了提高系统整体性能,Linux系统对bdi设备的读写内容进行了缓冲,那些读写的数据会临时保存在内存里,以避免每次都直接操作bdi设备,但这就需要在一定的时机(比如每隔5秒、脏数据达到的一定的比率等)把它们同步到bdi设备,否则长久的呆在内存里容易丢失(比如机器突然宕机、重启),而进行间隔性同步工作的进程之前名叫pdflush,但后来在Kernel 2.6.2x/3x(没注意具体是哪个小版本的改动,比如:http://kernelnewbies.org/Linux_2_6_35#head-57d43d498509746df08f48b1f040d20d88d2b984http://lwn.net/Articles/396757/)对此进行了优化改进,产生有多个内核进程bdi-default、flush-x:y等,这也是这两篇文章要介绍的内容。

关于以前的pdflush不再多说,我们这里只讨论bdi-default和flush-x:y,这两个进程(事实上,flush-x:y为多个)的关系与运行模式类似于lighttpd的那种标准的父子进程工作demon模型,当然,很多人不了解lighttpd的进程模型,下面详解。

一般而言,一个Linux系统会挂载很多bdi设备,在bdi设备注册(函数:bdi_register(…))时,这些bdi设备会以链表的形式组织在全局变量bdi_list下,除了一个比较特别的bdi设备以外,它就是default bdi设备(default_backing_dev_info),它除了被加进到bdi_list,还会新建一个bdi-default内核进程,即本文的主角。具体代码如下,我相信你一眼就能注意到kthread_run和list_add_tail_rcu这样的关键代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
struct  backing_dev_info default_backing_dev_info = {
     .name       =  "default" ,
     .ra_pages   = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
     .state      = 0,
     .capabilities   = BDI_CAP_MAP_COPY,
     .unplug_io_fn   = default_unplug_io_fn,
};
EXPORT_SYMBOL_GPL(default_backing_dev_info);
  
static  inline  bool  bdi_cap_flush_forker( struct  backing_dev_info *bdi)
{
     return  bdi == &default_backing_dev_info;
}
  
int  bdi_register( struct  backing_dev_info *bdi,  struct  device *parent,
         const  char  *fmt, ...)
{
     va_list  args;
     struct  device *dev;
  
     if  (bdi->dev)     /* The driver needs to use separate queues per device */
         return  0;
  
     va_start (args, fmt);
     dev = device_create_vargs(bdi_class, parent, MKDEV(0, 0), bdi, fmt, args);
     va_end (args);
     if  (IS_ERR(dev))
         return  PTR_ERR(dev);
  
     bdi->dev = dev;
  
     /*
      * Just start the forker thread for our default backing_dev_info,
      * and add other bdi's to the list. They will get a thread created
      * on-demand when they need it.
      */
     if  (bdi_cap_flush_forker(bdi)) {
         struct  bdi_writeback *wb = &bdi->wb;
  
         wb->task = kthread_run(bdi_forker_thread, wb,  "bdi-%s" ,
                         dev_name(dev));
         if  (IS_ERR(wb->task))
             return  PTR_ERR(wb->task);
     }
  
     bdi_debug_register(bdi, dev_name(dev));
     set_bit(BDI_registered, &bdi->state);
  
     spin_lock_bh(&bdi_lock);
     list_add_tail_rcu(&bdi->bdi_list, &bdi_list);
     spin_unlock_bh(&bdi_lock);
  
     trace_writeback_bdi_register(bdi);
     return  0;
}
EXPORT_SYMBOL(bdi_register);

接着跟进函数bdi_forker_thread,它是bdi-default内核进程的主体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
static  int  bdi_forker_thread( void  *ptr)
{
     struct  bdi_writeback *me = ptr;
  
     current->flags |= PF_SWAPWRITE;
     set_freezable();
  
     /*
      * Our parent may run at a different priority, just set us to normal
      */
     set_user_nice(current, 0);
  
     for  (;;) {
         struct  task_struct *task = NULL;
         struct  backing_dev_info *bdi;
         enum  {
             NO_ACTION,    /* Nothing to do */
             FORK_THREAD,  /* Fork bdi thread */
             KILL_THREAD,  /* Kill inactive bdi thread */
         } action = NO_ACTION;
  
         /*
          * Temporary measure, we want to make sure we don't see
          * dirty data on the default backing_dev_info
          */
         if  (wb_has_dirty_io(me) || !list_empty(&me->bdi->work_list)) {
             del_timer(&me->wakeup_timer);
             wb_do_writeback(me, 0);
         }
  
         spin_lock_bh(&bdi_lock);
         set_current_state(TASK_INTERRUPTIBLE);
  
         list_for_each_entry(bdi, &bdi_list, bdi_list) {
             bool  have_dirty_io;
  
             if  (!bdi_cap_writeback_dirty(bdi) ||
                  bdi_cap_flush_forker(bdi))
                 continue ;
  
             WARN(!test_bit(BDI_registered, &bdi->state),
                  "bdi %p/%s is not registered!\n" , bdi, bdi->name);
  
             have_dirty_io = !list_empty(&bdi->work_list) ||
                     wb_has_dirty_io(&bdi->wb);
  
             /*
              * If the bdi has work to do, but the thread does not
              * exist - create it.
              */
             if  (!bdi->wb.task && have_dirty_io) {
                 /*
                  * Set the pending bit - if someone will try to
                  * unregister this bdi - it'll wait on this bit.
                  */
                 set_bit(BDI_pending, &bdi->state);
                 action = FORK_THREAD;
                 break ;
             }
  
             spin_lock(&bdi->wb_lock);
  
             /*
              * If there is no work to do and the bdi thread was
              * inactive long enough - kill it. The wb_lock is taken
              * to make sure no-one adds more work to this bdi and
              * wakes the bdi thread up.
              */
             if  (bdi->wb.task && !have_dirty_io &&
                 time_after(jiffies, bdi->wb.last_active +
                         bdi_longest_inactive())) {
                 task = bdi->wb.task;
                 bdi->wb.task = NULL;
                 spin_unlock(&bdi->wb_lock);
                 set_bit(BDI_pending, &bdi->state);
                 action = KILL_THREAD;
                 break ;
             }
             spin_unlock(&bdi->wb_lock);
         }
         spin_unlock_bh(&bdi_lock);
  
         /* Keep working if default bdi still has things to do */
         if  (!list_empty(&me->bdi->work_list))
             __set_current_state(TASK_RUNNING);
  
         switch  (action) {
         case  FORK_THREAD:
             __set_current_state(TASK_RUNNING);
             task = kthread_create(bdi_writeback_thread, &bdi->wb,
                           "flush-%s" , dev_name(bdi->dev));
             if  (IS_ERR(task)) {
                 /*
                  * If thread creation fails, force writeout of
                  * the bdi from the thread.
                  */
                 bdi_flush_io(bdi);
             else  {
                 /*
                  * The spinlock makes sure we do not lose
                  * wake-ups when racing with 'bdi_queue_work()'.
                  * And as soon as the bdi thread is visible, we
                  * can start it.
                  */
                 spin_lock_bh(&bdi->wb_lock);
                 bdi->wb.task = task;
                 spin_unlock_bh(&bdi->wb_lock);
                 wake_up_process(task);
             }
             break ;
  
         case  KILL_THREAD:
             __set_current_state(TASK_RUNNING);
             kthread_stop(task);
             break ;
  
         case  NO_ACTION:
             if  (!wb_has_dirty_io(me) || !dirty_writeback_interval)
                 /*
                  * There are no dirty data. The only thing we
                  * should now care about is checking for
                  * inactive bdi threads and killing them. Thus,
                  * let's sleep for longer time, save energy and
                  * be friendly for battery-driven devices.
                  */
                 schedule_timeout(bdi_longest_inactive());
             else
                 schedule_timeout(msecs_to_jiffies(dirty_writeback_interval * 10));
             try_to_freeze();
             /* Back to the main loop */
             continue ;
         }
  
         /*
          * Clear pending bit and wakeup anybody waiting to tear us down.
          */
         clear_bit(BDI_pending, &bdi->state);
         smp_mb__after_clear_bit();
         wake_up_bit(&bdi->state, BDI_pending);
     }
  
     return  0;
}

代码看上去很多,但逻辑十分简单,一个for死循序,接着一个list_for_each_entry遍历bdi_list下的所有bdi设备对应的flush-x:y内核进程是否存在、进程状态如何、是否需要进行对应的操作(kill掉或create)。

绝大部分的bdi设备都会有对应的flush-x:y内核进程,除了一些特殊的bdi设备,比如default bdi设备或其它一些内存虚拟bdi设备,这从第一个if判断代码可以看出:

1
2
3
if  (!bdi_cap_writeback_dirty(bdi) ||
      bdi_cap_flush_forker(bdi))
     continue ;

关于flush-x:y内核进程具体做什么,待下一篇文章再讲,但我们这里需要知道,如果一个bdi设备当前有脏数据需要同步,那么它对应的flush-x:y内核进程就会被创建(当然,这是在它本身不存在的情况下):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
have_dirty_io = !list_empty(&bdi->work_list) ||
         wb_has_dirty_io(&bdi->wb);
 
/*
  * If the bdi has work to do, but the thread does not
  * exist - create it.
  */
if  (!bdi->wb.task && have_dirty_io) {
     /*
      * Set the pending bit - if someone will try to
      * unregister this bdi - it'll wait on this bit.
      */
     set_bit(BDI_pending, &bdi->state);
     action = FORK_THREAD;
     break ;
}

标记action为FORK_THREAD,在接下来(注意if语句里的break语句,这个break语句会跳出list_for_each_entry循环)的switch (action)的语句体里进行具体的flush-x:y内核进程创建工作。
如果一个bdi设备当前没有脏数据需要同步,并且它对应的flush-x:y内核进程已经有很久没有活动(通过对比最后活动时间last_active与当前jiffies)了,那么就把它kill掉:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
             /*
              * If there is no work to do and the bdi thread was
              * inactive long enough - kill it. The wb_lock is taken
              * to make sure no-one adds more work to this bdi and
              * wakes the bdi thread up.
              */
             if  (bdi->wb.task && !have_dirty_io &&
                 time_after(jiffies, bdi->wb.last_active +
                         bdi_longest_inactive())) {
                 task = bdi->wb.task;
                 bdi->wb.task = NULL;
                 spin_unlock(&bdi->wb_lock);
                 set_bit(BDI_pending, &bdi->state);
                 action = KILL_THREAD;
                 break ;
             }
  
/*
  * Calculate the longest interval (jiffies) bdi threads are allowed to be
  * inactive.
  */
static  unsigned  long  bdi_longest_inactive( void )
{
     unsigned  long  interval;
  
     interval = msecs_to_jiffies(dirty_writeback_interval * 10);
     return  max(5UL * 60 * HZ, interval);
}
  
unsigned  int  dirty_writeback_interval = 5 * 100;  /* centiseconds */

可以看到“很久”在默认情况下是5分钟,此时标记action为KILL_THREAD,在接下来的switch (action)的语句体里进行具体的flush-x:y内核进程kill工作。

如果所有bdi设备遍历操作结束,此时bdi-default内核进程自身执行switch (action)的语句体里NO_ACTION语句进行睡眠,直到超时后continue重复上面的工作。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值