Linux内核进程详解之二：bdi-default

最新推荐文章于 2020-07-21 22:40:57 发布

sannik

最新推荐文章于 2020-07-21 22:40:57 发布

阅读量2.1k

点赞数

bdi，即是backing device info的缩写，根据英文单词全称可知其通指备用存储设备相关描述信息，这在内核代码里用一个结构体backing_dev_info来表示：http://lxr.linux.no/#linux+v2.6.38.8/include/linux/backing-dev.h#L62

bdi，备用存储设备，简单点说就是能够用来存储数据的设备，而这些设备存储的数据能够保证在计算机电源关闭时也不丢失。这样说来，软盘存储设备、光驱存储设备、USB存储设备、硬盘存储设备都是所谓的备用存储设备（后面都用bdi来指示），而内存显然不是，具体看下面这个链接：http://www.gordonschools.aberdeenshire.sch.uk/Departments/Computing/StandardGrade/SystemsWeb/6BackingStorage.htm

相对于内存来说，bdi设备（比如最常见的硬盘存储设备）的读写速度是非常慢的，因此为了提高系统整体性能，Linux系统对bdi设备的读写内容进行了缓冲，那些读写的数据会临时保存在内存里，以避免每次都直接操作bdi设备，但这就需要在一定的时机（比如每隔5秒、脏数据达到的一定的比率等）把它们同步到bdi设备，否则长久的呆在内存里容易丢失（比如机器突然宕机、重启），而进行间隔性同步工作的进程之前名叫pdflush，但后来在Kernel 2.6.2x/3x（没注意具体是哪个小版本的改动，比如：http://kernelnewbies.org/Linux_2_6_35#head-57d43d498509746df08f48b1f040d20d88d2b984，http://lwn.net/Articles/396757/）对此进行了优化改进，产生有多个内核进程，bdi-default、flush-x:y等，这也是这两篇文章要介绍的内容。

关于以前的pdflush不再多说，我们这里只讨论bdi-default和flush-x:y，这两个进程（事实上，flush-x:y为多个）的关系与运行模式类似于lighttpd的那种标准的父子进程工作demon模型，当然，很多人不了解lighttpd的进程模型，下面详解。

一般而言，一个Linux系统会挂载很多bdi设备，在bdi设备注册（函数：bdi_register(…)）时，这些bdi设备会以链表的形式组织在全局变量bdi_list下，除了一个比较特别的bdi设备以外，它就是default bdi设备（default_backing_dev_info），它除了被加进到bdi_list，还会新建一个bdi-default内核进程，即本文的主角。具体代码如下，我相信你一眼就能注意到kthread_run和list_add_tail_rcu这样的关键代码。

 
        struct  
        backing_dev_info default_backing_dev_info = { 
       
        .name       =  
        "default" 
        , 
       
        .ra_pages   = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, 
       
        .state      = 0, 
       
        .capabilities   = BDI_CAP_MAP_COPY, 
       
        .unplug_io_fn   = default_unplug_io_fn, 
       
        }; 
       
        EXPORT_SYMBOL_GPL(default_backing_dev_info); 
       
        static  
        inline  
        bool  
        bdi_cap_flush_forker( 
        struct  
        backing_dev_info *bdi) 
       
        { 
       
        return  
        bdi == &default_backing_dev_info; 
       
        } 
       
        int  
        bdi_register( 
        struct  
        backing_dev_info *bdi,  
        struct  
        device *parent, 
       
        const  
        char  
        *fmt, ...) 
       
        { 
       
        va_list  
        args; 
       
        struct  
        device *dev; 
       
        if  
        (bdi->dev)     
        /* The driver needs to use separate queues per device */ 
       
        return  
        0; 
       
        va_start 
        (args, fmt); 
       
        dev = device_create_vargs(bdi_class, parent, MKDEV(0, 0), bdi, fmt, args); 
       
        va_end 
        (args); 
       
        if  
        (IS_ERR(dev)) 
       
        return  
        PTR_ERR(dev); 
       
        bdi->dev = dev; 
       
        /* 
       
        * Just start the forker thread for our default backing_dev_info, 
       
        * and add other bdi's to the list. They will get a thread created 
       
        * on-demand when they need it. 
       
        */ 
       
        if  
        (bdi_cap_flush_forker(bdi)) { 
       
        struct  
        bdi_writeback *wb = &bdi->wb; 
       
        wb->task = kthread_run(bdi_forker_thread, wb,  
        "bdi-%s" 
        , 
       
        dev_name(dev)); 
       
        if  
        (IS_ERR(wb->task)) 
       
        return  
        PTR_ERR(wb->task); 
       
        } 
       
        bdi_debug_register(bdi, dev_name(dev)); 
       
        set_bit(BDI_registered, &bdi->state); 
       
        spin_lock_bh(&bdi_lock); 
       
        list_add_tail_rcu(&bdi->bdi_list, &bdi_list); 
       
        spin_unlock_bh(&bdi_lock); 
       
        trace_writeback_bdi_register(bdi); 
       
        return  
        0; 
       
        } 
       
        EXPORT_SYMBOL(bdi_register);

接着跟进函数bdi_forker_thread，它是bdi-default内核进程的主体：

 
        static  
        int  
        bdi_forker_thread( 
        void  
        *ptr) 
       
        { 
       
        struct  
        bdi_writeback *me = ptr; 
       
        current->flags |= PF_SWAPWRITE; 
       
        set_freezable(); 
       
        /* 
       
        * Our parent may run at a different priority, just set us to normal 
       
        */ 
       
        set_user_nice(current, 0); 
       
        for  
        (;;) { 
       
        struct  
        task_struct *task = NULL; 
       
        struct  
        backing_dev_info *bdi; 
       
        enum  
        { 
       
        NO_ACTION,    
        /* Nothing to do */ 
       
        FORK_THREAD,  
        /* Fork bdi thread */ 
       
        KILL_THREAD,  
        /* Kill inactive bdi thread */ 
       
        } action = NO_ACTION; 
       
        /* 
       
        * Temporary measure, we want to make sure we don't see 
       
        * dirty data on the default backing_dev_info 
       
        */ 
       
        if  
        (wb_has_dirty_io(me) || !list_empty(&me->bdi->work_list)) { 
       
        del_timer(&me->wakeup_timer); 
       
        wb_do_writeback(me, 0); 
       
        } 
       
        spin_lock_bh(&bdi_lock); 
       
        set_current_state(TASK_INTERRUPTIBLE); 
       
        list_for_each_entry(bdi, &bdi_list, bdi_list) { 
       
        bool  
        have_dirty_io; 
       
        if  
        (!bdi_cap_writeback_dirty(bdi) || 
       
        bdi_cap_flush_forker(bdi)) 
       
        continue 
        ; 
       
        WARN(!test_bit(BDI_registered, &bdi->state), 
       
        "bdi %p/%s is not registered!\n" 
        , bdi, bdi->name); 
       
        have_dirty_io = !list_empty(&bdi->work_list) || 
       
        wb_has_dirty_io(&bdi->wb); 
       
        /* 
       
        * If the bdi has work to do, but the thread does not 
       
        * exist - create it. 
       
        */ 
       
        if  
        (!bdi->wb.task && have_dirty_io) { 
       
        /* 
       
        * Set the pending bit - if someone will try to 
       
        * unregister this bdi - it'll wait on this bit. 
       
        */ 
       
        set_bit(BDI_pending, &bdi->state); 
       
        action = FORK_THREAD; 
       
        break 
        ; 
       
        } 
       
        spin_lock(&bdi->wb_lock); 
       
        /* 
       
        * If there is no work to do and the bdi thread was 
       
        * inactive long enough - kill it. The wb_lock is taken 
       
        * to make sure no-one adds more work to this bdi and 
       
        * wakes the bdi thread up. 
       
        */ 
       
        if  
        (bdi->wb.task && !have_dirty_io && 
       
        time_after(jiffies, bdi->wb.last_active + 
       
        bdi_longest_inactive())) { 
       
        task = bdi->wb.task; 
       
        bdi->wb.task = NULL; 
       
        spin_unlock(&bdi->wb_lock); 
       
        set_bit(BDI_pending, &bdi->state); 
       
        action = KILL_THREAD; 
       
        break 
        ; 
       
        } 
       
        spin_unlock(&bdi->wb_lock); 
       
        } 
       
        spin_unlock_bh(&bdi_lock); 
       
        /* Keep working if default bdi still has things to do */ 
       
        if  
        (!list_empty(&me->bdi->work_list)) 
       
        __set_current_state(TASK_RUNNING); 
       
        switch  
        (action) { 
       
        case  
        FORK_THREAD: 
       
        __set_current_state(TASK_RUNNING); 
       
        task = kthread_create(bdi_writeback_thread, &bdi->wb, 
       
        "flush-%s" 
        , dev_name(bdi->dev)); 
       
        if  
        (IS_ERR(task)) { 
       
        /* 
       
        * If thread creation fails, force writeout of 
       
        * the bdi from the thread. 
       
        */ 
       
        bdi_flush_io(bdi); 
       
        }  
        else  
        { 
       
        /* 
       
        * The spinlock makes sure we do not lose 
       
        * wake-ups when racing with 'bdi_queue_work()'. 
       
        * And as soon as the bdi thread is visible, we 
       
        * can start it. 
       
        */ 
       
        spin_lock_bh(&bdi->wb_lock); 
       
        bdi->wb.task = task; 
       
        spin_unlock_bh(&bdi->wb_lock); 
       
        wake_up_process(task); 
       
        } 
       
        break 
        ; 
       
        case  
        KILL_THREAD: 
       
        __set_current_state(TASK_RUNNING); 
       
        kthread_stop(task); 
       
        break 
        ; 
       
        case  
        NO_ACTION: 
       
        if  
        (!wb_has_dirty_io(me) || !dirty_writeback_interval) 
       
        /* 
       
        * There are no dirty data. The only thing we 
       
        * should now care about is checking for 
       
        * inactive bdi threads and killing them. Thus, 
       
        * let's sleep for longer time, save energy and 
       
        * be friendly for battery-driven devices. 
       
        */ 
       
        schedule_timeout(bdi_longest_inactive()); 
       
        else 
       
        schedule_timeout(msecs_to_jiffies(dirty_writeback_interval * 10)); 
       
        try_to_freeze(); 
       
        /* Back to the main loop */ 
       
        continue 
        ; 
       
        } 
       
        /* 
       
        * Clear pending bit and wakeup anybody waiting to tear us down. 
       
        */ 
       
        clear_bit(BDI_pending, &bdi->state); 
       
        smp_mb__after_clear_bit(); 
       
        wake_up_bit(&bdi->state, BDI_pending); 
       
        } 
       
        return  
        0; 
       
        }

代码看上去很多，但逻辑十分简单，一个for死循序，接着一个list_for_each_entry遍历bdi_list下的所有bdi设备对应的flush-x:y内核进程是否存在、进程状态如何、是否需要进行对应的操作（kill掉或create）。

绝大部分的bdi设备都会有对应的flush-x:y内核进程，除了一些特殊的bdi设备，比如default bdi设备或其它一些内存虚拟bdi设备，这从第一个if判断代码可以看出：

 
        if  
        (!bdi_cap_writeback_dirty(bdi) || 
       
        bdi_cap_flush_forker(bdi)) 
       
        continue 
        ;

关于flush-x:y内核进程具体做什么，待下一篇文章再讲，但我们这里需要知道，如果一个bdi设备当前有脏数据需要同步，那么它对应的flush-x:y内核进程就会被创建（当然，这是在它本身不存在的情况下）：

 
        have_dirty_io = !list_empty(&bdi->work_list) || 
       
        wb_has_dirty_io(&bdi->wb); 
       
        /* 
       
        * If the bdi has work to do, but the thread does not 
       
        * exist - create it. 
       
        */ 
       
        if  
        (!bdi->wb.task && have_dirty_io) { 
       
        /* 
       
        * Set the pending bit - if someone will try to 
       
        * unregister this bdi - it'll wait on this bit. 
       
        */ 
       
        set_bit(BDI_pending, &bdi->state); 
       
        action = FORK_THREAD; 
       
        break 
        ; 
       
        }

标记action为FORK_THREAD，在接下来（注意if语句里的break语句，这个break语句会跳出list_for_each_entry循环）的switch (action)的语句体里进行具体的flush-x:y内核进程创建工作。
如果一个bdi设备当前没有脏数据需要同步，并且它对应的flush-x:y内核进程已经有很久没有活动（通过对比最后活动时间last_active与当前jiffies）了，那么就把它kill掉：

 
        /* 
       
        * If there is no work to do and the bdi thread was 
       
        * inactive long enough - kill it. The wb_lock is taken 
       
        * to make sure no-one adds more work to this bdi and 
       
        * wakes the bdi thread up. 
       
        */ 
       
        if  
        (bdi->wb.task && !have_dirty_io && 
       
        time_after(jiffies, bdi->wb.last_active + 
       
        bdi_longest_inactive())) { 
       
        task = bdi->wb.task; 
       
        bdi->wb.task = NULL; 
       
        spin_unlock(&bdi->wb_lock); 
       
        set_bit(BDI_pending, &bdi->state); 
       
        action = KILL_THREAD; 
       
        break 
        ; 
       
        } 
       
        /* 
       
        * Calculate the longest interval (jiffies) bdi threads are allowed to be 
       
        * inactive. 
       
        */ 
       
        static  
        unsigned  
        long  
        bdi_longest_inactive( 
        void 
        ) 
       
        { 
       
        unsigned  
        long  
        interval; 
       
        interval = msecs_to_jiffies(dirty_writeback_interval * 10); 
       
        return  
        max(5UL * 60 * HZ, interval); 
       
        } 
       
        unsigned  
        int  
        dirty_writeback_interval = 5 * 100;  
        /* centiseconds */

可以看到“很久”在默认情况下是5分钟，此时标记action为KILL_THREAD，在接下来的switch (action)的语句体里进行具体的flush-x:y内核进程kill工作。

如果所有bdi设备遍历操作结束，此时bdi-default内核进程自身执行switch (action)的语句体里NO_ACTION语句进行睡眠，直到超时后continue重复上面的工作。

sannik

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Linux内核进程详解之二：bdi-default

bdi，即是backing device info的缩写，根据英文单词全称可知其通指备用存储设备相关描述信息，这在内核代码里用一个结构体backing_dev_info来表示：http://lxr.linux.no/#linux+v2.6.38.8/include/linux/backing-dev.h#L62bdi，备用存储设备，简单点说就是能够用来存储数据的设备，而这些设备存储的数据能
复制链接

扫一扫