Linux缓存回写——基于linux-4.15

本文深入探讨了Linux内核的缓存回写机制,详细介绍了backing_dev_info、bdi_writeback和wb_writeback_work等关键数据结构,以及Linux内核启动时如何初始化默认的writeback线程。同时,分析了delayed_work接口在writeback过程中的作用,特别是wb_workfn()、wb_do_writeback()和writeback_sb_inodes()等核心函数的工作流程。通过对ext4_writepages函数的解析,展示了实际文件系统如何执行写回操作。
摘要由CSDN通过智能技术生成

这也是刚看,仅供参考~

1、Linux内核bdi系统

bdi是backing device info的缩写,它用于描述后端存储(如磁盘)设备相关的信息。相对于内存来说,后端存储的I/O比较慢,因此写盘操作需要通过page cache进行缓存延迟写入。

与bdi_writeback机制相关的主要数据结构有三个:

1)backing_dev_info:该数据结构描述了backing_dev的所有信息,通常块设备的request queue中会包含backing_dev对象。

2)bdi_writeback:该数据结构封装了writeback的内核线程以及需要操作的inode队列。

3)wb_writeback_work:该数据结构封装了writeback的工作任务。

include/linux/backing-dev-defs.h中定义了前两个结构。

1.1 backing_dev_info

其中backing_dev_info结构定义如下:

struct backing_dev_info {

       struct list_head bdi_list;

       unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */

       unsigned int capabilities; /* Device capabilities */

       congested_fn *congested_fn; /* Function pointer if device is md/dm */

       void *congested_data;    /* Pointer to aux data for congested func */

       char *name;

       unsigned int min_ratio;

       unsigned int max_ratio, max_prop_frac;

       atomic_long_t tot_write_bandwidth;

       struct bdi_writeback wb;  /* the root writeback info for this bdi */

       struct list_head wb_list; /* list of all wbs */

#ifdef CONFIG_CGROUP_WRITEBACK

       struct radix_tree_root cgwb_tree; /* radix tree of active cgroup wbs */

       struct rb_root cgwb_congested_tree; /* their congested states */

       atomic_t usage_cnt; /* counts both cgwbs and cgwb_contested's */

#else

       struct bdi_writeback_congested *wb_congested;

#endif

       wait_queue_head_t wb_waitq;

       struct device *dev;

       struct timer_list laptop_mode_wb_timer;

#ifdef CONFIG_DEBUG_FS

       struct dentry *debug_dir;

       struct dentry *debug_stats;

#endif

};

1.2 bdi_writeback

bdi_writeback对象封装了需要处理的inode队列。当page cache/buffer cache需要刷新radix tree上的inode时,可以将该inode挂载到writeback对象的b_dirty队列上,然后唤醒writeback线程。在处理过程中,inode会被移到b_io队列上进行处理。

bdi_writeback定义如下:

struct bdi_writeback {

       struct backing_dev_info *bdi; /* our parent bdi */

       unsigned long state;       /* Always use atomic bitops on this */

       unsigned long last_old_flush; /* last old data flush */

       struct list_head b_dirty;  /* dirty inodes */

       struct list_head b_io;             /* parked for writeback */

       struct list_head b_more_io;    /* parked for more writeback */

       struct list_head b_dirty_time; /* time stamps are dirty */

       spinlock_t list_lock;              /* protects the b_* lists */

       struct percpu_counter stat[NR_WB_STAT_ITEMS];

       struct bdi_writeback_congested *congested;

       unsigned long bw_time_stamp;     /* last time write bw is updated */

       unsigned long dirtied_stamp;

       unsigned long written_stamp; /* pages written at bw_time_stamp */

       unsigned long write_bandwidth;    /* the estimated write bandwidth */

       unsigned long avg_write_bandwidth; /* further smoothed write bw, > 0 */

       unsigned long dirty_ratelimit;

       unsigned long balanced_dirty_ratelimit;

       struct fprop_local_percpu completions;

       int dirty_exceeded;

       spinlock_t work_lock;           /* protects work_list & dwork scheduling */

       struct list_head work_list;

       struct delayed_work dwork;   /* work item used for writeback */

       struct list_head bdi_node;      /* anchored at bdi->wb_list */

#ifdef CONFIG_CGROUP_WRITEBACK

       struct percpu_ref refcnt; /* used only for !root wb's */

       struct fprop_local_percpu memcg_completions;

       struct cgroup_subsys_state *memcg_css; /* the associated memcg */

       struct cgroup_subsys_state *blkcg_css; /* and blkcg */

       struct list_head memcg_node; /* anchored at memcg->cgwb_list */

       struct list_head blkcg_node;   /* anchored at blkcg->cgwb_list */

       union {

              struct work_struct release_work;

              struct rcu_head rcu;

       };

#endif

};

1.3 wb_writeback_work

fs/fs-writeback.c中定义了wb_writeback_work结构体,该数据结构封装了writeback的工作任务,其内容如下:

struct wb_writeback_work {

       long nr_pages;

       struct super_block *sb;

       unsigned long *older_than_this;

       enum writeback_sync_modes sync_mode;

       unsigned int tagged_writepages:1;

       unsigned int for_kupdate:1;

       unsigned int range_cyclic:1;

       unsigned int for_background:1;

       unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */

       unsigned int auto_free:1; /* free on completion */

       enum wb_reason reason;       /* why was writeback initiated? */

       struct list_head list;        /* pending work list */

       struct wb_completion *done; /* set if the caller waits */

};

wb_writeback_work数据结构是对writeback任务的封装,不同的任务可以采用不同的刷新策略。writeback线程的处理对象就是wb_writeback_work。如果writeback_work队列为空,那么内核线程就可以睡眠。

nr_pages:待回写页面数量;

sb: 该 writeback 任务所属的 super_block;

for_background: 若值为 1,表示后台回写;否则值为 0;

1.4 bdi-default内核线程

1.4.1 default_bdi_init

Linux内核启动时,会执行bdi模块default_bdi_init(),代码定义在文件mm/backing-dev.c中。主要工作如下:

1)创建名为writeback的线程,此线程由定时器来唤醒。

2)调用bdi_init,定义默认数据结构noop_backing_dev_info。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值