struct address_space解读

最新推荐文章于 2023-02-15 10:00:54 发布

杨枫_mind

最新推荐文章于 2023-02-15 10:00:54 发布

阅读量1.9k

点赞数

分类专栏： linux内核文章标签： page cache

本文链接：https://blog.csdn.net/ytfy339784578/article/details/104556979

版权

linux内核专栏收录该内容

10 篇文章 3 订阅

订阅专栏

首先说的是Page Cache

address_space的操作将“文件”的某些部分映射到Linux page cache中的页面中。此page cache表示已映射到内存的某些物理设备（例如磁盘）上的数据。物理设备通常对应于磁盘，但不一定必须如此。以这种方式，page cache包含来自最近访问的“文件”的整个页面。在页面I / O操作（例如read（）]）中，内核检查数据是否驻留在page cache中。如果数据在page cache中，则内核可以快速返回请求的页面，而不必从磁盘读取数据。

address_space

一个物理页面可能包含多个不连续的物理块，由于构成每个页面的块的不连续性，检查page cache以查看是否已缓存某些数据变得更加困难。因此，不可能仅使用设备名称和块号来索引页面缓存中的数据，否则这将是最简单的解决方案。

例如，在x86架构上，物理页面的大小为4KB，而大多数文件系统上的磁盘块可以小到512字节。因此，一个页面中可能包含8个块。块不必是连续的，因为文件本身可能会在整个磁盘上布局。

此外，Linux page cache在它可以缓存的页面方面相当普遍。实际上，System V Release 4中引入的原始page cache仅缓存文件系统数据。因此，SVR4 page cache使用等效的文件对象（称为struct vnode）来管理page cache。Linux page cache旨在缓存任何基于页面的对象，其中包括许多形式的文件和内存映射。为了保持通用性，Linux page cache使用address_space结构来标识page cache中的页面。此结构在<linux / fs.h>中定义：

struct address_space {

struct inode *host; /* owner: inode, block_device */

struct radix_tree_root page_tree; /* radix tree of all pages */

spinlock_t tree_lock; /* and lock protecting it */

atomic_t i_mmap_writable;/* count VM_SHARED mappings */

struct rb_root_cached i_mmap; /* tree of private and shared mappings */

struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */

/* Protected by tree_lock together with the radix tree */

unsigned long nrpages; /* number of total pages */

/* number of shadow or DAX exceptional entries */

unsigned long nrexceptional;

pgoff_t writeback_index;/* writeback starts here */

const struct address_space_operations *a_ops; /* methods */

unsigned long flags; /* error bits */

spinlock_t private_lock; /* for use by the address_space */

gfp_t gfp_mask; /* implicit gfp mask for allocations */

struct list_head private_list; /* for use by the address_space */

void *private_data; /* ditto */

errseq_t wb_err;

} __attribute__((aligned(sizeof(long)))) __randomize_layout;

i_mmap字段是此地址空间中所有共享和私有映射的优先级搜索树，优先级搜索树是堆和基数树的巧妙组合；地址空间中总共有nrpages；address_space一般与inode内核对象相关联，如果是这样，则host字段指向关联的inode，而如果相关联的对象不是inode，则host会是NULL，比如address_space与swapper相关联的情况；a_ops字段与操作的接口函数由struct address_space_operations表示，并且也在<linux / fs.h>中定义，还是直接给个图来说明：

readpage（）和writepage（）方法是最重要的，页面读取操作中涉及的步骤有：首先，向readpage（）方法传递一个address_space加偏移量对，这些值用于在页面缓存中搜索所需的数据：page = find_get_page(mapping, index);此处，mapping是给定的地址空间，而index是文件中所需的位置，如果页面不在缓存中，则分配一个新页面并将其添加到page cache中(从pagecache_get_page截取的部分关键代码)：

page = find_get_entry(mapping, offset);

if (!page){

__page_cache_alloc(gfp_mask);

add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_RECLAIM_MASK);

}

最后，可以从磁盘读取请求的数据，将其添加到page cache中，然后返回给用户：

error = mapping->a_ops->readpage(file, page);

写操作有些不同。对于文件映射，只要修改页面，VM就会简单地调用SetPageDirty(page);

内核随后通过writepage（）方法将页面写出。对特定文件的写操作更加复杂。基本上，mm / filemap.c中的通用写入路径执行以下步骤：

page = __grab_cache_page(mapping, index, &cached_page, &lru_pvec);

status = a_ops->prepare_write(file, page, offset, offset+bytes);

page_fault = filemap_copy_from_user(page, offset, buf, bytes);

status = a_ops->commit_write(file, page, offset, offset+bytes);

首先，在page cache中搜索所需的页面。如果它不在高速缓存中，则会分配并添加一个条目。接下来，调用prepare_write（）方法来设置写请求。然后将数据从用户空间复制到内核缓冲区。最后，数据通过commit_write（）函数写入磁盘。由于前面的步骤是在所有页面I / O操作期间执行的，因此可以确保所有页面I / O都经过page cache。因此，内核尝试满足来自page cache的所有读取请求。如果失败，将从磁盘读取页面并将其添加到page cache中。对于写操作，page cache充当写操作的临时基础。因此，所有写入的页面也将添加到page cache中。