文件系统读写--page cache机制

最新推荐文章于 2024-07-06 19:28:42 发布

水木无痕

最新推荐文章于 2024-07-06 19:28:42 发布

阅读量6.2k

点赞数

本文链接：https://blog.csdn.net/yxfabcdefg/article/details/78211326

版权

Linux系统内核为文件系统文件设置了一个缓存，对文件读写的数据内容都缓存在这里。这个缓存成为 page cache（页缓存）。

10.1 page cache机制

page cache 是Linux操作系统的一个特色，其中存储的数据在I/O完成后并不回收，而是一直保存在内存中，除非内存紧张，才开始回收占用的内存。

10.1.1 buffer I/O 和 direct I/O

使用page cache 的I/O操作成为 buffer I/O，默认情况下，内核都是使用 buffer I/O；但有的应用不希望使用内存缓存，而是由应用提供内存，这种由应用提供内存的I/O 称为 direct I/O，它的特点是不使用系统提供的page cache。

Linux应用编程接口提供文件的读写接口，就是read 和 write接口。 read 和 write 接口是同步I/O接口，调用者两个函数的进程被阻塞，直到读写过程完成，才返回应用程序。和同步I/O接口对应的是异步I/O接口。异步I/O接口不会阻塞进程，而是立即返回。异步接口需要提供机制判断 I/O 是否完成。

Linux系统的buffer I/O 由于填充page cache，必须等于I/O完成才能返回，所以buffer I/O本身在内核中就会阻塞。所以Linux的异步I/O必须是 direct I/O，才能不阻塞进程立即返回。

direct I/O和 buffer I/O区别

通常情况下，大多数I/O操作在内核层次上都会进行数据缓冲，以提高性能。然后，有些情况下，直接对用户空间的缓冲区进行I/O读写操作可能更能提高性能和数据传输速率，特别针对大数据传递的情形，这样将省去了将数据从内核空间复制到用户空间的操作，从而节省了传输时间。
当然，在使用Direct I/O之间，也有必要了解下它的一些开销，毕竟，天下没有免费的午餐。
首先，启用Direct I/O，意味着将失去Buffered I/O的一切好处。其次，Direct I/O要求write系统调用必须同步执行，否则应用程序将不知道何时可重用它的I/O Buffer。很明显，这将影响应用程序的速度。不过，也有补救措施，即在这种情况下，一般都会同时使用异步I/O操作。
实现Direct I/O的核心函数是get_user_pages, 它的原形如下：

int get_user_pages(struct task_struct *tsk,  // current  
struct mm_struct *mm,       // current->mm.  
unsigned long start,  // start is the (page-aligned) address of the user-space buffer  
int len,    // len is the length of the buffer in pages.  
int write,  // If write is nonzero, the pages are mapped for write access  
int force, // The force flag tells get_user_pages  
//to override the protections on the given pages to provide the requested access  
// drivers should always pass 0 here.  
struct page **pages,   
struct vm_area_struct **vmas);

down_read(¤t->mm->mmap_sem);  
result = get_user_pages(current, current->mm, ...);  
up_read(¤t->mm->mmap_sem);  
…  
//avoid pages to be swapped out   
if (! PageReserved(page))  
SetPageDirty(page);  
…  
  
// free pages  
//void page_cache_release(struct page *page);

10.1.2 buffer head 和块缓存

page cache 是以页面为单位组织的。Linux内核对内存的管理以页面为单位，对文件缓存的管理也是以页面为单位。如果一个文件大小为16KB，它正好可以用4个4KB的页面来缓存。因为内存有可能需要交换到硬盘上，而对硬盘文件的访问也可以通过mmap方式像访问内存一样进行访问。这两个管理单位的统一，减少了内核程序转换的麻烦。

硬盘这种物理介质以扇区为最小访问单位。通常一个扇区为512字节，对硬盘的读写最小单位是512字节，而文件系统是以块的方式来组织文件，文件块一般为2扇区、4扇区、或者8扇区的格式。文件系统这种组织方式，要求提供一种块缓存机制来暂停文件的内容。所以内核提供了 buffer head 管理结构来管理块缓存。

buffer head 本身没有保存文件内容，文件内容实际上还是在 page cache 中，buffer head 是管理结构，它只是标识文件块的序号以及文件块缓存的地址。buffer head 同时提供对底层硬件设备（块设备）的映射。结构定义：

/*
 * Historically, a buffer_head was used to map a single block
 * within a page, and of course as the unit of I/O through the
 * filesystem and block layers.  Nowadays the basic I/O unit
 * is the bio, and buffer_heads are used for extracting block
 * mappings (via a get_block_t call), for tracking state within
 * a page (via a page_mapping) and for wrapping bio submission
 * for backward compatibility reasons (e.g. submit_bh).
 */
struct buffer_head {
	unsigned long b_state;		/* buffer state bitmap (see above) */
	struct buffer_head *b_this_page;/* circular list of page's buffers */
	struct page *b_page;		/* the page this bh is mapped to */

	sector_t b_blocknr;		/* start block number */
	size_t b_size;			/* size of mapping */
	char *b_data;			/* pointer to data within the page */

	struct block_device *b_bdev;
	bh_end_io_t *b_end_io;		/* I/O completion */
 	void *b_private;		/* reserved for b_end_io */
	struct list_head b_assoc_buffers; /* associated with another mapping */
	atomic_t b_count;		/* users using this buffer_head */
};

--jbuffer head 数据结构的重要成员：

b_this_page ：buffer head单向链表，指向下一个buffer head结构

b_page ：指向数据所在的页面

b_blocknr ：buffer head 的起始块号，这块块号是以整个硬盘为空间编址的，所以可以转换为硬盘的物理扇区地址

b_data ：指向数据的地址

b_bdev ：文件系统绑定的块设备

b_end_io ：回调函数；I/O处理完毕后调用这个函数

b_blocknr是以整个硬盘为空间编址，这个信息只有文件系统可以知道。

第9章分析了文件系打开块设备的过程，文件系统的超级块对象保存了块设备指针，通过块设备指针可以获得硬盘的容量信息和硬盘分区信息，同时文件的数据空间是由文件系统分配的，因此文件系统知道硬盘的数据分布，可以提供以整个硬盘为编址空间的块号。硬盘文件系统一般提供get_block 调用将文件的位置翻译为硬盘的块号信息

10.1.3 page cache的管理

通过数据结构 address_space 管理 page_cache。

这个数据结构提供一个radix tree成员，文件内容的缓存页保存在 radix tree 里面。

对page cache而言，最重要的调用有两个，一是插入页面到page cache，另一个是从page cache 搜索页面。

插入页面到page cache 通过调用 add_to_page_cache 来实现：

/**
 * add_to_page_cache - add newly allocated pagecache pages
 * @page:	page to add
 * @mapping:	the page's address_space
 * @offset:	page index
 * @gfp_mask:	page allocation mode
 *
 * This function is used to add newly allocated pagecache pages;
 * the page is new, so we can just run SetPageLocked() against it.
 * The other page state flags were set by rmqueue().
 *
 * This function does not add the page to the LRU.  The caller must do that.
 */
int add_to_page_cache(struct page *page, struct address_space *mapping,
		pgoff_t offset, gfp_t gfp_mask)
{
	int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);

	if (error == 0) {
		write_lock_irq(&mapping->tree_lock);
		error = radix_tree_insert(&mapping->page_tree, offset, page);
		if (!error) {
			page_cache_get(page);
			SetPageLocked(page);
			page->mapping = mapping;
			page->index = offset;
			mapping->nrpages++;
			__inc_zone_page_state(page, NR_FILE_PAGES);
		}
		write_unlock_irq(&mapping->tree_lock);
		radix_tree_preload_end();
	}
	return error;
}
EXPORT_SYMBOL(add_to_page_cache);

首先，创建 radix tree 根节点；然后，把页面加入到radix tree。加入成功后，设置页面index

从page cache 搜索一个页面通过 find_get_page 实现：

/**
 * find_get_page - find and get a page reference
 * @mapping: the address_space to search
 * @offset: the page index
 *
 * A rather lightweight function, finding and getting a reference to a
 * hashed page atomically.
 */
struct page * find_get_page(struct address_space *mapping, unsigned long offset)
{
	struct page *page;

	read_lock_irq(&mapping->tree_lock);
	page = radix_tree_lookup(&mapping->page_tree, offset);
	if (page)
		page_cache_get(page);
	read_unlock_irq(&mapping->tree_lock);
	return page;
}
EXPORT_SYMBOL(find_get_page);

10.1.4 page cache的状态

页面有多种状态，由于内存管理的页和 page cache 的页是同一个结构，所以页面的状态其实也包含page cache页面需要的状态。解释几个 page cache 中比较重要的状态。

PG_uptodata ：

PG_dirty ：

PG_private ：

PG_mappedtodisk ：

BH_Mapped ：

BH_Uptodata ：

BH_Dirty ：

水木无痕

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
1
评论
文件系统读写--page cache机制

10.1.1 buffer I/O 和 direct I/O10.1.2 buffer head 和块缓存10.1.3 page cache的管理10.1.4 page cache的状态
复制链接

扫一扫