细说Linux中三种常见的IO模式

PaulForCoding

已于 2024-05-31 23:01:08 修改

阅读量717

点赞数 28

分类专栏：细说IO 文章标签： linux spring 运维

于 2024-05-29 21:26:00 首次发布

本文链接：https://blog.csdn.net/weixin_42628255/article/details/139305428

版权

细说IO 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

细说Linux中三种常见的IO模式

三种IO模式概述

在Linux上要读写文件，第一步就是调用open()系统调用函数，根据传给它的标记，可以把IO分成三种模式。很长时间以来，我对三种模式的理解是这样：

open()标记	IO模式	对write()函数的影响
没有下面两种标记	buffered模式	write()写入内核缓存马上返回，数据不落盘
O_SYNC	sync模式	write()既写入内核缓存，也写入磁盘，都完成后返回
O_DIRECT	direct模式	write()绕过内核缓存，直接写盘，完成后返回

这种理解对不对呢？

答案是：可以算对，但理解不够全面和准确，这里面的要注意的地方还很多。

下面具体讲讲三种模式。

buffered IO模式

这是最常见的IO模式，我们平常就是使用buffered IO来读写文件的，上代码看看。


#include <cstring>
#include <fcntl.h>
#include <format>
#include <iostream>
#include <unistd.h>

int main(int argc, char *argv[]) {
  const char *file_name{"/home/xxx/hello.txt"};

  int fd = open(file_name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
  if (fd < 0) {
    std::cerr << std::format("failed to open file: {}, error: {}\n", file_name,
                             std::strerror(errno));
    return 1;
  }

  std::string file_content{"hello world\n"};
  if (write(fd, file_content.data(), file_content.size()) < 0) {
    std::cerr << std::format("failed to write file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  }

  if (fsync(fd) < 0) {
    std::cerr << std::format("failed to fsync file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  }

  close(fd);

  return 0;
}

上述代码我们用到两个系统调用来完成数据写入。一是write()，二是fsync()，前者负责把数据写入内核的page cache，后者负责把page cache里面的数据写盘而永久保存，实现持久化。

但是，在日常的工作中，有太多的现象会让我们误认为写盘在write()的时候就已经完成了，fsync()则是可有可无。这应该是最大的误解了。

所以我们今天来重新证明一下，write()只是写到来内核缓存，并没有写盘。

那如何证明呢？

证明`write()`没有写盘

我们可以从man里面看到这样的说明：

A successful return from write() does not make any guarantee that data has been committed to disk. On some filesystems, including NFS, it f not even guarantee that space has successfully been reserved for the data.
In this case, some errors might be delayed until a future write(), fsync(2), or even close(2). The only way to be sure is to call fsync(2) after you are done writing all your data.

这里说得很清楚了：

write()并不保证数据写到磁盘上。
保证数据落盘的唯一途径是fsync()

但就算我们给出man page的证据，还是有部分人是不会相信的……他们还是相信write()返回的时候，数据就到了磁盘。

通常，他们的理由有以下几种：

我的程序没有调用fsync()，但程序运行完之后，我用cat命令检查过了，文件内容都存在，所以文件内容已经落盘。这个观点是比较容易反驳的，持这个观点的人大抵是忘了cat也是从缓存里面读取数据的了，所以cat能读出数据并不意味这数据已经到了磁盘。
我的程序没有调用fsync()，但我的程序已经稳定运行好多年了，都不知道经历了多少次掉电，如果说write()不写入磁盘，我早就丢数据了，所以事实证明我的数据是已经到了磁盘的。这个问题不容易一下子指出问题，但也不是太难。因为Linux本身有定时去刷脏落盘的行为，一般这个值是30秒，所以你调用write()之后30秒，数据的确在不知不觉中写到了磁盘。所以运气好的话，确实可以运行好几年而不丢数据。
我的程序虽然没有调用fsync()，但我调用了close()了，难道close()还不足以触发数据写盘？

最难破除的是第三点迷思。因为这里面有很多种情况。

第一种情况，也是最常见的情况，就是在ext4/xfs文件系统下，close()是并没有刷盘的。我们可以用这个方法验证：像上面的例子一样，写一小段代码，write()+close()，不加fsync()，运行完后迅速输入reboot -fn命令，等机器重启完毕后，再看看文件数据hello.txt的内容是否为空，为空则证明close()并没有刷盘。这个实验还有两个注意的地方：

注意一：在有些系统上，如果只输入reboot，那么默认它还是会刷盘再重启的，所以观察不到文件内容丢失，这时要输入reboot -fn

注意二：有些人喜欢把hello.txt写入/tmp目录，这个目录有些linux发行版是实现为tmpfs的，即是内存，重启之后整个hello.txt都找不到了，这也不是正确的实验结果。这时候你就不能使用/tmp目录来做这个实验。

第二种情况，你的文件系统不是ext4/xfs，情况就又会不一样。比如JuiceFS，它就会close()的时候暗地里做一次fsync()，这时候还真的数据就到了磁盘。

真正的写盘的系统调用是`fsync()`

如上所述，其实真正写盘系统调用是fsync()，它会把这个文件的数据和元数据都写入磁盘。这点这fsync()的man page说得很清楚：

fsync() transfers (“flushes”) all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted.

这是说，你成功调用fsync()之后，即使是系统崩溃或者重启，你的数据也可以找回来。

我们不妨把fsync()视为一种契约，当它返回0值的时候，相当于数据完整由文件系统负责了；当它返回非0值的时候，就要程序员自己负责了。所以我们应该重视fsync()的返回值，因为这说明我们程序员和Linux之间的契约是否签订成功。

所以fsync()又被称为数据完整性保障系统调用。

`close()`的作用

我们不应该依赖于close()来写盘，因为从它的man page来看，它并无此责任。

   close() closes a file descriptor, so that it no longer refers to any file and may be reused.  Any record locks   (see  fcntl(2))  held on the file it was associated with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock).
   If fd is the last file descriptor referring to the underlying open file description  (see  open(2)),  the  resources  associated with the open file description are freed; if the file descriptor was the last reference to a file which has been removed using unlink(2), the file is deleted.

简单来说，close()的责任在于释放资源，而不是刷脏写盘。

用两种眼光来看待系统调用

从上面分析可以看出，对于一个系统调用，其实有两种角度的理解：

使用者。即，作为函数的调用者，我希望这个函数做什么事情，这是这个函数上“语义”上的含义。这是具有普遍性的约定俗成。我们理解系统调用的作用，应该参考这层含义，参考就是POSIX标准和man page。
实现者。即，作为函数的实现方，我通过什么样的手段来达到使用者的希望。这是各个文件系统的实验者可以自由发挥的地方，有些文件系统就喜欢不按约定俗成来做，但不代表这是普遍的情况。比如刚才的JuiceFS在close()里面完成了fsync()的事情，这是一种实现层面上的正向偏差，对用户更友好，但不代表其他文件系统也会这样。

sync IO模式

先上代码

#include <cstring>
#include <fcntl.h>
#include <format>
#include <iostream>
#include <unistd.h>

int main(int argc, char *argv[]) {
  const char *file_name{"/home/xxx/hello.txt"};

  int fd = open(file_name, O_SYNC | O_WRONLY | O_CREAT | O_TRUNC, 0644);
  if (fd < 0) {
    std::cerr << std::format("failed to open file: {}, error: {}\n", file_name,
                             std::strerror(errno));
    return 1;
  }

  std::string file_content{"hello world\n"};
  if (write(fd, file_content.data(), file_content.size()) < 0) {
    std::cerr << std::format("failed to write file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  }

  close(fd);

  return 0;
}

和buffered IO相比，sync IO有两处不同，一是open()时多来O_SYNC标记，二是少了fsync()的调用。

它的特征是write()的时候就会完成数据落盘，保障数据完整。

作为使用者看待sync IO

我们看看POSIX标准3和4都是怎么描述sync IO的。

POSIX3:

O_SYNC
Write I/O operations on the file descriptor shall complete as defined by synchronized I/O file integrity completion.

POSIX4:

O_SYNC
Write I/O operations on the file descriptor shall complete as defined by synchronized I/O file integrity completion.
The O_SYNC flag shall be supported for regular files, even if the Synchronized Input and Output option is not supported.

两个版本的意思是一样的，简单来说，当open()一个文件并打上O_SYNC标记后，后续的write()就应该同步完成文件的数据完整性工作。这里提到两个事：

是“同步”，即函数返回时就代表工作已经完成。作为调用者，write()成功返回，我就可以认为数据已经成功持久化了，无需再调用fsync()，如果你的文件系统不是这样工作，应该咨询文件系统的实现者。
它没提到缓存。即write()写不写入缓存呢？它没说。所以作为调用者不应当假定它对于任何文件系统都一定写入缓存。写不写入缓存，要看具体文件系统的实验，下面可以看到ext4和xfs是写入缓存page cache的。

作为实现者看待sync IO

Linux最常见的实现者就是ext4和xfs，先来看ext4是怎么实现sync IO的。

我们要注意两点：

是否写入了page cache
是否同步等待刷脏落盘完成

ext4的实现

static ssize_t
ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
	...

	ret = __generic_file_write_iter(iocb, from); // 在这里写入缓存
	...

	if (ret > 0)
		ret = generic_write_sync(iocb, ret); // 在这里刷脏落盘

	return ret;

out:
	inode_unlock(inode);
	return ret;
}

其中__generic_file_write_iter()又调用generic_perform_write()，然后调用iov_iter_copy_from_user_atomic()把用户的数据写入page cache。代码如下：

ssize_t generic_perform_write(struct file *file,
				struct iov_iter *i, loff_t pos)
{
...
  		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
						&page, &fsdata);
		if (unlikely(status < 0))
			break;

		if (mapping_writably_mapped(mapping))
			flush_dcache_page(page);

		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes); //写入page chache
		flush_dcache_page(page);

		status = a_ops->write_end(file, mapping, pos, bytes, copied,
						page, fsdata);

...
}

其中generic_write_sync()就是等待刷脏落盘的完成：


/*
 * Sync the bytes written if this was a synchronous write.  Expect ki_pos
 * to already be updated for the write, and will return either the amount
 * of bytes passed in, or an error if syncing the file failed.
 */
static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
{
    // open()中的O_SYNC标记在这里会变成IOCB_DSYNC|IOCB_SYNC
    // 所以会进入if分支，并最终把元数据和数据写入磁盘
	if (iocb->ki_flags & IOCB_DSYNC) {
		int ret = vfs_fsync_range(iocb->ki_filp,
				iocb->ki_pos - count, iocb->ki_pos - 1,
				(iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
		if (ret)
			return ret;
	}

	return count;
}

从以上代码可以看到，ext4的实现方式是先写入缓存，然后写入磁盘，然后返回。实现了O_SYNC的语义。

xfs的实现

xfs的调用链是：

xfs_file_write_iter()
	->xfs_file_buffered_write()
		->iomap_file_buffered_write() // 在这里写入缓存
		->generic_write_sync() // 在这里刷脏落盘

所以xfs的行为和ext4是一样的，也是先写入缓存，然后写入磁盘，然后返回。

sync IO总结

在语义上是同步，即函数在刷脏落盘之后再返回。
在ext4和xfs的实现上是先写入缓存再写入磁盘。

direct IO模式

先上代码

#include <cstring>
#include <fcntl.h>
#include <format>
#include <iostream>
#include <unistd.h>

#ifndef _GNU_SOURCE
#error "NO GNU SOURCE defined\n"
#endif

int main(int argc, char *argv[]) {

  const char *file_name{"/mnt/ext4/hello.txt"};

  int fd = open(file_name, O_DIRECT | O_WRONLY | O_CREAT | O_TRUNC, 0644);
  if (fd < 0) {
    std::cerr << std::format("failed to open file: {}, error: {}\n", file_name,
                             std::strerror(errno));
    return 1;
  }

  // 准备direct IO的buffer
  #define SECTORSIZE 512
  alignas(512) char buf[SECTORSIZE];
  bzero(buf, SECTORSIZE);
  // 开始direct IO
  std::string file_content{"hello world\n"};
  strcpy(buf, file_content.data());
  if (write(fd, buf, SECTORSIZE) < 0) {
    std::cerr << std::format("failed to write file: {}, error: {}\n", file_name,
                             std::strerror(errno));
    goto out;
  }

  // 把文件大小修改为真实的逻辑大小
  if (ftruncate(fd, file_content.size()) < 0) {
    std::cerr << std::format("failed to ftruncate file: {}, error: {}\n",
                             file_name, std::strerror(errno));
  }

out:
  close(fd);

  return 0;
}

可以看到，direct IO的代码复杂很多，具体说来，要进行direct IO，有4个要求：

定义_GNU_SOURCE宏
open()的时候加入O_DIRECT标记
缓冲区（buf数组）要是512字节内存对齐
缓冲区（buf数组）的大小要是512的倍数

和上面一样，我们也是从使用者和实现者两个方面来考察direct IO。

使用者角度

direct IO比较尴尬的地方在于：posix没有关于它的描述。也就是说，其实它没有标准的语义。

但是linux man page是有说明的，不妨以此为准。

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.

这里说得很清楚了。

O_DIRECT标记的语义只有一点：就是尽量不使用缓存。甚至它不能保证write()返回的时候数据一定到了盘上，要保证这个，就要O_SYNC和O_DIRECR连用！

所以上面给出的direct IO的程序例子，正确的写法应该是open(O_SYNC|O_DIRECT)。

那为什么上述例子又可以成功写盘呢？那是因为ext4和xfs的程序实现是写盘完成后再返回，并不代表其他文件系统的实现也是这样。

到这里，我们可以回答本文一开始提出的问题了。我们不能简单粗暴地把O_SYNC等同于写缓存再写盘，把O_DIRECT等同于不写缓存只写盘——如果这样，就无法理解O_SYNC|O_DIRECT是什么意思了。

而是从使用者的角度，从语义上去理解O_SYNC和O_DIRECT，O_SYNC的语义只是同步，O_DIRECT的语义只是尽量避开缓存，那么O_SYNC|O_DIRECT就是很自然而然能理解了：既要同步，也要尽量不使用缓存。

实现者角度

类似于解剖sync IO的思想，对于direct IO，我们也要关注两点：

它是如何绕过page cache？
是否等待落盘IO的完成？

ext4的实现

由于我们使用来O_DIRECT标记，所以函数调用链是：

__generic_file_write_iter()
	->generic_file_direct_write()
		->ext4_direct_IO()
			->ext4_direct_IO_write()
				->__blockdev_direct_IO()
					->do_blockdev_direct_IO()

重点就是这个do_blockdev_direct_IO()，见以下注释：

static inline ssize_t
do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
		      struct block_device *bdev, struct iov_iter *iter,
		      get_block_t get_block, dio_iodone_t end_io,
		      dio_submit_t submit_io, int flags)
{
...

	blk_start_plug(&plug); // 开始bio层逻辑，bio层是fs的下面一层

	retval = do_direct_IO(dio, &sdio, &map_bh);
    // 在do_direct_IO()中，最终是通过iov_iter_get_pages()来把用户态的数据映射进内核，然后直接使用（即用这些个page来提交bio），就这样绕过了page cache。
	if (retval)
		dio_cleanup(dio, &sdio);

	if (retval == -ENOTBLK) {
		/*
		 * The remaining part of the request will be
		 * be handled by buffered I/O when we return
		 */
		retval = 0;
	}
	/*
	 * There may be some unwritten disk at the end of a part-written
	 * fs-block-sized block.  Go zero that now.
	 */
	dio_zero_block(dio, &sdio, 1, &map_bh);

	if (sdio.cur_page) {
		ssize_t ret2;

		ret2 = dio_send_cur_page(dio, &sdio, &map_bh);
		if (retval == 0)
			retval = ret2;
		put_page(sdio.cur_page);
		sdio.cur_page = NULL;
	}
	if (sdio.bio)
		dio_bio_submit(dio, &sdio);

	blk_finish_plug(&plug);  // 下发bio

...
	if (dio->is_async && retval == 0 && dio->result &&
	    (iov_iter_rw(iter) == READ || dio->result == count))
		retval = -EIOCBQUEUED;
	else
		dio_await_completion(dio); // 在这里等待bio完成

...
}

小结一下：ext4的实现中，它的确绕开了page cache，而且等待IO完成才返回write()函数。

xfs的实现

xfs的实现较为复杂，这里不节选代码来，实现的效果是和ext4是一样的。

direct IO总结

O_DIRECT是指示后续的write()函数尽量不使用缓存。
如果希望write()在返回时数据已落盘，则要对open()调用使用flag: O_SYNC|O_DIRECT。

direct IO 模式的证明

direct IO有个尴尬的地方在于：如何证明你绕开了内核的page cache？
内核的page cache对用户态是透明的，看不见摸不着，而且，direct IO有时还会回退到普通buffered IO的模式，我如何确定我的程序走的就是direct IO呢？
最直接的办法就是trace一下write()调用的内核函数栈。
在内核中，direct IO和函数调用路径是明显不同于其他IO模式的，如果trace出来走的是direct IO的相关函数，那么走的就是directg IO无疑。
具体办法，可以查看这个文章：https://zp001.blog.csdn.net/article/details/139305331

混合使用IO模式

既然有三种IO模式，那么我们可不可以混合使用IO模式呢？工作时间长了，总会碰到有这种想法的人。

先看文档：Linux man page是提到不要把direct IO和buffered IO混合使用的。如下：

Applications should avoid mixing O_DIRECT and normal I/O to the same file, and especially to overlapping byte regions in the same file. Even when the filesystem correctly handles the coherency issues in this situation, overall I/O throughput is likely to be slower than using either mode alone.

遗憾的是没有说得很具体，比如我先buffered IO close()掉之后，再open()它用direct IO算不算混合IO呢？

但从语义上想想，就可以发现难以调和的地方：buffered IO是写入缓存的，在没有fsync()之前，缓存里的数据是新的，磁盘上的数据是旧的，如果这时你来一个direct read()，那么文件系统该返回新的数据还是旧的数据给你呢？

这个场景可以写个小程序来测一下：


#include <cstring>
#include <fcntl.h>
#include <format>
#include <iostream>
#include <unistd.h>

int main(int argc, char *argv[]) {
  const char *file_name{"/mnt/ext4/hello.txt"};

  int fd = open(file_name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
  if (fd < 0) {
    std::cerr << std::format("failed to open file: {}, error: {}\n", file_name,
                             std::strerror(errno));
    return 1;
  }

  std::string file_content{"hello world\n"};
  if (write(fd, file_content.data(), file_content.size()) < 0) {
    std::cerr << std::format("failed to write file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  }

  // 注意这里并没有调用fsync()

  close(fd);

  fd = open(file_name, O_DIRECT | O_RDONLY);
  if (fd < 0) {
    std::cerr << std::format("failed to open file directly: {}, error: {}\n",
                             file_name, std::strerror(errno));
    return 1;
  }

  alignas(512) char read_buf[1024];
  memset(read_buf, '\0', 1024);
  if (read(fd, read_buf, 1024) < 0) {
    std::cerr << std::format("failed to read file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  } else {
    std::cout << "file content: " << read_buf << '\n';
  }

  close(fd);

  return 0;
}

结果表明，在ext4的情况下，direct read()读到的是最新的内容。

为什么是这样的结果呢？

这时因为ext4在进行direct read()之前，会先把page cache里面的内容先写入磁盘，然后再进行direct read()。内核中相关代码如下：

ssize_t
generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
	size_t count = iov_iter_count(iter);
	ssize_t retval = 0;

	if (!count)
		goto out; /* skip atime */

	if (iocb->ki_flags & IOCB_DIRECT) { // direct IO 分支
		struct file *file = iocb->ki_filp;
		struct address_space *mapping = file->f_mapping;
		struct inode *inode = mapping->host;
		loff_t size;

		size = i_size_read(inode);
		if (iocb->ki_flags & IOCB_NOWAIT) {
			if (filemap_range_has_page(mapping, iocb->ki_pos,
						   iocb->ki_pos + count - 1))
				return -EAGAIN;
		} else {
			retval = filemap_write_and_wait_range(mapping,
						iocb->ki_pos,
					        iocb->ki_pos + count - 1); // 先把page cache里面的内容写盘（如果有的话）
			if (retval < 0)
				goto out;
		}

		file_accessed(file);
		// 下面开始做direct IO for read
		retval = mapping->a_ops->direct_IO(iocb, iter);
		if (retval >= 0) {
			iocb->ki_pos += retval;
			count -= retval;
		}
...
}

从上面可以看到ext4是会在direct IO的路径中，尽量处理好page cache，也就是混合IO的问题，使得应用程序可以正常运行。

既然ext4是如此，那其他文件系统呢？只能说大概率也是如此，因为文件系统的实现者都会相互模仿，但没有保证一定如此。

关于`fcntl()`

fnctl()是一个系统调用，使用它可以动态修改open flags，所以它可以动态地加上或者去掉O_DIRECT，这使得问题复杂了起来——既然允许动态修改，那算不算混合IO了呢？使用fcntl()又有没有问题呢？

根据man page，我们可以使用fcntl()来修改O_DIRECT

On Linux, this command can change only the OAPPEND, O_ASYNC, **_O_DIRECT**, O_NOATIME, and O_NONBLOCK flags. It is not possible to change the O_DSYNC and O_SYNC flags;

`fcntl()`的例子

#include <cstdio>
#include <cstring>
#include <fcntl.h>
#include <format>
#include <iostream>
#include <strings.h>
#include <unistd.h>

bool add_direct_flag(int fd) {
  int flags;
  flags = fcntl(fd, F_GETFL);
  if (flags < 0) {
    std::cerr << "failed to get flags\n";
    return false;
  }

  flags |= O_DIRECT; // 加上O_DIRECT标记

  int ret;
  ret = fcntl(fd, F_SETFL, flags);
  if (ret < 0) {
    std::cerr << "failed to set flags\n";
    return false;
  }
  return true;
}

int main(int argc, char *argv[]) {
  const char *file_name{"/mnt/ext4/hello.txt"};

  int fd = open(file_name, O_RDWR | O_CREAT | O_TRUNC, 0644);
  if (fd < 0) {
    std::cerr << std::format("failed to open file: {}, error: {}\n", file_name,
                             std::strerror(errno));
    return 1;
  }

  std::string file_content{"hello world\n"};
  if (write(fd, file_content.data(), file_content.size()) < 0) {
    std::cerr << std::format("failed to write file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  }

  if (!add_direct_flag(fd))
    return 1;

  if (lseek(fd, 0, SEEK_SET) < 0) {
    std::cerr << "failed to lseek()\n";
    return 1;
  }

  alignas(512) char read_buf[1024];
  memset(read_buf, '\0', 1024);
  if (read(fd, read_buf, 1024) < 0) {
    std::cerr << std::format("failed to read file: {}, error: {}\n", file_name,
                             std::strerror(errno));
  } else {
    std::cout << "file content: " << read_buf << '\n';
  }

  close(fd);

  return 0;
}

也可以运行成功，说明上述混合IO方式对ext4是可以的。
但不代表其他文件系统也这样，总之就是要测试再使用。

为什么要这么小心？

因为说到底，fcntl()只是帮你把open()时候打上去的flag变量修改掉，至于在后续的read() write()中生不生效，还是得看具体的文件系统实现。具体内核代码如下：

do_fcntl()
	->setfl()

而这个setfl()的作用就是修改f_flags这个变量而已：

static int setfl(int fd, struct file * filp, unsigned long arg)
{
...
	spin_lock(&filp->f_lock);
	filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK); // arg是用户传进来的flag
	spin_unlock(&filp->f_lock);

 out:
	return error;
}