How the buffer cache works

On a typical system approximately 85% of disk I/O can be avoided by using the buffer cache, though this depends on the mix of jobs running. The buffer cache is created in an area of kernel memory and is never swapped out. Although the buffer cache can be regarded as a memory resource, it is primarily an I/O resource due to its use in mediating data transfer. When a user process issues a read request, the operating system searches the buffer cache for the requested data. If the data is in the buffer cache, the request is satisfied without accessing the physical device. It is quite likely that data to be read is already in the buffer cache because the kernel copies an entire block containing the data from disk into memory. This allows any subsequent data falling within that block to be read more quickly from the cache in memory, rather than having to re-access the disk. The kernel also performs read-ahead of blocks on the assumption that most files are accessed from beginning to end.

在一个经典的系统中通过使用buffer cache可以避免大约硬盘85%的读写操作,尽管这依赖于混合模式的工作。buffer cache在kernel memory区域中被创建并且永远不会被置换出去。尽管buffer cache可以被当成内存资源,但是他主要是I/O资源,由于其在协调数据传输使用当一个用户进程遇到一个读请求,操作系统会去buffer cache搜索请求的数据。如果这个数据在buffer cache中,这个请求可以在不经过物理设备的情况下被满足。去读取的数据已经在buffer cache中是非常可能的,因为kernel将一整块包含数据的块从disk复制到memory中。这允许任何落入那个数据块中的后续数据可以从memory的cache中更快的读取到,而不是不得不重新通过disk获取。在大部分从头到尾访问的文件中,Kernel也会使用预读块

The data area of each buffer for filesystems other than DTFS is 1KB which is the same size as a filesystem logical block and twice the typical physical disk block size of 512 bytes. DTFS filesystems use buffers with data areas in multiples of 512 bytes from 512 bytes to 4KB.

文件系统的每个数据区域buffer除了DTFS是1KB,这个大小与一个文件系统的逻辑块大小是一样的并且是经典物理disk块512bytes的两倍。DTFS文件系统使用512bytes的倍数,从512bytes到4KB。

If data is written to the disk, the kernel first checks the buffer cache to see if the block, containing the data address to be written, is already in memory. If it is, then the block found in the buffer cache is updated; if not, the block must first be read into the buffer cache to allow the existing data to be overwritten.

如果数据被写入disk,kernel首先检查buffer cache去看一下是否对应的包含数据地址的需要写入的块已经在memory中。如果在memory中,这个在buffer cache中的block会被更新,如果不在memeory中这个对应的block应当首先被读入到buffer cache中,并允许已存在的数据可以被覆盖。

When the kernel writes data to a buffer, it marks it as delayed-write. This means that the buffer must be written to disk before the buffer can be re-used. Writing data to the buffer cache allows multiple updates to occur in memory rather than having to access the disk each time. Once a buffer has aged in memory for a set interval it is flushed to disk by the buffer flushing daemonbdflush.

当kernel写数据到一个buffer中,它标记这个buffer为 delayed-write(延迟写入)。这意味着这个buffer必须在这个buffer可以被重用之前写入disk。写数据到buffer cache中允许在memory中发生多次更新而不必每次都访问disk。 一旦一个buffer已经在memory中到达一个设定的时间间隔,它将被flush到disk中,通过buffer flushing daemon——bdflush。

The kernel parameter NAUTOUP specifies how long a delayed-write buffer can remain in the buffer cache before its contents are written to disk. The default value for NAUTOUP is 10 seconds, and ranges between 0 and 60. It does not cause a buffer to be written precisely at NAUTOUP seconds, but at the next buffer flushing following this time interval.

kernel参数NAUTOUP指定了一个delayed-write的buffer可以在buffer cache中保留多长时间才被写到disk上。NAUTOUP的默认值是10 秒,范围是0-60s之间。在NAUTOUP参数的时间,这个buffer不会被正好写入到disk中,但是在下一个buffer flush会遵从这个时间间隔。

Although the system buffer cache significantly improves overall system throughput, in the event of a system power failure or a kernel panic, data remaining in the buffer cache but which has not been written to disk may be lost. This is because data scheduled to be written to a physical device will have been erased from physical memory (which is volatile) as a consequence of the crash.

尽管buffer cache系统极大的提高了系统的吞吐量,但是在系统断电或者kernel panic,保留在buffer cache中还没有被写到disk上的数据可能会丢失。这是因为被安排写入到物理设备的数据将会从物理memory中被擦除结果是crash。

The default flushing interval of the buffer flushing daemonbdflush, is 30 seconds. The kernel parameter BDFLUSHR controls the flushing interval. You can configureBDFLUSHR to take a value in the range 1 to 300 seconds.

默认的buffer flushing daemon——bdflush的flushing时间间隔是30s。kernel参数BDFLUSHR控制flushing时间间隔,你可以配置BDFLUSHR去获取一个从1-300秒之间的一个值。

If your system crashes, you will lose NAUTOUP + (BDFLUSHR/2) seconds of data on average. With the default values of these parameters, this corresponds to 25 seconds of data. Decreasing BDFLUSHR will increase data integrity but increase system overhead. The converse is true if you increase the interval.

如果你的系统crash,你将会丢失平均NAUTOUP+(BDFLUSHR/2)秒的数据。使用这些参数的额默认值的话,这个数据将是25s。降低BDFLUSHR将会增加数据完整性但是会增加系统的负载。如果你增加时间间隔,那么结果正好相反。

Apart from adjusting the aging and flushing intervals, you can also control the size of the buffer cache. The kernel parameter NBUF determines the amount of memory in kilobytes that is available for buffers. If you are using the DTFS filesystem, the value of NBUF does not correspond to the actual number of buffers in use. The default value of NBUF is 0; this causes the kernel to allocate approximately 10% of available physical memory to buffers.


The size of the buffer cache in kilobytes is displayed when the system starts up and in the file /usr/adm/messages. Look for a line of the form:

   kernel: Hz = 100, i/o bufs = numberk
If there are any buffers in memory above the first 16MB, the line may take the form:
   kernel: Hz = 100, i/o bufs = numberk  (high bufs = numberk)

The amount of memory reserved automatically for buffers may be not be optimal depending on the mix of applications that a system will run. For example, you may need to increase the buffer cache size on a networked file server to make disk I/O more efficient and increase throughput. You might also find that reducing the buffer cache size on the clients of the file server may be possible since the applications that they are running tend to access a small number of files. It is usually beneficial to do this because it increases the amount of physical memory available for user processes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值