有关postgresql checkpoint_write_time与checkpoint_sync_time的理解

在stackexchange上找到一篇关于checkpoint_write_time与checkpoint_sync_time的讨论摘录如下:
原文链接

I know what PostgreSQL checkpoint is and when it is happening.
I need some additional information about the logs produced by the log_checkpoints = on parameter, so please explain some points of it to me:
2017-09-09 16:31:37 EEST [6428-6524] LOG: checkpoint complete: wrote 30057 buffers (22.9%); 0 transaction log file(s) added, 0 removed, 47 recycled; write=148.465 s, sync=34.339 s, total=182.814 s; sync files=159, longest=16.143 s, average=0.215 s
1.
I know that 22.9% of shared buffers are written (I have 1024 MB shared_buffers so that means 234 MB are written out).
2.
I know that 47 WAL files are recycled, i.e., they are not needed anymore for crash recovery, because the real data from them is already on disk.

Question A. But what about write=148.465 s and sync=34.339? What is the difference? What is write and why its time is far more than the fsync() operation?
Question B. What are sync files? Which files: WAL files? Why sync files are 159, but there are only 47 recycled files? What is the relation between these?

To your question A: “write” and “sync” mean exactly what they say; it’s a sequence of buffered (as in filesystem I/O buffers) writes of dirty pages from the Postgres buffer pool, followed by fsync() (usually) to ensure that filesystem buffers are in fact written to disk. Writes are throttled and happen in the background, so they can take longer.
To your question B: sync files tells you how many OS files have been affected by the checkpoint. As each table and index is stored in its own file, the number gives you an idea of how many database objects have been affected by the checkpoint. WALs are synchronized when the WAL buffer is flushed, that is, way before the checkpoint; that’s why it’s called “write ahead”.

The write and sync phase mean rather different things (unfortunately). In the middle phase of a checkpoint, it writes out all the dirty buffers from shared buffers. It does this in a throttled way, mainly regulated by checkpoint_completion_target. The time reported here is the time it spent writing those buffers, as well as the time it spend sleeping in order to regulate the rate at which the buffers get written. So not all of the “write” time was spent actively writing. The length of time spent in the write phase, 2.5 minutes, doesn’t really mean anything without knowing what your other settings are.
During the sync phase, it syncs all of those files it just wrote to (as well as any data files other processes may have written to). This stage is unthrottled, it syncs them as fast as it can. That this took 34 seconds, with one file in there taking 16.143 seconds, suggests that your IO hardware is not up to the workload you are throwing at it.
The 159 synced files refers to the number of data files that had been dirtied since the end of the last checkpoint, and so needed to be synced before the end of this one. The 47 refers to WAL files, not data files. WAL files contain references to changes made to data files. It is quite easy for 47 WAL files to contain references for 159 data files (or for that matter, 15,900 data files). But it is not quite that simple, because the WAL files being recycled are the ones from two checkpoints ago, so the relation between 47 and 159 is an analogy, not a concrete count.

个人理解:
1.这里write之所以比sync慢,是因为write是节流写入(throttled),受checkpint_completion_target参数调节,如果进程的产生脏页的速度超过限制,就让进程主动sleep,而sync是不受限制尽可能快的写入(unthrottle)
2.write是指从shared buffer写dirty page到filesystem buffer,一般会跟随fsync,确保data真正落盘
• checkpoint_completion_target=0.5
• checkpoint_timeout = 5min
• 100G数据(需要刷进磁盘的数据量)
• 1G/s(刷盘速度)
• 100/(0.5*5*60)*1024≈670M/s (数据写入速度
比如100G数据,1g/s写入速度,write的速度将是100/(checkpoint_timeout*60*checkpoint_completion_target)*1024=100*1024/5*60*0.5=682m/s
以上仅个人理解,欢迎指正

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值