我们继续学习了解MogDB的核心线程。对于关系数据库而言,检查点是一个尤为重要的机制,检查点机制的优劣可以说决定了数据库性能的高低和稳定性。
这里我们继续来学习了解在MogDB数据库中,checkpoint机制是怎么样的。
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[omm@mogdb ~]$ gsql -d enmotech -p 26000
gsql ((MogDB 2.0.0 build b75b585a) compiled at 2021-05-28 17:20:47 commit 0 last mr )
NOTICE : The password has been expired, please change the password.
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
enmotech=# \copyright
MogDB Kernel Database Management System
Copyright (c) Yunhe Enmo (Beijing) Information Technology Co., Ltd. Copyright © 2009-2020 , All rights reserved.
enmotech=# select name,setting,category,context from pg_settings where name like '%checkpoint%';
name | setting | category | context
--------------------------------+---------+-------------------------------------+------------
checkpoint_completion_target | 0.5 | Write-Ahead Log / Checkpoints | sighup
checkpoint_flush_after | 32 | Write-Ahead Log / Checkpoints | sighup
checkpoint_segments | 64 | Write-Ahead Log / Checkpoints | sighup
checkpoint_timeout | 900 | Write-Ahead Log / Checkpoints | sighup
checkpoint_wait_timeout | 60 | Write-Ahead Log / Checkpoints | sighup
checkpoint_warning | 300 | Write-Ahead Log / Checkpoints | sighup
enable_incremental_checkpoint | on | Write-Ahead Log / Checkpoints | postmaster
incremental_checkpoint_timeout | 60 | Write-Ahead Log / Checkpoints | sighup
log_checkpoints | off | Reporting and Logging / What to Log | sighup
(9 rows)
enmotech=#
首先我们从参数来看,涉及到检查点相关的参数一共有9个。这里针对上述几个参数进行简单解释;
checkpoint_completion_target 检查点完成的目标,默认参数为0.5,表示每个checkpoint需要在checkpoints间隔时间的50%内完成
checkpoint_flush_after 设置checkpointer线程在连续写多少个磁盘页后会进行异步刷盘操作,默认值为32,表示checkpointer线程连续写32个磁盘页,即32*8=256KB磁盘空间后会进行异步刷盘。
checkpoint_segments 设置checkpoint_timeout周期内所保留的最少WAL日志段文件数量
checkpoint_timeout 自动WAL检查点之间的最长时间,默认为900s
checkpoint_wait_timeout 请求检查点等待checkpointer线程启动的最长时间,默认为60s
checkpoint_warning 填充检查点段文件导致检查点发生的时间间隔接近这个参数表示的秒数,就向服务器日志发送一个建议增加checkpoint_segments值的消息;有大量warnning说明checkpoint_segments设置偏小
enable_incremental_checkpoint 表示是否启用增量检查点功能
incremental_checkpoint_timeout 增量检查点开关打开之后,设置自动WAL检查点之间的最长时间
对于增量检查点技术而言,我们Oracle DBA们并不陌生,因为在Oracle 8i就已经有了。通过增量检查点技术,可以让数据库相对均匀的进行脏数据刷新;
同时可以减少数据库实例崩溃后的恢复时间。实际上就现在而言,很多数据库都还没有实现增量检查点功能,比如PostgreSQL或者国产的达梦数据库。
就拿postgreSQL来讲,每次检查点出发后,脏页需要在checkpoint_timeout*checkpoint_completion_target的时间范围内必须写完,如果脏数据较多,而IO又不是太强劲的话,那么可能有一些问题。
因为每次进行大量脏页写出,势必会影响整个数据库。可能这也是为什么之前在测试PostgreSQL发现性能抖动比较厉害,峰刺较多。
我们通过创建测试表并跟踪checkpoint 线程来观察一下:
++++创建测试表
enmotech=# insert into test1123 select * from test1123;
INSERT 0 153856
enmotech=# insert into test1123 select * from test1123;
INSERT 0 307712
enmotech=# checkpoint;
CHECKPOINT
enmotech=#
enmotech=# SELECT OID,relname FROM pg_class where OID=16453;
oid | relname
-------+----------
16453 | test1123
(1 row)
++++获取checkpoint线程信息
[omm@mogdb ~]$ ps -ef|grep mogdb |grep -v grep
avahi 9129 1 0 01:02 ? 00:00:03 avahi-daemon: running [mogdb.local]
omm 16305 1 99 07:01 pts/1 00:15:26 /data/mogdb/bin/mogdb -D /data/mogdb_b75b585a/data/db1
[omm@mogdb ~]$ ps -T -p 16305
PID SPID TTY TIME CMD
16305 16305 pts/1 00:00:01 mogdb
16305 16306 pts/1 00:00:00 jemalloc_bg_thd
16305 16309 pts/1 00:00:00 mogdb
16305 16310 pts/1 00:00:00 syslogger
16305 16311 pts/1 00:00:00 alarm
16305 16312 pts/1 00:00:00 jemalloc_bg_thd
16305 16313 pts/1 00:00:00 jemalloc_bg_thd
16305 16314 pts/1 00:00:00 reaper
16305 16315 pts/1 00:00:00 jemalloc_bg_thd
16305 16341 pts/1 00:00:00 checkpointer
16305 16343 pts/1 00:00:02 pagewriter
16305 16344 pts/1 00:00:00 pagewriter
16305 16345 pts/1 00:00:00 bgwriter
16305 16346 pts/1 00:00:00 bgwriter
16305 16347 pts/1 00:00:00 CBMwriter
16305 16348 pts/1 00:11:33 WALwriter
16305 16349 pts/1 00:00:00 WALwriteraux
16305 16350 pts/1 00:00:00 AVClauncher
16305 16351 pts/1 00:00:00 Jobscheduler
16305 16352 pts/1 00:00:00 statscollector
16305 16353 pts/1 00:00:01 snapshotworker
16305 16354 pts/1 00:03:45 percentworker
16305 16355 pts/1 00:00:06 ashworker
16305 16356 pts/1 00:00:02 TrackStmtWorker
16305 16357 pts/1 00:00:00 auditor
16305 16358 pts/1 00:00:00 2pccleaner
16305 16359 pts/1 00:00:00 faultmonitor
[omm@mogdb ~]$
++++跟踪checkpoint线程
[root@mogdb ~]# strace -fr -o /tmp/16341.log -p 16341
strace: Process 16341 attached with 28 threads
strace: Process 16600 attached
strace: Process 16601 attached
strace: Process 16602 attached
^Cstrace: Process 16341 detached
strace: Process 16305 detached
strace: Process 16306 detached
strace: Process 16309 detached
strace: Process 16310 detached
strace: Process 16311 detached
strace: Process 16312 detached
strace: Process 16313 detached
strace: Process 16314 detached
strace: Process 16315 detached
strace: Process 16343 detached
strace: Process 16344 detached
strace: Process 16345 detached
strace: Process 16346 detached
strace: Process 16347 detached
strace: Process 16348 detached
strace: Process 16349 detached
strace: Process 16350 detached
strace: Process 16351 detached
strace: Process 16352 detached
strace: Process 16353 detached
strace: Process 16354 detached
strace: Process 16355 detached
strace: Process 16356 detached
strace: Process 16357 detached
strace: Process 16358 detached
strace: Process 16359 detached
strace: Process 16586 detached
通过查看跟踪文件,我们发现checkpoint线程即16341仅仅写了26号文件:
[root@mogdb tmp]# cat /tmp/16341.log |grep "pwrite"|grep 16341
16341 0.000004 pwrite64(26, "\0\0\0\0\256e\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0 <unfinished ...>
16341 0.000016 <... pwrite64 resumed> ) = 8192
16586 0.000047 pwrite64(112, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 51634176 <unfinished ...>
16346 0.000007 pwrite64(112, "\0\0\0\0p\225\32\35RU@\0\240\0\340\0\0 \6 \0\0\0\0\215\25\1\0\0\0\0\0"..., 8192, 51634176 <unfinished ...>
16346 0.000010 pwrite64(112, "\0\0\0\0p\225\32\35\255\31@\0\240\0\340\0\0 \6 \0\0\0\0\215\25\1\0\0\0\0\0"..., 8192, 51634176 <unfinished ...>
16341 0.000011 pwrite64(26, "\0\0\1\0\r&\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0 <unfinished ...>
16341 0.000003 <... pwrite64 resumed> ) = 8192
[root@mogdb tmp]#
+++获取进程操作句柄信息
[omm@mogdb ~]$
[omm@mogdb ~]$ ls -ltr /proc/16305/fd
total 0
l-wx------. 1 omm dbgrp 64 Nov 23 07:13 2 -> pipe:[156914]
lrwx------. 1 omm dbgrp 64 Nov 23 07:15 99 -> /data/mogdb_b75b585a/data/db1/base/16423/14619
lrwx------. 1 omm dbgrp 64 Nov 23 07:15 98 -> /data/mogdb_b75b585a/data/db1/base/16423/14743
......
lr-x------. 1 omm dbgrp 64 Nov 23 07:15 31 -> pipe:[156210]
l-wx------. 1 omm dbgrp 64 Nov 23 07:15 30 -> pipe:[156920]
l-wx------. 1 omm dbgrp 64 Nov 23 07:15 3 -> /data/mogdb_b75b585a/data/db1/pg_ctl.lock
lr-x------. 1 omm dbgrp 64 Nov 23 07:15 29 -> pipe:[156920]
lrwx------. 1 omm dbgrp 64 Nov 23 07:15 28 -> /data/mogdb_b75b585a/data/db1/pg_cbm/pg_xlog_1_0000000009000258_0000000000000000.cbm
lrwx------. 1 omm dbgrp 64 Nov 23 07:15 27 -> /data/mogdb_b75b585a/data/db1/global/pg_dw_single
lrwx------. 1 omm dbgrp 64 Nov 23 07:15 26 -> /data/mogdb_b75b585a/data/db1/global/pg_dw
lr-x------. 1 omm dbgrp 64 Nov 23 07:15 25 -> pipe:[156209]
......
我们可以看到26号文件即double write文件。对于double write双写的情况,mogdb也提供了一个如下的性能试图,便于进行查看读写情况:
postgres=# select * from dbe_perf.global_double_write_status;
node_name | curr_dwn | curr_start_page | file_trunc_num | file_reset_num | total_writes | low_threshold_writes | high_threshold_writes | total_pages | low_threshold_pages | high_threshold_pages
-----------+----------+-----------------+----------------+----------------+--------------+----------------------+-----------------------+-------------+---------------------+----------------------
dn_6001 | 1 | 9759 | 30 | 1 | 26 | 1 | 18 | 16342 | 6 | 14411
(1 row)
postgres=# checkpoint;
CHECKPOINT
postgres=# select * from dbe_perf.global_double_write_status;
node_name | curr_dwn | curr_start_page | file_trunc_num | file_reset_num | total_writes | low_threshold_writes | high_threshold_writes | total_pages | low_threshold_pages | high_threshold_pages
-----------+----------+-----------------+----------------+----------------+--------------+----------------------+-----------------------+-------------+---------------------+----------------------
dn_6001 | 1 | 9759 | 31 | 1 | 26 | 1 | 18 | 16342 | 6 | 14411
(1 row)
postgres=#
最后简单总结一下检查点线程:
1、检查点线程是单线程,从功能上来讲跟其他关系型数据库类型,可以理解为一个触发事件;通知bgwriter和pagewriter等线程写脏页。
2、MogDB使用的是增量检查点,通过增量检查点技术,可以极大的降低实例崩溃后的恢复时间,同时由于让数据库写脏更加均匀,因此性能相对稳定(实际上目前很多国产数据库还没有增量检查点技术)。
3、MogDB提供了多种存储引擎,其中MOT 内存表和增量检查点技术是不兼容的。