背景
通常普通的监控会包括系统资源的监控:
cpu
io
内存
网络
等,但是仅凭资源的监控,当问题发生时,如何快速的定位到问题在哪里?需要更高级的监控:
更高级的监控方法通常是从数据库本身的特性触发,但是需要对数据库具备非常深刻的理解,才能做出好的监控和诊断系统。属于专家型或叫做经验型的监控和诊断系统。
[[《[未完待续] PostgreSQL 一键诊断项 - 珍藏级》](http://www.sipaiyibiao.com/article/liuliangyibiao/625.html)
《PostgreSQL 实时健康监控 大屏 - 低频指标 - 珍藏级》
《PostgreSQL 实时健康监控 大屏 - 高频指标(服务器) - 珍藏级》
《PostgreSQL 实时健康监控 大屏 - 高频指标 - 珍藏级》
《PostgreSQL pgmetrics - 多版本、健康监控指标采集、报告》
《PostgreSQL pg_top pgcenter - 实时top类工具》
《PostgreSQL、Greenplum 日常监控 和 维护任务 - 最佳实践》
《PostgreSQL 如何查找TOP SQL (例如IO消耗最高的SQL) (包含SQL优化内容) - 珍藏级》
《PostgreSQL 锁等待监控 珍藏级SQL - 谁堵塞了谁》
然而数据库在不断的演进,经验型的诊断系统好是好,但是不通用,有没有更加通用,有效的发现系统问题的方法?
AWS与Oracle perf insight的思路非常不错,实际上就是等待事件的统计追踪,作为性能诊断的方法。
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html
《AWS performance insight》
简单来说就是对系统不停的打点,例如每秒一个采样,仅记录这一秒数据库活跃的会话(包括等待中的会话),等待事件,QUERY,时间,用户,数据库。这几个指标。
活跃度会话,不管是在耗费CPU,还是在等待(锁,IO)或者其他,实际上都是占用了资源的。可以算出平均的活跃会话(例如10秒的平均值,5秒的平均值)(avg active sessions)。
这个avg active sessions是一个值,这个值和数据库实例的CPU个数进行比较,就可以衡量出系统是否存在瓶颈(当avg active sessions超过CPU个数时,说明存在瓶颈)。
当某个时间窗口存在瓶颈,瓶颈在哪里,则可以通过这个时间窗口内的打点明细,进行统计。等待事件,QUERY,用户,数据库。
PostgreSQL打点的方法也很多:
1、(推荐)通过pg_stat_activity 内存中的动态视图获取,每秒取一次ACTIVE的内容(例如:会话ID,等待事件,QUERY,时间,用户,数据库)。
https://www.postgresql.org/docs/11/monitoring-stats.html#MONITORING-STATS-VIEWS
2、(不推荐)开启审计日志,在审计日志中获取,这个在高并发系统中,不太好用。并且审计日志是在结束时打印,一个QUERY的中间执行过程并不完全是占用CPU或其他资源的,所以审计日志获取的信息对于perf insight并没有什么效果。
perf insight的入门门槛低,可以摆平很多问题,在出现问题时快速定位到问题SQL,问题的等待事件在哪里。结合经验型的监控,可以构建PG非常强大的监控、诊断、优化体系。
perf insight 实现讲解
pic
pic
pic
pic
pic
pic
pic
举例1
会话1
postgres=# begin;
BEGIN
postgres=# lock table abc in access exclusive mode ;
LOCK TABLE
会话2
postgres=# select * from abc;
从pg_stat_activity获取状态,可以看到会话2在等待,会话处于active状态,这种消耗需要被记录到avg active session中,用来评估资源消耗指标。
postgres=# select now(),state,datname,usename,wait_event_type,wait_event,query from pg_stat_activity where state in (‘active’, ‘fastpath function call’);
now | state | datname | usename | wait_event_type | wait_event | query
-------------------------------±-------±---------±---------±----------------±-----------±-------------------------------------------------------------------------------------------
2019-01-25 21:17:28.540264+08 | active | postgres | postgres | | | select datname,usename,query,state,wait_event_type,wait_event,now() from pg_stat_activity;
2019-01-25 21:17:28.540264+08 | active | postgres | postgres | Lock | relation | select * from abc;
(2 rows)
举例2
使用pgbench压测数据库,每秒打点,后期进行可视化展示
pgbench -i -s 100
1、压测只读
pgbench -M prepared -n -r -P 1 -c 64 -j 64 -T 300 -S
2、查看压测时的活跃会话状态
postgres=#
select now()::timestamptz(0),state,
datname,usename,wait_event_type,wait_event,query
from pg_stat_activity
where state in
(‘active’, ‘fastpath function call’)
and pid<>pg_backend_pid();
now | state | datname | usename | wait_event_type | wait_event | query
---------------------±-------±---------±---------±----------------±-----------±------------------------------------------------------
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | Client | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts WHERE aid = $1;
2019-01-25 21:28:52 | active | postgres | postgres | | | SELECT abalance FROM pgbench_accounts