由于墨天轮后台DB使用的PostgreSQL,对于数据库来说,监控必不可少,同时为了满足云平台上客户对PG监控的需求,开发了一套针对PG的无插件web实时监控程序。
pgDsh——性能数据历史记录
pgMonitor——无插件web实时监控
主要监控项
-
PG没有类似Oracle ASH或AWR的历史性能数据,如果需要监控,首先需要通过pgDsh记录PG部分指标的历史数据:
Session、Transactions、Tuplesin、Tuplesout、BlockI/O
。 -
另外就是实时监控数据:
Session、Lock、PreparedTransactions、Configuration、Rowsfetched/returned、Databasecapacity、Maxconnectionsinuse
。
后期会加入更多指标的监控以及告警功能、数据生命周期功能。
1、配置pgDsh数据
PG的性能统计数据主要通过pg_stat_database视图查询,针对保留历史性能数据的功能,首先在原库中创建 pg_db_stat_hist
表,然后通过crontab定时插入数据和清理历史数据。
1.1、创建pgDsh表和索引:
create table pg_db_stat_hist
(
snap_time timestamp(0) not null,
transactions bigint not null,
commits bigint not null,
rollbacks bigint not null,
inserts bigint not null,
updates bigint not null,
deletes bigint not null,
reads bigint not null,
hits bigint not null,
fetched bigint not null,
returned bigint not null,
total integer not null,
active integer not null,
idle integer not null,
snap_id serial not null
constraint pg_dsh_snap_id_pk primary key
);
create index idx_pg_dsh_snaptime on pg_db_stat_hist (snap_time);
1.2、通过crontab插入数据
目前设置每30秒采集一次数据,可通过设置修改sleep 30调整pgDsh的细粒度,crontab如下:
*/1 * * * * psql -d emcs -f "insertPgDsh.sql">>insertPgDsh.log
*/1 * * * * sleep 30; psql -d emcs -f "insertPgDsh.sql">>insertPgDsh.log
insertPgDsh.sql脚本如下:
INSERTO INTO pg_db_stat_hist
SELECT
(SELECT now()) AS nowtime,
(SELECT sum(xact_commit) + sum(xact_rollback) FROM pg_stat_database) AS transactions,
(SELECT sum(xact_commit) FROM pg_stat_database) AS commits,
(SELECT sum(xact_rollback) FROM pg_stat_database) AS rollbacks,
(SELECT sum(tup_inserted) FROM pg_stat_database) AS inserts,
(SELECT sum(tup_updated) FROM pg_stat_database) AS updates,
(SELECT sum(tup_deleted) FROM pg_stat_database) AS deletes,
(SELECT sum(blks_read) FROM pg_stat_database) AS reads,
(SELECT sum(blks_hit) FROM pg_stat_database) AS hits,
(SELECT sum(tup_fetched) FROM pg_stat_database) AS fetched,
(SELECT sum(tup_returned) FROM pg_stat_database) AS returned,
(SELECT count(*) FROM pg_stat_activity) AS total,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS active,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'idle') AS idle;
1.3、通过crontab清理数据
历史数据保留7天,每天晚上凌晨1点执行,crontab脚本如下:
1 1 * * * psql -d emcs -f "purgePgDsh.sql">>purgePgDsh.log
purgePgDsh.sql脚本如下:
delete from pg_db_stat_hist where snap_time < now() - interval '7 d';
2、查询pgDsh数据
默认查询最近一个小时的数据,也可以根据开始结束时间查询历史数据,由于pg_stat_database表中transactions
、 commits
等字段都是累加值,需要减去上一个值,可直接使用PG的lag函数,SQL如下:
select
snap_time,
transactions-lag(transactions,1) over(order by snap_id) transactions,
commits-lag(commits,1) over(order by snap_id) commits,
rollbacks-lag(rollbacks,1) over(order by snap_id) rollbacks,
inserts-lag(inserts,1) over(order by snap_id) inserts,
updates-lag(updates,1) over(order by snap_id) updates,
deletes-lag(deletes,1) over(order by snap_id) deletes,
reads-lag(reads,1) over(order by snap_id) reads,
hits-lag(hits,1) over(order by snap_id) hits,
fetched-lag(fetched,1) over(order by snap_id) fetched,
returned-lag(returned,1) over(order by snap_id) returned,
total, active, idle
from pg_dsh where
snap_time between #{begin} and #{end}
3、查询活动会话
select
pid, datname, usename, application_name, client_addr::character varying,
to_char(backend_start, 'yyyy-mm-dd hh24:mm') backend_start,
state, wait_event_type||': '||wait_event as wait_event,
array_to_string(pg_blocking_pids(pid),',') as blocking_pids
from
pg_stat_activity
4、查询锁
select
pid, locktype, datname, relation::integer, page, tuple, virtualxid, transactionid,
classid::integer, objid, objsubid, virtualtransaction, mode, granted
from
pg_locks l
left outer join pg_database d on (l.database = d.oid)
order by
pid, locktype
5、查询预备事物
PREPARE TRANSACTION是PG为了允许外部事物管理器提交回滚预备事物设计的,比如需要程序异步验证、需要对其他数据库进行异步操作等情况,该预备事物会一直持有锁,且对数据库性能有影响,如果不使用该特性,建议将max_prepared_transactions设置为零,防止勿操作设置预备事物。
select
gid, database, owner, transaction,to_char(prepared,'yyyy-mm-dd hh24:mm') as prepared
from
pg_prepared_xacts
order by
gid, database, owner
6、查询参数
select
name, category, setting, unit, short_desc
from
pg_settings
7、查询基础数据
主要查询PG版本、数据读取占返回比、可用连接比、数据库大小
select
(select substring(version(),0,16)) as version,
(select round(sum(tup_fetched)/sum(tup_returned)*100,2) from pg_stat_database where datname='emcs') as fetchper,
(select trunc((a.total/b.maxcon)*100) from
(select count(*) as total from pg_stat_activity) a,
(select setting::float as maxcon from pg_settings where name = 'max_connections') b) as conper,
(select pg_size_pretty(sum(pg_database_size(oid))) from pg_database) as dbsize
效果图如下:
云服务:https://cs.enmotech.com
demo查看地址(需购买标准云服务):https://cs.enmotech.com/monitor
目前云服务内测,注册免费领取2万优惠券,可直接兑换一套库一年的标准云服务