硬件层面可以使用snmp协议,也可以使用第三方监控软件,比如zabbix,ganlia,nagios等等
下面主要列一下GP内部的监控项
可以通过以下自带的命令检查GP的状态信息
gpstate
需要更加详细一些的信息,加上-s
gpstate -s
gpstate -m
gpstate -c
gpstate -f
检查磁盘空间使用,GP里面就可以查看到对应分区的使用情况
warehouse=# SELECT dfsegment,dfhostname,dfdevice,dfspace FROM gp_toolkit.gp_disk_free ORDER BY dfsegment;
dfsegment | dfhostname | dfdevice | dfspace
-----------+------------------+------------+-----------
0 | 192-168-173-234 | /dev/sdb1 | 288538868
1 | 192-168-173-234 | /dev/sdc1 | 288538772
2 | 192-168-173-95 | /dev/sdb1 | 970200396
3 | 192-168-173-95 | /dev/sda3 | 97576168
4 | 192-168-175-74 | /dev/sdb1 | 288538068
5 | 192-168-175-74 | /dev/sdc1 | 288538640
(6 rows)
warehouse=# SELECT sodddatname,pg_size_pretty(sodddatsize) FROM gp_toolkit.gp_size_of_database ORDER BY sodddatname;
sodddatname | sodddatsize
-------------+-------------
gpperfmon | 118882710
warehouse | 4244832662
(2 rows)
查看表使用空间的情况
SELECT relname AS name, pg_size_pretty(sotdsize) AS size, pg_size_pretty(sotdtoastsize)
AS toast, sotdadditionalsize AS other
FROM gp_toolkit.gp_size_of_table_disk as sotd, pg_class
WHERE sotd.sotdoid=pg_class.oid ORDER BY relname;
查看索引的使用情况
SELECT soisize, relname as indexname
FROM pg_class, gp_toolkit.gp_size_of_index
WHERE pg_class.oid=gp_size_of_index.soioid
AND pg_class.relkind='i';
检查数据分布偏差
查看表结构和分布键
=# \d+ sales
Table "retail.sales"
Column | Type | Modifiers | Description
-------------+--------------+-----------+-------------
sale_id | integer | |
amt | float | |
date | date | |
Has OIDs: no
Distributed by: (sale_id)
查看数据分布
SELECT gp_segment_id, count(*)
FROM table_name GROUP BY gp_segment_id;
查看表操作信息,比如创建表,分析表的时间
SELECT schemaname as schema, objname as table,
usename as role, actionname as action,
subtype as type, statime as time
FROM pg_stat_operations
WHERE objname='cust';
创建session_level_memory_consumption视图
$ psql -d testdb -f $GPHOME/share/postgresql/contrib/gp_session_state.sql
查询视图
select * from session_state.session_level_memory_consumption;
节点状态检查可以使用以下几条语句:
SELECT * FROM gp_segment_configuration
WHERE status <> 'u';
SELECT * FROM gp_segment_configuration
WHERE mode = 'c';
SELECT * FROM gp_segment_configuration
WHERE mode = 'r';
SELECT * FROM gp_segment_configuration
WHERE preferred_role <> role;
运行分布式查询,每个segent都需要返回
SELECT gp_segment_id, count(*)
FROM gp_dist_random('pg_class')
GROUP BY 1;
检查流复制状态,正常返回streaming
psql dbname -c 'SELECT procpid, state FROM pg_stat_replication;'
检查服务器配置,需要使用root用户
gpcheck -f /home/gpadmin/all_hosts
检查是否有足够的IO
gpcheckperf -f /home/gpadmin/seg_hosts -d /data01 -d /data02 -r ds
目录一致性检查,根据自己需要可以设置隔一段时间检查一次,官方建议每周一次
gpcheckcat -O
检查表上是否缺少统计信息
SELECT * FROM gp_toolkit.gp_stats_missing;
检查膨胀的表,此类表不能通过常规vacuum回收空间,需要full vacuum回收
SELECT * FROM gp_toolkit.gp_bloat_diag;
以上检查的相关的命令gpstate,gpcheck,gpcheckperf,gpcheckcat在使用的时候可以查看相关帮助文档,做出相应检查。
参考:https://gpdb.docs.pivotal.io/43300/admin_guide/monitoring/monitoring.html
https://gpdb.docs.pivotal.io/43300/admin_guide/managing/monitor.html