目标
ceph (luminous 版) 默认已经自带 zabbix 监控支持
配置 zabbix 相应监控
说明
当前使用环境, ceph luminous 版本 ceph-12.2.0-0.el7.x86_64
当前 zabbix 监控支持, 需要添加 zabbix 模块
监控数据项由 ceph 自身提供, 并通过 trapper 模式向 zabbix server 提交监控数据
zabbix 监控针对整个 ceph cluster 整体健康状态
只需要在其中一台可以访问 ceph mgr 服务的电脑中启用监控程序即可
信息参考
手动部署 ceph mgr (luminous 版)
trapper 模式参考 zabbix sender 配置方法
zabbix 官方说明
ceph 官方说明
ceph zabbix plugin
强调:
只需要在 ceph 集群中其中一台具有访问 mgr 权限的电脑中执行即可
加载模块
[root@cephsvr-128040 ~]# ceph mgr module enable zabbix
配置
定义 zabbix server
[root@cephsvr-128040 ~]# ceph zabbix config-set zabbix_host gx-yun-084044.vclound.com
Configuration option zabbix_host updated
定义当前被监控电脑
[root@cephsvr-128040 ~]# ceph zabbix config-set identifier cephsvr-128040.vclound.com
Configuration option identifier updated
定义 zabbix-sender 位置
[root@cephsvr-128040 ~]# ceph zabbix config-set zabbix_sender /etc/apps/svr/zabbix/bin/zabbix_sender
Configuration option zabbix_sender updated
定义 zabbix server port
[root@cephsvr-128040 ~]# ceph zabbix config-set zabbix_port 10051
Configuration option zabbix_port updated
定义 item 周期时间
[root@cephsvr-128040 ~]# ceph zabbix config-set interval 60
Configuration option interval updated
显示配置
[root@cephsvr-128040 ~]# ceph zabbix config-show
{"zabbix_host": "gx-yun-084044.vclound.com", "identifier": "cephsvr-128040.vclound.com", "zabbix_sender": "/etc/apps/svr/zabbix/bin/zabbix_sender", "interval": 60, "zabbix_port": 10051}
zabbix server 配置
模板
zabbix_tempalte.xml 位置
[root@cephsvr-128040 ~]# rpm -ql ceph-mgr | grep xml
/usr/lib64/ceph/mgr/zabbix/zabbix_template.xml
导入模板
注意
模板默认对应 zabbix-3.x 版, 假如需要导入到 zabbix-2.x 中, 则需要修改 zabbix_temaplte.xml
<?xml version="1.0" encoding="UTF-8"?>
<zabbix_export>
<version>2.0</version> <- 修改成 2.0 即可导入
导入模板方法
浏览时候选择本地模板文件, 点击 import 即可导入模板
添加主机
指定主机对应 template
修改数据库 allow host
为了确保每个 template 中的 trapper 都指定 allowed host, 最直接的方法是修改数据库
参考
获得 template id
MariaDB [(none)]> use zabbix;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [zabbix]> select hostid from hosts where name='ceph-mgr Zabbix module';
+--------+
| hostid |
+--------+
| 10395 |
+--------+
1 row in set (0.00 sec)
获得对应 item
MariaDB [zabbix]> select itemid, name, key_, type, trapper_hosts from items where hostid=10395;
+--------+-----------------------------------------------+-----------------------------+------+---------------+
| itemid | name | key_ | type | trapper_hosts |
+--------+-----------------------------------------------+-----------------------------+------+---------------+
| 35793 | Number of Monitors | ceph.num_mon | 2 | |
| 35794 | Number of OSDs | ceph.num_osd | 2 | |
| 35795 | Number of OSDs in state: IN | ceph.num_osd_in | 2 | |
| 35796 | Number of OSDs in state: UP | ceph.num_osd_up | 2 | |
| 35797 | Number of Placement Groups | ceph.num_pg | 2 | |
| 35798 | Number of Placement Groups in Temporary state | ceph.num_pg_temp | 2 | |
| 35799 | Number of Pools | ceph.num_pools | 2 | |
| 35800 | Ceph OSD avg fill | ceph.osd_avg_fill | 2 | |
| 35801 | Ceph backfill full ratio | ceph.osd_backfillfull_ratio | 2 | |
| 35802 | Ceph full ratio | ceph.osd_full_ratio | 2 | |
| 35803 | Ceph OSD Apply latency Avg | ceph.osd_latency_apply_avg | 2 | |
| 35804 | Ceph OSD Apply latency Max | ceph.osd_latency_apply_max | 2 | |
| 35805 | Ceph OSD Apply latency Min | ceph.osd_latency_apply_min | 2 | |
| 35806 | Ceph OSD Commit latency Avg | ceph.osd_latency_commit_avg | 2 | |
| 35807 | Ceph OSD Commit latency Max | ceph.osd_latency_commit_max | 2 | |
| 35808 | Ceph OSD Commit latency Min | ceph.osd_latency_commit_min | 2 | |
| 35809 | Ceph OSD max fill | ceph.osd_max_fill | 2 | |
| 35810 | Ceph OSD min fill | ceph.osd_min_fill | 2 | |
| 35811 | Ceph nearfull ratio | ceph.osd_nearfull_ratio | 2 | |
| 35812 | Overall Ceph status | ceph.overall_status | 2 | |
| 35813 | Overal Ceph status (numeric) | ceph.overall_status_int | 2 | |
| 35814 | Ceph Read bandwidth | ceph.rd_bytes | 2 | |
| 35815 | Ceph Read operations | ceph.rd_ops | 2 | |
| 35816 | Total bytes available | ceph.total_avail_bytes | 2 | |
| 35817 | Total bytes | ceph.total_bytes | 2 | |
| 35818 | Total number of objects | ceph.total_objects | 2 | |
| 35819 | Total bytes used | ceph.total_used_bytes | 2 | |
| 35820 | Ceph Write bandwidth | ceph.wr_bytes | 2 | |
| 35821 | Ceph Write operations | ceph.wr_ops | 2 | |
+--------+-----------------------------------------------+-----------------------------+------+---------------+
29 rows in set (0.00 sec)
定义 allow host
把之前添加了 ceph zabbix module 的服务器 IP 地址 update 到表中
MariaDB [zabbix]> update items set trapper_hosts='xx.199.128.40,xx.199.128.214,xx.199.128.215' where hostid=10395;
Query OK, 29 rows affected (0.00 sec)
Rows matched: 29 Changed: 29 Warnings: 0
MariaDB [zabbix]> select itemid, name, key_, type, trapper_hosts from items where hostid=10395;
+--------+-----------------------------------------------+-----------------------------+------+---------------------------------------------+
| itemid | name | key_ | type | trapper_hosts |
+--------+-----------------------------------------------+-----------------------------+------+---------------------------------------------+
| 35793 | Number of Monitors | ceph.num_mon | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35794 | Number of OSDs | ceph.num_osd | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35795 | Number of OSDs in state: IN | ceph.num_osd_in | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35796 | Number of OSDs in state: UP | ceph.num_osd_up | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35797 | Number of Placement Groups | ceph.num_pg | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35798 | Number of Placement Groups in Temporary state | ceph.num_pg_temp | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35799 | Number of Pools | ceph.num_pools | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35800 | Ceph OSD avg fill | ceph.osd_avg_fill | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35801 | Ceph backfill full ratio | ceph.osd_backfillfull_ratio | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35802 | Ceph full ratio | ceph.osd_full_ratio | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35803 | Ceph OSD Apply latency Avg | ceph.osd_latency_apply_avg | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35804 | Ceph OSD Apply latency Max | ceph.osd_latency_apply_max | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35805 | Ceph OSD Apply latency Min | ceph.osd_latency_apply_min | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35806 | Ceph OSD Commit latency Avg | ceph.osd_latency_commit_avg | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35807 | Ceph OSD Commit latency Max | ceph.osd_latency_commit_max | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35808 | Ceph OSD Commit latency Min | ceph.osd_latency_commit_min | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35809 | Ceph OSD max fill | ceph.osd_max_fill | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35810 | Ceph OSD min fill | ceph.osd_min_fill | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35811 | Ceph nearfull ratio | ceph.osd_nearfull_ratio | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35812 | Overall Ceph status | ceph.overall_status | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35813 | Overal Ceph status (numeric) | ceph.overall_status_int | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35814 | Ceph Read bandwidth | ceph.rd_bytes | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35815 | Ceph Read operations | ceph.rd_ops | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35816 | Total bytes available | ceph.total_avail_bytes | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35817 | Total bytes | ceph.total_bytes | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35818 | Total number of objects | ceph.total_objects | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35819 | Total bytes used | ceph.total_used_bytes | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35820 | Ceph Write bandwidth | ceph.wr_bytes | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
| 35821 | Ceph Write operations | ceph.wr_ops | 2 | xx.199.128.40,xx.199.128.214,xx.199.128.215 |
+--------+-----------------------------------------------+-----------------------------+------+---------------------------------------------+
29 rows in set (0.00 sec)
注, 上面只是一个允许添加多个 trapper allow host 的例子, 实际上只需要添加一台服务器 ip 地址
确认 trapper
参考下图
打开 zabbix 中新添加的 host , 打开其中一个 ceph item, 确认 type = zabbix trapper, allowed hosts = 你 update 数据库中的 ip 地址
ceph cron job
利用 cron job, 每分钟自动上报一次 ceph 监控数据
[root@cephsvr-128040 ~]# cat /etc/cron.d/ceph
*/1 * * * * root ceph zabbix send
监控 screenshot
监控 ceph pool 可用空间
监控 ceph io
监控 ceph bandwidth
监控 ceph OSD latency