一、简介
Nagios是一款开源分布式监控软件,能够有效监控节点状态,交换机、路由器等网络设置。有关Nagios的监控框架、实现原理及配置文档,详细可阅读这边文章。
Nagios功能:
- 监控网络服务(SMTP、POP3、HTTP、FTP、PING 等);
- 监控本机及远程主机资源(CPU 负荷、磁盘利用率、进程 等);
- 允许用户编写自己的插件来监控特定的服务,方便地扩展自己服务的检测方法,支持多种开发语言(Shell、Perl、Python、PHP 等)
- 具备定义网络分层结构的能力,用”parent”主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
- 当服务或主机问题产生与解决时将告警发送给联系人(通过 EMail、短信、用户定义方式);
- 可以支持并实现对主机的冗余监控;
- 可用 WEB 界面用于查看当前的网络状态、通知和故障历史、日志文件等;
在这里,Nagios用于监控Openstack集群每个物理节点上的所运行的基础服务。
二、部署脚本
安装部署顺序:
- 在controller01上安装nagios及其插件,并将所有待监控的物理节点添加到/etc/nagios/objects/hosts.cfg中,然后再将所有节点上所有服务添加到nagios的服务列表/etc/nagios/objects/services.cfg中,然后检查controller01上nagios配置项是否正确,如果正确,启动nagios服务并设置帐号密码。
- 在所有节点上,安装nagios远程插件执行器nrpe,并根据不同节点上所监听的不同服务,配置各自的nrpe.cfg,最后启动所有的nrpe服务,完成对服务的监控。
部署完成后,浏览器访问:http://192.168.2.11/nagios/,输入帐号密码,可以看到如下的Web监控平台:
Nagios Rest API:访问http://192.168.2.11/nagios/cgi-bin/statusjson.cgi?query=servicelist&hostname=XXX,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | { "format_version": 0, "result": { "query_time": 1511874587000, "cgi": "statusjson.cgi", "user": "nagiosadmin", "query": "servicelist", "query_status": "beta", "program_start": 1504500771000, "last_data_update": 1511874580000, "type_code": 0, "type_text": "Success", "message": "" }, "data": { "selectors": { }, "servicelist": { "controller01": { "AODH_API": 2, "AODH_EVALUATOR": 2, "AODH_LISTENER": 2, "AODH_NOTIFIER": 2, "CEILOMETER_API": 2, "CEILOMETER_CENTRAL": 2, "CEILOMETER_COLLECTOR": 2, "CEILOMETER_NOTIFICATION": 2, …… } } } } |
一键部署脚本install-configure-nagios.sh,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | #!/bin/sh . ../0-set-config.sh ./style/print-split.sh "Nagios Installation" ### [controller01] 安装软件 yum install -y nagios nagios-devel nagios-plugins* gd gd-devel php gcc glibc glibc-common openssl ###修改/etc/nagios/objects/commands.cfg文件 cat ../conf/nagios/check_nrpe | tee -a /etc/nagios/objects/commands.cfg > /dev/null ### [controller01] 修改/etc/nagios/objects/hosts.cfg文件 echo > /tmp/hosts.cfg.new for ((i=0; i<${#nodes_map[@]}; i+=1)); do name=${nodes_name[$i]}; ip=${nodes_map[$name]}; sed \ -e "s/FQDN/$name/g" \ -e "s/HOSTNAME/$name/g" \ -e "s/IP/$ip/g" \ ../conf/nagios/host >> /tmp/hosts.cfg.new done; \cp /tmp/hosts.cfg.new /etc/nagios/objects/hosts.cfg echo "cfg_file=/etc/nagios/objects/hosts.cfg" >> /etc/nagios/nagios.cfg ### [controller01] 修改/etc/nagios/objects/services.cfg文件 echo > /tmp/services.cfg.new for h in ${!controller_map[@]} do sed -e "s/FQDN/$h/g" ../conf/nagios/controller_services >> /tmp/services.cfg.new if [[ "$networker_split" = "no" ]];then sed -e "s/FQDN/$h/g" ../conf/nagios/networker_services >> /tmp/services.cfg.new fi done if [[ "$networker_split" = "yes" ]];then for h in ${!networker_map[@]} do sed -e "s/FQDN/$h/g" ../conf/nagios/networker_services >> /tmp/services.cfg.new done fi for h in ${!hypervisor_map[@]} do sed -e "s/FQDN/$h/g" ../conf/nagios/compute_services >> /tmp/services.cfg.new done \cp /tmp/services.cfg.new /etc/nagios/objects/services.cfg echo "cfg_file=/etc/nagios/objects/services.cfg" >> /etc/nagios/nagios.cfg ### [controller01] 检查配置项是否正确 echo "=== TRACE MESSAGE ===>>> " "按任意键检查配置项是否正确[-]" read answer nagios -v /etc/nagios/nagios.cfg echo "=== TRACE MESSAGE ===>>> " "按任意键继续[-]" read answer ### [controller01] 配置服务 echo "=== TRACE MESSAGE ===>>> " "配置服务" systemctl enable nagios && systemctl start nagios ### [controller01] 设置nagiosadmin登录密码 echo "=== TRACE MESSAGE ===>>> " "设置nagiosadmin登录密码" htpasswd -b -c /etc/nagios/passwd nagiosadmin $password_nagiosadmin ### 安装nrpe ./pssh-exe A "yum install -y nrpe nagios-plugins* openssl" ### [所有节点] 编辑/etc/nagios/nrpe.cfg配置文件,添加服务端地址 ./pssh-exe A "\cp /etc/nagios/nrpe.cfg /etc/nagios/nrpe.cfg.bak" ./pssh-exe A "sed -i -e 's#^allowed_hosts.*#allowed_hosts=127.0.0.1,${local_network}#' /etc/nagios/nrpe.cfg" ### 定义check_nrpe监控脚本 cat ../conf/nagios/controller_nrpe_commands > /tmp/controller_nrpe_commands if [[ "$networker_split" = "yes" ]];then cat ../conf/nagios/networker_nrpe_commands > /tmp/networker_nrpe_commands ./scp-exe N /tmp/networker_nrpe_commands /etc/nagios/nrpe.cfg else cat ../conf/nagios/networker_nrpe_commands >> /tmp/controller_nrpe_commands fi ./scp-exe C /tmp/controller_nrpe_commands /etc/nagios/nrpe.cfg cat ../conf/nagios/compute_nrpe_commands > /tmp/compute_nrpe_commands ./scp-exe H /tmp/compute_nrpe_commands /etc/nagios/nrpe.cfg ### [所有节点] 配置服务 ./pssh-exe A "systemctl enable nrpe && systemctl start nrpe" echo -n "访问nagios web服务确认安装成功:http://"${controller_map[$ref_host]}"/nagios" read answer |
说明:脚本中所有到nagios配置模版文件来源于这里。
三、参考文档
Table Of Contents · Nagios Core Documentation
四、源码
五、系列文章
“Openstack云平台脚本部署”系列文章目录如下:
Openstack云平台脚本部署之Galera高可用集群配置(二)
Openstack云平台脚本部署之RabbitMQ高可用集群部署(三)
Openstack云平台脚本部署之Memcached配置(五)
Openstack云平台脚本部署之Keystone认证服务配置(六)
Openstack云平台脚本部署之Glance镜像服务配置(七)
Openstack云平台脚本部署之Nova计算服务配置(八)
Openstack云平台脚本部署之Neutron网络服务配置(九)
Openstack云平台脚本部署之Dashboard配置(十)
Openstack云平台脚本部署之Cinder块存储服务配置(十一)
Openstack云平台脚本部署之Ceilometer数据收集服务配置(十二)
Openstack云平台脚本部署之Aodh告警服务配置(十三)
Openstack云平台脚本部署之Ceph存储集群配置(十四)