Nagios要求被监控端按照约定格式定时将数据发送到Nagios端。监控包括节点和服务2种。
节点监控约定数据格式如下:
[<timestamp>] PROCESS_HOST_CHECK_RESULT;<host_name>;<host_status>;<plugin_output>
[<timestamp>] PROCESS_HOST_CHECK_RESULT;<host_name>;<host_status>;<plugin_output>
格式很容易理解,数据提交时间戳,被监控节点名称,节点状态(UP/DOWN/UNREARCHABLE),插件自定义的额外数据。状态具体每个字段的解释如下:
1. timestamp is the time in time_t format (seconds since the UNIX epoch) that the host check was perfomed (or submitted). Please note the single space after the right bracket.
2. host_name is the short name of the host (as defined in the host definition)
3. host_status is the status of the host (0=UP, 1=DOWN, 2=UNREACHABLE)
4. plugin_output is the text output of the host check
1. timestamp is the time in time_t format (seconds since the UNIX epoch) that the host check was perfomed (or submitted). Please note the single space after the right bracket.
2. host_name is the short name of the host (as defined in the host definition)
3. host_status is the status of the host (0=UP, 1=DOWN, 2=UNREACHABLE)
4. plugin_output is the text output of the host check
服务监控约定数据格式如下:
[<timestamp>] PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>
[<timestamp>] PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>
数据提交时间戳,被监控节点名称,被监控的服务名称,服务状态(OK/WARNING/CRITICAL/UNKNOWN),插件自定义的额外数据。具体每个字段的解释如下:
1. timestamp is the time in time_t format (seconds since the UNIX epoch) that the service check was perfomed (or submitted). Please note the single space after the right bracket.
2. host_name is the short name of the host associated with the service in the service definition
3. svc_description is the description of the service as specified in the service definition
4. return_code is the return code of the check (0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN)
5. plugin_output is the text output of the service check (i.e. the plugin output)
SERVICE STATUS: First line of output | First part of performance data
output可以自定义显示更加详细的监测数据,显示在Nagios的status Information栏。Performance data显示在Performance Data栏,它就有特殊的格式要求,具体结构如下:
'label'=value[UOM];[warn];[crit];[min];[max] 'label'=value[UOM];[warn];[crit];[min];[max]
每个'label',value组合由空格分开,在我们系统中没有额外定义性能数据的UOM,warn,crit,min,max,具体含义可以参考:https://nagios-plugins.org/doc/guidelines.html#PLUGOUTPUT
我们系统中服务用的是被动,节点用的是主动ping,接下来说说我们系统中如何监控被监控节点的CPU,Memory,IO,Network使用情况,以CPU数据收集为主要解释对象: