今天维护 Centreon+nagios分布式报警时候。一台子节点有问题,把上面的报警迁移到另外一台上面。之前有一个critical和一个warn报警是之前的采集服务器产生。
因此始终无法消除。
急中生智,我到数据库中去寻找关联。
Centreon的代码是高度封装的。其连接也封装在数据表中。最后定位到ServiceXML .php页面中。
整理出如下SQL,是显示Critical报警的。
select no_s.name1 as host_name, nagios_instances.instance_name as instance_name,
no_h.object_id as host_object_id, nhs.scheduled_downtime_depth as host_scheduled_downtime_depth,
nhs.current_state as host_current_state, nhs.problem_has_been_acknowledged as host_problem_has_been_acknowledged,
nhs.passive_checks_enabled as host_passive_checks_enabled, nhs.active_checks_enabled as host_active_checks_enabled,
no_s.name2 as service_description, no_s.object_id as service_object_id,
nss.process_performance_data as service_process_performance_data, nss.current_state as service_current_state,
nss.output as service_output, nss.state_type as service_state_type, nss.current_check_attempt as service_current_check_attempt,
nss.status_update_time as service_status_update_time, unix_timestamp(nss.last_state_change) as service_last_state_change,
unix_timestamp(nss.last_hard_state_change) as service_last_hard_state_change, unix_timestamp(nss.last_check) as service_last_check,
unix_timestamp(nss.next_check) as service_next_check, nss.problem_has_been_acknowledged as service_problem_has_been_acknowledged,
nss.passive_checks_enabled as service_passive_checks_enabled, nss.active_checks_enabled as service_active_checks_enabled, nss.event_handler_enabled as service_event_handler_enabled, nss.is_flapping as service_is_flapping,
nss.scheduled_downtime_depth as service_scheduled_downtime_depth, nss.flap_detection_enabled as service_flap_detection_enabled,cv.varvalue as criticality, cv.varvalue IS NULL as isnull
from nagios_objects as no_h, nagios_hoststatus as nhs, nagios_servicestatus as nss, nagios_instances, nagios_customvariablestatus cvs, nagios_objects as no_s LEFT JOIN nagios_customvariablestatus cv ON (no_s.object_id = cv.object_id AND cv.varname = 'CRITICALITY_LEVEL');
where nss.current_state = 2
简单说来,就是找nagios_servicestatus这张表,current_state = 1是warn报警 current_state = 2是critical
现在简单了,找到那个重复的报警,直接delete即可。
刷新界面,发现那两个永远消失的报警没有了。