监控方案
storm的所有状态写在zookeeper中,而且nimbus和supervisor通信完全通过zookeeper;
可以通过访问zookeeper得知集群状态,如nimbus、supervisor是否正常,线上作业是否状态良好?
要求:
1、可配置化;
2、当numbus或supervisor停掉后,发告警出来;
3、当监控的作业停掉后,告警出来;
(1)配置文件storm-monitor-config.xml
<?xml version="1.0" encoding="UTF-8"?>
<monitor>
<!-- 值班人,可以有多个 -->
<person>
<tel>18819457205</tel>
<describe>林玲</describe>
</person>
<nimbus>
<alarm>[Storm Alarm]: Nimbus has stopped, pls start.</alarm>
<describe>Nimbus</describe>
</nimbus>
<Supervisor>
<!-- Supervisor 的个数 -->
<Supervisor-sum>5</Supervisor-sum>
<alarm>[Storm Alarm]: Supervisor is ${num}, the right is 5.</alarm>
</Supervisor>
<Topology>
<!-- 需要监控的topology name,可以有多个 -->
<topology-name>test1</topology-name>
<topology-name>test2</topology-name>
<alarm>[Storm Alarm]: Topology ${name} is down, pls check and start.</alarm>
</Topology>
<zookeeper>
<!-- zk的配置 -->
<host>hadoop-senior.ibeifeng.com:2181, hadoop-senior02.ibeifeng.com:2181</host>
<timeout>3000</timeout>
</zookeeper>
</monitor>
(2)model,读取配置文件;
package storm.monitor;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
// 对应 storm-m