一般而言,常常采取的方法是配置hbase的监控输出到Ganglia,通过其自带的图表展示相应监控点。但是存在几个问题:
1、region很多过期的监控数据不会消失,导致很多无用监控点;
2、图表系统渲染太慢,监控点多以后不可接受;
3、监控数据存储无法水平扩展,因为是存在本地磁盘文件
通过调研OpenTSDB,发现其可以很好地处理时间序列数据,具有很好的扩展性和查询。具体思路就是对FileSink进行扩展,让其打印输出我们关注的监控点,并格式化为OpenTSDB支持的JSON数据,输出到文件中,通过另外一个Agent将这些监控数据实时入库到OpenTSDB中。
扩展的FileSink打成jar包,放入$HBASE_HOME/lib目录下,同时修改$HBASE_HOME/conf目录中的hadoop-metrics2-hbase.properties文件:
*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
# default sampling period
*.period=10
# Below are some examples of sinks that could be used
# to monitor different hbase daemons.
hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
hbase.sink.file-all.filename=all.metrics
public abstract class JsonFileSink implements MetricsSink {
private static final String FILENAME_KEY = "filename";
private PrintWriter writer;
private Set<String> specialMetrics;
@Override
public void init(SubsetConfiguration conf) {
specialMetrics = buildConsiderableMetrics();
String filename = conf.getString(FILENAME_KEY);
try {
writer = filename == null
? new PrintWriter(System.out)
: new PrintWriter(new FileWriter(new File(filename), true));
}
catch (Exception e) {
throw new MetricsException("Error creating "+ filename, e);
}
}
@Override
public void putMetrics(MetricsRecord record) {
Map<String, String> tags = new HashMap<String, String>();
for (MetricsTag tag : record.tags()) {
if (!StringUtils.isEmpty(tag.name()) && ! StringUtils.isEmpty(tag.value())) {
tags.put(tag.name(), tag.value());
}
}
for (AbstractMetric metric : record.metrics()) {
if (specialMetrics.contains(metric.name())) {
DataPoint dp = new DataPoint();
dp.setMetric(Joiner.on(".").join(record.context(), record.name(), metric.name()));
dp.setTimestamp(record.timestamp());
dp.setTags(tags);
dp.setValue(metric.value());
writer.print(dp.toString());
writer.println();
}
}
}
public abstract Set<String> buildConsiderableMetrics();
@Override
public void flush() {
writer.flush();
}
}