文章目录
本文将介绍下flink已经支持的指标reporters。主要围绕flink源码中的flink-metrics
的子模块展开。最后介绍下flink指标平台化实践。
1. 指标 reporters
flink 内置了多种指标 reporter ,如jmx、slf4j、graphite、prometheus、influxdb、statsd、datadog等。
1.1 flink-metrics-dropwizard
只是将flink内部定义的指标org.apache.flink.metrics.Metric
和dropwizard中定义的指标com.codahale.metrics.Metric
接口和子类互相包装转换。
并且实现了 ScheduledDropwizardReporter :
public static final String ARG_HOST = "host";
public static final String ARG_PORT = "port";
public static final String ARG_PREFIX = "prefix";
public static final String ARG_CONVERSION_RATE = "rateConversion";
public static final String ARG_CONVERSION_DURATION = "durationConversion";
// ------------------------------------------------------------------------
/**
* dropwizard 包中的 MetricRegistry
*/
protected final MetricRegistry registry;
/**
* dropwizard 包中的 ScheduledReporter
*/
protected ScheduledReporter reporter;
private final Map<Gauge<?>, String> gauges = new HashMap<>();
private final Map<Counter, String> counters = new HashMap<>();
private final Map<Histogram, String> histograms = new HashMap<>();
private final Map<Meter, String> meters = new HashMap<>();
/**
* 添加指标,需要将flink内部的Metric转换成dropwizard中的Metric,
* 再注册到 dropwizard 的 MetricRegistry 中
*/
@Override
public void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group) {
final String fullName = group.getMetricIdentifier(metricName, this);
synchronized (this) {
if (metric instanceof Counter) {
counters.put((Counter) metric, fullName);
registry.register(fullName, new FlinkCounterWrapper((Counter) metric));
}
else if (metric instanceof Gauge) {
gauges.put((Gauge<?>) metric, fullName);
registry.register(fullName, FlinkGaugeWrapper.fromGauge((Gauge<?>) metric));
} else if (metric instanceof Histogram) {
Histogram histogram = (Histogram) metric;
histograms.put(histogram, fullName);
if (histogram instanceof DropwizardHistogramWrapper) {
registry.register(fullName, ((DropwizardHistogramWrapper) histogram).getDropwizardHistogram());
} else {
registry.register(fullName, new FlinkHistogramWrapper(histogram));
}
} else if (metric instanceof Meter) {
Meter meter = (Meter) metric;
meters.put(meter, fullName);
if (meter instanceof DropwizardMeterWrapper) {
registry.register(fullName, ((DropwizardMeterWrapper) meter).getDropwizardMeter());
} else {
registry.register(fullName, new FlinkMeterWrapper(meter));
}
} else {
log.warn("Cannot add metric of type {}. This indicates that the reporter " +
"does not support this metric type.", metric.getClass().getName());
}
}
}
/**
* report 时直接从 dropwizard 内部的 MetricRegistry 中捞取所有指标,执行 ScheduledReporter 的 report 方法
*/
@Override
public void report() {
// we do not need to lock here, because the dropwizard registry is
// internally a concurrent map
@SuppressWarnings("rawtypes")
final SortedMap<String, com.codahale.metrics.Gauge> gauges = registry.getGauges();
final SortedMap<String, com.codahale.metrics.Counter> counters = registry.getCounters();
final SortedMap<String, com.codahale.metrics.Histogram> histograms = registry.getHistograms();
final SortedMap<String, com.codahale.metrics.Meter> meters = registry.getMeters();
final SortedMap<String, com.codahale.metrics.Timer> timers = registry.getTimers();
this.reporter.report(gauges, counters, histograms, meters, timers);
}
public abstract ScheduledReporter getReporter(MetricConfig config);
只有flink-metrics-graphite
模块会引用这个模块,直接复用 dropwizard 包提供的 GraphiteReporter 功能。
1.2 flink-metrics-graphite
- Reporter实现
GraphiteReporter 继承了 flink-metrics-dropwizard 模块中的 ScheduledDropwizardReporter。
只需要实现其中的 getReporter() 抽象方法:
@Override
public ScheduledReporter getReporter(MetricConfig config) {
String host = config.getString(ARG_HOST, null);
int port = config.getInteger(ARG_PORT, -1);
if (host == null || host.length() == 0 || port < 1) {
throw new IllegalArgumentException("Invalid host/port configuration. Host: " + host + " Port: " + port);
}
String prefix = config.getString(ARG_PREFIX, null);
String conversionRate = config.getString(ARG_CONVERSION_RATE, null);
String conversionDuration = config.getString(ARG_CONVERSION_DURATION, null);
String protocol = config.getString(ARG_PROTOCOL, "TCP");
// 复用 dropwizard 包提供的 GraphiteReporter
com.codahale.metrics.graphite.GraphiteReporter.Builder builder =
com.codahale.metrics.graphite.GraphiteReporter.forRegistry(registry);
if (prefix != null) {
builder.prefixedWith(prefix);
}
if (conversionRate != null) {
builder.convertRatesTo(TimeUnit.valueOf(conversionRate));
}
if (conversionDuration != null) {
builder.convertDurationsTo(TimeUnit.valueOf(conversionDuration));
}
Protocol prot;
try {
prot = Protocol.valueOf(protocol);
} catch (IllegalArgumentException iae) {
log.warn("Invalid protocol configuration: " + protocol + " Expected: TCP or UDP, defaulting to TCP.");
prot = Protocol.TCP;
}
log.info("Configured GraphiteReporter with {host:{}, port:{}, protocol:{}}", host, port, prot);
switch(prot) {
case UDP:
return builder.build(new GraphiteUDP(host, port));
case TCP:
default:
return builder.build(new Graphite(host, port));
}
}
- 配置
- 复制 flink-metrics-graphite-xxx.jar 到 $FLINK_HOME/lib 下
- 在 flink-conf.yml 增加如下配置:
metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.grph.host: localhost # Graphite server host
metrics.reporter.grph.port: 2003 # Graphite server port
metrics.reporter.grph.protocol: TCP # protocol to use (TCP/UDP)
1.3 flink-metrics-influxdb
- influxdb基本概念
使用方法参考:时序数据库 Influxdb 使用详解
为了方便理解 InfluxdbReporter 的实现,这里简单说下 Influxdb 中的几个概念:
name: census
-————————————
time butterflies honeybees location scientist
2015-08-18T00:00:00Z 12 23 1 langstroth
2015-08-18T00:00:00Z 1 30 1 perpetua
2015-08-18T00:06:00Z 11 28 1 langstroth
2015-08-18T00:06:00Z 3 28 1 perpetua
2015-08-18T05:54:00Z 2 11 2 langstroth
2015-08-18T06:00:00Z 1 10 2 langstroth
2015-08-18T06:06:00Z 8 23 2 perpetua
2015-08-18T06:12:00Z 7 22 2 perpetua
-
timestamp
既然是时间序列数据库,influxdb 的数据都有一列名为 time 的列。 -
field key,field value,field set
bufferflies 和 honeybees 为 field key,它们为String类型,用于存储元数据。
数据 12-7 为 bufferflies 的field value,数据 23-22 为 honeybees 的field value。field value可以为String,float,integer或boolean类型。
field key 和 field value 对组成的集合称之为 field set,如下:
butterflies = 12 honeybees = 23
butterflies = 1 honeybees = 30
butterflies = 11 honeybees = 28
butterflies = 3 honeybees = 28
butterflies = 2 honeybees = 11
butterflies = 1 honeybees = 10
butterflies = 8 honeybees = 23
butterflies = 7 honeybees = 22
在 influxdb 中,field 是必须的,但是字段是没有索引的,如果字段作为查询条件,会扫描所有符合查询条件的所有字段值。相当于SQL的没有索引的列。
- tag key,tag value,tag set
location 和 scientist 是两个tag,location 有两个 tag value:1和2,scientist 有两个 tag value:langstroth 和 perpetua。
tag key 和 tag value 对组成的集合称之为 tag set,如下:
location = 1, scientist = langstroth
location = 2, scientist = langstroth
location = 1, scientist = perpetua
location = 2, scientist = perpetua
在 influxdb 中,tag 是可选的,但 tag 相当于SQL中有索引的列,因此强烈建议使用。
-
measurement
指标项,是 fields,tags 以及 time 列的容器。 -
retention policy
数据保留策略,默认是 autogen,表示数据一直保留永不过期,副本数量为1。 -
series
指共享同一个 retention policy,measurement 以及 tag set 的数据集合,如下:
| Arbitrary series number | Retention policy | Measurement | Tag set |
| ----------------------- | ---------------- | ----------- | ------------------------------- |
| series 1 | autogen | census | location=1,scientist=langstroth |
| series 2 | autogen | census | location=2,scientist=perpetua |
| series 3 | autogen | census | location=1,scientist=langstroth |
| series 4 | autogen | census | location=2,scientist=perpetua |
- point
指的是同一个series中具有相同时间的 field set,points 相当于SQL中的数据行。如下:
name: census
-----------------
time butterflies honeybees location scientist
2015-08-18T00:00:00Z