Flink官网在Metrics方面对Prometheus的支持
参考Metric Reporters | Apache Flink
Prometheus #
(org.apache.flink.metrics.prometheus.PrometheusReporter) #
Parameters:
port
- (optional) the port the Prometheus exporter listens on, defaults to 9249. In order to be able to run several instances of the reporter on one host (e.g. when one TaskManager is colocated with the JobManager) it is advisable to use a port range like9250-9260
.filterLabelValueCharacters
- (optional) Specifies whether to filter label value characters. If enabled, all characters not matching [a-zA-Z0-9:_] will be removed, otherwise no characters will be removed. Before disabling this option please ensure that your label values meet the Prometheus requirements.
Example configuration:
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
Flink metric types are mapped to Prometheus metric types as follows:
Flink | Prometheus | Note |
---|---|---|
Counter | Gauge | Prometheus counters cannot be decremented. |
Gauge | Gauge | Only numbers and booleans are supported. |
Histogram | Summary | Quantiles .5, .75, .95, .98, .99 and .999 |
Meter | Gauge | The gauge exports the meter’s rate. |
All Flink metrics variables (see List of all Variables) are exported to Prometheus as labels.
Flink使用prometheus的代码实践
此种方式非prometheus gateway方式,而是将自身的服务metrics(flink自身或者是业务上的自定义metrics)暴露出来,供prometheus来拉取。
自己改过的代码目的是:
1.为在mac电脑本地可以运行,且可以查看flink web ui界面
需要有runtime-web包
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
2. 配置prometheus.yml,让其可以抓取flink应用的metrics
只配置了job_name为flink的一个scape。启动prometheus即可。
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'flink'
static_configs:
- targets: ['localhost:9249']
github中的代码是通过docker启动的,我这里期望调试程序。所以下载代码,本地mac启动。
对于maven依赖
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-metrics-prometheus</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-metrics</artifactId>
<version>${flink.version}</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
</dependency>
</dependencies>
对于代码
PrometheusExampleJob
package org.galaxy.foundation.metrics.app;
/**
* @author michael.wang
* @since 2022/4/10 10:10 AM
*/
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.configuration.RestOptions;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
import org.galaxy.foundation.metrics.func.MetricsMapFunc;
import org.galaxy.foundation.metrics.func.RandomSourceFunc;
public class PrometheusExampleJob {
private final ParameterTool parameters;
public static void main(String[] args) throws Exception {
new PrometheusExampleJob(ParameterTool.fromArgs(args)).run();
}
private PrometheusExampleJob(ParameterTool parameters) {
this.parameters = parameters;
}
private void run() throws Exception {
// final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Configuration config = new Configuration();
config.setInteger(RestOptions.PORT,9998);
config.setString("metrics.reporter.prom.class","org.apache.flink.metrics.prometheus.PrometheusReporter");
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config);
env.enableCheckpointing(30000);
env.disableOperatorChaining();
DataStreamSource<Integer> source= env.addSource(new RandomSourceFunc(parameters.getInt("elements", Integer.MAX_VALUE)));
source.print();
source.map(new MetricsMapFunc())
.name(MetricsMapFunc.class.getSimpleName())
.addSink(new DiscardingSink<>())
.name(DiscardingSink.class.getSimpleName());
env.execute(PrometheusExampleJob.class.getSimpleName());
}
}
MetricsMapFunc
package org.galaxy.foundation.metrics.func;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.metrics.Counter;
import org.apache.flink.metrics.Histogram;
import org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogram;
/**
* @author michaelwang
*/
public class MetricsMapFunc extends RichMapFunction<Integer, Integer> {
private static final long serialVersionUID = 1L;
private transient Counter eventCounter;
private transient Histogram valueHistogram;
@Override
public void open(Configuration parameters) {
eventCounter = getRuntimeContext().getMetricGroup().counter("events");
valueHistogram =
getRuntimeContext()
.getMetricGroup()
.histogram("value_histogram", new DescriptiveStatisticsHistogram(10));
}
@Override
public Integer map(Integer value) {
eventCounter.inc();
valueHistogram.update(value);
return value;
}
}
RandomSourceFunc
package org.galaxy.foundation.metrics.func;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import java.util.concurrent.ThreadLocalRandom;
/**
* @author michael.wang
* @since 2022/4/10 10:08 AM
*/
public class RandomSourceFunc implements SourceFunction<Integer> {
private int count = 0;
private volatile boolean isRunning = true;
private int elements;
public RandomSourceFunc(int elements) {
this.elements = elements;
}
@Override
public void run(SourceContext<Integer> ctx) throws InterruptedException {
while (isRunning && count < elements) {
Thread.sleep(1000);
ctx.collect(ThreadLocalRandom.current().nextInt(10));
count++;
}
}
@Override
public void cancel() {
isRunning = false;
}
}
启动PrometheusExampleJob类,即可在http://localhost:9249/查看flink暴露出的metrics
启动promethes,即可抓取http://localhost:9249/的metrics信息。