Pushgateway
pushgateway 是一个Prometheus 生态中重要工具,因为Prometheus采用Pull模式,可能由于一些原因,Prometheus无法直接拉取各个target的数据,需要有个地方统一先收集起来
下载安装1
2
3
4
5
6cd /usr/local/prometheus
wget https://github.com/prometheus/pushgateway/releases/download/v1.0.0/pushgateway-1.0.0.linux-amd64.tar.gz
tar -zxvf pushgateway-1.0.0.linux-amd64.tar.gz
cd pushgateway-1.0.0.linux-amd64
# 启动
nohup /usr/local/prometheus/pushgateway-1.0.0.linux-amd64/pushgateway > /usr/local/prometheus/pushgateway-1.0.0.linux-amd64/nohup.out 2>&1 &
node_exporter 安装
下载安装1
2
3wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz
nohup /usr/local/prometheus/node_exporter-0.18.1.linux-amd64/node_exporter > /usr/local/prometheus/node_exporter-0.18.1.linux-amd64/nohup.out 2>&1 &
Prometheus 安装
下载安装1
2
3
4
5
6# 新建 /usr/local/prometheus 目录
mkdir /usr/local/prometheus
cd /usr/local/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.14.0.linux-amd64.tar.gz
cd prometheus-2.14.0.linux-amd64
默认的配置
Prometheus 默认会采集本身的一些运行信息1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
修改后的配置1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'linux'
static_configs:
- targets: ['localhost:9100']
labels:
instance: 'localhost'
- job_name: 'pushgateway'
static_configs:
- targets: ['localhost:9091']
labels:
instance: 'pushgateway'
启动1nohup /usr/local/prometheus/prometheus-2.14.0.linux-amd64/prometheus --config.file=/usr/local/prometheus/prometheus-2.14.0.linux-amd64/prometheus.yml >/usr/local/prometheus/prometheus-2.14.0.linux-amd64/nohup.out 2>&1 &
查看端口1netstat -apn | grep -E '9091|3000|9090|9100'
查看target
Flink
修改配置文件
在 flink的安装目录的 conf/flink-conf.yaml 中增加以下配置(host为上面安装pushgateway的机器host)1
2
3
4
5
6metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: host
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: job
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: false
拷贝jar文件1
2cd /usr/local/flink/current
cp opt/flink-metrics-prometheus-1.9.1.jar lib/
Grafana
下载安装1
2wget https://dl.grafana.com/oss/release/grafana-6.4.4.linux-amd64.tar.gz
tar -zxvf grafana-6.4.4.linux-amd64.tar.gz
启动1nohup /usr/local/grafana/grafana-6.4.4/bin/grafana-server web >/usr/local/grafana/grafana-6.4.4/nohup.out 2>&1 &
使用自定义的pushgateway jobname上报
问题
问题11
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
222019-11-12 16:07:48,899 ERROR org.apache.flink.runtime.metrics.ReporterSetup - Could not instantiate metrics reporter promgateway. Metrics might not be exposed/reported.
java.lang.ClassNotFoundException: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.flink.runtime.metrics.ReporterSetup.loadViaReflection(ReporterSetup.java:242)
at org.apache.flink.runtime.metrics.ReporterSetup.loadReporter(ReporterSetup.java:210)
at org.apache.flink.runtime.metrics.ReporterSetup.fromConfiguration(ReporterSetup.java:162)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createMetricRegistry(ClusterEntrypoint.java:305)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:261)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:202)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501)
at org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint.main(YarnSessionClusterEntrypoint.java:93)
解决: 需要拷贝jar1cp opt/flink-metrics-prometheus-1.9.1.jar lib/
问题21
2
3
4
5
6
7
8
9
10
11
12
13java.io.IOException: Response code from http://server3:9091/metrics/job/fibodata5ab95bcaadf9b4c7d3a61220f0945f77 was 200
at org.apache.flink.shaded.io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:297)
at org.apache.flink.shaded.io.prometheus.client.exporter.PushGateway.push(PushGateway.java:105)
at org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter.report(PrometheusPushGatewayReporter.java:76)
at org.apache.flink.runtime.metrics.MetricRegistryImpl$ReporterTask.run(MetricRegistryImpl.java:436)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-11-12 16:40:06,645 WARN org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter - Failed to push metrics to PushGateway with jobName fibodata5ab95bcaadf9b4c7d3a61220f0945f77.
暂未找到原因,可能是框架本身的问题