metrics类型 普罗米修斯_使用普罗米修斯和Grafana监控Flink运行状态

本文介绍了如何使用Prometheus的Pushgateway收集数据,并通过node_exporter和Prometheus监控系统组件。详细讲述了Pushgateway、node_exporter和Prometheus的安装及配置过程,同时讨论了Flink与Prometheus集成时遇到的问题及其解决方案。
摘要由CSDN通过智能技术生成

Pushgateway

pushgateway 是一个Prometheus 生态中重要工具,因为Prometheus采用Pull模式,可能由于一些原因,Prometheus无法直接拉取各个target的数据,需要有个地方统一先收集起来

下载安装1

2

3

4

5

6cd /usr/local/prometheus

wget https://github.com/prometheus/pushgateway/releases/download/v1.0.0/pushgateway-1.0.0.linux-amd64.tar.gz

tar -zxvf pushgateway-1.0.0.linux-amd64.tar.gz

cd pushgateway-1.0.0.linux-amd64

# 启动

nohup /usr/local/prometheus/pushgateway-1.0.0.linux-amd64/pushgateway > /usr/local/prometheus/pushgateway-1.0.0.linux-amd64/nohup.out 2>&1 &

node_exporter 安装

下载安装1

2

3wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz

tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz

nohup /usr/local/prometheus/node_exporter-0.18.1.linux-amd64/node_exporter > /usr/local/prometheus/node_exporter-0.18.1.linux-amd64/nohup.out 2>&1 &

Prometheus 安装

下载安装1

2

3

4

5

6# 新建 /usr/local/prometheus 目录

mkdir /usr/local/prometheus

cd /usr/local/prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-amd64.tar.gz

tar -zxvf prometheus-2.14.0.linux-amd64.tar.gz

cd prometheus-2.14.0.linux-amd64

默认的配置

Prometheus 默认会采集本身的一些运行信息1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=` to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:

- targets: ['localhost:9090']

修改后的配置1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=` to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:

- targets: ['localhost:9090']

- job_name: 'linux'

static_configs:

- targets: ['localhost:9100']

labels:

instance: 'localhost'

- job_name: 'pushgateway'

static_configs:

- targets: ['localhost:9091']

labels:

instance: 'pushgateway'

启动1nohup /usr/local/prometheus/prometheus-2.14.0.linux-amd64/prometheus --config.file=/usr/local/prometheus/prometheus-2.14.0.linux-amd64/prometheus.yml >/usr/local/prometheus/prometheus-2.14.0.linux-amd64/nohup.out 2>&1 &

查看端口1netstat -apn | grep -E '9091|3000|9090|9100'

查看target

Flink

修改配置文件

在 flink的安装目录的 conf/flink-conf.yaml 中增加以下配置(host为上面安装pushgateway的机器host)1

2

3

4

5

6metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter

metrics.reporter.promgateway.host: host

metrics.reporter.promgateway.port: 9091

metrics.reporter.promgateway.jobName: job

metrics.reporter.promgateway.randomJobNameSuffix: true

metrics.reporter.promgateway.deleteOnShutdown: false

拷贝jar文件1

2cd /usr/local/flink/current

cp opt/flink-metrics-prometheus-1.9.1.jar lib/

Grafana

下载安装1

2wget https://dl.grafana.com/oss/release/grafana-6.4.4.linux-amd64.tar.gz

tar -zxvf grafana-6.4.4.linux-amd64.tar.gz

启动1nohup /usr/local/grafana/grafana-6.4.4/bin/grafana-server web >/usr/local/grafana/grafana-6.4.4/nohup.out 2>&1 &

使用自定义的pushgateway jobname上报

问题

问题11

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

222019-11-12 16:07:48,899 ERROR org.apache.flink.runtime.metrics.ReporterSetup - Could not instantiate metrics reporter promgateway. Metrics might not be exposed/reported.

java.lang.ClassNotFoundException: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at org.apache.flink.runtime.metrics.ReporterSetup.loadViaReflection(ReporterSetup.java:242)

at org.apache.flink.runtime.metrics.ReporterSetup.loadReporter(ReporterSetup.java:210)

at org.apache.flink.runtime.metrics.ReporterSetup.fromConfiguration(ReporterSetup.java:162)

at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createMetricRegistry(ClusterEntrypoint.java:305)

at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:261)

at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:202)

at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:164)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)

at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:163)

at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:501)

at org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint.main(YarnSessionClusterEntrypoint.java:93)

解决: 需要拷贝jar1cp opt/flink-metrics-prometheus-1.9.1.jar lib/

问题21

2

3

4

5

6

7

8

9

10

11

12

13java.io.IOException: Response code from http://server3:9091/metrics/job/fibodata5ab95bcaadf9b4c7d3a61220f0945f77 was 200

at org.apache.flink.shaded.io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:297)

at org.apache.flink.shaded.io.prometheus.client.exporter.PushGateway.push(PushGateway.java:105)

at org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter.report(PrometheusPushGatewayReporter.java:76)

at org.apache.flink.runtime.metrics.MetricRegistryImpl$ReporterTask.run(MetricRegistryImpl.java:436)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2019-11-12 16:40:06,645 WARN org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter - Failed to push metrics to PushGateway with jobName fibodata5ab95bcaadf9b4c7d3a61220f0945f77.

暂未找到原因,可能是框架本身的问题

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值