Prometheus+Grafana+Pushgatway监控主动上报模式搭建与使用

最新推荐文章于 2024-05-17 10:08:31 发布

SpiralStory

最新推荐文章于 2024-05-17 10:08:31 发布

阅读量2.6k

点赞数 1

分类专栏： Prometheus shell 文章标签： linux centos 运维

本文链接：https://blog.csdn.net/weixin_43713949/article/details/118522641

版权

shell 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

Prometheus

1 篇文章 0 订阅

订阅专栏

Prometheus对比Zabbix

和Zabbix类似，Prometheus也是一个近年比较火的开源监控框架，和Zabbix不同之处在于Prometheus相对更灵活点，模块间比较解耦，比如告警模块、代理模块等等都可以选择性配置。服务端和客户端都是开箱即用，不需要进行安装。zabbix则是一套安装把所有东西都弄好，很庞大也很繁杂。

zabbix的客户端agent可以比较方便的通过脚本来读取机器内数据库、日志等文件来做上报。而Prometheus的上报客户端则分为不同语言的SDK和不同用途的exporter两种，比如如果你要监控机器状态、mysql性能等，有大量已经成熟的exporter来直接开箱使用，通过http通信来对服务端提供信息上报（server去pull信息）；而如果你想要监控自己的业务状态，那么针对各种语言都有官方或其他人写好的sdk供你使用，都比较方便，不需要先把数据存入数据库或日志再供zabbix-agent采集。

zabbix的客户端更多是只做上报的事情，push模式。而Prometheus则是客户端本地也会存储监控数据，服务端定时来拉取想要的数据。

界面来说zabbix比较陈旧，而prometheus比较新且非常简洁，简洁到只能算一个测试和配置平台。要想获得良好的监控体验，搭配Grafana还是二者的必走之路。

安装Prometheus：

Prometheus有很多种安装方式，可以在官网看到，这里只介绍下载安装包解压的方式，因为Prometheus是“开箱即用”的，也就是说解压安装包后就可以直接使用了，不需要再执行安装程序，很方便。可以去Prometheus的官网下载页面获取最新版本的信息，比如现在的最新版本是2.7.2，那就下载相应系统的安装包，然后解压

$ wget https://github.com/prometheus/prometheus/releases/download/v2.7.2/prometheus-2.7.2.linux-amd64.tar.gz
$ tar xvfz prometheus-2.7.2.linux-amd64.tar.gz

解压后当前目录就会出现一个相应的文件夹，进入该文件夹，然后就可以直接运行Prometheus server了！

$ cd prometheus-2.7.2.linux-amd64
// 查看版本
$ ./prometheus --version
// 运行server
$ ./prometheus --config.file=prometheus.yml

命令中的prometheus.yml文件其实就是配置文件，也在当前目录下，在其中可以配置一些东西。

配置Prometheus

上文说了，prometheus.yml是配置文件，打开可以看到不多的几十行文字，类似下面：

$ cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093
 
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
 
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
    - targets: ['localhost:9090']

其大致分为四部分：

global：全局配置，其中scrape_interval表示抓取一次数据的间隔时间，evaluation_interval表示进行告警规则检测的间隔时间；
alerting：告警管理器（Alertmanager）的配置，目前还没有安装Alertmanager；
rule_files：告警规则有哪些；
scrape_configs：抓取监控信息的目标。一个job_name就是一个目标，其targets就是采集信息的IP和端口。这里默认监控了Prometheus自己，可以通过修改这里来修改Prometheus的监控端口。Prometheus的每个exporter都会是一个目标，它们可以上报不同的监控信息，比如机器状态，或者mysql性能等等，不同语言sdk也会是一个目标，它们会上报你自定义的业务监控信息。

Prometheus界面：

运行后，在浏览器访问[机器IP:端口]就可以查看Prometheus的界面了，这里的机器IP是你运行Prometheus的机器，端口是上面配置文件中配置的监控自己的端口。打开后界面如下：
在这里插入图片描述

如果访问不了，看看是不是端口没有打开或者允许外网访问。

界面非常简单（所以我们还需要Grafana），上面标签栏中，Alerts是告警管理器，暂时还没安装。Graph是查看监控项的图表，也是访问后的默认页面，Status中可以查看一些配置、监控目标、告警规则等。

在Graph页面，由于我们默认已经监控了Prometheus自己，所以已经可以查看一些监控图表，比如在输入框输入“promhttp_metric_handler_requests_total”，执行Execute，下面的小标签中切换到Graph就能看到“/metrics”访问次数的折线图。

添加机器状态监控

我们尝试添加第一个监控exporter——监控当前机器自身的状态，包括硬盘、CPU、流量等。因为Prometheus已经有了很多现成的常用exporter，所以我们直接用其中的node_exporter。注意了，这里名字虽然叫node_exporter，但跟nodejs没有任何关系，在Prometheus看来，一台机器或者说一个节点就是一个node，所以该exporter是在上报当前节点的状态。

node_exporter本身也是一个http服务，可以供prometheus server调用（pull）来获取监控的信息，安装方法同样是下载安装包后解压直接运行：

// 下载最新版本，可以在github的release中对最新版本右键获取下载链接
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
// 解压
$ tar xvfz node_exporter-0.17.0.linux-amd64.tar.gz
// 进入解压出的目录
$ cd node_exporter-0.17.0.linux-amd64
// 运行监控采集服务
$ ./node_exporter

运行后可以看到在监听9100端口。这样就可以采集了，现在先访问试试能不能有没有成功运行：

$ curl http://localhost:9100/metrics

这里也可以看出其实每个exporter本身都是一个http服务，server端会定时来访问获取监控信息。

访问成功的话，我们去prometheus的配置文件（prometheus.yml）中，加上这个target：

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'server'
    static_configs:
      - targets: ['localhost:9100']

可以看到，就是在scrape_configs模块中加一个job，命好名，配置好监听的IP和端口即可，然后重新运行prometheus，在标签栏的 Status --> Targets 中可以看到多了一个：
在这里插入图片描述

如果新加的target的status是“UP”的话，就说明监听成功了。

此时去Graph中，输入框输入node可以发现有很多node开头的监控项了，都是和机器状态有关的，可以自己执行看一看。

安装Grafana：

因为Prometheus的界面看起来非常简单，所以我们还需要Grafana这个非常强大也是最常用的监控展示框架。

我们还是用下载二进制包的方式来进行安装，这种方式不需要你当前的linux用户拥有sudo权限，也不需要你知道root密码。如果你有这些权限，那就使用yum等其他直接的安装方式吧，安装说明见Grafana的官方安装页面

我们直接下载并解压：

$ wget [https://dl.grafana.com/oss/release/grafana-6.0.0.linux-amd64.tar.gz](https://dl.grafana.com/oss/release/grafana-6.0.0.linux-amd64.tar.gz) 
$ tar -zxvf grafana-6.0.0.linux-amd64.tar.gz

这个页面给出的是最新版本的安装命令，右上角可以选择切换其他版本的安装命令。

解压后会出现 grafana-6.0.0 目录，进入该目录，然后就可以运行Grafana了：

$ cd grafana-6.0.0
// 启动Grafana。
$ ./bin/grafana-server web

通过log信息可以看到Grafana默认运行在3000端口，这个也可以通过配置文件进行修改：创建名为 custom.ini 的配置文件，添加到 conf 文件夹，复制 conf/defaults.ini 中定义的所有设置，然后修改自己想要修改的。

在Grafana展示监控信息
安装并启动Grafana后，浏览器输入 IP:3000 来访问Grafana，管理员账号密码默认是admin/admin。首次登陆会让你修改管理员密码，然后就可以登录查看了。

在界面左边是一竖排选项，选择设置图标中的Data Source，添加Prometheus的数据源，URL就填上面你给Prometheus Server设置的ip+端口号就行了，如果没改过且在本机运行的话，那就是localhost:9090。

此时可以添加dashboard，也就是监控面板了，在刚配好的Prometheus Data Source的设置中有一个标签就是dashboard，我们导入Prometheus 2.0 Stats这个面板，就能看到我们Prometheus的一些基本监控情况了，这其实就是导入了一个别人写好的面板配置，并且连接我们自己Prometheus的监控数据做展示。

还记得我们上面还运行了一个node exporter吧，现在我们展现一下这个监控信息，左边竖排点击加号图标中的Import，来导入其他别人写好的面板。在Grafana的官方面板页面其实可以看到很多别人配置好的面板，我们找到自己想要的面板，比如这个node exporter的：在这里插入图片描述
复制右边那个面板ID，然后在Import界面输入ID，Load后配置好数据源为我们的Prometheus，就可以出现我们自己机器的状态监控面板了。

PushGatway安装

Prometheus 是一套开源的系统监控、报警、时间序列数据库的组合，最初有 SoundCloud
开发的，后来随着越来越多公司使用，于是便独立成开源项目。Prometheus 基本原理是通过 Http
协议周期性抓取被监控组件的状态，而输出这些被监控的组件的 Http 接口为 Exporter。PushGateway 作为
Prometheus 生态中的一个重要一员，它允许任何客户端向其 Push 符合规范的自定义监控指标，在结合 Prometheus
统一收集监控。
PushGateway 使用场景：
Prometheus 采用定时 Pull 模式，可能由于子网络或者防火墙的原因，不能直接拉取各个 Target 的指标数据，此时可以采用各个
Target 往 PushGateway 上 Push 数据，然后 Prometheus 去 PushGateway 上定时 pull。
其次在监控各个业务数据时，需要将各个不同的业务数据进行统一汇总，此时也可以采用 PushGateway 来统一收集，然后
Prometheus 来统一拉取。

PushGateway 安装很简单，可以使用二进制包解压安装服务，也可以使用 Docker 启动服务。

二进制包安装方式，直接从官方 Github 下载最新二进制安装包，解压即可。

$ wget https://github.com/prometheus/pushgateway/releases/download/v1.0.0/pushgateway-1.0.0.linux-amd64.tar.gz
$ tar xzvf pushgateway-1.0.0.linux-amd64.tar.gz 
$ mv pushgateway-1.0.0.linux-amd64 /usr/local/pushgateway

使用命令 ./pushgateway 命令即可启动服务，此时浏览器访问 http://:9091 即可访问 UI 页面，只不过默认 Metrics 上没有任何数据展示，那是因为我们还没有往 PushGateway 上推送任何数据。
在这里插入图片描述
不过，PushGateway 服务本身是带了一些 Metrics 的，可以通过访问 http://:9091/metrics 地址来获取，可以看到里边包含了 go、process 等相关的一些监控指标。
OK，现在 PushGateway 服务已经启动完毕，但是还没有跟 Prometheus 关联起来，我们需要的是通过 PushGateway 来上传自定义监控数据，然后通过 Prometheus 采集这些数据来进行监控。那么就需要将 PushGateway 添加到 Prometheus 目标任务中去，增加 prometheus.yml 配置如下：

...
- job_name: 'pushgateway'
    static_configs:
      - targets: ['172.30.12.167:9091']
        labels:
          instance: pushgateway

说明一下，这里采用 static_configs 静态配置方式，因为目前就一个 PushGateway，如果有多个可以考虑其他服务发现方式，来方便动态加载，具体可以参考这里。配置完毕后，重启 Prometheus 服务，此时可以通过 Prometheus UI 页面的 Targets 下查看是否配置成功。
在这里插入图片描述

API 方式 Push 数据到 PushGateway

接下来，我们要 Push 数据到 PushGateway 中，可以通过其提供的 API 标准接口来添加，默认 URL 地址为：http://:9091/metrics/job/{/<LABEL_NAME>/<LABEL_VALUE>}，其中是必填项，为 job 标签值，后边可以跟任意数量的标签对，一般我们会添加一个 instance/<INSTANCE_NAME> 实例名称标签，来方便区分各个指标。

接下来，可以 Push 一个简单的指标数据到 PushGateway 中测试一下。

 echo "test_metric 123456" | curl --data-binary @- http://172.30.12.167:9091/metrics/job/test_job

执行完毕，刷新一下 PushGateway UI 页面，此时就能看到刚添加的 test_metric 指标数据了。在这里插入图片描述

不过我们会发现，除了 test_metric 外，同时还新增了 push_time_seconds 和 push_failure_time_seconds 两个指标，这两个是 PushGateway 系统自动生成的相关指标。此时，我们在 Prometheus UI 页面上 Graph 页面可以查询的到该指标了。在这里插入图片描述

这里要着重提一下的是，上图中 test_metric 我们查询出来的结果为 test_metric{exported_job=“test_job”,instance=“pushgateway”,job=“pushgateway”} ，眼尖的会发现这里头好像不太对劲，刚刚提交的指标所属 job 名称为 test_job ，为啥显示的为 exported_job=“test_job” ，而 job 显示为 job=“pushgateway” ，这显然不太正确，那这是因为啥？其实是因为 Prometheus 配置中的一个参数 honor_labels （默认为 false）决定的，我们不妨再 Push 一个数据，来演示下添加 honor_labels: true 参数前后的变化。

这次，我们 Push 一个复杂一些的，一次写入多个指标，而且每个指标添加 TYPE 及 HELP 说明。

$ cat <<EOF | curl --data-binary @- http://172.30.12.167:9091/metrics/job/test_job/instance/test_instance
# TYPE test_metrics counter
test_metrics{label="app1",name="demo"} 100.00
# TYPE another_test_metrics gauge
# HELP another_test_metrics Just an example.
another_test_metrics 123.45
EOF

添加完毕，再刷新一下 PushGateway UI 页面，可以看到添加的数据了。
在这里插入图片描述

从上图可以看出，/metrics/job/test_job 和 metrics/job/test_job/instance/test_instance 虽然它们都属于 test_job，但是它们属于两个指标值，因为 instance 对二者做了区分。此时我们访问 Prometheus UI 页面上 Graph 页面查询该指标。
在这里插入图片描述

依旧有问题，那么修改一下 prometheus.yaml，增加 honor_labels: true 参数配置如下：

...
- job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['172.30.12.167:9091']
        labels:
          instance: pushgateway

重启 Prometheus，稍等一会，等到 Prometheus 采集到数据后，我们再访问 Prometheus UI 页面上 Graph 页面查询该指标。

在这里插入图片描述
此时，可以看到能够正确匹配到 Push 的指标值对应到 job 和 instance 上了。这里说明一下 honor_labels 的作用：因为 Prometheus 配置 PushGateway 的时候，也会指定 job 和 instance，但是它只表示 PushGateway 实例本身，不能真正表达收集数据的含义。所以配置 PushGateway 需要添加 honor_labels:true 参数，避免收集数据本身的 job 和 instance 被覆盖。详细可参考这里官网文档对该参数的说明。

最后,自建是使用主动上报模式采集数据,附脚本参考如下(可根据自身情况修改):

#!/bin/bash
host=`cat /etc/hostname`
nodeps=`ps -ef | egrep -v grep | grep -w node_exporter | wc -l`
if [ $nodeps -eq 1 ];then
  echo -e "\033[32m Target in sight. Report in \033[0m"
  curl http://127.0.0.1:9100/metrics | curl --data-binary @- http://192.168.1.5:9091/metrics/job/$host
else
  #rm -rf /root/node_exporter-0.17.0.linux-amd64
  rm -rf /usr/local/node_exporter
  echo -e "\033[31m  No sign of the target. Commencing install \033[0m"
  cd /root
  wget https://file.api.ym68.cc/temp/node_exporter-0.17.0.linux-amd64.tar.gz
  tar xvfz node_exporter-0.17.0.linux-amd64.tar.gz
  mv /root/node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter
  #cd /usr/local/node_exporter/node_exporter-0.17.0.linux-amd64
  #./node_exporter >/dev/null 2>&1 &
  cat << EOF >/usr/lib/systemd/system/node_exporter.service
  [Unit]
  Description=node_exporter
  Documentation=https://prometheus.io/docs/introduction/overview
  After=network-online.target remote-fs.target nss-lookup.target
  Wants=network-online.target

  [Service]
  Type=simple
  PIDFile==/var/run/node_exporter.pid
  ExecStart=/usr/local/node_exporter/node_exporter
  ExecReload=/bin/kill -s HUP $MAINPID
  ExecStop=/bin/kill -s TERM $MAINPID

  [Install]
  WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start node_exporter
curl http://127.0.0.1:9100/metrics | curl --data-binary @- http://192.168.1.5:9091/metrics/job/$host
echo -e "\033[32m Deployment Successful \033[0m"
fi

mkdir -p /usr/local/devops/shell
rm -rf /usr/local/devops/shell/node_cron.sh
wget https://file.api.ym68.cc/temp/node_cron.sh  -P /usr/local/devops/shell
chmod +x /usr/local/devops/shell/*
cr=`crontab -l | grep -v grep | wc -l`
if [ $cr -eq 0 ];then
    #echo "*/5 * * * * sh /usr/local/devops/shell/node_cron.sh" > conf && crontab conf && rm -rf conf             # ///每五分钟执行一次
    echo "* * * * * sleep 30; sh /usr/local/devops/shell/node_cron.sh" > conf && crontab conf && rm -rf conf      # ///每30秒执行一次
else
    crs=`crontab -l | grep -v grep | grep -w node_cron.sh | wc -l`
    if [ $crs -eq 0 ];then
        #crontab -l > conf && echo "*/5 * * * * sh /usr/local/devops/shell/node_cron.sh" > /tmp/tmp.txt >> conf && crontab conf && rm -rf conf
        crontab -l > conf && echo "* * * * * sleep 30; sh /usr/local/devops/shell/node_cron.sh" > /tmp/tmp.txt >> conf && crontab conf && rm -rf conf 
    else
        crontab -l > conf && crontab conf && rm -rf conf 
    fi    
fi
echo -e "\033[32m is OK \033[0m"

脚本主要用于主动上报,主要实现部署node+crontab计划任务实现自动上报pushgatway功能,随后Prometheus可抓取pushgatway来查询数据,Grafana负责绘图,适用于批量下发使用!

SpiralStory

关注

1
点赞
踩
16

收藏

觉得还不错? 一键收藏
1
评论
Prometheus+Grafana+Pushgatway监控主动上报模式搭建与使用

Prometheus对比Zabbix和Zabbix类似，Prometheus也是一个近年比较火的开源监控框架，和Zabbix不同之处在于Prometheus相对更灵活点，模块间比较解耦，比如告警模块、代理模块等等都可以选择性配置。服务端和客户端都是开箱即用，不需要进行安装。zabbix则是一套安装把所有东西都弄好，很庞大也很繁杂。zabbix的客户端agent可以比较方便的通过脚本来读取机器内数据库、日志等文件来做上报。而Prometheus的上报客户端则分为不同语言的SDK和不同用途的exporter
复制链接

扫一扫

专栏目录