Prometheus、Grafana、node_exporter、redis_exporter、mysqld_exporter

本文链接：https://blog.csdn.net/qq_37705525/article/details/124895657

1 Prometheus

Prometheus (普罗米修斯)
容器监控与报警：
容器监控的实现方对比虚拟机或者物理机来说比大的区别，比如容器在k8s环境中可以任意横向扩容与缩容，那么
就需要监控服务能够自动对新创建的容器进行监控，当容器删除后又能够及时的从监控服务中删除，而传统的
zabbix的监控方式需要在每一个容器中安装启动agent，并且在容器自动发现注册方面并没有比好的实现方式

Prometheus：
k8s的早期版本基于组件heapster实现对pod和node节点的监控功能，但是从k8s 1.8版本开始使用metrics API的方
式监控，并在1.11版本正式将heapster替换，后期的k8s监控主要是通过metrics Server提供核心监控指标，比如
Node节点的CPU和内存使用率，其他的监控交由另外一个组件Prometheus 完成。

prometheus简介：
https://prometheus.io/docs/ #官方文档
https://github.com/prometheus #github地址
Prometheus是基于go语言开发的一套开源的监控、报警和时间序列数据库的组合，是由SoundCloud公司开发的开源监控系统,Prometheus是CNCF（Cloud Native Computing Foundation,云原生计算基金会）继kubernetes 之后毕业的第二个项目，prometheus在容器和微服务领域中得到了广泛的应用，其特点主要如下：
使用key-value的多维度格式保存数据数据不使用MySQL这样的传统数据库，而是使用时序数据库，目前是使用的TSDB 支持第三方dashboard实现更高的图形界面，如grafana(Grafana 2.5.0版本及以上) 功能组件化不需要依赖存储，数据可以本地保存也可以远程保存服务自动化发现强大的数据查询语句功(PromQL,Prometheus Query Language)

prometheus server：主服务，接受外部http请求，收集、存储与查询数据等 prometheus targets: 静态收集的目标服务数据 service discovery：动态发现服务 prometheus alerting：报警通知 pushgateway：数据收集代理服务器(类似于zabbix proxy) data visualization and export：数据可视化与数据导出(访问客户端)
prometheus 二进制安装：
https://prometheus.io/download/ #官方二进制下载地址，监听端口为9090
prometheus特点
1、Prometheus基于时间序列的数值数据的容器监控解决方案，是一套开源的监控&报警&时间序列数据库的组合，适合监控docker容器
2、时间序列：按照时间顺序记录系统、设备状态变化的数据，被称为时序数据
3、基于时间序列数据的特点：
（1）性能好：关系型数据库处理大规模数据适合性能弱，NOSQL可以比较好的处理，但仍比不上时间序列数据库
（2）存储成本低：搞笑的压缩算法，节省存储空间，有效降低IO
4、Prometheus特征：
（1）多维度数据模型
（2）灵活的查询语言
（3）不依赖分布式存储，单个服务器节点即可实现监控
（4）以HTTP方式，通过pull模型拉取时间序列数据
（5）也可以通过中介网关支持push模型
（6）通过服务发现或者静态配置，来发现目标服务对象
（7）支持多种多样的图表和界面显示
多维度数据模型，一个时间序列由一个度量指标和多个标签键值对确定
灵活的查询语言，对收集的时许数据进行重组
强大的数据可视化功能，除了内置的浏览器，也支持grafana集成
高效存储，内存加本地磁盘，可通过功能分片和联盟来拓展性能
运维简单，只依赖于本地磁盘，go二进制安装包没有任何其他依赖
精简告警
非常多的客户端库
提供了许多导出器来收集常用系统指标
Prometheus根据配置的任务（job）以周期性pull的方式获取指定目标（target）上的指标（metric）。
（1）Prometheus Server: 根据配置完成数据采集，服务发现以及数据存储。
（2）Push Gateway : 为应对部分push场景提供的插件，监控数据先推送到 Push Gateway 上，然后再由 Prometheus Server 端采集 pull 。由于存在时间较短，可能在 Prometheus 来 pull 之前，jobs就消失了。（若 Prometheus Server 在采集间隔期间，Push Gateway 上的数据没有变化， Prometheus Server 将采集到2次相同的数据，仅时间戳不同）
（3）Exporters（探针）: 是Prometheus的一类数据采集组件的总称。它负责从目标处搜集数据，并将其转化为Prometheus支持的格式。与传统的数据采集组件不同的是，它并不向中央服务器发送数据，而是等待中央服务器主动前来抓取。抓取什么样的数据，就需要什么类型的exporter，比如说抓取mysql状态的数据，就需要mysqld_exporter。
（4）Alertmanager: Prometheus server 主要负责根据基于PromQL的告警规则分析数据，如果满足PromQL定义的规则，则会产生一条告警，并发送告警信息到Alertmanager，Alertmanager则是根据配置处理告警信息并发送。常见的接收方式有：电子邮件，webhook 等。Alertmanager三种处理告警信息的方式：分组，抑制，静默。

在这里插入图片描述

altermanager架构图

Prometheus命令参数与配置文件详解
1、命令参数
–web.read-timeout=5m：请求链接的最大等待时间，防止太多的空闲链接占用资源
–web.read-timeout=5m：最大链接数
–storage.tsdb.retention=15d：开启采集监控数据后，会存在内存和硬盘中多少天，很重要
–storage.tsdb.path=”data/”：存储数据路径，很重要，不要随便放在=，避免/被塞满
–query.timeout=2m
–query.max-concurrency=20
这两项是对用户执行prometheus查询适合的优化设置，防止太多用户同时查询，也防止单个用户执行过大的查询而一直不退出
2、配置文件详解
prometheus.yml
重要参数
global:
scrape_interval: 15s 监控每15秒采集一次信息
scrape_configs:

job_name: ‘prometheus’ 任务名称
static_configs:
targets: [‘localhost:9090’,’mysql:9100’] 设置监控的服务器有哪些，服务器间以,分割，服务器名称需要被prometheus能够解析到
3、数据存放
如图中一些长串字母的目录，是历史数据保留，而当前近期数据，实际保留在内存中，并且按照一定时间间隔存放在wal目录中，防止土壤断电或重启，用来恢复内存中的数据
在这里插入图片描述
4、node_exporters中一些重要key
（1）node_cpu
（2）node_memory
（3）node_disk

与其他监控系统对不
3.1 Prometheus vs. Zabbix
Zabbix 使用的是 C 和 PHP, Prometheus 使用 Golang, 整体而言 Prometheus 运行速度更快一点。
Zabbix 属于传统主机监控，主要用于物理主机、交换机、网络等监控，Prometheus 不仅适用主机监控，还适用于 Cloud、SaaS、Openstack、Container 监控。
Zabbix 在传统主机监控方面，有更丰富的插件。
Zabbix 可以在 WebGui 中配置很多事情，Prometheus 需要手动修改文件配置。、
3.2 Prometheus vs. Nagios
Nagios 数据不支持自定义 Labels, 不支持查询，告警也不支持去噪、分组, 没有数据存储，如果想查询历史状态，需要安装插件。
Nagios 是上世纪 90 年代的监控系统，比较适合小集群或静态系统的监控Nagios 太古老，很多特性都没有，Prometheus 要优秀很多。
3.3 Prometheus vs Sensu
Sensu 广义上讲是 Nagios 的升级版本，它解决了很多 Nagios 的问题，如果你对 Nagios 很熟悉，使用 Sensu 是个不错的选择。
Sensu 依赖 RabbitMQ 和 Redis，数据存储上扩展性更好。
3.4 Prometheus vs InfluxDB
InfluxDB 是一个开源的时序数据库，主要用于存储数据，如果想搭建监控告警系统，需要依赖其他系统。
InfluxDB 在存储水平扩展以及高可用方面做的更好, 毕竟核心是数据库。

1.1 下载

下载地址
https://prometheus.io/download/

1.2 安装

wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz
tar xf prometheus-2.35.0.linux-amd64.tar.gz -C /usr/local/
rm -rf prometheus-2.35.0.linux-amd64.tar.gz 
mv /usr/local/prometheus-2.35.0.linux-amd64/ /usr/local/prometheus

1.3 启动

默认文件 Prometheus 目录中启动

/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml &
./prometheus &	
nohup ./prometheus --config.file=prometheus.yml > prometheus.file 2>&1 &

1.4 Docker

docker run --name prometheus -d -p 9090:9090 prom/prometheus
# 启动 prometheus       
docker run -d  --name=prometheus --net=host  -v /data/prometheus:/etc/prometheus  prom/prometheus:latest

1.5 服务

cat >/usr/lib/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/usr/local/prometheus/
ExecStart=/usr/local/prometheus/prometheus --
config.file=/usr/local/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
EOF


cat >  /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/usr/local/prometheus/
ExecStart=/usr/local/prometheus/prometheus --
config.file=/usr/local/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
EOF

prometheus.yml

global:       
  scrape_interval:     60s       
  evaluation_interval: 60s
         
scrape_configs:       
  - job_name: prometheus       
    static_configs:       
      - targets: ['localhost:9090']       
        labels:       
          instance: 'prometheus'
 
  # 配置被监控的主机的node_expoter   
  - job_name: node-exporter       
    static_configs:       
      - targets: ['192.168.1.14:9100']       
        labels:       
          instance: 'web'
 
  #配置rabbitmq集群
  - job_name: rabbitmq-cluster       
    static_configs:       
      - targets: ['192.168.1.15:15672']       
        labels:       
          instance: 'rabbitmq01'
      - targets: ['192.168.1.12:15672']       
        labels:       
          instance: 'rabbitmq02'

systemctl start prometheus
http://192.168.66.22:9090/metrics
http://192.168.66.22:9090
在这里插入图片描述

处有几条记录，其中 instance 值为 localhost:8080 的记录，value 是 1，这代表对应应用是存活状态。
例如查看我们所运行 NodeExporter 节点所在机器的内存使用情况，可以输入：node_memory_Active_bytes/(102410241024) 查看。
查看 NodeExporter 节点所在机器 CPU 1 分钟的负载情况，可以输入 node_load1 即可查看。
查询指定 mertic_name : node_cpu_seconds_total
带标签的查询
node_cpu_seconds_total{instance=“127.0.0.1:9100”}

5)多标签查询
node_cpu_seconds_total{instance=“127.0.0.1:9100”,mode=“system”}
计算 CPU 使用率
100 - (avg(irate(node_cpu_seconds_total{mode=“idle”}[5m])) by (instance) * 100)
计算内存使用率
100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100

在这里插入图片描述
8) 计算磁盘使用率
100 - (((node_filesystem_size_bytes{fstype=~“xfs|ext4”} - node_filesystem_free_bytes{fstype=~“xfs|ext4”}) / node_filesystem_size_bytes{fstype=~“xfs|ext4”}) * 100)

另外每次抓取时，Prometheus 还会自动在以下时序里插入采样值：

up{job=“[job-name]”, instance=“instance-id”}：采样值为 1 表示实例健康，否则为不健康
scrape_duration_seconds{job=“[job-name]”, instance=“[instance-id]”}：采样值为本次抓取消耗时间
scrape_samples_post_metric_relabeling{job=“”, instance=“”}：采样值为重新打标签后的采样值个数
scrape_samples_scraped{job=“”, instance=“”}：采样值为本次抓取到的采样值个数

2 node_exporter

2.1 安装

tar xf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/node_exporter-1.3.1.linux-amd64 /usr/local/node_exporter

2.2 启动

nohup /usr/local/node_exporter/node_exporter &
nohup ./node_exporter --web.listen-address 172.17.0.2:8080 > nodeout.file 2>&1 &
nohup ./node_exporter --config.file=node_exporter.yml 2>&1 1>node_exporter.log &

lsof -i:9100
http://192.168.66.22:9100/metrics
在这里插入图片描述

2.3 修改Prometheus

[prometheus.yml] 配置文件，如果不是部署再一台服务器的话新增以下内容

 - job_name: 'node_exporter' # 取一个job名称来代 表被监控的机器 
    static_configs: 
    - targets: ['127.0.0.9:9100'] # 这里改成被监控机器

如果是部署在一台服务器的话只需要再原来的配置中添加

        - targets: ['localhost:9090','localhost:9100']

如想在 Grafana 中展示 JVM 数据，则需要在配置文件中加入

- job_name: 'application'
    scrape_interval: 5s
    metrics_path: '/actuator/prometheus'
    file_sd_configs:
      - files: ['/usr/local/prometheus/groups/applicationgroups/*.json']

2.4 服务

vim /etc/systemd/system/node_exporter.service

[Unit]
Description=node_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target

cat >/usr/lib/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

systemctl enable node_exporter
systemctl start node_exporter

2.5 Docker

　docker run -d -p 9100:9100 quay.io/prometheus/node-exporter
　# 启动 node-exporter(所有被监控主机都需要安装)       
   docker run -d  --name=node-exporter --net=host -v /proc:/host/proc:ro  -v /sys:/host/sys:ro  -v /:/rootfs:ro    prom/node-exporter:latest

2.6 配置Grafana

添加dashboard
Grafana官方为我们提供了很多dashboard页面，可直接下载使用。浏览器访问 https://grafana.com/grafana/dashboards 下载所需要的dashboard页面
在这里插入图片描述
选择数据源为Prometheus，然后我们选择第一个dashboard

复制dashboard Id

然后打开我们的Grafana监控页面，打开dashboard的管理页面

Grafana会自动识别dashboard Id 。点击load

然后点击【change uid】按钮，生成一个随机的UID，然后点击下方输入框，选择我们之前创建的数据源Prometheus，最后点击【Import】按钮，即可完成导入。
在这里插入图片描述
导入成功后，会自动打开该Dashboard，即可看到我们刚才设置好的node监控

2.7 node-exporter常用指标类型：

   node_cpu：系统CPU使用量
　　node_disk*：磁盘IO
　　node_filesystem*：文件系统用量
　　node_load1：系统负载
　　node_memeory*：内存使用量
　　node_network*：网络带宽
　　node_time：当前系统时间
　　go_*：node exporter中go相关指标
　　process_*：node exporter自身进程相关运行指标

Node Exporter 经常使用查询语句
收集到 node_exporter 的数据后，咱们可使用 PromQL 进行一些业务查询和监控，下面是一些比较常见的查询

如下查询均以单个节点做为例子，若是你们想查看全部节点，将 instance=“xxx” 去掉便可。

CPU使用率
100 - (avg by (instance) (irate(node_cpu{instance=“172.16.8.153:9100”, mode=“idle”}[5m])) * 100)
CPU各个mode使用率
avg by (instance, mode) (irate(node_cpu{instance=“172.16.8.153:9100”}[5m])) * 100

User：CPU一共花了多少比例的时间运行在用户态空间或者说是用户进程(running user space processes)。典型的用户态空间程序有：Shells、数据库、web服务器等

Nice：可理解为，用户空间进程的CPU的调度优先级，范围为[-20,19]

System：System的含义与User类似。System表示：CPU花了多少比例的时间在内核空间运行。分配内存、IO操做、建立子进程……都是内核操做。这也代表，当IO操做频繁时，System参数会很高

ioWait：在计算机中，读写磁盘的操做远比CPU运行的速度要慢，CPU负载处理数据，而数据通常在磁盘上须要读到内存中才能处理。当CPU发起读写操做后，须要等着磁盘驱动器将数据读入内存,从而致使CPU 在等待的这一段时间内无事可作。CPU处于这种等待状态的时间由Wait参数来衡量

Idle：CPU处于空闲状态时间比例。通常而言，idel + user + nice 约等于100%

机器平均负载
node_load1{instance=“172.16.8.153:9100”} // 1分钟负载
node_load5{instance=“172.16.8.153:9100”} // 5分钟负载
node_load15{instance=“172.16.8.153:9100”} // 15分钟负载　　
内存使用率
100-(node_memory_MemFree{instance=“172.16.8.172:9100”}+node_memory_Cached{instance=“172.16.8.172:9100”}+node_memory_Buffers{instance=“172.16.8.172:9100”})/node_memory_MemTotal{instance=“172.16.8.172:9100”} * 100
磁盘使用率
100 - node_filesystem_free{instance=“172.16.8.153:9100”,fstype!~“rootfs|selinuxfs|autofs|rpc_pipefs|tmpfs|udev|none|devpts|sysfs|debugfs|fuse."} / node_filesystem_size{instance=“172.16.8.153:9100”,fstype!~"rootfs|selinuxfs|autofs|rpc_pipefs|tmpfs|udev|none|devpts|sysfs|debugfs|fuse.”} * 100

网卡出/入包
// 入包量
sum by (instance) (rate(node_network_receive_bytes{instance=“172.16.8.153:9100”,device!=“lo”}[5m]))

// 出包量
sum by (instance) (rate(node_network_transmit_bytes{instance=“172.16.8.153:9100”,device!=“lo”}[5m]))
Node exporter Dashboard 模板
获取node exporter dashboard
1.下载dashboard json文件在上传到grafana中
https://grafana.com/dashboards/1860

2.直接在grafana中输入相应导入的dashboard code_id

3 mysqld_exporter

如果需要利用Prometheus来监控MySQL同样也很方便，只需选择相应的Exporter即可。具体地，选择MySQLD Exporter来采集MySQL的监控数据。命令如下，其中通过DATA_SOURCE_NAME环境变量设置MySQL服务的账号、密码、URL信息

3.1 安装

tar xf mysqld_exporter-0.14.0.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/soft/mysqld_exporter-0.14.0.linux-amd64/ /usr/local/mysqld_exporter

# 在MySQL服务器上创建监控用户
mysql> grant select,replication client, process on *.* to 'mysql_monitor'@'localhost' identified by '123';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.03 sec)
mysql> exit
Bye

# 将上面创建的mysql用户信息写入mysqld_exporter配置文件（新创建一个）
[root@mysql01 ~]# vim /opt/mysqld_exporter/.my.cnf
[client]
user=mysql_monitor
password=123

# 启动mysqld_exporter
nohup /opt/mysqld_exporter/mysqld_exporter --config.my-cnf=/opt/mysqld_exporter/.my.cnf &

3.2 启动

nohup /usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/.my.cnf &

lsof -i:9104
如果是部署在一台服务器的话只需要再原来的配置中添加

    - targets: ['localhost:9090','localhost:9100','localhost:9104']

3.3 配置

类似地，我们可以在Prometheus服务的配置文件prometheus.yml中添加相应的配置，用于收集MySQLD Exporter的监控数据

...
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  # 收集主机的监控数据  
  - job_name: 'MacBook Pro'
    static_configs:
    - targets: ['192.168.0.108:9100']

  # 收集MySQL的监控数据      
  - job_name: 'MySQL'
    static_configs:
    - targets: ['192.168.0.108:9104']

3.4 Docker

# 拉取镜像
docker pull prom/mysqld-exporter

# 启动容器
docker run -d --name mysqldExporter \
-p 9104:9104 \
-e DATA_SOURCE_NAME="root:123456@(192.168.0.108:3306)/"  \
prom/mysqld-exporter

3.5 配置grafana

类似地对于可视化配置而言，在Grafana官网选择适用于监控MySQL的模板(过滤条件：name/description=mysql and data source=Prometheus)，复制其ID——12826
在这里插入图片描述

当然，这个更新的不及时，要体验最新的我们直接去项目地址下载安装mysql监控的dashboard（包含相关json文件，这些json文件可以看作是开发人员开发的一个监控模板)
下载网址: https://github.com/percona/grafana-dashboards
方式二：导入最新的json文件
在这里插入图片描述
解压得到相关JSON文件

在这里插入图片描述在grafana图形界面导入MySQL相关json文件

4 Redis

4.1 安装

cd /export/prometheus_exporter/
wget https://github.com/oliver006/redis_exporter/releases/download/v1.37.0/redis_exporter-v1.37.0.linux-amd64.tar.gz
tar -xvf redis_exporter-v1.37.0.linux-amd64.tar.gz

4.2 启动

无密码
./redis_exporter redis//10.200.10.169:4100 &
有密码
./redis_exporter -redis.addr 10.200.10.169:4100 -redis.password &

4.3系统服务：

vim /etc/systemd/system/redis_exporter.service

[Unit]
Description=redis_exporter
After=network.target

[Service]
Restart=on-failure
ExecStart=/export/prometheus_exporter/redis_exporter -redis.addr 10.200.10.169:4100 -redis.password 123456

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl start redis_exporter.service
systemctl enable redis_exporter.service

4.4 配置Prometheus

配置Prometheus.yaml文件

　- job_name: 'redis-10.200.10.169'
　　static_configs:
　　- targets: ['10.200.10.169:9121']

配置模板
下载grafana的redis的prometheus-redis_rev1.json模板：　
wget https://grafana.com/api/dashboards/763/revisions/1/download
在grafana中导入json模板：
在这里插入图片描述
配置成功UP状态

访问grafana，最终呈现：

4.5 Docker

#使用docker启动redis监控客户端
docker run -d --name redis_exporter -p 9121:9121 oliver006/redis_exporter --restart=always

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
23f8171fb0af oliver006/redis_exporter “/bin/redis_exporter” 3 seconds ago Up 3 seconds

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']
  - job_name: node_exporter
    static_configs:
    - targets: ['10.39.43.120:9100']
  - job_name: 'mysql global status'
    scrape_interval: 15s
    static_configs:
    - targets:
        - 10.39.43.120:9104
    params:
      collect[]:
        - global_status
  - job_name: 'mysql performance'
    scrape_interval: 1m
    static_configs:
      - targets:
        - '10.39.43.120:9104'
    params:
      collect[]:
        - perf_schema.tableiowaits
        - perf_schema.indexiowaits
        - perf_schema.tablelocks
  - job_name: redis_exporter
    static_configs:
    - targets: ['10.39.43.120:9121']

5 Process-exporter

进程监控：Process-exporter
由于我们常用的node_exporter并不能覆盖所有进程细化监控项，这里我们使用Process-exporter对进程进行监控。

5.1 下载

wget https://github.com/ncabatoff/process-exporter/releases/download/v0.7.10/process-exporter-0.7.10.linux-amd64.tar.gz

5.2 安装

tar -zxvf process-exporter-0.7.10.linux-amd64.tar.gz
mv process-exporter-0.7.10.linux-amd64 process-exporter
 rm -rf process-exporter-0.7.10.linux-amd64.tar.gz

5.3 服务

vim /etc/systemd/system/process-exporter.service

[Unit]
Description=node exporter
Documentation=node exporter
 
[Service]
ExecStart=/usr/local/process-exporter/process-exporter -config.path /usr/local/process-exporter/process-name.yaml
 
[Install]
WantedBy=multi-user.target

5.4 配置文件匹配规则

　配置文件根据变量名匹配到配置文件：
　　　　{{.Comm}} 包含原始可执行文件的basename，/proc//stat 中的换句话说，2nd 字段
　　　　{{.ExeBase}} 包含可执行文件的basename
　　　　{{.ExeFull}} 包含可执行文件的完全限定路径
　　　　{{.Matches}} 映射包含应用命令行tlb所产生的所有匹配项

6 Nginx

Process-exporter 可以进程名字匹配进程，获取进程信息。匹配规则由name对应的模板变量决定，以下表示监控进程名字为nginx （前提是我们已经安装了nginx服务，并且已经启动）
　vim /usr/local/process-exporter/process-name.yaml

process_names:
- name: "{{.Matches}}"
cmdline:
- 'nginx'
[root@localhost ~]# systemctl start process-exporter.service
[root@localhost ~]# systemctl enable process-exporter.service
[root@localhost ~]# vim /usr/local/prometheus/prometheus.yml
- job_name: 'process'
static_configs:
- targets: ['192.168.10.121:9256']

systemctl restart prometheus
在这里插入图片描述

　统计有多少个进程数：sum(namedprocess_namegroup_states)

7 alertmanager

7.1 安装

cd /opt && wget -c https://github.com/prometheus/alertmanager/releases/download/v0.18.0/alertmanager-0.18.0.linux-amd64.tar.gz
tar zxf alertmanager-0.18.0.linux-amd64.tar.gz
mv alertmanager-0.18.0.linux-amd64 alertmanager
chown root.root alertmanager -R

7.3 服务

# 配置服务
cat >/usr/lib/systemd/system/alertmanager.service <<EOF
[Unit]
Description=Alertmanager
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
ExecStart=/opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

设置服务开机自启动
systemctl enable alertmanager
systemctl start alertmanager

直接启动
nohup ./alertmanager --config.file=alertmanager.yml 2>&1 1>alertmanager.log &

7.3 Docker

docker pull quay.io/prometheus/alertmanager

8 pushgateway

pushgateway本身没有任何抓取监控数据的功能，它只是被动的等待数据推送，它接受的是http的post形式，所以需要用户自定义脚本把数据发送给pushgateway
附搜集脚本：

#!/bin/bash

#本机机器名 变量 用于之后的 标签
instance_name=hostname -f | cut -d’.’ -f1

#要求机器名 不能是 localhost 不然标签就没有区分了
if [ instance_name == “localhost” ];then
echo “Must FQDN hostname”
exit 1
fi

#定一个新的 key
label=“count_netstat_wait_connections”

#定义一个新的数值 netstat中 wait 的数量
count_netstat_wait_connections=netstat -an | grep -i wait | wc -l
echo “label : count_netstat_wait_connections”
echo “label count_netstat_wait_connections” | curl --data- binary @- http://prometheus.server.com:9091/metrics/job/ pushgateway/instance/instance_name

之后结合crontab定时触发
（2）优缺点
优点：快速、灵活，不受约束，中小企业一般使用node和db的exporter就行，其余使用pushgateway就可以
缺点：单点瓶颈，故障后监控数据也没了；问题数据依旧发送给prometheus
4、监控远程mysql
（1）在被管理及安装mysqld_export组件
首先下载mysqld-exporter，之后安装mariadb或者mysql
（2）其次创建mysql账号，用来收集数据
grant select,replication client,process on . to ‘prom’@‘localhost’ identified by ‘123’;
由于只是搜集数据，所以权限没有给写，避免权限过大
（3）在mysqld_exporter组件中配置mysql信息，比如我在/etc/prometheus目录下新建了一个mysqld_export.cnf，其中内容，user与password就是刚才授权的mysql账号
在这里插入图片描述
之后启动，nohup prometheus-mysqld-exporter --config.my-cnf=/etc/prometheus/mysqld_export.cnf &

prometheus服务器拉取mysql服务器信息
编辑prometheus.yml，添加job_name以及监控设备ip与端口

8.1 安装

cd /opt && wget -c https://github.com/prometheus/pushgateway/releases/download/v0.9.1/pushgateway-0.9.1.linux-amd64.tar.gz
tar zxf pushgateway-0.9.1.linux-amd64.tar.gz
mv pushgateway-0.9.1.linux-amd64 pushgateway
chown root.root pushgateway -R

8.2 服务

cat >/usr/lib/systemd/system/pushgateway.service <<EOF
[Unit]
Description=pushgateway
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
ExecStart=/opt/pushgateway/pushgateway
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

设置服务开机自启动
systemctl enable pushgateway
systemctl start pushgateway

直接启动
nohup ./pushgateway --config.file=node_exporter.yml 2>&1 1>node_exporter.log &

在这里插入图片描述
shell命令创建

echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/some_job

发送复杂数据

cat <<EOF | curl --data-binary @- http://localhost:9091/metrics/job/some_job/instance/some_instance
# TYPE some_metric counter
some_metric{label="val1"} 42
# TYPE another_metric gauge
# HELP another_metric Just an example.
another_metric 2398.283
EOF

9 Grafana

9.1 下载

https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm/

9.2 安装

rpm

wget https://dl.grafana.com/oss/release/grafana-8.5.3-1.x86_64.rpm 
yum localinstall grafana-8.5.3-1.x86_64.rpm 
systemctl enable grafana-server.service
systemctl start grafana-server.service
# web页面3000 登录信息：admin/admin
# 安装插件
grafana-cli plugins install grafana-piechart-panel
systemctl restart grafana-server

tar.gz

wget https://dl.grafana.com/oss/release/grafana-7.1.1.linux-amd64.tar.gz
tar xf grafana-7.1.1.linux-amd64.tar.gz -C /usr/local/soft/
mv grafana-7.1.1 /usr/local/grafana
cd /usr/local/grafana/bin/
nohup ./grafana-server &

9.3 Docker

docker pull grafana/grafana:6.2.2
docker run -d --name=grafana --net=host -v /data/grafana-storage:/var/lib/grafana grafana/grafana:latest

9.4 配置

模板下载https://grafana.com/grafana/dashboards
打开服务器 url:3000 查看是否启动成功, 正式环境记得放行此端口
在这里插入图片描述
初次登录默认密码：admin/admin
设置---->Data Source---->Add Datasource----->选择prometheus---->数据源修改
Name：默认为Prometheus，当仅有一个Prometheus数据源时，默认即可。（注意：首字母大写
Url：默认http://localhost:9090,根据实际情况填写IP:端口。
Access：默认为Server，无需变更，默认即可。

添加Prometheus数据源
（1）点击主界面的“Add data source”
在这里插入图片描述
（2）选择Prometheus

（3）填写数据源设置项

URL处填写Prometheus服务所在的IP地址，此处我们将Prometheus服务与Grafana安装在同一台机器上，直接填写localhost即可

在这里插入图片描述
点击下方【Save & Test】按钮，保存设置

（4）Dashboards页面选择“Prometheus 2.0 Stats”

点击Dashboards选项卡，选择Prometheus 2.0 Stats

在这里插入图片描述
（5）查看监控

点击Grafana图标，切换到Grafana主页面，然后点击Home，选择我们刚才添加的Prometheus 2.0 Stats，即可看到监控数据

在这里插入图片描述

配置大屏
点击+号------>import-------->选择Upload json file或者是搜索8919模板------>load加载

验证图形信息：
饼图插件未安装，需要提前安装
https://grafana.com/grafana/plugins/grafana-piechart-panel
安装插件
grafana-cli plugins install grafana-piechart-panel
重启服务
root@master2:~# service grafana-server restart

10 容器监控cAdvisor

现如今Docker部署已经愈来愈流行，为了更好监控Dcoker整体的运行情况。Google开源的一款用于分析、展示容器运行状态的可视化工具——cAdvisor。这里我们依然选择Docker来部署实践

# 拉取镜像
docker pull google/cadvisor

# 启动容器
docker run --name=mycAdvisor \
  -p 8080:8080 -d \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:ro \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  -v /dev/disk/:/dev/disk:ro \
  --privileged \
  --device=/dev/kmsg \
  google/cadvisor

具体地，我们可通过 http://localhost:8080 来访问其监控页面，查看Dcoker中整体及各容器的监控指标。值得一提的是，cAdvisor原生支持Prometheu，通过 http://localhost:8080/metrics 即可看到其采集的监控数据
在这里插入图片描述
故在prometheus.yml配置文件继续添加一个名为cAdvisor的job，并重启Prometheus服务

...
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  # 收集主机的监控数据  
  - job_name: 'MacBook Pro'
    static_configs:
    - targets: ['192.168.0.108:9100']

  # 收集MySQL的监控数据      
  - job_name: 'MySQL'
    static_configs:
    - targets: ['192.168.0.108:9104']

  # 收集Docker容器的监控数据
  - job_name: 'cAdvisor'
    static_configs:
    - targets: ['192.168.0.108:8080']

类似地对于可视化配置而言，在Grafana官网选择适用于cAdvisor的模板(过滤条件：name/description=cAdvisor and data source=Prometheus)，复制其ID——893。导入后，效果如下所示
在这里插入图片描述

11 alertmanager

1、prometheus报警需要使用alertmanager这个组件，而且报警规则需要手动编写，所以这里选用grafana中的email报警
（1）报警之前，需要我们将服务器时间设置一致，使用ntpdate命令
（2）新建报警渠道
在这里插入图片描述
进入后点击type下拉菜单，可以看到很多第三方报警平台，包括了钉钉

这里我们打开了email报警，email报警需要设置stmp服务
vim /etc/grafana/grafana.ini
[smtp]
enabled = true
host = smtp.163.com:25 #smtp服务器的地址：端口 (服务器地址不同公司可能不相同)
user = 你的邮箱
#If the password contains # or ; you have to wrap it with trippel quotes. Ex “”“#password;”“”
password = 你的密码
;cert_file =
;key_file =
skip_verify = true #Verify SSL for smtp server? defaults to false
from_address = 你的邮箱
from_name = Grafana
结束后可以save
（3）回到之前最初建立的dashboards，或者新建的dashboards，在alert设定规则
在这里插入图片描述

代表意思：A查询语句中，检测时长，从现在到1min的变化显示
在这里插入图片描述
查询语句、检测时长、显示范围时间等都可以选择

在这里插入图片描述

报警取值范围：超过、低于、范围外、范围之内、没有该值
在这里插入图片描述

（4）报警条件设置
在这里插入图片描述
如果没有数据或所有值为满，则设置状态到：
如果程序成错或者超时，则设置状态到：
完成后点击test rule，测试报警规则

12 总结

Prometheus 属于一站式监控告警平台，依赖少，功能齐全。
Prometheus 支持对云或容器的监控，其他系统主要对主机监控。
Prometheus 数据查询语句表现力更强大，内置更强大的统计函数。
Prometheus 在数据存储扩展性以及持久性上没有 InfluxDB，OpenTSDB，Sensu 好
Prometheus vs Zabbix

Zabbix 使用的是 C 和 PHP, Prometheus 使用 Golang, 整体而言 Prometheus 运行速度更快一点。
Zabbix 属于传统主机监控，主要用于物理主机，交换机，网络等监控，Prometheus 不仅适用主机监控，还适用于 Cloud, SaaS, Openstack，Container 监控。
Zabbix 在传统主机监控方面，有更丰富的插件。
Zabbix 可以在 WebGui 中配置很多事情，但是 Prometheus 需要手动修改文件配置。

13 脚本

#!/bin/bash

set -ex
CWD=$(pwd)

InstallDir=/usr/local
AppsDir=/tmp

echo "nameserver 114.114.114.114">>/etc/resolv.conf
mkdir -p $InstallDir
mkdir -p $AppsDir
useradd -s /sbin/nologin -M prometheus

#Download Software
#官方链接地址
#https://grafana.com/grafana/download?pg=get&plcmt=selfmanaged-box1-cta1

wget https://dl.grafana.com/oss/release/grafana-8.0.6-1.x86_64.rpm
yum -y install grafana-8.0.6-1.x86_64.rpm
systemctl enable grafana-server;systemctl start grafana-server

#官方链接地址 https://prometheus.io/download/
cd ${AppsDir} && \
 wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz
tar xf prometheus-2.28.1.linux-amd64.tar.gz -C ${InstallDir}
mv ${InstallDir}/prometheus* ${InstallDir}/prometheus

#官方链接地址 https://prometheus.io/download/
cd ${AppsDir} && \
 wget https://github.com/prometheus/node_exporter/releases/download/v1.2.0/node_exporter-1.2.0.linux-amd64.tar.gz
tar xf node_exporter-1.2.0.linux-amd64.tar.gz -C $InstallDir
mv $InstallDir/node_exporter* $InstallDir/node_exporter

chown -R prometheus:prometheus $InstallDir/prometheus
chown -R prometheus:prometheus $InstallDir/node_exporter

cat>/usr/lib/systemd/system/prometheus.service<<EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=$InstallDir/prometheus/prometheus \
  --config.file=$InstallDir/prometheus/prometheus.yml \
  --web.enable-lifecycle \
  --storage.tsdb.path=$InstallDir/prometheus/data \
  --storage.tsdb.retention=60d
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

cat>/usr/lib/systemd/system/node_exporter.service<<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=$InstallDir/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable node_exporter prometheus grafana-server
systemctl start  node_exporter prometheus grafana-server

#安装zabbix插件
grafana-cli plugins install alexanderzobnin-zabbix-app
mysql -uroot -proot -e "GRANT SELECT ON zabbix.* TO 'grafana'@'%' identified by 'grafana';flush privileges"

clear
echo
ss -tuanlpe | egrep  "3000|9090"
systemctl status prometheus node_exporter grafana-server | grep Active

grafana+prometheus