源码部署Prometheus+Grafana实现监控

主机浇点儿板蓝根

已于 2022-03-08 11:07:34 修改

阅读量1.7k

点赞数 1

分类专栏：监控文章标签： python 开发语言后端

于 2022-02-24 16:49:30 首次发布

本文链接：https://blog.csdn.net/qq_45592278/article/details/123110743

版权

监控专栏收录该内容

1 篇文章 0 订阅

订阅专栏

简介
部署prometheus
部署grafana
服务器节点的监控
Pushgateway数据收集与Alertmanager监控

一.简介

Prometheus是一个开源的系统监控和报警系统，现在已经加入到CNCF基金会，成为继k8s之后第二个在CNCF托管的项目，在kubernetes容器管理系统中，通常会搭配prometheus进行监控，同时也支持多种exporter采集数据，还支持pushgateway进行数据上报，Prometheus性能足够支撑上万台规模的集群。

grafana 是一款采用 go 语言编写的开源应用，主要用于大规模指标数据的可视化展现，是网络架构和应用分析中最流行的时序数据展示工具，目前已经支持绝大部分常用的时序数据库.Grafana支持许多不同的数据源。每个数据源都有一个特定的查询编辑器,该编辑器定制的特性和功能是公开的特定数据来源。官方支持以下数据源:Graphite，Elasticsearch，InfluxDB，Prometheus，Cloudwatch，MySQL和OpenTSDB等。

二.部署promethues

Download | Prometheus 下载最新版本（包含promethues所需插件）

[root@localhost ~]# mkdir -p /app/prometheus
[root@localhost ~]# cd /app/prometheus
[root@localhost prometheus]# wget https://github.com/prometheus/prometheus/releases/download/v2.33.3/prometheus-2.33.3.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf prometheus-2.33.3.linux-amd64.tar.gz
[root@localhost prometheus]# cd prometheus-2.33.3

查看下prometheus的程序包，修改配置文件完成各种类型监控~

[root@localhost prometheus-2.33.3]# ls
console_libraries consoles data LICENSE NOTICE prometheus prometheus.yml promtool

[root@localhost prometheus-2.33.3]# vim prometheus

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration 告警
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
# prometheus server
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.137.100:9090"]

#收集器
  - job_name: 'pushgateway'
    static_configs:
      - targets: ['192.168.137.100:9091']
        labels:
           instance: pushgateway

#节点监控
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['192.168.137.100:9100','192.168.137.2:9100','192.168.137.3:9100','47.99.57.254:8100']

 #mysql数据库监控
  - job_name: 'mysqld_exporter'
    static_configs:
      - targets: ['47.99.57.254:9104']

#nginx监控
  - job_name: 'nginx_node'
    static_configs:
      - targets: ['192.168.137.3:9913']
        labels:
           instance: web1

[root@localhost prometheus-2.33.3]# ./prometheus --config.file=/app/prometheus/prometheus-2.33.3/prometheus.yml --storage.tsdb.path=/app/prometheus/prometheus-2.33.3/data/ &

服务启动成功，从安全角度考虑，配置promethues开机自启也有利于我们后期维护操作

cat > /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/app/prometheus/prometheus-2.33.3//prometheus --config.file=/app/prometheus/prometheus-2.33.3/prometheus.yml --storage.tsdb.path=/app/prometheus/prometheus-2.33.3/data
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl start prometheus.service
systemctl status prometheus.service
systemctl enable prometheus.service

访问 192.168.137.100:9090 进入prometheus界面

三.部署Grafana

[root@localhost ~]# cd /app/prometheus
[root@localhost prometheus]# wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.4.1.linux-amd64.tar.gz
[root@localhost prometheus]# cd granfana-8.4.1
[root@localhost grafana-8.4.1]# nohup ./bin/grafana-server web > ./grafana.log 2>&1 &

查看服务进程和端口是否正常（显示Ok）

访问OK_ ，指定IP和端口，将prometheus添加到grafana中

四.node_exporter节点监控

[root@localhost prometheus]# wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf node_exporter-1.3.1.linux-amd64.tar.gz
[root@localhost prometheus]# mv node_exporter-1.3.1 node_exporter
[root@localhost prometheus]# cd node_exporter
[root@localhost node_exporter]# ./node_exporter --web.listen-address=:9100 >node_exporter.log 2>&1 &

服务启动成功，promethues成功监控到node节点（需要在prometheus.yml中配置node_exporter的监控节点ip:prot,上文已配置，只需在对应的节点启动node_exporter即可），从安全角度考虑，配置node_exporter开机自启也有利于我们后期维护操作

vim /etc/systemd/system/node_exporter.service

[Unit]
Description=node_exporter Monitoring System
Documentation=node_exporter Monitoring System

[Service]
ExecStart=自己本地路径/node_exporter --web.listen-address=:9100

[Install]
WantedBy=multi-user.target

#设置开机自启
systemctl daemon-reload
systemctl start node_exporter.service
systemctl status node_exporter.service
systemctl enable node_exporter.service

上图可见，节点已经加入prometheus监控，现在，我们可以用grafana做可视化展览了

导入大神模板，看看效果！！！（当然你也可以自己做模板）

五.Pushgateway数据收集与Alertmanager监控

1.部署pushgateway

[root@localhost ~]# cd /app/prometheus/
[root@localhost prometheus]# wget https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf pushgateway-1.4.2.linux-amd64.tar.gz
[root@localhost prometheus]# mv pushgateway-1.4.2 pushgateway
[root@localhost prometheus]# cd pushgateway
[root@localhost pushgateway]# nohup /app/prometheus/pushgateway/pushgateway --web.listen-address :9091 > /app/prometheus/pushgateway/pushgateway.log 2>&1 &

因为我们刚才将启动信息输入到/app/prometheus/pushgateway/pushgateway.log，可以cat看看启动的信息。查看pushgateway服务进程是否启动

验证是否有数据收集：访问IP:8091/metrics,如下显示，则服务信息收集正常。

2.部署Alertmanager

[root@localhost prometheus]# wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf alertmanager-0.23.0.linux-amd64.tar.gz
[root@localhost prometheus]# mv alertmanager-0.23.0 alertmanager
[root@localhost prometheus]# cd alertmanager

设置alertmanager启动项

[root@localhost alertmanager]# cat /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=prometheus

[Service]
Restart=on-failure
ExecStart=/app/prometheus/alertmanager/alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.targe

启动alertmanager服务，并设置开机自启动

[root@localhost alertmanager]# systemctl start alertmanager
[root@localhost alertmanager]# systemctl enable alertmanager
[root@localhost alertmanager]# ps -elf | grep alertmanager
4 S root        913      1  0  80   0 - 181955 futex_ 08:24 ?       00:00:15 /app/prometheus/alertmanager/alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml
0 S root       3384   3008  0  80   0 - 28206 pipe_w 10:42 pts/0    00:00:00 grep --color=auto alertmanager

alertmanager服务需要在prometheus.yml配置文件中添加监控基本配置如下，重启prometheus刷新配置

我将监控规则统一格式，创建rule目录放入之中，分别为cpu\disk\mem的信息监控告警

下面是一个简单的测试，可根据具体情况设置服务环境监控的脚本

vim rule/cpu_rule.yml

groups:
- name: Host
rules:
- alert: HostCPU
expr: 100 * (1 - avg(irate(node_cpu_seconds_total{mode="idle"}[2m])) by(instance)) > 10
for: 5m
labels:
serverity: high
annotations:
summary: "{{$labels.instance}}: High CPU Usage Detected"
description: "{{$labels.instance}}: CPU usage is {{$value}}, above 10%"

vim rules/disk_rule.yml

groups:
- name: Host
rules:
- alert: HostDisk
expr: 100 * (node_filesystem_size_bytes{fstype=~"xfs|ext4"} - node_filesystem_avail_bytes) / node_filesystem_size_bytes > 30
for: 5m
labels:
serverity: low
annotations:
summary: "{{$labels.instance}}: High Disk Usage Detected"
description: "{{$labels.instance}}, mountpoint {{$labels.mountpoint}}: Disk Usage is {{ $value }}, above 30%"

vim rules/Memory_rule.yml

groups:
- name: Host
rules:
- alert: HostMemory
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 20
for: 5m
labels:
serverity: middle
annotations:
summary: "{{$labels.instance}}: High Memory Usage Detected"
description: "{{$labels.instance}}: Memory Usage i{{ $value }}, above 20%"

为了更好看出效果，CUP使用率超过10%，磁盘超过30%，内存超过20%，则告警如下：

主机浇点儿板蓝根

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
源码部署Prometheus+Grafana实现监控

简介部署prometheus 部署grafana 服务器节点的监控一.简介Prometheus是一个开源的系统监控和报警系统，现在已经加入到CNCF基金会，成为继k8s之后第二个在CNCF托管的项目，在kubernetes容器管理系统中，通常会搭配prometheus进行监控，同时也支持多种exporter采集数据，还支持pushgateway进行数据上报，Prometheus性能足够支撑上万台规模的集群。grafana 是一款采用 go 语言编写的开源...
复制链接

扫一扫