源码部署Prometheus+Grafana实现监控

  1. 简介  

  2. 部署prometheus

  3. 部署grafana

  4. 服务器节点的监控

  5. Pushgateway数据收集与Alertmanager监控


一.简介

Prometheus是一个开源的系统监控和报警系统,现在已经加入到CNCF基金会,成为继k8s之后第二个在CNCF托管的项目,在kubernetes容器管理系统中,通常会搭配prometheus进行监控,同时也支持多种exporter采集数据,还支持pushgateway进行数据上报,Prometheus性能足够支撑上万台规模的集群。

grafana 是一款采用 go 语言编写的开源应用,主要用于大规模指标数据的可视化展现,是网络架构和应用分析中最流行的时序数据展示工具,目前已经支持绝大部分常用的时序数据库.Grafana支持许多不同的数据源。每个数据源都有一个特定的查询编辑器,该编辑器定制的特性和功能是公开的特定数据来源。 官方支持以下数据源:Graphite,Elasticsearch,InfluxDB,Prometheus,Cloudwatch,MySQL和OpenTSDB等。

二.部署promethues

Download | Prometheus 下载最新版本(包含promethues所需插件)

[root@localhost ~]# mkdir -p /app/prometheus
[root@localhost ~]# cd /app/prometheus
[root@localhost prometheus]# wget https://github.com/prometheus/prometheus/releases/download/v2.33.3/prometheus-2.33.3.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf prometheus-2.33.3.linux-amd64.tar.gz
[root@localhost prometheus]# cd prometheus-2.33.3

查看下prometheus的程序包,修改配置文件完成各种类型监控~

[root@localhost prometheus-2.33.3]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool

[root@localhost prometheus-2.33.3]# vim prometheus
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration 告警
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
# prometheus server
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.137.100:9090"]

#收集器
  - job_name: 'pushgateway'
    static_configs:
      - targets: ['192.168.137.100:9091']
        labels:
           instance: pushgateway

#节点监控
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['192.168.137.100:9100','192.168.137.2:9100','192.168.137.3:9100','47.99.57.254:8100']

 #mysql数据库监控
  - job_name: 'mysqld_exporter'
    static_configs:
      - targets: ['47.99.57.254:9104']

#nginx监控
  - job_name: 'nginx_node'
    static_configs:
      - targets: ['192.168.137.3:9913']
        labels:
           instance: web1

[root@localhost prometheus-2.33.3]# ./prometheus --config.file=/app/prometheus/prometheus-2.33.3/prometheus.yml --storage.tsdb.path=/app/prometheus/prometheus-2.33.3/data/ &

服务启动成功,从安全角度考虑,配置promethues开机自启也有利于我们后期维护操作

cat > /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=
/app/prometheus/prometheus-2.33.3//prometheus --config.file=/app/prometheus/prometheus-2.33.3/prometheus.yml --storage.tsdb.path=/app/prometheus/prometheus-2.33.3/data 
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl start prometheus.service
systemctl status prometheus.service
systemctl enable prometheus.service

访问 192.168.137.100:9090 进入prometheus界面

三.部署Grafana

[root@localhost ~]# cd /app/prometheus
[root@localhost prometheus]# wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.4.1.linux-amd64.tar.gz
[root@localhost prometheus]# cd granfana-8.4.1
[root@localhost grafana-8.4.1]# nohup ./bin/grafana-server web > ./grafana.log 2>&1 &

查看服务进程和端口是否正常(显示Ok)

 访问OK_ ,指定IP和端口,将prometheus添加到grafana中

 四.node_exporter节点监控

[root@localhost prometheus]# wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf node_exporter-1.3.1.linux-amd64.tar.gz
[root@localhost prometheus]# mv node_exporter-1.3.1 node_exporter
[root@localhost prometheus]# cd node_exporter
[root@localhost node_exporter]# ./node_exporter --web.listen-address=:9100 >node_exporter.log 2>&1 &

 服务启动成功,promethues成功监控到node节点(需要在prometheus.yml中配置node_exporter的监控节点ip:prot,上文已配置,只需在对应的节点启动node_exporter即可),从安全角度考虑,配置node_exporter开机自启也有利于我们后期维护操作

vim /etc/systemd/system/node_exporter.service

[Unit]
Description=node_exporter Monitoring System
Documentation=node_exporter Monitoring System
 
[Service]
ExecStart=自己本地路径/node_exporter --web.listen-address=:9100
 
[Install]
WantedBy=multi-user.target

#设置开机自启
systemctl daemon-reload
systemctl start node_exporter.service
systemctl status node_exporter.service
systemctl enable node_exporter.service

上图可见,节点已经加入prometheus监控,现在,我们可以用grafana做可视化展览了

导入大神模板,看看效果!!!(当然你也可以自己做模板)

 

 

 五.Pushgateway数据收集与Alertmanager监控

1.部署pushgateway

[root@localhost ~]# cd /app/prometheus/
[root@localhost prometheus]# wget https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf pushgateway-1.4.2.linux-amd64.tar.gz
[root@localhost prometheus]# mv pushgateway-1.4.2 pushgateway
[root@localhost prometheus]# cd pushgateway
[root@localhost pushgateway]# nohup /app/prometheus/pushgateway/pushgateway --web.listen-address :9091 > /app/prometheus/pushgateway/pushgateway.log 2>&1 &

因为我们刚才将启动信息输入到/app/prometheus/pushgateway/pushgateway.log,可以cat看看启动的信息。查看pushgateway服务进程是否启动

 验证是否有数据收集:访问IP:8091/metrics,如下显示,则服务信息收集正常。

 2.部署Alertmanager

[root@localhost prometheus]# wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
[root@localhost prometheus]# tar zxvf alertmanager-0.23.0.linux-amd64.tar.gz
[root@localhost prometheus]# mv alertmanager-0.23.0 alertmanager
[root@localhost prometheus]# cd alertmanager

 设置alertmanager启动项

[root@localhost alertmanager]# cat /usr/lib/systemd/system/alertmanager.service 
[Unit]
Description=prometheus

[Service]
Restart=on-failure
ExecStart=/app/prometheus/alertmanager/alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.targe

 启动alertmanager服务,并设置开机自启动

[root@localhost alertmanager]# systemctl start alertmanager
[root@localhost alertmanager]# systemctl enable alertmanager
[root@localhost alertmanager]# ps -elf | grep alertmanager
4 S root        913      1  0  80   0 - 181955 futex_ 08:24 ?       00:00:15 /app/prometheus/alertmanager/alertmanager --config.file=/app/prometheus/alertmanager/alertmanager.yml
0 S root       3384   3008  0  80   0 - 28206 pipe_w 10:42 pts/0    00:00:00 grep --color=auto alertmanager

 alertmanager服务需要在prometheus.yml配置文件中添加监控基本配置如下,重启prometheus刷新配置

 我将监控规则统一格式,创建rule目录放入之中,分别为cpu\disk\mem的信息监控告警

 下面是一个简单的测试,可根据具体情况设置服务环境监控的脚本

vim rule/cpu_rule.yml

groups:
- name: Host
  rules:
  - alert: HostCPU
    expr: 100 * (1 - avg(irate(node_cpu_seconds_total{mode="idle"}[2m])) by(instance)) > 10
    for: 5m
    labels:
      serverity: high
    annotations:
      summary: "{{$labels.instance}}: High CPU Usage Detected"
      description: "{{$labels.instance}}: CPU usage is {{$value}}, above 10%"

 vim rules/disk_rule.yml

groups:
- name: Host
  rules:
  - alert: HostDisk
    expr: 100 * (node_filesystem_size_bytes{fstype=~"xfs|ext4"} - node_filesystem_avail_bytes) / node_filesystem_size_bytes > 30
    for: 5m
    labels:
      serverity: low
    annotations:
      summary: "{{$labels.instance}}: High Disk Usage Detected"
      description: "{{$labels.instance}}, mountpoint {{$labels.mountpoint}}: Disk Usage is {{ $value }}, above 30%"

 vim rules/Memory_rule.yml

groups:
- name: Host
  rules:
  - alert: HostMemory
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 20
    for: 5m
    labels:
      serverity: middle
    annotations:
      summary: "{{$labels.instance}}: High Memory Usage Detected"
      description: "{{$labels.instance}}: Memory Usage i{{ $value }}, above 20%"

 为了更好看出效果,CUP使用率超过10%,磁盘超过30%,内存超过20%,则告警如下:

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值