Prometheus+Pushgateway+VictoriaMetrics+Grafana+Consul开源监控体系搭建

监控全局架构图

本文基于prometheus开源全家桶 + 互联网企业实战经验,指导小白如何从零搭建一套完整的监控系统,教学内容从基础监控、业务监控、进程监控、自定义指标监控等等多个维度实战讲解;

其中告警中心为自研中间件,主要解决alertmanager没办法降噪、告警升级、按业务分流到人;(可直接通过alertmanager推送告警)
在这里插入图片描述

1. prometheus搭建和配置介绍

1.1 prometheus搭建

官网下载地址 https://prometheus.io/download/

创建存放目录和运行账号

//创建prometheus本地数据存放目录
mkdir /home/data/prometheus_data
//创建prometheus进程运行账号
groupadd prometheus
useradd -g prometheus prometheus -d /home/prometheus

下载及解压安装包

//进入到软件安装目录
cd /usr/local
//选择最新的稳定版本,下载安装包
wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-amd64.tar.gz
//解压安装包
tar -xvf prometheus-2.14.0.linux-amd64.tar.gz
//重命名解压目录
mv  prometheus-2.14.0.linux-amd64  prometheus

配置标准化

//进入到prometheus目录
cd /usr/local/prometheus
//创建数据、配置、日志等目录
mkdir -p {
   cfg,bin}
//移动二进制文件到bin目录
mv prometheus promtool bin/
//移动主配置文件,到cfg目录
mv prometheus.yml cfg/
//目录和文件授权给prometheus用户
chown -R prometheus.prometheus /usr/local/prometheus
//设置环境变量
cat >> /etc/profile <<'EOF'
PATH=/usr/local/prometheus/bin:$PATH:$HOME/bin
EOF
source /etc/profile

创建systemctl服务文件

//生成配置文件
cat > /usr/lib/systemd/system/prometheus.service <<'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Restart=always
ExecReload=/bin/kill -HUP $MAINPID
//指定本地时序存储路径storage.tsdb.path 60d为数据存储的天数
//通过api web更新cfg配置文件需要加 --web.enable-lifecycle 参数
ExecStart=/usr/local/prometheus/bin/prometheus --storage.tsdb.retention.time=60d --config.file=/usr/local/prometheus/cfg/prometheus.yml --storage.tsdb.path=/home/data/prometheus_data

[Install]
WantedBy=multi-user.target
EOF

1.5 使用systemctl 启动

//重新加载systemctl配置文件
systemctl daemon-reload
//加入到开启自启动
systemctl enable prometheus
//启动prometheus
systemctl start prometheus
//查看prometheus
systemctl status prometheus

//查看prometheus进程服务的详细日志
journalctl -u prometheus -f

搭建完成后,可以在http://prometheusIP:9090/targets 页面中查看各个监控agent的状态;

1.2 prometheus配置文件详解

1.2.1 prometheus.yml详解

详细介绍prometheus的几种常见配置方法,静态static_configs、file_sd_configs动态文件、consul_sd_configs 注册模式consul;

以及如何配置多个remote_write远程存储VictoriaMetrics、alertmanagers告警、rule_files告警规则等;

#my global config
global:
  //间隔时间,15秒pull一次
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

#Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    //配置告警的alertmanagers地址,用于处理监控规则出发的告警
    - targets: ["127.0.0.1:9093"]

#Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  //存放告警规则组的文件,详细配置可查阅2.3
  - "alert_rules.yml"

#remote write VictoriaMetrics
remote_write:
  //写远程存储地址,支持多个prometheus写入,grafana从远程存储读取数据
  - url: http://127.0.0.1:8428/api/v1/write
    remote_timeout: 30s
    queue_config:
      capacity: 500000
      max_shards: 50
      max_samples_per_send: 20000
      batch_send_deadline: 5s
  //同时写入多个远程存储地址,配置多个url即可
  - url: http://127.0.0.1:8428/api/v1/write
    remote_timeout: 30s
    queue_config:
      capacity: 500000
      max_shards: 50
      max_samples_per_send: 20000
      batch_send_deadline: 5s
      max_retries: 3

#A scrape configuration containing exactly one endpoint to scrape:
#Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  //prometheus 同类监控组的名称,自定义
  //static_configs,通过静态配置,适用于快速测试
  - job_name: 'node_exporter'
    scrape_interval: 30s
    scrape_timeout: 30s
    static_configs:
      - targets: ['127.0.0.1:9090']
        labels:
          instance: 127.0.0.1
          
  //file_sd_configs,通过文件的形式动态加载配置,适用于web化动态管理节点
  - job_name: 'node_monitor'
    scrape_interval: 30s
    scrape_timeout: 30s
    metrics_path: /node
    file_sd_configs:
    - files:
      - node_job.yml
    //可以通过正则表达式做标签过滤,可以省略
    relabel_configs:
    - source_labels: [__address__]
      regex: '(.*):.*'
      replacement: '$1'
      target_label: host
      
  //通过cunsul做动态发现,适用于java/php/go等业务程序上报的指标采集
  - job_name: 'java_metric'
    scrape_interval: 30s
    scrape_timeout: 30s
    metrics_path: /actuator/prometheus
    consul_sd_configs:
    - server: 'consul.cn:80'
      services: []
      //consul 认证的token,只需要consul node和servier的读权限
      token: 'ea298607-8e39-686e-7d05-d9068fe7f984'
      tags: ['java-cls']

  //通过pushgateway做监控监控,和自定义业务指标监控
  - job_name: 'push_metric'
    scrape_interval: 30s
    scrape_timeout: 30s
    static_configs:
      - targets: ['pushgatewayIP:9091']
      
  • 4
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值