Prometheus+Grafana监控系统快速搭建
说明
Prometheus负责收集数据,Grafana负责展示数据。其中采用Prometheus 中的 Exporter含:
1)Node Exporter,负责收集 host 硬件和操作系统数据。它将以容器方式运行在所有 host 上。
2)cAdvisor,负责收集容器数据。它将以容器方式运行在所有 host 上。
3)Alertmanager,负责告警。它将以容器方式运行在所有 host 上。
repo所有文件下载地址
https://young1.lanzouf.com/b036xjdkf
密码:9ure
默认grafana登录地址:
http://ip:33000/login
admin/admin
安装docker,docker-compose
2.1 安装docker
先安装一个64位的Linux主机,其内核必须高于3.10,内存不低于1GB。在该主机上安装Docker。
# 安装依赖包
yum install -y yum-utils device-mapper-persistent-data lvm2
# 添加Docker软件包源
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# 安装Docker CE
yum install docker-ce -y
# 启动
systemctl start docker
# 开机启动
systemctl enable docker
# 查看Docker信息
docker info
安装docker-compose
curl -L https://github.com/docker/compose/releases/download/1.23.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
添加配置文件
mkdir -p /root/docker/monitor
cd /root/docker/monitor
添加prometheus.yml配置文件,
vim prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['0.0.0.0:39093']
# - alertmanager:39093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "node_down.yml"
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['0.0.0.0:39090']
- job_name: 'cadvisor'
static_configs:
- targets: ['0.0.0.0:38980']
- job_name: 'node'
scrape_interval: 8s
static_configs:
- targets: ['0.0.0.0:39100']
2.2 添加邮件告警配置文件
添加配置文件alertmanager.yml,配置收发邮件邮箱
vim alertmanager.yml
global:
smtp_smarthost: 'smtp.163.com:25' #163服务器
smtp_from: 'tsiyuetian@163.com' #发邮件的邮箱
smtp_auth_username: 'tsiyuetian@163.com' #发邮件的邮箱用户名,也就是你的邮箱
smtp_auth_password: 'TPP***' #发邮件的邮箱密码
smtp_require_tls: false #不进行tls验证
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: live-monitoring
receivers:
- name: 'live-monitoring'
email_configs:
- to: '1933306137@qq.com' #收邮件的邮箱
2.3 添加报警规则
添加一个node_down.yml为 prometheus targets 监控
vim node_down.yml
groups:
- name: node_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: test
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
编写docker-compose
vim docker-compose-monitor.yml
version: '2'
networks:
monitor:
driver: bridge
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /root/docker/monitor/prometheus.yml:/etc/prometheus/prometheus.yml
- /root/docker/monitor/node_down.yml:/etc/prometheus/node_down.yml
ports:
- "39090:9090"
networks:
- monitor
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /root/docker/monitor/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "39093:9093"
networks:
- monitor
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
ports:
- "33000:3000"
networks:
- monitor
node-exporter:
image: quay.io/prometheus/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "39100:9100"
networks:
- monitor
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
hostname: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "38980:8080"
networks:
- monitor
启动docker-compose
#启动容器:
docker-compose -f /root/docker/monitor/docker-compose-monitor.yml up -d
#删除容器:
docker-compose -f /root/docker/monitor/docker-compose-monitor.yml down
#重启容器:
docker restart id
附录:单独命令启动各容器
#启动prometheus
docker run -d -p 39090:9090 --name=prometheus \
-v /root/docker/monitor/prometheus.yml:/etc/prometheus/prometheus.yml \
-v /root/docker/monitor/node_down.yml:/etc/prometheus/node_down.yml \
prom/prometheus
# 启动grafana
docker run -d -p 33000:3000 --name=grafana grafana/grafana
#启动alertmanager容器
docker run -d -p 39093:9093 -v /root/docker/monitor/config.yml:/etc/alertmanager/config.yml --name alertmanager prom/alertmanager
#启动node exporter
docker run -d \
-p 39100:9100 \
-v "/:/host:ro,rslave" \
--name=node_exporter \
quay.io/prometheus/node-exporter \
--path.rootfs /host
#启动cadvisor
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=38980:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest