目录
一:基本组件理解
node_exporter:采集节点的cpu,内存,磁盘等基本信息
prometheus: 收集node_exporter,redis_exporter等数据。
Grafana: 从prometheus,mysql,es中拿到采集的数据进行图形展示。
alertmanager: 报警管理器。可以与钉钉结合将报警发给钉钉机器人。
流程:
prometheus采集数据并编写告警规则,告警规则发给alertmanager 之后,alertmanager 作进一步作告警处理。并将告警消息转发给钉钉机器人。
二:告警平台架构图
三:prometheus部署
3.1: 创建用户
useradd -M -s /sbin/nologin prometheus -g prometheus
3.2: 下载安装包
wget https://github.com/prometheus/prometheus/releases/download/v2.22.2/prometheus-2.22.2.linux-amd64.tar.gz
3.3:解压
tar zxf prometheus-2.22.2.linux-amd64.tar.gz
mv prometheus-2.22.2.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus
mkdir relus data target
3.4:设置prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['172.24.65.155:9090']
# 设置自动发现监控主机
- job_name: 'hosts-status'
file_sd_configs:
- files:
- "/usr/local/prometheus/target/host_status.json"
refresh_interval: 6s
[root@localhost prometheus]# cat target/host_status.json
[
{
"targets": ["172.24.65.107:9100"],
"labels": {
"job": "hosts-status",
"service": "master107"
}
},
{
"targets": ["172.24.65.108:9100"],
"labels": {
"job": "hosts-status",
"service": "master108"
}
}
]
3.5:将prometheus设置为系统服务
[root@localhost prometheus]# cat /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus-server
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/usr/local/prometheus/data --storage.tsdb.retention.time=7d --web.max-connections=512 --web.read-timeout=3m --query.max-concurrency=