安装 prometheus
- https://prometheus.io/download/
下载,解压
# 创建 prometheus 用户
# -M 不创建家目录
# -s 使用指定的 shell,这里禁止登录
useradd -M -s /usr/sbin/nologin prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.2/prometheus-2.45.2.linux-amd64.tar.gz
mkdir -p /opt/prometheus/prometheus
tar xfv prometheus-2.45.2.linux-amd64.tar.gz -C /opt/prometheus/prometheus
chown -R prometheus:prometheus /opt/prometheus/prometheus
使用 systemd 管理 prometheus
cat > /etc/systemd/system/prometheus.service << "EOF"
[Unit]
Description=PrometheusServer
After=network-online.target
[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/opt/prometheus/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/prometheus/data \
--storage.tsdb.retention.time=60d \
--web.enable-lifecycle
[Install]
WantedBy=multi-user.target
EOF
systemctl status prometheus
查看状态
systemctl enable prometheus
开机自启
http://localhost:9090
安装 alertmanager
- https://prometheus.io/download/
下载,解压
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
mkdir -p /opt/prometheus/alertmanager
tar xfv alertmanager-0.26.0.linux-amd64.tar.gz -C /opt/prometheus/alertmanager
chown -R prometheus:prometheus /opt/prometheus/alertmanager
使用 systemd 管理 alertmanager
cat > /etc/systemd/system/alertmanager.service << "EOF"
[Unit]
Description=AlertManager
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/alertmanager/alertmanager \
--config.file=/opt/prometheus/alertmanager/alertmanager.yml \
--storage.path=/opt/prometheus/alertmanager/data
Restart=always
[Install]
WantedBy=multi-user.target
EOF
在 /opt/prometheus/prometheus/prometheus.yml
增加 alerting 配置:
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
# 这里增加了一条规则
rule_files:
- "alert.yml"
报警规则配置
cat > /opt/prometheus/prometheus/alert.yml << EOF
groups:
- name: Prometheus Alert
rules:
- alert: AAAAAAAAAAAAAAlert-Test!!!
expr: up == 0
for: 30s
labels:
severity: critical
annotations:
instance: "{{ $labels.instance }}"
description: "{{ $labels.job }} Service down"
EOF
重启 prometheus 和 alertmanager 查看效果 http://localhost:9093
安装 grafana
- https://grafana.com/grafana/download
- https://grafana.com/grafana/dashboards/1860-node-exporter-full/
下载解压
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.3.1.linux-amd64.tar.gz
mkdir -p /opt/prometheus/grafana
tar xfv grafana-enterprise-10.3.1.linux-amd64.tar.gz -C /opt/prometheus/grafana
chown -R prometheus:prometheus /opt/prometheus/grafana
使用 systemd 管理 grafana-server
cat > /etc/systemd/system/grafana-server.service << "EOF"
[Unit]
Description=GrafanaServer
[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/opt/prometheus/grafana/bin/grafana-server \
--config=/opt/prometheus/grafana/conf/defaults.ini \
--homepath=/opt/prometheus/grafana
[Install]
WantedBy=multi-user.target
EOF
grafana 忘记密码重置方式:
grafana-cli admin reset-admin-password 新的密码
http://localhost:3000
安装 node_exporter
- https://prometheus.io/download/
下载,解压
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
mkdir -p /opt/prometheus/node_exporter
tar xfv node_exporter-1.7.0.linux-amd64.tar.gz -C /opt/prometheus/node_exporter
chown -R prometheus:prometheus /opt/prometheus/node_exporter
配置 systemd service 文件
cat > /etc/systemd/system/node_exporter.service << "EOF"
[Unit]
Description=NodeExporter
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.targe
EOF
修改 prometheus 配置
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: "node_exporter"
scrape_interval: 15s
static_configs:
- targets: ["localhost:9100"]
labels:
instance: ahoj-dev-ubuntu-virtualbox
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
http://localhost:9100/