前提:已搭建好监控告警平台
配置Promethues
prometheus.yml
# my global config
global:
scrape_interval: 30s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules/*.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: 'application'
file_sd_configs:
- files:
- '/data/prometheus/monitor/base/*.yml'
- '/data/prometheus/monitor/biz/*.yml'
- job_name: 'jvm'
metrics_path: '/actuator/prometheus'
file_sd_configs:
- files:
- '/data/prometheus/monitor/jvm/*.yml'
refresh_interval: 15s
#sql的指标拉取
- job_name: 'sql'
scrape_interval: 1m
scrape_timeout: 50s
file_sd_configs:
- files:
- '/data/prometheus/monitor/sql/*.yml'
refresh_interval: 15s
#黑盒检测
- job_name: 'blackbox_api'
scrape_interval: 1m
metrics_path: '/probe'
params:
module: [http_2xx]
file_sd_configs:
- files:
- '/data/prometheus/monitor/api-check/*.yml'
refresh_interval: 15s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
- job_name: 'blackbox_api_post'
scrape_interval: 1m
metrics_path: '/probe'
params:
module: [http_post_2xx]
file_sd_configs:
- files:
- '/data/prometheus/monitor/api-check-post/*.yml'
refresh_interval: 15s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
- job_name: 'eureka-prometheus-xinlimei'
metrics_path: '/actuator/prometheus'
eureka_sd_configs:
- server: "http://172.19.37.215:8761/eureka"
- server: 'http://172.19.37.216:8761/eureka'
- server: 'http://172.19.37.217:8761/eureka'
relabel_configs:
- source_labels: [__meta_eureka_app_name]
target_label: application
- action: drop
regex: "eureka.*"
source_labels: [instance]
- action: replace
source_labels:
- instance
target_label: __metrics_path__
replacement: /prometheus/metrics
regex: 172.19.38.118:9333|172.19.38.119:9333
- action: replace
source_labels:
- instance
target_label: application
replacement: sunmei-mall-product
regex: 172.19.38.134:8090|172.19.38.135:8090
- job_name: 'nacos-services'
scrape_timeout: 15s
metrics_path: '/actuator/prometheus'
file_sd_configs:
- files:
- '/data/prometheus/monitor/nacos/*.json'
refresh_interval: 15s
配置blackbox需要监控的域名
/data/prometheus/monitor/api-check/ssl_check.yml
与job_name: 'blackbox_api'配置的文件路径保持一致
- labels:
job: 'ssl_check'
type: ssl
targets:
- https://domain1.com.cn
- https://domain2.con.cn
- ...
配置告警规则
rules/ssl_check_rules.yml
groups:
- name: ssl_expiry
rules:
- alert: SSL证书过期时间小于30天预警
expr: round(round(probe_ssl_earliest_cert_expiry - time(),2)/86400,2) < 30
for: 5m
labels:
status: 严重
annotations:
summary: "SSL 证书即将过期,请及时续期或更换! (instance {{ $labels.instance }})"
description: "SSL证书将在30天内过期\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
- alert: SSL证书过期时间小于7天预警
expr: round(round(probe_ssl_earliest_cert_expiry - time(),2)/86400,2) < 7
for: 5m
labels:
status: 严重
annotations:
summary: "SSL 证书即将过期,请及时续期或更换! (instance {{ $labels.instance }})"
description: "SSL证书将在7天内过期\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
重启或热加载服务生效
prometheus、