prometheus之记录规则(recording rules)与告警规则(alerting rule)

Prometheus支持两种类型的规则:记录规则和警报规则。 要在Prometheus中包含规则,请创建一个包含必要规则语句的文件,并让Prometheus通过Prometheus配置中的rule_files字段加载规则文件。
通过将SIGHUP发送到Prometheus进程,可以在运行时重新加载规则文件。 这些更改仅适用于所有规则文件格式良好的情况。

语法检查规则
要在不启动Prometheus进程的情况下快速检查规则文件是否在语法上正确,可以通过安装并运行Prometheus的promtool命令行工具来校验:

go get github.com/prometheus/prometheus/cmd/promtool

使用例子

[root@fabric-cli prometheus-2.2.1.linux-amd64]# ls -l
总用量 108104
drwxrwxr-x 2 1000 1000       38 3月  14 22:14 console_libraries
drwxrwxr-x 2 1000 1000      173 3月  14 22:14 consoles
drwxr-xr-x 5 root root       85 5月  12 00:05 data
-rw-rw-r-- 1 1000 1000    11357 314 22:14 LICENSE
-rw-rw-r-- 1 1000 1000     2769 314 22:14 NOTICE
-rwxr-xr-x 1 1000 1000 66176282 314 22:17 prometheus
-rw-r--r-- 1 root root      167 54 10:47 prometheus.rules.yml
-rw-rw-r-- 1 1000 1000      879 54 10:49 prometheus.yml
-rwxr-xr-x 1 1000 1000 44492910 314 22:18 promtool

[root@fabric-cli prometheus-2.2.1.linux-amd64]# ./promtool check rules prometheus.rules.yml 
Checking prometheus.rules.yml
  SUCCESS: 1 rules found

规则语法:

groups:
  [ - <rule_group> ]

<rule_group>的语法
# 规则组名 必须是唯一的
name: <string>

# 规则评估间隔时间
[ interval: <duration> | default = global.evaluation_interval ]

rules:
  [ - <rule> ... ]

<rule>的语法
# 收集的指标名称
record: <string>

# 评估时间
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>

# Labels to add or overwrite before storing the result.
labels:
  [ <labelname>: <labelvalue> ]

例子

groups:
  - name: example
    rules:
    - record: job:http_inprogress_requests:sum
      expr: sum(http_inprogress_requests) by (job)

另告警规则语法如下

# The name of the alert. Must be a valid metric name.
alert: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and all resultant time series become
# pending/firing alerts.
expr: <string>

# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
[ for: <duration> | default = 0s ]

# Labels to add or overwrite for each alert.
labels:
  [ <labelname>: <tmpl_string> ]

# Annotations to add to each alert.
annotations:
  [ <labelname>: <tmpl_string> ]

告警规则例子

groups:
- name: example
  rules:
  - alert: HighErrorRate
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

for: 10m表示超过10min访问不到服务就报警
参考
recording_rules
alerting_rules

### 如何在 Prometheus 中设置 Redis 集群的监控 Rules 告警规则 #### 创建告警规则文件 为了针对 Redis 集群配置有效的告警规则,需创建一个 `rules.yml` 文件来定义这些规则。此文件应放置于 Prometheus 的配置目录下以便加载。 ```yaml groups: - name: redis.rules rules: - alert: RedisDown expr: up{job="redis"} == 0 for: 5m labels: severity: critical annotations: summary: "Redis instance is down (instance {{ $labels.instance }})" description: "Redis has been down for more than 5 minutes." - alert: RedisMemoryUsageHigh expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 80 for: 10m labels: severity: warning annotations: summary: "High memory usage on Redis (instance {{ $labels.instance }})" description: "More than 80% of allocated memory used by Redis." - alert: RedisSlowCommands expr: rate(redis_commands_duration_seconds_bucket{le="+Inf", job="redis"}[5m]) > 10 for: 5m labels: severity: warning annotations: summary: "Too many slow commands executed" description: "The number of very slow commands per second exceeds the threshold over last 5 mins." ``` 上述 YAML 片段展示了几个典型的 Redis 监控告警规则[^1]: - **RedisDown**: 当某个 Redis 实例不可达时触发; - **RedisMemoryUsageHigh**: 如果内存利用率超过设定阈值,则发出警告; - **RedisSlowCommands**: 若执行缓慢命令的数量过多也会报警。 #### 更新 Prometheus 主配置文件 接着,在 Prometheus 的主配置文件 (`prometheus.yml`) 中加入新创建的告警规则路径,确保能够被正确读取并应用。 ```yaml rule_files: - "path/to/your/rules/*.yml" ``` 这一步骤使得 Prometheus 可以识别自定义的告警逻辑,并按照既定条件评估指标数据流,从而及时响应潜在问题的发生[^2]。 #### 测试验证 完成以上步骤之后,重启 Prometheus 服务使更改生效。随后可以通过 PromQL 查询接口测试所设规则是否按预期工作,比如手动模拟某些异常情况观察是否有相应的告警事件产生。 通过这种方式可以有效地利用 Prometheus 对 Redis 集群实施全面而细致的状态监测,保障系统的稳定性和性能表现[^3]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值