[root@k8s-master1 promethes]# cat prometheus-rules.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: kube-system
data:
general.rules: |
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: error
annotations:
summary: "Instance {
{ $labels.instance }} 停止工作"
description: "{
{ $labels.instance }} job {
{ $labels.job }} 已经停止5分钟以上."
node.rules: |
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: |
node_filesystem_avail{fstype=~"ext.|xfs",job="kubernetes-service-endpoints"}
/ node_filesystem_size{fstype=~"ext.|xfs",job="kubernetes-service-endpoints"}
* 100 <= 10
for: 2m
labels:
severity: critical
prometheus 告警规则
最新推荐文章于 2024-08-30 10:50:19 发布
本文详细介绍了如何在Prometheus中设置告警规则,包括定义告警条件、配置通知渠道和管理告警状态。通过实例解析,展示了如何利用PromQL进行复杂监控指标的筛选和组合,确保在系统出现异常时能够及时发出预警。
摘要由CSDN通过智能技术生成