Prometheus部署及监控告警配置参考上文:k8s之部署Prometheus监控平台并实现监控告警
1. 概述
本文采用helm安装数据库&中间件的exporter,并通过配置alertmanager及告警规则监控各组件的状态,并实现邮件报警。其中所采用的helm仓库及chart包如下所示:
- helm仓库:
prometheus-community: https://prometheus-community.github.io/helm-charts
- chart包:
下载无反应可尝试重试多次
prometheus-community/prometheus-mysql-exporter
prometheus-community/prometheus-redis-exporter
prometheus-community/prometheus-kafka-exporter
prometheus-community/prometheus-rabbitmq-exporter
2. 监控Mysql
2.1. 部署Mysql(单机版示例)
kubectl create ns test
vim mysql-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
namespace: test
spec:
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: mysql
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: root@mysql
---
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: test
spec:
type: ClusterIP
ports:
- port: 3306
targetPort: 3306
selector:
app: mysql
kubectl apply -f mysql-deploy.yaml
- 最终信息如下
host: mysql.test
port: 3306
user: root
pass: root@mysql
2.2. 部署mysql-exporter
2.2.1. 下载并解压mysql-exporter安装包
cd ~/workspace/prometheus/
helm pull prometheus-community/prometheus-mysql-exporter
tar zvcf [xxx.tgz]
2.2.2. 配置values.yaml
cd ~/workspace/prometheus/prometheus-mysql-exporter
vim values.yaml
2.2.3. 设置mysql连接
参考上节中mysql的连接信息
mysql:
db: ""
host: "mysql.test"
param: ""
pass: "root@mysql"
port: 3306
protocol: ""
user: "root"
2.2.4. 部署mysql-exporter
helm install prometheus-mysql-exporter -n prometheus .
多实例监控:部署多个exporter即可(注意区分helm-NAME)
- 在prometheus-server面板中查看Target
- 查看mysql-exporter采集的信息
2.3. 配置Grafana-Dashboard
- 导入MySQL Overview监控面板:7362
2.4. 告警规则
告警规则可以参考该监控面板配置,示例如下:
2.4.1. Mysql状态
mysql_up == 0
2.4.2. 打开文件数量偏高
mysql_global_status_innodb_num_open_files / mysql_global_variables_open_files_limit > 0.75
2.4.3. 当前连接数超过最大限制的75%
max_over_time(mysql_global_status_threads_connected[5m]) / mysql_global_variables_max_connections > 0.75
2.4.4. 历史最大连接数超过最大限制的75%
mysql_global_status_max_used_connections / mysql_global_variables_max_connections > 0.75
2.4.5. 慢查询过多
rate(mysql_global_status_slow_queries[5m])>3
3. 监控Redis
3.1. 部署Redis(单机版示例)
可参考之前的文章:k8s之安装单点Redis+NFS持久化+数据迁移
vim redis-deploy.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis
namespace: test
data:
redis.conf: |+
requirepass redis@passwd
maxmemory 268435456
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: test
labels:
app: redis
spec:
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
annotations:
version/date: "20210814"
version/author: "lc"
spec:
containers:
- name: redis
image: redis
imagePullPolicy: Always
command: ["redis-server","/etc/redis/redis.conf"]
ports:
- containerPort: 6379
volumeMounts:
- name: redis-config
mountPath: /etc/redis/redis.conf
subPath: redis.conf
volumes:
- name: redis-config
configMap:
name: redis
items:
- key: redis.conf
path: redis.conf
---
kind: Service
apiVersion: v1
metadata:
name: redis
namespace: test
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
kubectl apply -f redis-deploy.yaml
- 最终连接信息如下
redisAddress: redis.test:6379
redisPassword: redis@passwd
3.2. 部署redis-exporter
3.2.1. 下载并解压redis-exporter安装包
cd ~/workspace/prometheus/
helm pull prometheus-community/prometheus-redis-exporter
tar zvcf [xxx.tgz]
3.2.2. 配置values.yaml
cd ~/workspace/prometheus/prometheus-redis-exporter
vim values.yaml
3.2.3. 设置redis连接
- 参考上节中mysql的连接信息,并打开密码认证(在最下方)
redisAddress: redis.test:6379
---
auth:
# Use password authentication
enabled: true
# Use existing secret (ignores redisPassword)
secret:
name: ""
key: ""
# Redis password (when not stored in a secret)
redisPassword: "redis@passwd"
3.2.4. 设置exporter的Target
此处相比原模版改动较多,请注意差异
annotations:
prometheus.io/path: "/metrics"
prometheus.io/port: "9121"
prometheus.io/scrape: "true"
labels: {}
3.2.5. 部署redis-exporter
helm install prometheus-redis-exporter -n prometheus .
- 在prometheus-server面板中查看Target
- 查看redis-exporter采集的信息
3.3. 配置Grafana-Dashboard
- 导入Redis Dashboard监控面板:11835
3.4. 告警规则
告警规则可以参考该监控面板配置,示例如下:
3.4.1. Redis状态
redis_up == 0
3.4.2. 内存不足
redis_memory_used_bytes/redis_memory_max_bytes * 100 > 80
3.4.3. 连接过多
redis_connected_clients > 100
3.4.4. 连接不足
redis_connected_clients < 5
3.4.5. 连接被拒绝
increase(redis_rejected_connections_total[1m]) > 0
4. 监控Kafka
4.1. kafka集群部署
请参考之前的文章部署:k8s之部署kafka集群+高可用配置
- 依旧部署至test空间,最终连接信息如下
kafkaServer: kafka.test:9092
4.2. 部署kafka-exporter
4.2.1. 下载并解压kafka-exporter安装包
cd ~/workspace/prometheus/
helm pull prometheus-community/prometheus-kafka-exporter
tar zvcf [xxx.tgz]
4.2.2. 配置values.yaml
cd ~/workspace/prometheus/prometheus-kafka-exporter
vim values.yaml
4.2.3. 设置kafka连接
kafkaServer:
- kafka.test:9092
4.2.4. 设置exporter的Target
- 在annotations下添加:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9308"
4.2.5. 部署kafka-exporter
helm install prometheus-kafka-exporter -n prometheus .
- 在prometheus-server面板中查看Target
- 查看kafka-exporter采集的信息
4.2.6. 采集更多的数据
-
如上图,exporter当前采集的数据较少,后文配置dashborad时将无法显示数据
-
通过指定
--consumer-property
激活消费者配置,使的exporter采集到更多的数据
# 进图pod
kubectl exec -it -n test kafka-0 -- bash
# 创建topic
kafka-topics.sh --zookeeper zookeeper:2181 --topic test001 --create --partitions 3 --replication-factor 2
# 生产topic(出现角标后,随意几行数据)
kafka-console-producer.sh --broker-list kafka:9092 --topic test001
# 消费topic(指定--consumer-property)
kafka-console-consumer.sh --bootstrap-server kafka:9092 --from-beginning --topic test001 --consumer-property group.id=test
4.3. 配置Grafana-Dashboard
- 导入Kafka Exporter Overview监控面板:7589
4.4. 告警规则
告警规则可以参考该监控面板配置,示例如下:
4.4.1. kafka节点状态
kafka_brokers < 3
4.4.2. kafka消息产生数量
sum(round(delta(kafka_topic_partition_current_offset[5m])/5)) by (topic) > 100
4.4.3. kafka消息消费数量
sum(round(delta(kafka_consumergroup_current_offset[5m])/5)) by (topic) > 100
4.4.4. 消费滞后
sum(kafka_consumergroup_lag) by (consumergroup, topic)
5. 监控RabbitMQ
5.1. RabbitMQ集群部署
请参考之前的文章部署:K8S之部署RabbitMQ集群+镜像模式实现高可用
- 依旧部署至test空间,最终连接信息如下(rabbitmq-management)
rabbitmq:
url: http://rabbitmq.test:15672
user: admin
password: admin@mq
5.2. 部署rabbitmq-exporter
5.2.1. 下载并解压rabbitmq-exporter安装包
cd ~/workspace/prometheus/
helm pull prometheus-community/prometheus-rabbitmq-exporter
tar zvcf [xxx.tgz]
5.2.2. 配置values.yaml
cd ~/workspace/prometheus/prometheus-rabbitmq-exporter
vim values.yaml
5.2.3. 设置kafka连接
rabbitmq:
url: http://rabbitmq.test:15672
user: admin
password: admin@mq
5.2.4. 设置exporter的Target
- 打开annotations下的注释(端口要加引号):
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "9419"
5.2.5. 部署rabbitmq-exporter
helm install prometheus-rabbitmq-exporter -n prometheus .
- 在prometheus-server面板中查看Target
- 查看rabbitmq-exporter采集的信息
5.3. 配置Grafana-Dashboard
- 导入RabbitMQ Monitoring监控面板:4279
- 导入RabbitMQ Metrics监控面板:4371
5.4. 告警规则
告警规则可以参考该监控面板配置,示例如下:
5.4.1. 节点状态
sum by (node) (rabbitmq_running) == 0
5.4.2. 内存
超过500M即提示,数值参考历史状态
sum by (node) (round(rabbitmq_node_mem_used /1024 /1024 )) > 500
5.4.3. 文件描述符
数值参考历史状态
sum by (node) (rabbitmq_fd_used) > 100
5.4.4. 网络
rabbitmq_sockets_used < 0
5.4.5. 无队列消费
rabbitmq_consumersTotal < 0
5.4.6. 可消费消息数
increase (rabbitmq_queue_messages_ready_total[1m])
5.4.7. 未确认消息数
increase (rabbitmq_queue_messages_unacknowledged_total[1m])
若本篇内容对您有所帮助,请三连点赞,关注,收藏支持下,谢谢~