使用redis-exporter监控redis服务,并且使用prometheus收集数据,使用grafana展示数据。
监控报警利用alertmanager插件,报警信息发送钉钉消息。所使用的安装包可以 点击下载,提取码为: wdy3
下面简单说明监控操作。
启动redis-export服务
因为安装包是二进制形式,因此可以直接进行启动,启动指定如下两个参数:
/usr/local/redis_exporter/redis_exporter -redis.addr 10.9.68.46:6381 -web.listen-address :3389
-redis.addr: 指定redis服务的ip地址和端口号
-web.listen-address: 指定当前redis-exporter启动使用的端口信息
-redis.password: redis服务若是有密码的话,可用此参数指定redis的密码
启动prometheus服务
上面的安装包中prometheus是二进制形式,先解压,然后可以直接执行启动:
配置prometheus自动发现服务,这里配置prometheus自动发现targets目录下面的json文件,设置如下:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration # 配置报警插件端口,
alerting:
alertmanagers:
- static_configs:
- targets:
- 10.9.68.202:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/usr/local/prometheus/rules/*.rule"
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'file_ds'
file_sd_configs:
- files:
- targets/*.json
refresh_interval: 1m
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
# static_configs:
# - targets: ['localhost:9090']
上面定义了一个任务名字file_ds,自动发现targets目录下面的json文件,自动发现的间隔为1分钟;在rules目录下面定义相应的监控信息,会在后面说明;在targets目录下面json文件格式如下:
cat ceshi.json
[
{
"labels": {
"instance": "股票k线",
"addr": "10.9.68.46:6381"
},
"targets": [
"10.9.68.202:3389"
]
},
{
"labels": {
"instance": "股票k线",
"addr": "10.9.68.41:6381"
},
"targets": [
"10.9.68.202:6382"
]
}
]
配置文件说明: instance和addr的参数会在grafana的界面上显示用于表示不同的redis服务,targets表示监听addr指向的redis服务的redis-exporter服务ip和端口信息,然后启动prometheus服务。
cd /usr/local/prometheus
./prometheus --config.file=prometheus.yml --web.external-url='http://10.9.68.202:9090' &
启动granfana服务
压缩包中的rpm包直接安装即可,然后启动granfana服务
systemctl start grafana-server
在grafana中导入压缩包中redis-exporter的模板文件,然后设定数据源为上面的prometheus的数据源,即可看到如下redis的监控信息。
#### 添加监控和钉钉报警
需要用到两个插件alertmanager和prometheus-webhook-dingtalk,安装包在上面的压缩包中含有。
1、启动dingtalk插件,直接解压对应的压缩包即可,然后启动即可,下面token为钉钉群机器人token。
[root@estest1 alertmanager]# cd /usr/local/prometheus-webhook-dingtalk/
(python27) [root@estest1 prometheus-webhook-dingtalk]# ./prometheus-webhook-dingtalk --ding.profile='webhook1=https://oapi.dingtalk.com/robot/send?access_token=2f9aa0c7cc2bc28cd63c@@@@@@@@@@@@@@@@@@@@@@2e55bca72' --web.listen-address='10.9.68.202:8060' &
[1] 30100
(python27) [root@estest1 prometheus-webhook-dingtalk]# level=info ts=2021-05-07T02:16:39.894Z caller=main.go:62 msg="Starting prometheus-webhook-dingtalk" version="(version=1.4.0, branch=HEAD, revision=02fe8265a98ab4caaa78ebbed209d3f06b87b4a6)"
level=info ts=2021-05-07T02:16:39.894Z caller=main.go:63 msg="Build context" (gogo1.13.5,userroot@eb9f8d8f0437,date20191211-03:00:38)=(MISSING)
level=warn ts=2021-05-07T02:16:39.894Z caller=main.go:105 msg="DEPRECATION: Detected one of the following flags: --ding.profile, --ding.timeout, --template.file"
level=warn ts=2021-05-07T02:16:39.894Z caller=main.go:106 msg="DEPRECATION: Now working in compatibility mode, please consider upgrading your configurations"
level=info ts=2021-05-07T02:16:39.894Z caller=main.go:117 component=configuration msg="Loading templates" templates=
ts=2021-05-07T02:16:39.895Z caller=main.go:133 component=configuration msg="Webhook urls for prometheus alertmanager" urls=http://10.9.68.202:8060/dingtalk/webhook1/send
level=info ts=2021-05-07T02:16:39.896Z caller=web.go:210 component=web msg="Start listening for connections" address=10.9.68.202:8060
2、配置alertmanager服务,配置文件如下:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1m
receiver: 'DingDing'
receivers:
- name: 'DingDing'
webhook_configs:
- url: 'http://10.9.68.202:8060/dingtalk/webhook1/send'
inhibit_rules:
- source_match:
altername: 'redis'
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
在prometheus的配置文件中指定了rule的位置,在rules目录下面定义告警的阈值及告警信息如下,创建一个redis.rule文件,内容如下:
cat redis.rule
groups:
- name: redis
rules:
- alert: "内存报警"
expr: (redis_memory_used_bytes /redis_memory_max_bytes) > 0.80
for: 15s
labels:
severity: 1
annotations:
summary: "{{ $labels.addr }} 内存使用率超过80%"
description: "内存使用百分比: {{ $value }}"
配置完成之后,启动alertmanager服务,如下:
(python27) [root@estest1 alertmanager]# cd /usr/local/alertmanager/
(python27) [root@estest1 alertmanager]# ./alertmanager --config.file alertmanager.yml --web.external-url=http://10.9.68.202:9093 &
测试报警
若是redis的内存使用率超过上面设置的阈值,则会自动发送告警信息,测试结果如下[为了触发报警,改变了阈值]:
#### 后续l
利用prometheus的自动发现,只要我们把相应exporter信息写入到对应targets目录下面,就可以自动发现监控的服务;redis-exporter和redis服务一一对应,可以写个web界面去管理这样的对应关系。