3.4 告警功能
3.4.1 告警功能简介
Skywalking每隔一段时间根据收集到的链路追踪的数据和配置的告警规则(如服务响应时间、服务响应 时间百分比)等,判断如果达到阈值则发送相应的告警信息。发送告警信息是通过调用webhook接口完 成,具体的webhook接口可以使用者自行定义,从而开发者可以在指定的webhook接口中编写各种告 警方式,比如邮件、短信等。告警的信息也可以在RocketBot中查看到。
以下是默认的告警规则配置,位于skywalking安装目录下的config文件夹下 alarm-settings.yml文件 中:
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
metrics-name: service_resp_time
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: Response time of service {
name} is more than 1000ms in 3 minutes of last 10 minutes.
service_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_sla
op: "<"
threshold: 8000
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {
name} is lower than 80% in 2 minutes of last 10 minutes
service_p90_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_p90
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: 90% response time of service {
name} is more than 1000ms in 3 minutes of last 10 minutes
service_instance_resp_time_rule:
metrics-name: service_instance_resp_time
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of service instance {
name