elk告警 elastalert安装钉钉消息(二)

最新推荐文章于 2024-01-25 22:00:00 发布

andy.cao

最新推荐文章于 2024-01-25 22:00:00 发布

阅读量2.1k

点赞数

分类专栏：工具文章标签：运维大数据经验分享

本文链接：https://blog.csdn.net/hljczm/article/details/109512552

版权

工具专栏收录该内容

5 篇文章 2 订阅

订阅专栏

ElastAlert告警配置

接着上篇，已经安装好如下：
1、elastalert
2、elastalert_modules --钉钉告警模块

config.ymal配置

# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
rules_folder: rules

# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
  seconds: 30

# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
  minutes: 15

# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: xxx.xxx.xxx.xxx

# The Elasticsearch port
es_port: 9200

# The AWS region to use. Set this when using AWS-managed elasticsearch
#aws_region: us-east-1

# The AWS profile to use. Use this if you are using an aws-cli profile.
# See http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
# for details
#profile: test

# Optional URL prefix for Elasticsearch
#es_url_prefix: elasticsearch

# Connect with TLS to Elasticsearch
#use_ssl: True

# Verify TLS certificates
#verify_certs: True

# GET request with body is the default option for Elasticsearch.
# If it fails for some reason, you can pass 'GET', 'POST' or 'source'.
# See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport
# for details
#es_send_get_body_as: GET

# Option basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword

# Use SSL authentication with client certificates client_cert must be
# a pem file containing both cert and key for client
#verify_certs: True
#ca_certs: /path/to/cacert.pem
#client_cert: /path/to/client_cert.pem
#client_key: /path/to/client_key.key

# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: nginx-ingress-logs
writeback_alias: elastalert_alerts

# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
  days: 2

# Custom logging configuration
# If you want to setup your own logging configuration to log into
# files as well or to Logstash and/or modify log levels, use
# the configuration below and adjust to your needs.
# Note: if you run ElastAlert with --verbose/--debug, the log level of
# the "elastalert" logger is changed to INFO, if not already INFO/DEBUG.
#logging:
#  version: 1
#  incremental: false
#  disable_existing_loggers: false
#  formatters:
#    logline:
#      format: '%(asctime)s %(levelname)+8s %(name)+20s %(message)s'
#
#    handlers:
#      console:
#        class: logging.StreamHandler
#        formatter: logline
#        level: DEBUG
#        stream: ext://sys.stderr
#
#      file:
#        class : logging.FileHandler
#        formatter: logline
#        level: DEBUG
#        filename: elastalert.log
#
#    loggers:
#      elastalert:
#        level: WARN
#        handlers: []
#        propagate: true
#
#      elasticsearch:
#        level: WARN
#        handlers: []
#        propagate: true
#
#      elasticsearch.trace:
#        level: WARN
#        handlers: []
#        propagate: true
#
#      '':  # root logger
#        level: WARN
#          handlers:
#            - console
#            - file
#        propagate: false

api_error.ymal配置

name: API错误响应(status >= 400)
type: frequency
index: nginx-ingress-logs*
num_events: 5
timeframe:
    minutes: 1
filter:
- range:
    status:
      from: 400
      to: 599
include: ["_index","uri","remote_addr","http_x_forwarded_for","status"]
alert:
- "elastalert_modules.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxxxx"
dingtalk_msgtype: text

钉钉配置

进入目录/home/elastalert/elastalert_modules
修改 dingtalk_alert.py

import json
import requests
import logging
from elastalert.alerts import Alerter, DateTimeEncoder
from requests.exceptions import RequestException
from elastalert.util import EAException


class DingTalkAlerter(Alerter):

    required_options = frozenset(['dingtalk_webhook', 'dingtalk_msgtype'])
    def __init__(self, rule):
        super(DingTalkAlerter, self).__init__(rule)
        self.dingtalk_webhook_url = self.rule['dingtalk_webhook']
        self.dingtalk_msgtype = self.rule.get('dingtalk_msgtype', 'text')
        self.dingtalk_isAtAll = self.rule.get('dingtalk_isAtAll', False)
        self.digtalk_title = self.rule.get('dingtalk_title', '')

    def format_body(self, body):
        return body.encode('utf8')

    def alert(self, matches):
        headers = {
            "Content-Type": "application/json",
            "Accept": "application/json;charset=utf-8"
        }
        body = self.create_alert_body(matches)
        payload = {
            "msgtype": self.dingtalk_msgtype,
            "text": {
                "content": "【告警】请尽快检查"+matches[0]["@timestamp"]+"机器:"+matches[0]["remote_addr"]+"接口:"+matches[0]["uri"]+"状态:"+matches[0]["status"]
            },
            "at": {
                "isAtAll":False
            }
        }
        try:
            response = requests.post(self.dingtalk_webhook_url,
                        data=json.dumps(payload, cls=DateTimeEncoder),
                        headers=headers)
            response.raise_for_status()
        except RequestException as e:
            raise EAException("Error request to Dingtalk: {0}".format(str(e)))

    def get_info(self):
        return {
            "type": "dingtalk",
            "dingtalk_webhook": self.dingtalk_webhook_url
        }
        pass

总结

至此就可以执行了。

------------------------------------------------------------------------------------分割线----------------------------------------------------------------------------------------------

elastalert基本配置说明

elastalert 配置语法：

简单rule规则:

es_host,es_port:查询elasticsearch集群
name: 规则的唯一名称。如果相同，则elastalert不会启动。
type: 数据验证方式(规则类型)
index: 要查询的索引名称。默认logstash-*
filter：相当于query查询语法，将需要匹配的信息给匹配
alter: 每个匹配项上运行的警报列表。

query部分

run_every: 定时向ES发请求
buffer_time: 用来设置请求里时间字段的范围，默认45分钟
rules_folder: 用来加载下一阶段的rule设置，默认是example_rules
timestamp_field: 设置buffer_time时针对哪个字段，默认是@timestamp

type(规则类型):

任何

any: 匹配一切，查询返回的每个结果都会生成一个警报。

黑名单

blacklist: 黑名单规则将针对黑名单检查某个字段，如果它在黑名单中则匹配
- compare_key: 用于与黑名单进行比较的字段名称。如果该字段为null,则将忽略这些事件
- blacklist: 列入黑名单的列表，and or
  适用于，已知所有报警的错误时。并将其列举出来。

compare_key: "request"
blacklist:
    - /index.html        #request字段匹配有请求/index.html就报警
    - "!file /tmp/blacklist1.txt"
    - "!file /tmp/blacklist2.txt"

白名单

whitelist: 类似于blacklist,此规则会将某个字段与白名单进行比较，如果列表中不包含该字词则匹配。(过滤器)
- compare_key: 用于与白名单进行比较的字段名称
- ignore_null: 如果为true,则没有compare_key字段的事件不匹配
- whitelist: 列入白名单值列表

compare_key: "request"
ignore_null: "true"
whitelist:
    - /index.html        #request字段匹配过滤请求/index.html的请求
    - "!file /tmp/blacklist1.txt"
    - "!file /tmp/blacklist2.txt"

未成功测试，暂时放弃

change(值改变)

change: 此规则将监视某个字段并匹配该字段是否更改。该
- compare_key: 要监视更改的字段名称。由于这是一个字符串列表，如果任何一个字段发生更改，将触发报警
- ignore_null: 没有compare_key字段的事件将不计为已更改
- query_key
- timeframe: 更改之间的最长时间，如果超过该时间将忘记旧值。再次发生改变时将不认为change

频率

frequency: 给定时间范围内匹配一定数量的事件时。可以基于query_key计数
- num_events: 触发警报事件数
- timeframe: 必须在此时间范围内触发的num_events数量

type: frequency
index: n-nanjing-console
num_events: 5
timeframe:
    minutes: 1
filter:
- term:
   status: "404"

最近一分钟内触发五次404的请求才触发报警

spike(尖峰)

用处比较流量突起，温度计

spike: 前两个timeframe时间段内的比较
- spike_height: 前两个时间段内相差值
- spike_type: up(后一个时间段比前一个时间段高，则触发报警)/down/both
- timeframe:
  可选参数：
- threshold_ref: 前一个时间段内的下限，如果不达标，则不触发报警
- threshold_cur: 当前时间段内的下限，如果不达标，则不触发报警

flatline(水平线)

水平线以下触发报警

flatline: 当threshold一段时间内事件总数低于给定时间时，此规则匹配
- threshold: 不触发报警的最小事件数
- timeframe:
  可选参数:
- use_count_query: 如果为true，elastalert将使用count api轮询elasticsearch,而不是下载所有匹配的文档。如果只关心数据而不关心实际数据。

新值(new_term)

字段的值与30天前的数据是否是新出现，如比较后是新值，则触发报警

new_term: 此规则匹配新值出现在以前从未见过的字段中。当ElastAlert启动时，它将使用聚合查询来收集字段列表的所有已知术语。
- fields: 要监视的字段
  其它选项请参考官方文档

cardinality(基线)

基线上下的值，触发报警

cardinality: 当一个时间范围内某个字段的唯一值总数高于或低于阈值时，引规则匹配。
- timeframe:
- cardinality_field: 计数基数的字段
- max_cardinality: 如查数据的基数大于此数字，则会触发警报。每个提升基数的新事件都会触发警报
- min_cardinality: 如果数据的基数低于此数据，将触发警报

metric_aggregation(度量数据聚合)

metric_aggregation: 计算窗口中的度量值高于或低于阈值时，此规则匹配。
- metric_agg_key: 计算度量值的字段。
- metric_agg_type: 在metric_agg_key字段上执行聚合操作。聚合类型：min,max,avg,sum,cardinality,value_count
- max_threshold: 如果计算度量标准值大于此数字，则会触发报警
- min_threshold: 如查计算试题标准值小于此数字，则会触发报警
  可选的：
- use_run_every_query_size: 默认情况下，度量值是通过buffer_time大小的窗口计算的。如果此参数为true，则规则将run_every用作计算窗口。

percentage_match(百分比)

percentage_match: 当计算窗口中匹配桶中的文档百分比高于或低于阈值时，此规则匹配。默认情况下，计算窗口为buffer_time
- match_bucket_filter: 定义桶的过滤器，该过滤器就匹配主查询过滤器返回的文档子集
- min_percentage: 如果匹配文档的百分比小于此数字，则会触发警报
- max_percentage: 如果匹配文档的百分比大于此数字，则会触发警报

注：多个type可以写在一个规则配置文件中，按顺序进行匹配

过滤器

lucene语法规则

我们在使用的时候要预防报警风暴（在实际使用中我们遇到过可能一分钟成百上千的错误，要是都是发出来，就有问题了）。我们利用下面的一些措施来控制报警风暴：
1 aggregation：设置一个时长，则该时长内，所有的报警（同一个配置文件内的报警）最终合并在一起发送一次：
2 realert: 设置一个时长，在该时间内，相同 query_key 的报警只发一个
3 exponential_realert：设置一个时长，必须大于realert 设置，则在realert到exponential_realert之间，每次报警之后，realert 自动翻倍

alter警报

每条规则都可以附加任意数量的警报

email:

alert_subject: 邮件主题
alert_subject_args：主题中可以提供变量，变量值在此定义
alert_text: 正文
alert_text_args: 正方变量，可从匹配中获取
alert_text_type:
- alert_text_only ：输出自定义主体
- exclude_fields: 简单输出查询时间段内匹配到几条数据

alert_subject: "Alter {0} occurred at {1} {2}"
alert_subject_args:
- _index
- "@timestamp"
- request
alert_text: "最近三分钟有三次以上404请求"

注意：

格式化程序的参数将从与警报相关的匹配对象中提供。如果规则匹配索引中多个对象，则仅使用第一个匹配来填充格式化程序的参数。如果缺少参数列表中提到的字段，则电子邮件使用alert_missing_value代替。

smtp配置:

smtp_host: smtp.qq.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/rule_templates/smtp_auth_file.yaml   #帐号密码配置在此
from_addr: "xxxx@qq.com"
alert:
- "email"
email:
- "xxxx@qq.com"

$ cat /opt/elastalert/rule_templates/smtp_auth_file.yaml
user: xxxx@qq.com
password: xxxxxxxxxx

command

命令输出,允许执行任意命令并从匹配中传递参数或stdin

alert:
  - command
command: ["/bin/send_alert", "--username", "{match[username]}"]

其它更新配置信息，请参考官方文档

示例一：

五分钟内流量总和超过200M就发邮件

es_host: 192.168.20.6
es_port: 9200
run_every:
  minutes: 5

name: nanjing_flow
type: metric_aggregation
index: n-xxx-*
buffer_time:
  minutes: 5

metric_agg_key: body_bytes_sent
metric_agg_type: sum
max_threshold: 209715200
use_run_every_query_size: true

alert_text_type: alert_text_only
alert_subject: "Alter nanjing 最近五分钟流量超200M，请注意！！！"
alert_text: |
  最近五分钟总流量: {0} B
  kibana url: http://xxxxx

alert_text_args:
  - metric_body_bytes_sent_sum

smtp_host: smtp.qq.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/rule_templates/smtp_auth_file.yaml
from_addr: "xxxx@qq.com"
alert:
- "email"
email:
- "xxxx@qq.com"

示例二

nginx例子，对后端请求超过3秒的发送邮件。需要对特定的接口，比如认证接口过滤(不计算在内)

es_host: 192.168.20.6
es_port: 9200
run_every:
  seconds: 30
name: xxx_reponse_time
index: n-xxx-*
type: whitelist
compare_key: "request"
ignore_null: true
whitelist:
  - /index.html
  - /siteapp/ecsAuthentication/hasAuthentication

type: frequency
num_events: 1
timeframe:
    seconds: 30
filter:
- query_string:
   query: "upstream_response_time: >3 "

alert_text_type: alert_text_only
alert_subject: "Alter {0} 接口后端处理超过3秒！！！"
alert_subject_args:
- _index

html_table_title: "<h2>This is a heading</h2>"
alert_text: |
  timestamp: {0}
  request_method: {1}
  request: {2}
  request_body: {3}
  request_time: {4} s
  upstream_response_time: {5} s
  body_bytes_sent: {6} B
  status: {7}
  remote_addr: {8}
  http_x_forwarded_for: {9}
  upstream_addr: {10}
  agent: {11}

alert_text_args:
  - timestamp
  - request_method
  - request
  - request_body
  - request_time
  - upstream_response_time
  - body_bytes_sent
  - status
  - remote_addr
  - http_x_forwarded_for
  - upstream_addr
  - agent

smtp_host: smtp.qq.com
smtp_port: 25
smtp_auth_file: /opt/elastalert/rule_templates/smtp_auth_file.yaml
from_addr: "xxx@qq.com"
alert:
- "email"
email:
- "xxxxx@qq.com"