docker 安装、配置、验证ElasticAlert
created by fangchangtan | 2020/2/24
1.elastalert的场景用途
elastalert组件作为elk中日志关键词的告警组件。基本的流程是,通过elk日志获取程序发出的不间断的心跳、错误日志关键词ERROR抓取等 ,获得对程序的健康状态和稳定性的监控告警。
2.安装elastalert
2.1 下载git仓库文件
## git拉去文件
git clone https://github.com/bitsensor/elastalert.git
##切换目录
cd elastalert
2.2在本地测试elastalert的docker安装:
需要切换到elastalert目录下面,(官方建议的安装方式)
#启动elastalert容器
sudo docker run --rm -p 3030:3030 \
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v `pwd`/config/elastalert-test.yaml:/opt/elastalert/config-test.yaml \
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
-v `pwd`/rules:/opt/elastalert/rules \
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
--net="host" \
--name elastalert-fct2 bitsensor/elastalert:2.0.0
或者,正式的安装方式(建议方式):
#正式环境,启动elastalert
docker run --rm \
--name fct-elastalert \
--net "host" \
-p 3030:3030 \
-v /data/poc/trial-production/myelastalert/elastalert/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/config.json:/opt/elastalert-server/config/config.json \
-v /data/poc/trial-production/myelastalert/elastalert/rules:/opt/elastalert/rules \
-v /data/poc/trial-production/myelastalert/elastalert/rule_templates:/opt/elastalert/rule_templates \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/server_data:/opt/elastalert/server_data \
-v /data/poc/trial-production/myelastalert/elastalert/logs:/opt/logs \
bitsensor/elastalert:2.0.0
2.3 配置elastalert的配置文件
其中config.conf文件,主要配置需要连接的es地址,规则rule和rul_templates的路径,要写入的es的index的名称;
{
"appName": "elastalert-server",
"port": 3030,
"wsport": 3333,
"elastalertPath": "/opt/elastalert",
"verbose": false,
"es_debug": false,
"debug": false,
"rulesPath": {
"relative": true,
"path": "/rules"
},
"templatesPath": {
"relative": true,
"path": "/rule_templates"
},
"es_host": "172.19.32.106",
"es_port": 9202,
"writeback_index": "elastalert_status"
}
其中,elastalert.yaml的配置如下
# The elasticsearch hostname for metadata writeback
# Note that every rule can have its own elasticsearch host
es_host: 172.19.32.106
# The elasticsearch port
es_port: 9202
# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
rules_folder: rules
# How often ElastAlert will query elasticsearch
# The unit can be anything from weeks to seconds
run_every:
seconds: 5
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
minutes: 1
# Optional URL prefix for elasticsearch
#es_url_prefix: elasticsearch
# Connect with TLS to elasticsearch
#use_ssl: True
use_ssl: False
# Verify TLS certificates
#verify_certs: True
verify_certs: False
# GET request with body is the default option for Elasticsearch.
# If it fails for some reason, you can pass 'GET', 'POST' or 'source'.
# See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport
# for details
#es_send_get_body_as: GET
# Option basic-auth username and password for elasticsearch
#es_username: someusername
#es_password: somepassword
# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
days: 2
其次还有一个elastalert-test.yaml文件,该配置只是用来当你使用API来测试规则的时候,这个配置文件可以使你在为不同的示例测试不同的规则时候,可以写不同的写回索引;
elastalert.yaml文件中的smtp_auth.yaml文件配置,
user: swtx_wuhan@163.com
password: sdwtyx234
然后,配置elastalert中的告警规则, 扫描es制定索引中的最近1min中,满足查询过滤条件日志的消息数量》5时候,直接发送邮件到fangchangtan@swtx.com报警;
如下,是/rules/tank-rules.yaml的elastalert的配置规则文件。
es_host: 172.19.32.106
es_port: 9202
#rule name 必须是独一的,不然会报错,这个定义完成之后,会成为报警邮件的标题
## (Required)
## Rule name, must be unique
name: fct-test-rule-name
#配置一种数据验证的方式,有 any,blacklist,whitelist,change,frequency,spike,flatline,new_term,cardinality
#any:只要有匹配就报警;
#blacklist:compare_key字段的内容匹配上 blacklist数组里任意内容;
#whitelist:compare_key字段的内容一个都没能匹配上whitelist数组里内容;
#change:在相同query_key条件下,compare_key字段的内容,在 timeframe范围内 发送变化;
#frequency:在相同 query_key条件下,timeframe 范围内有num_events个被过滤出 来的异常;
#spike:在相同query_key条件下,前后两个timeframe范围内数据量相差比例超过spike_height。其中可以通过spike_type设置具体涨跌方向是- up,down,both 。还可以通过threshold_ref设置要求上一个周期数据量的下限,threshold_cur设置要求当前周期数据量的下限,如果数据量不到下限,也不触发;
#flatline:timeframe 范围内,数据量小于threshold 阈值;
#new_term:fields字段新出现之前terms_window_size(默认30天)范围内最多的terms_size (默认50)个结果以外的数据;
#cardinality:在相同 query_key条件下,timeframe范围内cardinality_field的值超过 max_cardinality 或者低于min_cardinality
## (Required)
## Type of alert.
## the frequency rule type alerts when num_events events occur with timeframe time
##我配置的是frequency,这个需要两个条件满足,在相同 query_key条件下,timeframe 范围内有num_events个被过滤出来的异常
type: frequency
#这个index 是指再kibana 里边的index,支持正则匹配,支持多个index,同时如果嫌麻烦直接* 也可以。
## (Required)
## Index to search, wildcard supported
index: fct-logstash*
# 只要1最近1min内,有一条事件满足条件,就满足规则,出发报警
num_events: 1
timeframe:
minutes: 1
#这个还是非常关键的地方,就是你希望程序的message里边出现了什么样的关键字就报警,这个其实就是elasticsearch 的query语句,支持 AND&OR等。
filter:
- query:
query_string:
query: "UNKNOWN"
#在邮件正文会显示你定义的alert_text
alert_text: "你好,请回复邮件,方昌坦"
# Setup report smtp config
smtp_host: smtp.163.com
smtp_port: 25
smtp_ssl: False
#SMTP auth
from_addr: swtx_wuhan@163.com
email_reply_to: swtx_wuhan@163.com
smtp_auth_file: /opt/elastalert/config/smtp_auth.yaml
# (Required)
# # The alert is use when a match is found
alert:
- "email"
# (required, email specific)
# # a list of email addresses to send alerts to
email:
- "swtx_wuhan@163.com"
注意: 此处需要注册163邮箱,并开通smtp协议:
邮箱账号:swtx_wuhan@163.com
邮箱密码:221123.com
smtp协议密码:swtx234
其中smtp协议可以允许第三方用户登录访问该邮箱。需要163邮箱开通smtp协议,在163邮箱设置中设置;
2.4 重启elastalert使得配置生效
最后重新启elastalert,是的刚才的新配置生效;
本地测试106主机上,运行elastalert的命令如下:
docker run --rm \
--name fct-elastalert \
--net "host" \
-p 3030:3030 \
-v /data/poc/trial-production/myelastalert/elastalert/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/config.json:/opt/elastalert-server/config/config.json \
-v /data/poc/trial-production/myelastalert/elastalert/rules:/opt/elastalert/rules \
-v /data/poc/trial-production/myelastalert/elastalert/rule_templates:/opt/elastalert/rule_templates \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/server_data:/opt/elastalert/server_data \
-v /data/poc/trial-production/myelastalert/elastalert/logs:/opt/logs \
bitsensor/elastalert:2.0.0
3.验证邮件推送功能(本地测试)
3.1 启动logstash发送测试数据
为了验证elastalert的告警效果,需要启动logstash向es中发送测试数据;
在172.19.32.67上,本地启动logstash验证:
用来接收kafka中的日志数据,并通过logstash过滤之后放松到elasticsearch中的fct-logstash_*索引中;
docker run \
--rm \
--name fct-alert-logstash \
-p 5047:5044 \
-v /root/fct/logstash-test/logstash_kafka.conf:/logstash/logstash_kafka.conf \
-v /root/fct/logstash-test/logstash.yml:/usr/share/logstash/config/logstash.yml \
registry.marathon.l4lb.thisdcos.directory:5000/logstash:6.6.1 \
logstash -f /logstash/logstash_kafka.conf
3.2 成功的结果表现
出现如上所示,表明发送邮件成功!
3.3 常见错误总结
启动额elastalert服务的日志中,可以看到如下错误。
3.3.1 错误1:无法连接163邮箱服务错误。
运行过程提示:(提示邮箱配置不正确),需要配置正确的邮箱连接
15:43:43.085Z INFO elastalert-server: Router: Listening for GET request on /mapping/:index.
15:43:43.085Z INFO elastalert-server: Router: Listening for POST request on /search/:index.
15:43:43.090Z INFO elastalert-server: ProcessController: Starting ElastAlert
15:43:43.090Z INFO elastalert-server: ProcessController: Creating index
15:43:43.980Z INFO elastalert-server:
ProcessController: Elastic Version:6
Mapping used for string:{'type': 'keyword'}
Index elastalert_status already exists. Skipping index creation.
15:43:43.980Z INFO elastalert-server: ProcessController: Index create exited with code 0
15:43:43.981Z INFO elastalert-server: ProcessController: Starting elastalert with arguments [none]
15:43:43.991Z INFO elastalert-server: ProcessController: Started Elastalert (PID: 50)
15:43:43.992Z INFO elastalert-server: Server: Server listening on port 3030
15:43:43.993Z INFO elastalert-server: Server: Websocket listening on port 3333
15:43:43.994Z INFO elastalert-server: Server: Server started
15:44:04.860Z ERROR elastalert-server:
ProcessController: ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed
15:48:06.886Z ERROR elastalert-server:
ProcessController: WARNING:elasticsearch:GET http://172.19.32.106:9202/elastalert_status/elastalert/_search?size=10000 [status:400 request:0.012s]
15:48:06.886Z ERROR elastalert-server:
ProcessController: ERROR:root:Error fetching aggregated matches: RequestError(400, u'search_phase_execution_exception', u'parse_exception: Encountered " "-" "- "" at line 1, column 13.\nWas expecting one of:\n <BAREOPER> ...\n "(" ...\n "*" ...\n <QUOTED> ...\n <TERM> ...\n <PREFIXTERM> ...\n <WILDTERM> ...\n <REGEXPTERM> ...\n "[" ...\n "{" ...\n <NUMBER> ...\n ')
15:48:26.972Z ERROR elastalert-server:
ProcessController: ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed
出现该错误,表示邮箱没有连接上去;请检查配置文件是否正确;
3.3.2 错误警告2:163邮箱认为发送了非法内容被拦截,导致发送邮件失败。
SMTPDataError: (554, 'DT:SPM 163 smtp11,D8CowADn5mq2dFNewkQ5Aw--.52552S3 1582527670,please see http://mail.163.com/help/help_spam_16.htm?ip=58.49.28.162&hostid=smtp11&time=1582527670')
07:01:11.026Z ERROR elastalert-server:
ProcessController: ERROR:root:Uncaught exception running rule fct-Example-rule-name: (554, 'DT:SPM 163 smtp11,D8CowADn5mq2dFNewkQ5Aw--.52552S3 1582527670,please see http://mail.163.com/help/help_spam_16.htm?ip=58.49.28.162&hostid=smtp11&time=1582527670')
其中, •554 DT:SPM 发送的邮件内容包含了未被许可的信息,或被系统识别为垃圾邮件。请检查是否有用户发送病毒或者垃圾邮件;
表明,告警程序将使用网易163邮箱发送告警程序到swtx_wuhan@163.com和fangchangtan@swtx.com两个邮箱组成的邮箱用户组。
解决方法:
1.首先,需要在163邮箱中,网页版的首页中,”设置“-》”常规设置“-》”反垃圾/黑白名单 “-》右侧主页中有"白名单”(添加白名单选项卡),将白名单“swtx_wuhan@163.com”邮箱地址,添加进入白名单;
提示:目前只是简单的走通所有的elk的告警流程,对于elastalert的各种告警规则,并没有深究,尤其是各种告警场景的罗列,下一步需要继续深入研究。
附注:
关于elasticalert的过滤规则,如下