ELK 实践
全局流程
选用了加入 Kafka 的 ELK 架构
服务端日志
服务端用的SpringBoot,日志使用了 slf4j + logback,用到了 MDC 输出一些 session 信息,pattern 基于 SpringBoot 默认 FILE_LOG_PATTERN
的基础上优化了文件名、行号及自定义 MDC ,具体配置如下:
%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}} ${LOG_LEVEL_PATTERN:-%5p} ${PID:- } --- [%t] %logger{40}\\(%F:%L\\) %X{CURRENT_USER:-system} %X{MOCK_USER:-self} : %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}}
日志 Demo
2020-03-12 00:02:24.256 INFO 196145 --- [main] o.s.s.concurrent.ThreadPoolTaskExecutor(ExecutorConfigurationSupport.java:171) system self : Initializing ExecutorService 'applicationTaskExecutor'
filebeat
目标:文件输入,输出kafka
配置文件:
filebeat.inputs:
- type: log
enabled: true
tail_files: true
paths:
- /my/service/path/log/*-debug.log.*
- /my/service/path/log/*-info.log.*
# 这里为了解决 Slf4j 抛异常堆栈等多行日志的情况
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
output.kafka:
hosts: ["my-kafka.host1:9092", "my-kafka.host1:9092"]
topic: 'my-kafka-topic'
client_id: 'my-kafka-topic-XXXX'
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000
#output.console:
# pretty: true
调试的时候先开了 output.console,验证没问题之后才 output 到 kafka
输出数据 demo:
{
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.6.1"
},
"@timestamp": "2020-03-10T16:43:05.504Z",
"input": {
"type": "log"
},
"agent": {
"ephemeral_id": "15f3cfd8-4c3f-4be2-b014-0da2480ace49",
"hostname": "my-server.host1",
"id": "ca1321c2-dde2-4465-91b7-474a067742e3",
"type": "filebeat",
"version": "7.6.1"
},
"ecs": {
"version": "1.4.0"
},
"log": {
"file": {
"path": "/my/service/path/log/service-debug.log.2020-03-12"
},
"offset": 21897293
},
"host": {
"name": "my-server.host1"
},
"message": "2020-03-12 00:02:24.256 INFO 196145 --- [main] o.s.s.concurrent.ThreadPoolTaskExecutor(ExecutorConfigurationSupport.java:171) system self : Initializing ExecutorService 'applicationTaskExecutor'"
}
logstash
配置文件:
input {
kafka {
bootstrap_servers => "my-kafka.host1:9092,my-kafka.host1:9092"
client_id => "my-kafka-topic-XXXX"
group_id => "my-kafka-topic-elk"
auto_offset_reset => "latest"
consumer_threads => 1
topics => ["my-kafka-topic"]
}
}
filter {
# 这里需要先用 json 反序列化,从 kafka 里读到的 filebeat 的消息
json{
source => "message"
}
# 正则匹配 filter,强烈建议用 grok debugger 去调试
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:log_time}\s+%{LOGLEVEL:log_level} %{NUMBER:log_pid} --- \[%{DATA:log_thread_name}\] %{DATA:log_class_name} %{DATA:log_current_user} %{DATA:log_mock_user} : %{GREEDYDATA:log_message}"
}
}
# 类型转换,重命名一些字段、去掉一些字段
mutate {
rename => ["[host][name]", "hostname"]
rename => ["[log][file][path]", "file_path"]
rename => ["message", "raw_message"]
remove_field => ["agent", "log", "input", "ecs", "version", "host", "@timestamp"]
}
# 时间类型的转换,用真正的日志时间,替代写入时间的 @timestamp 字段
date {
match => [ "log_time", "yyyy-MM-dd HH:mm:ss.SSS" ]
target => "@timestamp"
}
}
#output {
# stdout {
# codec => rubydebug
## codec => json_lines
# }
#}
output {
elasticsearch {
hosts => ["http://my-es.host:9299"]
index => "my-index-elk_%{+YYYY-MM-dd}" # logstash 可以根据数据模板自动创建或者更新 es 的mapping
user => "test_user"
password => "test_user"
document_type => "_doc" # 这里因为用的 es6,各版本的默认值不一样,最新的 es8 这个字段已经 deprecated 了
}
}
调试的时候先开了 output.stdout,验证没问题之后才 output 到 es
不加任何 filter 时的输出 Demo(即filebeat输出到kafka,再由kafka输入到logstash的原始格式:
{
"@timestamp" => 2020-03-11T11:02:24.128Z,
"@version" => "1",
"message" => "{\"@timestamp\":\"2020-03-12T00:02:24.256Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"7.6.1\"},\"input\":{\"type\":\"log\"},\"ecs\":{\"version\":\"1.4.0\"},\"host\":{\"name\":\"my-service.host1\"},\"agent\":{\"type\":\"filebeat\",\"ephemeral_id\":\"85042605-dfef-41d0-96e0-9810b1de1716\",\"hostname\":\"my-service.host1\",\"id\":\"ca1321c2-dde2-4465-91b7-474a067742e3\",\"version\":\"7.6.1\"},\"log\":{\"offset\":179604,\"file\":{\"path\":\"/my/service/path/log/service-debug.log.2020-03-12\"}},\"message\":\"2020-03-12 00:02:24.256 INFO 196145 --- [main] o.s.s.concurrent.ThreadPoolTaskExecutor(ExecutorConfigurationSupport.java:171) system self : Initializing ExecutorService 'applicationTaskExecutor'\"}"
}
加了上面配置的所有 filter 之后的输出Demo:(除非很熟悉配置,否则别妄想一次配成功)
{
"log_pid" => "196288",
"@timestamp" => 2020-03-11T11:02:24.256Z,
"@version" => "1",
"log_class_name" => "c.b.s.s.c.u.helper.AsyncDetectHelper(FileLog.java:47)",
"raw_message" => "2020-03-12 00:02:24.256 INFO 196145 --- [main] o.s.s.concurrent.ThreadPoolTaskExecutor(ExecutorConfigurationSupport.java:171) system self : Initializing ExecutorService 'applicationTaskExecutor'",
"log_mock_user" => "self",
"log_thread_name" => "ServerDetect-thread",
"log_time" => "2020-03-12 00:02:24.256",
"log_level" => "INFO",
"hostname" => "my-service.host1",
"file_path" => "/my/service/path/service-info.log.2020-03-12",
"log_message" => "Initializing ExecutorService 'applicationTaskExecutor'",
"log_current_user" => "system"
}
es
mapping配置:
POST
http://my-es.host/my-elk/_doc/_mapping
{
"_doc": {
"properties": {
"log_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSS"
},
"log_level": {
"type": "keyword"
},
"log_pid": {
"type": "integer"
},
"log_class_name": {
"type": "text"
},
"log_current_user": {
"type": "keyword"
},
"log_mock_user": {
"type": "keyword"
},
"log_message": {
"type": "text"
},
"hostname": {
"type": "text"
},
"file_path": {
"type": "text"
},
"raw_massage": {
"type": "text"
},
"@timestamp": {
"type": "date",
"format": "date_optional_time"
}
}
}
}
kibana
management -> elasticsearch -> index management 里查下索引是不是自动建好了
management -> kibana -> index patterns 创建一个 pattern,配置索引前缀规则、时间字段
management -> kibana -> Advance Settings -> Timezone for date formatting 恢复成默认的 browser,我这刚打开是 UTC,这个如果不改,搜索日志时,时间戳会显示为 UTC 时间
剩下的就是搜索日志了
遇到的一些坑
版本不匹配
直接说结论吧:
- 尽量将 ELK 三个组件的版本统一(人家本来就是一套东西,想想也知道要用同一个版本),Logstash 如果和 es 版本不匹配,曾出现过 output 到 es 时,报 http 406 的错误
- 由于logstash读写kafka 都是用的插件,所以插件的版本一定要和kafka兼容,否则报过认证错误
logstash
logstash 坑还是比较多的,grok规则(推荐grok debugger)、时间规则,都需要字节去看下官方文档,遇到问题可以先扫一眼StackOverFlow、CSDN什么的,问题不是完全一样或者没法马上找到答案的,马上Google搜下关键字,找到对应的官方文档,别浪费时间在别人的问题上
logstash创建es索引的时候还有个坑,是根据UTC时间来计算的,所以会有8小时时差,而且人家作者说了,就这么设计的,不需要改,可以参考这篇讨论,中间有一段中英文切换,场面一度十分混乱
es
es 主要遇到创建mapping的api的坑,比如URL的层次和body中json的层次如果对应不上,会报各种莫名其妙的错误
kibana
kibana没啥问题,就是时区一定要改,不然没法看