elasticsearch7.5.1和kibana7.5.1以及logstash7.15.1的安装部署这里略过。
1. 配置文件一
input {
file {
start_position => end
path => "E:/home/wxp/box/task/box-task-info.log"
type => "type1" ### 用去输出到es时判断存入哪个索引
codec => multiline {
### 是否匹配,只有匹配的才记录,不匹配的忽略
negate => true
### 匹配的正则
pattern => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}) *"
###将没匹配到的合并到上一条,可选previous或next, previous是合并到匹配的上一行末尾
what => "previous"
max_lines => 1000 ### 最大允许的行
max_bytes => "10MiB" ### 允许的大小
auto_flush_interval => 30 ### 如果在规定时候内没有新的日志事件就不等待后面的日志事件
}
}
}
output {
stdout{}
elasticsearch {
#es地址,可多个
hosts => ["localhost:9200"]
action => "index"
#获取输出参数"indexname"值当做索引,如果没有则会自动创建对应索引(需要es开启自动创建索引)
index => "test_log_index"
}
}
- start_position => end 配置表示从文件末尾开始读取;
- type => "type1" 表示增加一个类型参数,后续可以根据该参数进行判断;
- pattern => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}) *" ,匹配正则,这里只是以完整的日期时间开头的行,表示开始行;
- auto_flush_interval => 30,日志刷写间隔;
2. 增加过滤的配置
input {
file {
start_position => end
path => "E:/home/wxp/box/task/box-task-info.log"
type => "type1" ### 用去输出到es时判断存入哪个索引
codec => multiline {
### 是否匹配,只有匹配的才记录,不匹配的忽略
negate => true
### 匹配的正则
pattern => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}) *"
###将没匹配到的合并到上一条,可选previous或next, previous是合并到匹配的上一行末尾
what => "previous"
max_lines => 1000 ### 最大允许的行
max_bytes => "10MiB" ### 允许的大小
auto_flush_interval => 30 ### 如果在规定时候内没有新的日志事件就不等待后面的日志事件
}
}
}
filter{
grok{
match => {"message" => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}) *"}
} ### 通过grok匹配内容并将
date{
match => ["datetime","yyyy-MM-dd HH:mm:ss.SSS","yyyy-MM-dd HH:mm:ss.SSSZ"]
target => "@timestamp"
} ### 处理时间
}
output {
stdout{}
elasticsearch {
#es地址,可多个
hosts => ["localhost:9200"]
action => "index"
#获取输出参数"indexname"值当做索引,如果没有则会自动创建对应索引(需要es开启自动创建索引)
index => "test_log_index"
}
}
这里以最简单的方式进行时间戳提取。
3. 启动logstash进行测试
在文件尾添加如下内容:
2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]
2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]
控制台输出:
{
"datetime" => "2020-02-19 17:09:31.829",
"@timestamp" => 2020-02-19T09:09:31.829Z,
"type" => "type1",
"path" => "E:/home/wxp/box/task/box-task-info.log",
"message" => "2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]\r",
"@version" => "1",
"host" => "DESKTOP-O93E7VQ"
}
{
"datetime" => "2020-02-19 17:09:31.829",
"@timestamp" => 2020-02-19T09:09:31.829Z,
"type" => "type1",
"path" => "E:/home/wxp/box/task/box-task-info.log",
"message" => "2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]\r",
"@version" => "1",
"host" => "DESKTOP-O93E7VQ"
}
可以看到,这里的时间戳,是从日志中提取出来的,而不是系统的时间。
日志时间,也相应地根据日志时间进行了设置。
4. filter中grok匹配规则
通过增加过滤器,对日志进行格式化。
grok匹配的典型格式:
- %{NUMBER:duration} — 匹配浮点数
- %{IP:client} — 匹配IP
- (?([\S+]*)),自定义正则
- (?<class_info>([\S+]*)), 自定义正则匹配多个字符
- \s*或者\s+,代表多个空格
- \S+或者\S*,代表多个字符
- 大括号里面:xxx,相当于起别名
- %{UUID},匹配类似091ece39-5444-44a1-9f1e-019a17286b48
- %{WORD}, 匹配请求的方式
- %{GREEDYDATA},匹配所有剩余的数据
- %{LOGLEVEL:loglevel} ---- 匹配日志级别
- 自定义类型
在kibana的dev tools中,有一个调试工具"Grok Debugger",界面如下所示:
可以输入样例数据,匹配样式,系统自动显示匹配结果。
案例一
2020-03-18 14:04:23.944 [DubboServerHandler-10.50.245.25:63046-thread-168] INFO c.f.l.d.LogTraceDubboProviderFilter - c79b0905-03c7-4e54-a5a6-ff1b34058cdf CALLEE_IN dubbo:EstatePriceService.listCellPrice
\s*%{TIMESTAMP_ISO8601:timestamp} \s*\[%{DATA:current_thread}\]\s*%{LOGLEVEL:loglevel}\s*(?<class_info>([\S+]*))
{
"current_thread": "DubboServerHandler-10.50.245.25:63046-thread-168",
"loglevel": "INFO",
"class_info": "c.f.l.d.LogTraceDubboProviderFilter",
"timestamp": "2020-03-18 14:04:23.944"
}
案例二
2020-12-04 14:16:30.003 INFO 19095 --- [pool-4-thread-1] com.yck.laochangzhang.task.EndorseTask : 平台发起背书结束.....
(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.\d{3})\s+(?<level>\w+)\s+\d+\s+-+\s\[[^\[\]]+\]\s+(?<handler>\S+)\s+:(?<msg>.*)
{
"msg": " 平台发起背书结束.....",
"handler": "com.yck.laochangzhang.task.EndorseTask",
"datetime": "2020-12-04 14:16:30.003",
"level": "INFO"
}
案例三
2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]
(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3}) \[%{DATA:thread}\] %{LOGLEVEL:loglevel}\s*%{GREEDYDATA:msg} *
{
"msg": "o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]",
"datetime": "2020-02-19 17:09:31.829",
"loglevel": "INFO",
"thread": "main"
}
5. 提取线程、日志级别以及按年和月生成索引
input {
file {
start_position => end
path => "E:/home/wxp/box/task/box-task-info.log"
type => "type1" ### 用去输出到es时判断存入哪个索引
codec => multiline {
### 是否匹配,只有匹配的才记录,不匹配的忽略
negate => true
### 匹配的正则
pattern => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s*\[%{DATA:thread}\]\s*%{LOGLEVEL:loglevel}\s*%{GREEDYDATA:msg}"
###将没匹配到的合并到上一条,可选previous或next, previous是合并到匹配的上一行末尾
what => "previous"
max_lines => 1000 ### 最大允许的行
max_bytes => "10MiB" ### 允许的大小
auto_flush_interval => 30 ### 如果在规定时候内没有新的日志事件就不等待后面的日志事件
}
}
}
filter{
grok{
match => {"message" => "(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s*\[%{DATA:thread}\]\s*%{LOGLEVEL:loglevel}\s*%{GREEDYDATA:msg}"}
} ### 通过grok匹配内容并将
date{
match => ["datetime","yyyy-MM-dd HH:mm:ss.SSS","yyyy-MM-dd HH:mm:ss.SSSZ"]
target => "@timestamp"
} ### 处理时间
}
output {
stdout{}
elasticsearch {
#es地址,可多个
hosts => ["localhost:9200"]
action => "index"
#获取输出参数"indexname"值当做索引,如果没有则会自动创建对应索引(需要es开启自动创建索引)
index => "test_log_index-%{+YYYY-MM}"
}
}
logstash标准输出:
{
"type" => "type1",
"datetime" => "2020-02-19 17:09:31.829",
"@timestamp" => 2020-02-19T09:09:31.829Z,
"@version" => "1",
"thread" => "main",
"host" => "DESKTOP-O93E7VQ",
"msg" => "o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]\r",
"path" => "E:/home/wxp/box/task/box-task-info.log",
"loglevel" => "INFO",
"message" => "2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]\r"
}
{
"type" => "type1",
"datetime" => "2020-02-19 17:09:31.829",
"@timestamp" => 2020-02-19T09:09:31.829Z,
"@version" => "1",
"thread" => "main",
"host" => "DESKTOP-O93E7VQ",
"msg" => "o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]\r",
"path" => "E:/home/wxp/box/task/box-task-info.log",
"loglevel" => "INFO",
"message" => "2020-02-19 17:09:31.829 [main] INFO o.a.c.c.StandardService - [log,173] - Stopping service [Tomcat]\r"
}
可以看到,增加了日志级别,线程,按年和月生成的索引。
6. 通过grok插件匹配springBoot日志
input {
stdin { }
file {
# 容器中日志所在目录的文件
path => ["/usr/share/logstash/logs/*.log"]
# 多行匹配方法1
codec => multiline {
pattern => "^(%{TIMESTAMP_ISO8601})"
negate => true
what => "previous"
}
sincedb_path => "NUL"
type => "spring"
start_position => "beginning"
}
}
filter {
if [type] == "spring" {
# 多行匹配方法2
# multiline {
# pattern => "^(%{TIMESTAMP_ISO8601})"
# negate => true
# what => "previous"
# }
grok {
# Do multiline matching with (?m) as the above mutliline filter may add newlines to the log messages.
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{LOGLEVEL:logLevel}%{SPACE}%{NUMBER:pid}%{SPACE}---%{SPACE}%{SYSLOG5424SD:threadName}%{SPACE}%{NOTSPACE:loggerName}%{SPACE}:%{SPACE}%{GREEDYDATA:message}" ]
# 覆盖原有的message
overwrite=> [ "message" ]
}
}
}
output {
if [type] == "spring" {
elasticsearch {
hosts => ["localhost:9200"]
index => "springboot-%{+YYYY-MM-dd}"
}
}
stdout { codec => rubydebug }
}
匹配的日志格式:
2020-05-15 17:54:50.805 DEBUG 8296 --- [scheduling-1] org.jooq.tools.LoggerListener : Executing query : select id from user
2020-05-15 17:54:52.945 DEBUG 1012 --- [scheduling-1] org.jooq.tools.LoggerListener : Executing query : select id from user
上述格式只为举例说明,实际项目中的格式可能与上述格式不同,根据需要进行适当调整即可。