一、提取字段
grok相关可参考博客ELK入门(五)——messages和log日志生成时间替换时间戳(grok+data)
2020-11-21 00:19:47,459 INFO org.Executor: Deleting absolute path: [nice, -n, 0, bash, /sdada/appcache/application_315146546_156/contanier_fd4156_fdsafasdfasd/fsfs_dsa.sh]
2020-11-21 00:19:47,459 INFO org.Executor: launchContanier: [nice, -n, 0, bash, /sdada/appcache/application_315146546_156/contanier_fd4156_fdsafasdfasd/fsfs_dsa.sh]
以上是我想要匹配的语句【其中很多是我键盘随意打出的,只留下了需要匹配的信息】,主要计划提取的信息有:
logdate: 2020-11-21 00:19:47,459
info: INFO
opration: Deleting/launchContanier
application: application_315146546_156
grok在线编译网址:https://www.5axxw.com/tools/v2/grok.html
经过调试,确定使用的pattern语句:
%{TIMESTAMP_ISO8601:logdate} %{LOGLEVEL:info} (.*?Executor): %{WORD:opration}(.*?appcache/)%{WORD:application}
其中(.*?appcache/)表示任意匹配直到出现appcache/,这在无用信息比较多时很好用
若是想要中间信息,则在.*?前加上?<字段名>,即可以将其余信息也赋值给新字段,下列语句中将info 后Executor前的信息赋给x字段,y字段同理
%{TIMESTAMP_ISO8601:logdate} %{LOGLEVEL:info} (?<x>.*?Executor): %{WORD:opration}(?<y>.*?appcache/)%{WORD:application}
二、生成pipeline+配置filebeat.yml
1.生成pipeline
参考博客ELK入门(十三)——filebeat实现时间戳更改(利用pipeline)
# 提取Hadoop时间、application和进程类型(INFO/WARN)
PUT _ingest/pipeline/hadoop
{
"description": "Extract the Hadoop time, application, and information type",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{TIMESTAMP_ISO8601:logdate} %{LOGLEVEL:info} (.*?Executor): %{WORD:opration}(.*?appcache/)%{WORD:application}"
],
"ignore_failure": true
},
"remove" : {
"field" : "@timestamp"
}
},
{
"date": {
"field": "logdate",
"timezone": "Asia/Shanghai",
"formats": [
"YYYY-MM-dd HH:mm:ss,SSS"
],
"ignore_failure": true
}
}
]
}
GET _ingest/pipeline/hadoop # 查看pipeline
2.配置filebeat.yml
vim /data/elk-ayers/filebeat-7.10.1/filebeat.yml
# hadoop数据的相关配置
filebeat.inputs:
- type: log
paths:
- /root/log/*datanode*.log.*
pipeline: hadoop
fields:
index: 'datanode'
- type: log
paths:
- /root/log/*nodemanager*.log.*
pipeline: hadoop
fields:
index: 'nodemanager'
# 输出
output.elasticsearch:
hosts: ["localhost:9200"]
indices:
- index: "myx-test-datanode"
when.equals:
fields.index: "datanode"
- index: "myx-test-nodemanager"
when.equals:
fields.index: "nodemanager"
启动filebeat
三、Kibana查看
查看导入的数据,发现新的字段成功生成
问题小结
不过利用pipeline过滤处理有一个问题,如果数据中有不同格式,则未匹配上pipeline的数据将自动舍弃。即索引仍在ES中,Kibana中却不会显示,因此,我们可以考虑另一种方式,利用已有数据生成新字段,即Kibana自带的Scripts Fields.
Scripts Fields生成可参考博客ELK入门(十六)——Kibana-Painless-Scripts-Fields,对索引字段提取处理,生成新字段
参考博客: