环境
filbeat 7.10
kafka 2.1
elasticsearch 7.4.2
windows 10
需求描述
Java程序生产Json格式的日志发送到kafka中,再由filebeat从kafka中消费日志,存储到ElasticSearch中。
设置
kafka
kafka需要在配置文件server.properties中设置listeners才可以被外界访问:
listeners=PLAINTEXT://host:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://host:9092
advertised.host.name=host
host是kafka所在服务器的IP,如果在本机使用localhost就可以。
注意,使用filbeat从kafka消费消息对kafka的版本是有要求的,7.10的filbeat官方文档上说对kafka 0.11~2.1都支持,但是使用0.11kafka会报错:
[2020-12-16 10:32:25,863] ERROR Closing socket for 127.0.0.1:9092-127.0.0.1:2014 because of error (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request for apiKey: 3 and apiVersion: 5
Caused by: java.lang.IllegalArgumentException: Invalid version for API key METADATA: 5
at org.apache.kafka.common.protocol.ApiKeys.schemaFor(ApiKeys.java:173)
at org.apache.kafka.common.protocol.ApiKeys.requestSchema(ApiKeys.java:141)
at org.apache.kafka.common.protocol.ApiKeys.parseRequest(ApiKeys.java:149)
at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:112)
at kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:99)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:93)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:517)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:510)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.processCompletedReceives(SocketServer.scala:510)
at kafka.network.Processor.run(SocketServer.scala:436)
at java.lang.Thread.run(Thread.java:748)
换了kafka 2.1后才可以正常使用,浪费了很多时间…
filebeat
在filebeat.yml中对kafka input进行配置:
filebeat.inputs:
- type: kafka
enabled: true
hosts:
- host:9092
topics: ["test"] # kafka的topic
group_id: "filebeat"
# 向es存储时的索引
index: "%{[agent.name]}-%{[agent.version]}-normal-%{+yyyy.MM.dd}"
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["host:9200"]
processors:
# kafka的消息会在message字段,通过该processor将json解析出来
- decode_json_fields:
fields: ["message"]
process_array: true
max_depth: 1
target: ""
overwrite_keys: true
add_error_key: true
# 下边这两个处理器根据自身需求设置
# 将json中start_time字段的时间放到@timestamp中
- timestamp:
# 格式化时间值 给 时间戳
field: start_time
# 使用我国东八区时间 格式化log时间
timezone: Asia/Shanghai
layouts:
# - '2006-01-02 15:04:05'
- '2006-01-02 15:04:05.999'
test:
- '2019-06-22 16:33:51.111'
#去除冗余字段
- drop_fields:
fields: ["log","host","input","agent","ecs","start_time","kafka"]
ignore_missing: false
配置文件完成后,在filebeat文件目录下执行:
PS D:\software\ES\Filebeat> .\filebeat -e -c filebeat_kafka_to_es.yml
ElasticSearch
当生产者向kafka生产日志后,就可以通过kibana看到ES中生成了新的索引:
注意:
- 如果使用kafka自身的生产者和消费者可能会出现中文乱码,但是从程序中生产的数据倒是没出现这个状况。