flume 自定义拦截器过滤日志文件_flume采集日志数据忽略一些日志-CSDN博客

本文链接：https://blog.csdn.net/dong7236983723698/article/details/125121597

本文档介绍了如何使用Flume 1.9.0搭建日志同步到Kafka的系统，重点在于自定义拦截器，只将日志级别为WARN的数据发送到Kafka。详细步骤包括Flume的安装、配置文件设置、Kafka模拟消费命令的使用，以及拦截器代码和相关POM文件的展示。

摘要由CSDN通过智能技术生成

背景

需要在测试环境搭建一套flume来过滤日志，并同步到kafka中，这个笔记简化了日志同步的内容，重点是整合flume的几个组件的使用，今天使用的是flume1.9.0版本。

安装flume

mkdir -p /root/install
wget http://www.apache.org/dyn/closer.lua/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
tar -zxvf apache-flume-1.9.0-bin.tar.gz
mv apache-flume-1.9.0-bin flume

启动flume 命令。其中a1 那个名字是这个flume实例的名字，跟下面这个flume.propeties中的配置是关联

cd /root/install
./flume/bin/flume-ng agent -c flume/conf/ -f flume.properties  -n a1 -Dflume.root.logger=INFO,console

配置文件的位置/root/install/flume.propeties

具体细节

# 分配flume组件
a1.sources = r1
a1.sinks = sk1
a1.channels = c1

#记录文件变化
a1.sources.r1.type = TAILDIR
#偏移量文件
a1.sources.r1.positionFile = /root/install/flum/taildir_position.json
#文件的组，可以定义多种
a1.sources.r1.filegroups = f1
#第一组监控的是test1文件夹中的什么文件：.log文件
a1.sources.r1.filegroups.f1 = /app/logs/.*log

#设置拦截器
a1.sources.r1.interceptors= i1

# 这个地方就是下面代码的内容
a1.sources.r1.interceptors.i1.type=com.test.flume.intercept.TestIntercept$Builder
a1.sources.r1.selecter.type=multiplexing
a1.sources.r1.selecter.header=topic
a1.source