Flume拦截器

最新推荐文章于 2023-04-29 17:04:11 发布

红枣枸杞配茶

最新推荐文章于 2023-04-29 17:04:11 发布

阅读量1k

点赞数

文章标签： flume big data 大数据

本文链接：https://blog.csdn.net/weixin_45151645/article/details/121970787

版权

本文探讨了Flume拦截器在处理大数据流时的作用，特别是针对非JSON格式的数据过滤。在第一层Flume中，使用taildir source配合自定义拦截器排除非JSON数据，并利用kafka channel优化效率。而在kafka 1.6和1.7版本中，针对channel存在的不同问题进行了说明。第二层Flume采用kafka source和时间戳拦截器解决零点漂移问题，并通过file Channel将数据写入HDFS，配置useLocalTimeStamp和codec确保文件按照时间戳落盘并选择合适的压缩格式。

摘要由CSDN通过智能技术生成

关于Flume拦截器的问题：

flume官方文档：https://flume.apache.org/documentation.html

当我们采用flume - kafka - flume的中间件：

第一层flume:

#为各组件命名
a1.sources = r1
a1.channels = c1

#描述source
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/module/applog/log/app.*
a1.sources.r1.positionFile = /opt/module/flume/taildir_position.json
a1.sources.r1.interceptors =  i1
a1.sources.r1.interceptors.i1.type = com.atguigu.ETLInterceptor$MyBuilder

#描述channel
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092，hadooop104:9092
a1.channels.c1.kafka.topic = topic_log
a1.channels.c1.parseAsFlumeEvent = false

#绑定source和channel以及sink和channel的关系
a1.sources.r1.channels = c1

taildir source:

taildir_position.json存储的文件格式：

最低0.47元/天解锁文章

红枣枸杞配茶

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flume拦截器

关于Flume拦截器的问题： flume官方文档：https://flume.apache.org/documentation.html 当我们采用flume - kafka - flume的中间件：第一层flume:#为各组件命名a1.sources = r1a1.channels = c1#描述sourcea1.sources.r1.type = TAILDIRa1.sources.r1.filegroups = f1a1.sources.r1.filegroups.f1
复制链接

扫一扫