如何在spark-streaming中获取通过kafka传递的flume信息header

最新推荐文章于 2024-06-02 14:06:00 发布

a95473004

最新推荐文章于 2024-06-02 14:06:00 发布

阅读量3.9k

点赞数 2

分类专栏： Spark 文章标签： spark

本文链接：https://blog.csdn.net/a95473004/article/details/53896791

版权

标题好长。。。好复杂。。。

flume+kafka+spark-streaming，应该说这一套架构已经成为流式计算的标配了。

如何整合我这里就不赘述了。

上几个配置文件好了

a1.sources = r1
a1.channels = c1
a1.sinks=k1

a1.sources.r1.type = TAILDIR
a1.sources.r1.channels = c1
a1.sources.r1.positionFile = /work/onedesk/bidlog/apache-flume-1.7.0-bin/taildir_position.json
a1.sources.r1.filegroups = f1 f2 f3
a1.sources.r1.filegroups.f1 = /work/onedesk/bidlog/bid.tmp
a1.sources.r1.headers.f1.topic = bid
a1.sources.r1.filegroups.f2 = /work/onedesk/bidlog/sspbid.tmp
a1.sources.r1.headers.f2.topic = sspbid
a1.sources.r1.filegroups.f3 = /work/onedesk/bidlog/sspclick.tmp
a1.sources.r1.headers.f3.topic = sspclick
a1.sources.r1.fileHeader = true


a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 1000

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

最低0.47元/天解锁文章

a95473004

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
3
评论
如何在spark-streaming中获取通过kafka传递的flume信息header

默认模式下，Spark-streaming只能拿到flume tail到的文字，但是某些情况下我们希望spark也能处理一些header中的内容。譬如说上述配置文件中在tail的同时，我们还向header中添加了timestamp以及host信息。那么如何将header传递给spark呢？
复制链接

扫一扫