spark-streaming 获取 flume 传递的header

最新推荐文章于 2024-02-09 00:59:25 发布

lianchaozhao

最新推荐文章于 2024-02-09 00:59:25 发布

阅读量645

点赞数

分类专栏：大数据 flume spark 文章标签： fluem+kafka+spark

本文链接：https://blog.csdn.net/weixin_40809627/article/details/86574474

版权

大数据同时被 3 个专栏收录

74 篇文章 0 订阅

订阅专栏

spark

17 篇文章 0 订阅

订阅专栏

flume

5 篇文章 0 订阅

订阅专栏

环境：
cm 5.13.0
flume 和kafka 为 cm 自动安装
spark-streaming 通过远程安装的版本为 2.2.0
flume+kafka+spark-streaming，应该说这一套架构已经成为流式计算的标配了。

本人通过flume采集然后分发到两台flume 其中采用两个节点flume 做负载均衡和容错（此为两个节点的配置）
注意其他的配置可以自己了解其具体的含义

tier1.sources = source1
tier1.channels = mobile schedule nginx bindcarderr kafka-weuser-channel

tier1.sources.source1.type = avro
tier1.sources.source1.bind = 0.0.0.0
tier1.sources.source1.port = 44444
tier1.sources.source1.channels = mobile schedule nginx bindcarderr kafka-weuser-channel
tier1.sources.source1.selector.type = multiplexing
tier1.sources.source1.selector.header = topic
tier1.sources.source1.selector.mapping.mobile = mobile
tier1.sources.source1.selector.mapping.schedule = schedule
tier1.sources.source1.selector.mapping.nginx = nginx
tier1.sources.source1.selector.mapping.bindcarderr = bindcarderr
tier1.sources.source1.selector.mapping.we-user = kafka-weuser-channel

tier1.channels.nginx.type = org.apache.flume.channel.kafka.KafkaChannel
tier1.channels.nginx.parseAsFlumeEvent = true
tier1.channels.nginx.kafka.topic = nginx
tier1.channels.nginx.kafka.consumer.group.id = flume-nginx
tier1.channels.nginx.kafka.consumer.auto.offset.reset = earliest
tier1.channels.nginx.kafka.bootstrap.servers = node1:9092,node2:9092,node3:9092

tier1.channels.bindcarderr.type = org.apache.flume.channel.kafka.KafkaChannel
tier1.channels.bindcarderr.parseAsFlumeEvent = true
tier1.channels.bindcarderr.kafka.topic = bindCardError
tier1.channels.bindcarderr.kafka.consumer.group.id = flume-bindcard-err
tier1.channels.bindcarderr.kafka.consumer.auto.offset.reset = earliest
tier1.channels.bindcarderr.kafka.bootstrap.servers = node1:9092,node2:9092,node3:9092

tier1.channels.kafka-weuser-channel.type = org.apache.flume.channel.kafka.KafkaChannel
tier1.channels.kafka-weuser-channel.parseAsFlumeEvent = false
tier1.channels.kafka-weuser-channel.kafka.topic = we-user
tier1.channels.kafka-weuser-channel.kafka.consumer.group.id = flume-we-user
tier1.channels.kafka-weuser-channel.kafka.consumer.auto.offset.reset = earliest
tier1.channels.kafka-weuser-channel.kafka.bootstrap.servers = node1:9092,node2:9092,node3:9092

注此时的 parseAsFlumeEvent = true

此时数据已经到了 kafka 中
然后通过写解析

在这里插入图片描述

此种方式拿到 flume 的带有头信息的数据
然后分别拿到头文件和body 文件即可

在这里插入图片描述
项目参考地址
https://pan.baidu.com/disk/home?#/all?vmode=list&path=

lianchaozhao

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
spark-streaming 获取 flume 传递的header

环境：cm 5.13.0flume 和kafka 为 cm 自动安装spark-streaming 通过远程安装的版本为 2.2.0flume+kafka+spark-streaming，应该说这一套架构已经成为流式计算的标配了。具体配置为采集数据的flumeagentcollector.sources = taildir-sourcecollector.channels = f...
复制链接

扫一扫

专栏目录