flume集群实现高可用集群

最新推荐文章于 2024-02-20 10:11:41 发布

lianchaozhao

最新推荐文章于 2024-02-20 10:11:41 发布

阅读量307

点赞数

分类专栏： flume 大数据工作实践文章标签： flume 集群实现高可用

本文链接：https://blog.csdn.net/weixin_40809627/article/details/85006635

版权

大数据同时被 3 个专栏收录

74 篇文章 0 订阅

订阅专栏

工作实践

25 篇文章 0 订阅

订阅专栏

flume

5 篇文章 0 订阅

订阅专栏

本人采用双节点的方式
1、其中两个节点都存活时：两个节点做负载均衡使用/
2、其中一个节点宕机：一个节点承担从前两个节点的流量（做到高可用）
3、kafka channel 确保数据到kafka 性能和安全性
4、断点续传功能

channel 直接对接kafka 节省资源

其中配置为（两份）
tier1.sources = source1 #对应sources名字
tier1.channels = kafka-mobile-channel #对应channel 名字

tier1.sources.source1.type = avro
tier1.sources.source1.bind = 0.0.0.0
tier1.sources.source1.port = 44444
tier1.sources.source1.channels = kafka-mobile-channel
tier1.sources.source1.selector.type = multiplexing
tier1.sources.source1.selector.header = topic
tier1.sources.source1.selector.mapping.mobile = kafka-mobile-channel

tier1.channels.kafka-mobile-channel.type = org.apache.flume.channel.kafka.KafkaChannel
tier1.channels.kafka-mobile-channel.parseAsFlumeEvent = false #用了配置是否后面要解析 Flume 头信息内容
tier1.channels.kafka-mobile-channel.kafka.topic = tomcat-mobile
tier1.channels.kafka-mobile-channel.kafka.consumer.group.id = flume-tomcat-mobile
tier1.channels.kafka-mobile-channel.kafka.consumer.auto.offset.reset = earliest
tier1.channels.kafka-mobile-channel.kafka.bootstrap.servers = ZW0804-hadoop-89:9092,ZW0804-hadoop-90:9092,ZW0804-hadoop-91:9092

他的上游配置为

agent

collector.sources = taildir-source
collector.channels = file-channel
collector.sinks = avro-forward-sink-node2 avro-forward-sink-node3

source

collector.sources.taildir-source.type = TAILDIR
collector.sources.taildir-source.channels = file-channel
collector.sources.taildir-source.positionFile = /var/log/flume-ng/taildir_position.json
collector.sources.taildir-source.filegroups = f1
collector.sources.taildir-source.filegroups.f1 = /tmp/nginx/.+.log
collector.sources.taildir-source.fileHeader = true
collector.sources.taildir-source.interceptors = topic UUID
collector.sources.taildir-source.interceptors.topic.type = static
collector.sources.taildir-source.interceptors.topic.key = topic
collector.sources.taildir-source.interceptors.topic.value = we-user
collector.sources.taildir-source.interceptors.topic.preserveExisting = false
collector.sources.taildir-source.interceptors.UUID.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
collector.sources.taildir-source.interceptors.UUID.headerName=key
collector.sources.taildir-source.interceptors.UUID.prefix=NODE_
collector.sources.taildir-source.interceptors.UUID.preserveExisting=false
collector.sources.taildir-source.skipToEnd = true

channel

collector.channels.file-channel.type=file
collector.channels.file-channel.checkpointDir = /var/log/flume-ng/file-channel/checkpoint #channel 的备份文件方式
collector.channels.file-channel.dataDirs = /var/log/flume-ng/file-channel/data #数据存储路径

sink 采用分发的方式

collector.sinks.avro-forward-sink-node2.type = avro
collector.sinks.avro-forward-sink-node2.channel = file-channel
collector.sinks.avro-forward-sink-node2.hostname = node2 #对应负载均衡的ip
collector.sinks.avro-forward-sink-node2.port = 44444

collector.sinks.avro-forward-sink-node3.type = avro
collector.sinks.avro-forward-sink-node3.channel = file-channel
collector.sinks.avro-forward-sink-node3.hostname = node3 #对应负载均衡的ip
collector.sinks.avro-forward-sink-node3.port = 44444

load balance

collector.sinkgroups = g1
collector.sinkgroups.g1.sinks = avro-forward-sink-node2 avro-forward-sink-node3
collector.sinkgroups.g1.processor.type = load_balance
collector.sinkgroups.g1.processor.backoff = true

断点续传功能
flume 采取采用 TAILDIR
偏移量存储在： /var/log/flume-ng/taildir_position.json
（注： [{“inode”:52299335,“pos”:13,“file”:"/tmp/nginx/aa.log"},{“inode”:52299428,“pos”:81,“file”:"/tmp/nginx/test.log"}]）
这里inode就是标记文件的，文件名称改变，这个iNode不会变，pos记录偏移量（按字符计算），file就是绝对路径

测试方法：关闭kafka 然后在其监控路径下生产数据（/tmp/nginx/.+.log）
发现记录偏移量的 pos 更新了（此时kafka 停滞状态）
发现采集chanel 数据存储和备份的文件路径下文件的大小基本不变（说明采集端的flume采集成功并成功发送到了后面的flume集群）
15 分钟后启动 kafka 发现flume 接收到了flume 停滞时间的数据（实现了断点传输和兼容kafka 挂掉）

测试采集flume 写入集群flume 没有成功（在数据路径下 tail -f /var/log/flume-ng/file-channel/data ）有写入 data的操作
把采集flume 配置改正确，后数据又把没传输成功的数据又传输到了kafka 消息队列中

flume 有时我们需要解析header中的信息 todo
1、常见的需求是我们解析业务日志时候，由于每条日志没有可能唯一标志这是唯一一条日志的字段，所以我们一般都加一个字段，进行区分，已确保后续日志的相关去重操作。
例子：
我们在日志采集端每条日志添加一个uuid 操作
source.interceptors.UUID.type=org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
collector.sources.taildir-source.interceptors.UUID.headerName=key

让后在代码中解析出uuid
在这里插入图片描述
最后应用MySQL 的唯一主键进行去重操作。

lianchaozhao

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
flume集群实现高可用集群

本人采用双节点的方式其中两个节点都存活时：两个节点做负载均衡使用其中一个节点宕机：一个节点承担从前两个节点的流量（做到高可用）channel 直接对接kafka 节省资源其中配置为（两份）tier1.sources = source1 #对应sources名字tier1.channels = kafka-mobile-channel #对应channel 名...
复制链接

扫一扫

专栏目录