Flume案例六:复制通道选择器(Replicating Channel Selector)

本文接上篇博客:Flume介绍、安装、使用案例、自定义Source/Sink、监控
Flume 版本:1.9.0
本文hdfs sink,需 Hadoop 支持,Hadoop相关内容,请参考:Hadoop专栏

1.复制通道选择器(单数据源多出口案例:1进2出)

选型:
 Flume-1:taildir source + memory channel + avro sink + Replicating Channel Selector(复制通道选择器)
 Flume-2:avro source + memory channel + hdfs sink
 Flume-3:avro source + memory channel + file roll sink(写入到本地文件)

文档参考:
taildir sourcehttp://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#taildir-source
memory channelhttp://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#memory-channel
avro sinkhttp://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#avro-sink
hdfs sinkhttp://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#hdfs-sink
file roll sinkhttp://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#file-roll-sink
Replicating Channel Selectorhttp://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#replicating-channel-selector-default

提示:
  Exec source 适用于监控一个实时追加的文件,但不能保证数据不丢失Spooling Directory Source 能够保证数据不丢失,且能够实现断点续传,但延迟较高,不能实时监控而 Taildir Source 既能够实现断点续传,又可以保证数据不丢失(通过positionFile属性记录读取位置),还能够进行实时监控,集两者优点于一身,更推荐使用Taildir Source。

2.需求分析

在这里插入图片描述

3.flume配置

Ⅰ.Flume-1

flume-taildir-avro-replicating.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# 将数据流复制给所有 channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /opt/module/flume/position/taildir_position_2.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/module/testdir/test.log

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.204.202
a1.sinks.k1.port = 41414

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.204.203
a1.sinks.k2.port = 41414

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

Ⅱ.Flume-2

flume-avro-hdfs.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.204.202
a1.sources.r1.port = 41414

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/replicating/%Y-%m-%d/%H
a1.sinks.k1.hdfs.filePrefix = replicating
# 是否使用本地时间戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 是否按照时间滚动文件夹
a1.sinks.k1.hdfs.round = true
# 多少时间单位创建一个新的文件夹
a1.sinks.k1.hdfs.roundValue = 1
# 重新定义时间单位
a1.sinks.k1.hdfs.roundUnit = hour
# 积攒多少个 Event 才 flush 到 HDFS 一次
a2.sinks.k2.hdfs.batchSize = 1000
# 多久生成一个新的文件(seconds)
a1.sinks.k1.hdfs.rollInterval = 30
# 设置每个文件的滚动大小
a1.sinks.k1.hdfs.rollSize = 134217700
# 文件的滚动与 Event 数量无关
a1.sinks.k1.hdfs.rollCount = 0
# 设置文件类型,可支持压缩(不加该配置的话,Flume写入HDFS的文件会出现SEQ !org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable)
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Ⅲ.Flume-3

flume-avro-file-roll.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.204.203
a1.sources.r1.port = 41414

# Describe the sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /opt/module/flume/file_roll

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4.启动命令

注意:
  1.file-roll sink,需要现在本地创建指定文件夹,flume不会自动创建。
   mkdir -p /opt/module/flume/file_roll
  2.必须先启动 Flume-2 和 Flume-3,再启动 Flume-1。如果先启动 Flume-1,会报错误:Connection refused: /192.168.204.202:41414 和 Connection refused: /192.168.204.203:41414

# Flume-2 启动命令
bin/flume-ng agent -c conf -n a1 -f job/flume-avro-hdfs.conf
# Flume-3 启动命令
bin/flume-ng agent -c conf -n a1 -f job/flume-avro-file-roll.conf
# Flume-1 启动命令
bin/flume-ng agent -c conf -n a1 -f job/flume-taildir-avro-replicating.conf

5.异常处理

写入HDFS,报如下错误:java.lang.NoSuchMethodError:com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V

  这是因为hadoop目录下得guava版本和flume下的guava版本的问题。进入 flume/lib 目录下,将 guava-11.0.2.jar 包移除即可

6.测试图示

Taildir Source 实时监听 testdir/test.log

  1. echo 方式追加数据至 test.log 文件,模拟实时日志;
  2. 会将数据写入hdfs 的 /flume/replicating 目录下
  3. 同时会将数据,写入到本地 /opt/module/flume/file_roll 目录下

测试结果,如图所示:
在这里插入图片描述


博主写作不易,加个关注呗

求关注、求点赞,加个关注不迷路 ヾ(◍°∇°◍)ノ゙

我不能保证所写的内容都正确,但是可以保证不复制、不粘贴。保证每一句话、每一行代码都是亲手敲过的,错误也请指出,望轻喷 Thanks♪(・ω・)ノ

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

扛麻袋的少年

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值