Flume负载均衡和故障转移
1. 故障转移
1.1 需求分析
使用 Flume1 监控一个端口,其 sink 组中的 sink 分别对接 Flume2 和 Flume3,采用
Failover Sink Processor
,实现故障转移的功能。
#在/opt/module/flume/job 目录下创建 group3文件夹 mkdir group3
- hadoop–flume1.conf
resource1 -- netcat
channel1 -- memory
sink -- avro
a1.sinkgroups.g1.processor.type = failover
- hadoop–flume2.conf
resource1 -- avro
channel1 -- memory
sink -- logger
- hadoop–flume3.conf
resource1 -- avro
channel1 -- memory
sink -- logger
1.2 配置文件
flume1.conf
# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop
a1.sources.r1.port = 44444
#设置故障转移
a1.sinkgroups.g1.processor.type = failover
#优先级值。绝对值越大表示优先级越高,优先级较高的值接收器将较早激活。
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
#失败的接收的最大回退周期(单位为毫秒)
a1.sinkgroups.g1.processor.maxpenalty = 10000
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop
a1.sinks.k2.port = 4142
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
flume2.conf
#Name
a2.sources = r1
a2.sinks = k1
a2.channels = c1
#Sources
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop
a2.sources.r1.port = 4141
#Sink
a2.sinks.k1.type = logger
#Channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
#Bind
a2.sinks.k1.channel = c1
a2.sources.r1.channels = c1
flume3.conf
#Name
a3.sources = r1
a3.sinks = k1
a3.channels = c1
#Sources
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop
a3.sources.r1.port = 4142
#Sink
a3.sinks.k1.type = logger
#Channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100
#Bind
a3.sinks.k1.channel = c1
a3.sources.r1.channels = c1
1.3 测试
bin/flume-ng agent -c conf/ -n a3 -f job/group3/flume3.conf -Dflume.root.logger=INFO,console
bin/flume-ng agent -c conf/ -n a2 -f job/group3/flume2.conf -Dflume.root.logger=INFO,console
bin/flume-ng agent -c conf/ -n a1 -f job/group3/flume1.conf -Dflume.root.logger=INFO,console
#在hadoop上输入以下内容
nc hadoop 44444
hello
OK
world
OK
lala
OK
#发现数据全走的Flume3
#把Flume3杀掉ctrl + c
shazi
OK
haha
OK
hahha
OK
hahah
OK
#这时候数据通过Flume2打印纸控制台上
2. 负载均衡
使用 Flume1 监控一个端口,其 sink 组中的 sink 分别对接 Flume2 和 Flume3,采用
Load balancing Sink Processor,
实现负载均衡的功能。对上面的配置文件进行如下修改
flume1.conf
:# Name the components on this agent a1.sources = r1 a1.channels = c1 a1.sinkgroups = g1 a1.sinks = k1 k2 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = hadoop a1.sources.r1.port = 44444 #设置负载均衡 a1.sinkgroups.g1.processor.type = load_balance a1.sinkgroups.g1.processor.backoff = true a1.sinkgroups.g1.processor.selector = random #失败的接收的最大回退周期(单位为毫秒) a1.sinkgroups.g1.processor.maxpenalty = 10000 # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop a1.sinks.k1.port = 4141 a1.sinks.k2.type = avro a1.sinks.k2.hostname = hadoop a1.sinks.k2.port = 4142 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinkgroups.g1.sinks = k1 k2 a1.sinks.k1.channel = c1 a1.sinks.k2.channel = c1
测试同上…