参考:http://www.jb51.net/article/53542.htm
Flume支持扇出流从一个源到多个通道,有两种模式的扇出,分别是复用和复制。
在复制的情况下,流的事件被发送到所有的配置通道,而在复用的情况下,事件被发送到可用通道的一个子集。扇出流需要制定删除源及扇出的规则。
Replication Channel Selector
两台机器:192.168.11.129 master, 192.168.11.130 hbase
- 在hbase上编辑主配置文件replication_selector.conf
a1.sources=r1
a1.channels=c1 c2
a1.sinks=k1 k2
a1.sources.r1.type=avro
a1.sources.r1.bind=hbase
a1.sources.r1.port=5140
a1.sources.r1.selector.type=replicating
a1.sources.r1.channels=c1 c2
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
a1.sinks.k1.type=logger
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.11.129
a1.sinks.k2.port=5140
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
- 在master上编辑链接sink的agent 配置文件 replication_selector.conf
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5140
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sinks.k1.type=logger
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
- 分别启动两台机器上的agent
>./flume-ng agent -f ../conf/replication_selector.conf -n a1 -Dflume.root.logger=INFO,console
>./flume-ng agent -f ../conf/replication_selector.conf -n a1 -Dflume.root.logger=INFO,console
- 向hbase agent中发送数据
>./flume-ng avro-client -H hbase -p 5140 -F /usr/develop-fm/flume1.7.0/temp/hql.txt.COMPLETED
- 查看两台机器的控制台输出(略)
Multiplexing Channel Selector
复用模式会基于headers中某个字段的值和具体的规则向channel中匹配数据。
- 在hbase(主机名)上创建主配置文件multiplexing_selector.conf
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2
#
a1.sources.r1.type=org.apache.flume.source.http.HTTPSource
a1.sources.r1.port=5140
a1.sources.r1.channels=c1 c2
a1.sources.r1.selector.type=multiplexing
a1.sources.r1.selector.header=type
#
a1.sources.r1.selector.mapping.baidu=c1
a1.sources.r1.selector.mapping.ali=c2
a1.sources.r1.selector.default=c1
#
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
#
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.11.129
a1.sinks.k1.port=5555
a1.sinks.k2.type=logger
#a1.sinks.k2.hostname=hbase
#a1.sinks.k2.port=5555
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
- 在master(主机名)上编写avro source配置文件multiplexing_selector.conf
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5555
a1.sources.r1.channels = c1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1
- 分别启动两台主机上的agent
>./flume-ng agent -f ../conf/multiplexing_selector.conf -n a1 -Dflume.root.logger=INFO,console
>./flume-ng agent -f ../conf/multiplexing_selector.conf -n a1 -Dflume.root.logger=INFO,console
- 发送type分别为 baidu和ali的两条数据
>curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://hbase:5140
>curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST1"}]' http://hbase:5140
- 查看两台机器上的控制台
hbase:
>16/11/07 16:52:09 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }
master:
16/11/07 16:51:43 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 }
与预期结果相符合。
Flume Sink Processor-Failover
Processor提供了一种机制:在channel和sink之间形成fan-out,并且可以实现在多个sink之间的failover和loadbalance。
Flume的sink processor有三种:default,failover,load_balance。其中default是单一sink时的默认设置,failover提供失效转移功能,load_balance实现在多个sink之间的负载均衡。
- 在hbase(主机名)上编写主配置文件
a1.sources=r1
a1.channels=c1 c2
a1.sinks=k1 k2
#sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks=k1 k2
#type
a1.sinkgroups.g1.processor.type=failover
#priority
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10
a1.sinkgroups.g1.processor.maxpenalty=100000
#configure source
a1.sources.r1.type=avro
a1.sources.r1.bind=hbase
a1.sources.r1.port=5140
a1.sources.r1.channels=c1 c2
a1.sources.r1.selector.type=replicating
#configure channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
#configure sink
a1.sinks.k1.type=logger
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.11.129
a1.sinks.k2.port=5555
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
- 在master上编写avro source配置文件
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5555
a1.sources.r1.channels=c1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1
- 分别在两台机器上启动agent
>./flume-ng agent -f ../conf/processor_failover.conf -n a1 -Dflume.root.logger=INFO,console
>./flume-ng agent -f ../conf/processor_failover.conf -n a1 -Dflume.root.logger=INFO,console
- 向hbase中多次发送avro数据
>./flume-ng avro-client -H hbase -p 5140 -F ../temp/log.00.COMPLETED
此时,由于sink k2的优先级大于k1所以,所有消息都由k2接收。
- 停止sink k2 再重发送数据
>./flume-ng avro-client -H hbase -p 5140 -F ../temp/log.00.COMPLETED
此时,由于k2已经死掉,所以,所有数据都有k1接收,即失效转移。
Flume Sink Processor-Loadbalance
和failover一样,也需要制定一个sink分组。
load_balance的策略有两个:一个是random随机选取,一个是round_robin轮训。
- 在hbase(主机名)中编写主配置文件
#defined name of source/channel/sink
a1.sources=r1
a1.channels=c1 c2
a1.sinks=k1 k2
#configure sink group for loadbalance
a1.sinkgroups=g1
a1.sinkgroups.g1.sinks=k1 k2
a1.sinkgroups.g1.processor.type=load_balance
a1.sinkgroups.g1.processor.backoff=true
a1.sinkgroups.g1.processor.selector=round_robin
#configure source
a1.sources.r1.type=avro
a1.sources.r1.bind=hbase
a1.sources.r1.port=5140
a1.sources.r1.channels=c1 c2
#configure channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
#configure sink
a1.sinks.k1.type=logger
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.11.129
a1.sinks.k2.port=5555
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
- 在master(主机名)上编写avro source配置文件
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5555
a1.sources.r1.channels=c1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1
- 启动两个主机上的agent
>./flume-ng agent -f ../conf/processor_loadbalance.conf -n a1 -Dflume.root.logger=INFO,console
- 向hbase上的avro端口发送数据
./flume-ng avro-client -H hbase -p 5140 -F ../temp/log.00.COMPLETED
- 查看两台机器的控制台
hbase:
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:49 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
master:
16/11/07 18:04:33 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
两台机器交替获取数据。
HBase Sink
将agent接收到的数据存入hbase。
- 启动Hadoop及Hbase
>cd /usr/develop-fm/hadoop/sbin
>./start-all.sh
>cd ../../hbase/bin
>./start-hbase.sh
- 编写hbase sink配置文件
a1.sources=r1
a1.channels=c1
a1.sinks=k1
#configure source
a1.sources.r1.type=syslogtcp
a1.sources.r1.port=5140
a1.sources.r1.host=hbase
a1.sources.r1.channels=c1
#describe channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
#describe hbase sink
a1.sinks.k1.type=logger
a1.sinks.k1.type=hbase
a1.sinks.k1.table=flume_test
a1.sinks.k1.columnFamily=info
a1.sinks.k1.column=logs
a1.sinks.k1.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel=c1
- 启动agent
>./flume-ng agent -f ../conf/hbase.conf -n a1 -Dflume.root.logger=INFO,console
- 向agent发送数据
>echo "hello flume hbase sink test" | nc hbase 5140
- 查看hbase表中的数据
>hbase shell
hbase(main):004:0> scan 'flume_test'
ROW COLUMN+CELL
1478574878697-i6ZKIWXYeP-0 column=info:payload, timestamp=1478574882003, value=hello flume hbase sink test
1478574904071-i6ZKIWXYeP-1 column=info:payload, timestamp=1478574907074, value=hello flume hbase sink test
插入数据到Hbase成功。