Flume-应用案例(四)

参考:http://www.jb51.net/article/53542.htm

Flume支持扇出流从一个源到多个通道,有两种模式的扇出,分别是复用和复制。

在复制的情况下,流的事件被发送到所有的配置通道,而在复用的情况下,事件被发送到可用通道的一个子集。扇出流需要制定删除源及扇出的规则。

Replication Channel Selector

两台机器:192.168.11.129 master, 192.168.11.130 hbase

  • 在hbase上编辑主配置文件replication_selector.conf
a1.sources=r1
a1.channels=c1 c2
a1.sinks=k1 k2

a1.sources.r1.type=avro
a1.sources.r1.bind=hbase
a1.sources.r1.port=5140
a1.sources.r1.selector.type=replicating
a1.sources.r1.channels=c1 c2

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100


a1.sinks.k1.type=logger
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.11.129
a1.sinks.k2.port=5140
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
  • 在master上编辑链接sink的agent 配置文件 replication_selector.conf
a1.sources=r1
a1.channels=c1
a1.sinks=k1

a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5140

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type=logger

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
  • 分别启动两台机器上的agent
>./flume-ng agent -f ../conf/replication_selector.conf -n a1 -Dflume.root.logger=INFO,console
>./flume-ng agent -f ../conf/replication_selector.conf -n a1 -Dflume.root.logger=INFO,console
  • 向hbase agent中发送数据
>./flume-ng avro-client -H hbase -p 5140 -F /usr/develop-fm/flume1.7.0/temp/hql.txt.COMPLETED
  • 查看两台机器的控制台输出(略)

Multiplexing Channel Selector

复用模式会基于headers中某个字段的值和具体的规则向channel中匹配数据。

  • 在hbase(主机名)上创建主配置文件multiplexing_selector.conf
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2

#
a1.sources.r1.type=org.apache.flume.source.http.HTTPSource
a1.sources.r1.port=5140
a1.sources.r1.channels=c1 c2
a1.sources.r1.selector.type=multiplexing
a1.sources.r1.selector.header=type

#
a1.sources.r1.selector.mapping.baidu=c1
a1.sources.r1.selector.mapping.ali=c2
a1.sources.r1.selector.default=c1

#
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100

#
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=192.168.11.129
a1.sinks.k1.port=5555
a1.sinks.k2.type=logger
#a1.sinks.k2.hostname=hbase
#a1.sinks.k2.port=5555

a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
  • 在master(主机名)上编写avro source配置文件multiplexing_selector.conf
a1.sources=r1
a1.channels=c1
a1.sinks=k1

a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5555
a1.sources.r1.channels = c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type=logger

a1.sinks.k1.channel=c1
  • 分别启动两台主机上的agent
>./flume-ng agent -f ../conf/multiplexing_selector.conf -n a1 -Dflume.root.logger=INFO,console

>./flume-ng agent -f ../conf/multiplexing_selector.conf -n a1 -Dflume.root.logger=INFO,console
  • 发送type分别为 baidu和ali的两条数据
>curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://hbase:5140

>curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST1"}]' http://hbase:5140
  • 查看两台机器上的控制台
    hbase:
>16/11/07 16:52:09 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31             idoall_TEST1 }

master:

16/11/07 16:51:43 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31             idoall_TEST1 }

与预期结果相符合。

Flume Sink Processor-Failover

Processor提供了一种机制:在channel和sink之间形成fan-out,并且可以实现在多个sink之间的failover和loadbalance。

Flume的sink processor有三种:default,failover,load_balance。其中default是单一sink时的默认设置,failover提供失效转移功能,load_balance实现在多个sink之间的负载均衡。

  • 在hbase(主机名)上编写主配置文件
a1.sources=r1
a1.channels=c1 c2
a1.sinks=k1 k2

#sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks=k1 k2
#type 
a1.sinkgroups.g1.processor.type=failover
#priority
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10

a1.sinkgroups.g1.processor.maxpenalty=100000

#configure source
a1.sources.r1.type=avro
a1.sources.r1.bind=hbase
a1.sources.r1.port=5140
a1.sources.r1.channels=c1 c2
a1.sources.r1.selector.type=replicating

#configure channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100


#configure sink
a1.sinks.k1.type=logger
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.11.129
a1.sinks.k2.port=5555
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
  • 在master上编写avro source配置文件
a1.sources=r1
a1.channels=c1
a1.sinks=k1

a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5555
a1.sources.r1.channels=c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1
  • 分别在两台机器上启动agent
>./flume-ng agent -f ../conf/processor_failover.conf -n a1 -Dflume.root.logger=INFO,console

>./flume-ng agent -f ../conf/processor_failover.conf -n a1 -Dflume.root.logger=INFO,console
  • 向hbase中多次发送avro数据
>./flume-ng avro-client -H hbase -p 5140 -F ../temp/log.00.COMPLETED

此时,由于sink k2的优先级大于k1所以,所有消息都由k2接收。
- 停止sink k2 再重发送数据

>./flume-ng avro-client -H hbase -p 5140 -F ../temp/log.00.COMPLETED

此时,由于k2已经死掉,所以,所有数据都有k1接收,即失效转移。

Flume Sink Processor-Loadbalance

和failover一样,也需要制定一个sink分组。

load_balance的策略有两个:一个是random随机选取,一个是round_robin轮训。

  • 在hbase(主机名)中编写主配置文件
#defined name of source/channel/sink
a1.sources=r1
a1.channels=c1 c2
a1.sinks=k1 k2

#configure sink group for loadbalance
a1.sinkgroups=g1
a1.sinkgroups.g1.sinks=k1 k2
a1.sinkgroups.g1.processor.type=load_balance
a1.sinkgroups.g1.processor.backoff=true
a1.sinkgroups.g1.processor.selector=round_robin

#configure source
a1.sources.r1.type=avro
a1.sources.r1.bind=hbase
a1.sources.r1.port=5140
a1.sources.r1.channels=c1 c2

#configure channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100

#configure sink
a1.sinks.k1.type=logger
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=192.168.11.129
a1.sinks.k2.port=5555
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
  • 在master(主机名)上编写avro source配置文件
a1.sources=r1
a1.channels=c1
a1.sinks=k1

a1.sources.r1.type=avro
a1.sources.r1.bind=master
a1.sources.r1.port=5555
a1.sources.r1.channels=c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1 
  • 启动两个主机上的agent
>./flume-ng agent -f ../conf/processor_loadbalance.conf -n a1 -Dflume.root.logger=INFO,console
  • 向hbase上的avro端口发送数据
./flume-ng avro-client -H hbase -p 5140 -F ../temp/log.00.COMPLETED
  • 查看两台机器的控制台

hbase:

16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:49 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }

master:

16/11/07 18:04:33 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }
16/11/07 18:04:45 INFO sink.LoggerSink: Event: { headers:{} body: E2 80 9C 48 65 6C 6C 6F 20 77 6F 72 6C 64 E2 80 ...Hello world.. }

两台机器交替获取数据。

HBase Sink

将agent接收到的数据存入hbase。

  • 启动Hadoop及Hbase
>cd /usr/develop-fm/hadoop/sbin
>./start-all.sh
>cd ../../hbase/bin
>./start-hbase.sh
  • 编写hbase sink配置文件
a1.sources=r1
a1.channels=c1
a1.sinks=k1

#configure source
a1.sources.r1.type=syslogtcp
a1.sources.r1.port=5140
a1.sources.r1.host=hbase
a1.sources.r1.channels=c1

#describe channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

#describe hbase sink
a1.sinks.k1.type=logger
a1.sinks.k1.type=hbase
a1.sinks.k1.table=flume_test
a1.sinks.k1.columnFamily=info
a1.sinks.k1.column=logs
a1.sinks.k1.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer
a1.sinks.k1.channel=c1
  • 启动agent
>./flume-ng agent -f ../conf/hbase.conf -n a1 -Dflume.root.logger=INFO,console
  • 向agent发送数据
>echo "hello flume hbase sink test" | nc hbase 5140
  • 查看hbase表中的数据
>hbase shell
hbase(main):004:0> scan 'flume_test'
ROW                                              COLUMN+CELL                                                                                                                                   
 1478574878697-i6ZKIWXYeP-0                      column=info:payload, timestamp=1478574882003, value=hello flume hbase sink test                                                               
1478574904071-i6ZKIWXYeP-1                      column=info:payload, timestamp=1478574907074, value=hello flume hbase sink test      

插入数据到Hbase成功。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值