1.Arvo
Avro可以通过client发送一个指定的文件给Flume,flume可以通过设置source的接受方式,监控avro发送数据的ip和端口,获取数据。
Flume主要的RPC Source也是 Avro Source,它使用Netty-Avro inter-process的通信(IPC)协议来通信,可以用java或JVM语言发送数据到Avro Source端。它的配置文件主要包含三个参数:type: Avro source的别名是avro,也可以使用完整类别名称,org.apache.flume.source.AvroSource;
bind: 绑定的IP地址或主机名。使用0.0.0.0绑定机器所有端口
port: 绑定监听端口端口
例:A服务器(日志文件test.log) -avro-> B服务器接受avro数据sink到控制台上
注:服务器互传日志文件fileToavro.conf 配置文件在所有节点服务器配置都要一
(在A服务器上)
linux>vi flieToavro.conf
# Name the components on this agent #//命名(agent)此代理上的组件
test.sources = s1 #//源 命名为 s1
test.sinks = k1 #//下沉 命名为 k1
test.channels = c1 #//通道 命名为 c1
# Describe/configure the source #//描述/配置源
test.sources.s1.type = exec #//设置源类型 = 执行 (用于命令)
test.sources.s1.command = tail -F /root/logs/test.log #//命令
# Describe the sink
#绑定的不是(A服务器)本机, 是另外一台机器的服务地址, sink端的avro是一个发送端, avro的客户端, 往(B服务器)这个机器上发
test.sinks = k1 #//sinks命名为 k1
test.sinks.k1.type = avro #//sink类型 = avro
test.sinks.k1.hostname = 192.168.58.201 #//B主机地址
test.sinks.k1.port = 1234 #//sink端口号
test.sinks.k1.batch-size = 2 #//sink 批量大小
# Use a channel which buffers events in memory
test.channels.c1.type = memory #//通道类型 = memory
test.channels.c1.capacity = 1000 #//容量 = 1000
test.channels.c1.transactionCapacity = 100 #//事务处理能力 = 100
# Bind the source and sink to the channel
test.sources.s1.channels = c1 #//设置源绑定通道 = c1
test.sinks.k1.channel = c1 #//设置sink绑定通道 = c1
(在B服务器上)
linux>vi avroToConsole.conf
# Name the components on this agent
test.sources = s1
test.sinks = k1
test.channels = c1
# Describe/configure the source
#source中的avro组件是接收者服务, 绑定本机
test.sources.s1.type = avro #//接收者服务, 绑定本机
test.sources.s1.bind = 0.0.0.0
test.sources.s1.port = 1234
# Describe the sink
test.sinks.k1.type = logger
# Use a channel which buffers events in memory
test.channels.c1.type = memory
test.channels.c1.capacity = 1000
test.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
test.sources.s1.channels = c1
test.sinks.k1.channel = c1
A服务器执行(后执行):
linux>bin/flume-ng agent --conf conf --conf-file fileToavro.conf --name test -Dflume.root.logger=INFO,console
B服务器执行(先执行):
linux>bin/flume-ng agent --conf conf --conf-file avroToConsole.conf --name test -Dflume.root.logger=INFO,console
结果:执行后可实现从avro端口接收数据 A服务器采集日志通过管道存入B服务中
2.load_balance(负载平衡,每个都发每个都使用)
服务器A配置
linux>vi load_balance_avro.conf
#test name
test.channels = c1
test.sources = s1
test.sinks = k1 k2 k3
#set gruop
test.sinkgroups = g1 #//设置sink组
#set channel
test.channels.c1.type = memory
test.channels.c1.capacity = 1000
test.channels.c1.transactionCapacity = 100
test.sources.s1.channels = c1
test.sources.s1.type = exec
test.sources.s1.command = tail -F /root/logs/test.log
# set sink1
test.sinks.k1.channel = c1
test.sinks.k1.type = avro
test.sinks.k1.hostname = 192.168.58.201
test.sinks.k1.port = 12345
# set sink2
test.sinks.k2.channel = c1
test.sinks.k2.type = avro
test.sinks.k2.hostname= 192.168.58.202
test.sinks.k2.port = 12345
# set sink3
test.sinks.k3.channel = c1
test.sinks.k3.type = avro
test.sinks.k3.hostname = 192.168.58.203
test.sinks.k3.port = 12345
#set sink group
test.sinkgroups.g1.sinks = k1 k2 k3 #//绑定组
#set loadbalance
test.sinkgroups.g1.processor.type = load_balance #//设置工作类型=负载平衡
#如果开启,则将失败的 sink 放入黑名单
test.sinkgroups.g1.processor.backoff = true #//sink组加工处理回退
#轮询 轮询负载均衡
test.sinkgroups.g1.processor.selector = round_robin #//sink组处理选择=轮询
#在黑名单放置的超时时间,超时结束时,若仍然无法接收,则超时时间呈指数增长
test.sinkgroups.g1.processor.selector.maxTimeOut=10000
三个客户端配置相同(B、C、D):
linux>vi avro.conf
# Name the components on this agent
test.sources = s1
test.sinks = k1
test.channels = c1
# Describe/configure the source
test.sources.s1.type = avro
test.sources.s1.channels = c1
test.sources.s1.bind = 0.0.0.0
test.sources.s1.port = 12345
# Describe the sink
test.sinks.k1.type = logger
# Use a channel which buffers events in memory
test.channels.c1.type = memory
test.channels.c1.capacity = 1000
test.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
test.sources.s1.channels = c1
test.sinks.k1.channel = c1
#下发文件
linux>scp avro.conf 192.168.58.202:`pwd`
linux>scp avro.conf 192.168.58.203:`pwd`
(B、C、D)3个节点启动---启动
linux>bin/flume-ng agent --conf conf --conf-file avro.conf --name test -Dflume.root.logger=INFO,console
(A)日志节点启动----
linux>bin/flume-ng agent --conf conf --conf-file load_balance_avro.conf --name test -Dflume.root.logger=INFO,console
如果报错,注意xxx.conf配置文件内有没有多余的空格
结果:将日志文件发送到其余3个节点(轮询负载均衡)
3.故障转移 (发生故障的时候才使用)
linux>vi failover.conf
#test name
test.channels = c1
test.sources = s1
test.sinks = k1 k2 k3
#set gruop
test.sinkgroups = g1
#set channel
test.channels.c1.type = memory
test.channels.c1.capacity = 1000
test.channels.c1.transactionCapacity = 100
test.sources.s1.channels = c1
test.sources.s1.type = exec
test.sources.s1.command = tail -F /root/logs/test.log
# set sink1
test.sinks.k1.channel = c1
test.sinks.k1.type = avro
test.sinks.k1.hostname = 192.168.58.201
test.sinks.k1.port = 12345
# set sink2
test.sinks.k2.channel = c1
test.sinks.k2.type = avro
test.sinks.k2.hostname= 192.168.58.202
test.sinks.k2.port = 12345
# set sink3
test.sinks.k3.channel = c1
test.sinks.k3.type = avro
test.sinks.k3.hostname = 192.168.58.203
test.sinks.k3.port = 12345
#set sink group
test.sinkgroups.g1.sinks = k1 k2 k3
test.sinkgroups = g1
test.sinkgroups.g1.sinks = k1 k2 k3
test.sinkgroups.g1.processor.type = failover #//处理模式=故障转移
#优先级值, 绝对值越大表示优先级越高
test.sinkgroups.g1.processor.priority.k1 = 1 #//设置优先级
test.sinkgroups.g1.processor.priority.k2 = 5 #//设置优先级
test.sinkgroups.g1.processor.priority.k3 = 9 #//设置优先级
#失败的Sink的最大回退期(millis)
test.sinkgroups.g1.processor.maxpenalty = 20000
(B、C、D)3个节点启动---启动
linux>bin/flume-ng agent --conf conf --conf-file avro.conf --name test -Dflume.root.logger=INFO,console
(A)日志节点启动----
linux>bin/flume-ng agent --conf conf --conf-file failover.conf
--name test -Dflume.root.logger=INFO,console
测试数据:
linux>while true;do echo 'access log ...' >> /root/logs/test.log; sleep 0.5;done
结果:数据会发给优先级最高的,如果中间挂掉了会发往优先级第二的服务器