1.多source到一agent
agent1 -> agent3
agent2 -> agetn3
Avro Source可以定制avro-client发送一个指定的文件给Flume agent,Avro源使用Avro RPC机制,Flume主要的RPC Source也是 Avro Source,它使用Netty-Avro inter-process的通信(IPC)协议来通信,因此可以用java或JVM语言发送数据到Avro Source端。它的配置文件主要包含三个参数:
- type: Avro source的别名是avro,也可以使用完整类别名称,org.apache.flume.source.AvroSource;
- bind: 绑定的IP地址或主机名。使用0.0.0.0绑定机器所有端口
- port: 绑定监听端口端口
agnet1 的配置文件flume-netcat-avro.conf
#定义agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#定义source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop000
a1.sources.r1.port = 44444
#定义channnel
a1.channels.c1.type = memory
#定义sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop000
a1.sinks.k1.port = 44445
#定义配置关系
a1.sinks.k1.channel = c1
a1.sources.r1.channels = c1
agent2的配置文件 flume-taildir-avro.conf
#定义agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
#定义source
a2.sources.r1.type = TAILDIR
a2.sources.r1.positionfile = /home/hadoop/position/taildir_position.json
a2.sources.r1.filegroups = f1 f2
a2.sources.r1.filegroups.f1 = /home/hadoop/data/test1/example.log
a2.sources.r1.filegroups.f2 = /home/hadoop/data/test2/.*log.*
#定义channnel
a2.channels.c1.type = memory
#定义sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname = hadoop000
a2.sinks.k1.port = 44445
#定义配置关系
a2.sources.r1.channels = c1
a2.sinks.k1.channel