Apache Flume
通道选择器
当一个Source组件对接多个Channel组件的时候, 通道选择器 决定了Source的数据如何传输到Channel中,如果用户不指定通道选择器,默认系统会将Source数据广播给所有的Channel(默认使用replicating模式)。
replicating 复制广播
启动flume
[root@Centos ~]# cd /usr/apache-flume-1.9.0-bin/
[root@Centos apache-flume-1.9.0-bin]# ./bin/flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9
配置example13.properties文件
[root@Centos conf]# vim example13.properties
# 声明基本组件 Source Channel Sink example13.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2
启动flume
[root@Centos apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example13.properties -Dflume.root.logger=INFO,console
如果出现以下错误
原因:jar冲突
1、如果用户配置HIVE_HOME环境,需要用户移除hive的lib下的derby或者flume的lib下的derby(仅仅删除一方即可)
2、默认情况下,flume使用的是复制|广播模式的通道选择器。
解决方法:
移除flume lib 下面的derby-10.14.1.0.jar
[root@Centos apache-flume-1.9.0-bin]# mv lib/derby-10.14.1.0.jar ~/
再次启动flume即可
输入数据段telnet
[root@Centos ~]# telnet Centos 44444
Trying 192.168.17.150...
Connected to Centos.
Escape character is '^]'.
20200608
OK
hello world
OK
两个通道文件tail -f监视结果
example14.properties文件第二种等价写法
[root@Centos conf]# vim example14.properties
# 声明基本组件 Source Channel Sink example14.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器器 复制模式
a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = Centos
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2
Multiplexing 分流
- 信息头是INFO的进入file_roll_1
- 信息头是ERROR的进入file_roll_2
- 没有消息头的默认进入file_roll_1
声明组件 example15.properties
[root@Centos conf]# vim example15.properties
# 声明基本组件 Source Channel Sink example15.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器器 复制模式
a1.sources.s1.selector.type = multiplexing
a1.sources.s1.channels = c1 c2
#定义一个消息头level 在匹配对应的消息 是INFO就传到C1 是ERROR就传到C2 两者都不是就传递到C1
a1.sources.s1.selector.header = level
a1.sources.s1.selector.mapping.INFO = c1
a1.sources.s1.selector.mapping.ERROR = c2
a1.sources.s1.selector.default = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = Centos
a1.sources.s1.port = 44444
#定义正则抽取拦截器
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = level
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/baizhi/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/baizhi/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2
启动flume
[root@Centos apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example15.properties -Dflume.root.logger=INFO,console
telnet输入数据
[root@Centos ~]# telnet Centos 44444
Trying 192.168.17.150...
Connected to Centos.
Escape character is '^]'.
INFO nihao
OK
ERROR nihao
OK
nihao
OK
hello world INFO
OK
- 注意 :去掉^ 结果如下 INFO|ERROR不在头也会匹配对应的文件中去
a1.sources.s1.interceptors.i1.regex = (INFO|ERROR)