Flume安装部署
用途:海量日志采集、聚合和传输的系统
安装地址
(1)Flume官网地址:http://flume.apache.org/
(2)文档查看地址:http://flume.apache.org/FlumeUserGuide.html
(3)下载地址:http://archive.apache.org/dist/flume/
1、解压安装…
2、将lib文件夹下的guava-11.0.2.jar删除以兼容Hadoop 3.1.3
rm /opt/module/flume/lib/guava-11.0.2.jar
Flume架构
案例
一、实现监控端口
需要使用netcat工具
# 安装netcat工具
sudo yum install -y nc
#查看端口号是否被占用
sudo netstat -nlp | grep 端口号
二、复制
默认采用了复制的功能
#该语句不用配置也可以
a1.sources.r1.selector.type = replicating
- 采用复制功能注意一个channel对应一个sink,若多个sink对应一个channel,则会导致随机选择客户端sink出去数据
故障转移配置
1、设置sinks组
2、配置type
3、配置优先级
# 读取信息端配置
# 通过netcat工具实现对端口发送数据检测
# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103
a1.sinks.k1.port = 8887
# Describe the sink
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop104
a1.sinks.k2.port = 8886
a1.sinkgroups = g1
#将k1,k2作为组
a1.sinkgroups.g1.sinks = k1 k2
# 将该组配置为故障转移
a1.sinkgroups.g1.processor.type = failover
#k1,k2的优先级设置
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
# hadoop103接收端配置(先启动)
# hadoop104修改bind和port即可
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103
a1.sources.r1.port = 8887
# Describe the sink
a1.sinks.k1.type = logger
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
聚合
- sink时,发送至同一IP,同一端口号实现聚合
自定义拦截器
定义拦截器
1、继承interceptor,实现四个方法
2、重写intercept的两个方法,设置过滤或添加map的键值对
3、实现内部类,继承Builder,返回class
/**
* @ClassName : Myinterceptor
* @Author : kele
* @Date: 2021/1/7 13:21
* @Description :自定义拦截器
* 1、写event的方法
* 2、写list的event方法
* 3、实现内部类builder继承Builder
*/
public class Myinterceptor implements Interceptor {
@Override
public void initialize() {
}
/**
* 处理一个事件
* 1、通过map接收event的head
* 2、配置head的kv
* 3、将配置的map写入event的head中
* 4、返回event(body中的数据一般不做修改)
* @param event
* @return
*/
@Override
public Event intercept(Event event) {
Map<String,String> map = event.getHeaders();
byte[] body = event.getBody();
if(body[0] == 'a'){
map.put("alphabet","a");
}else if (body[0] == 'b'){
map.put("alphabet","b");
}else
map.put("alphabet","defalut");
event.setHeaders(map);
return event;
}
/**
* 处理多个事件,调用处理一个事件的方法
* @param list
* @return
*/
@Override
public List<Event> intercept(List<Event> list) {
for (Event event : list) {
intercept(event);
}
return list;
}
@Override
public void close() {
}
/**
* 实现内部类,继承Builder,返回拦截器的方法
*/
public static class MyBuilder implements Builder{
@Override
public Interceptor build() {
return new Myinterceptor();
}
@Override
public void configure(Context context) {
}
}
}
使用自定义拦截器、选择器的配置文件
在使用了拦截器后channel selector要配置成多路复用的方式
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = 全类名$内部类
#多路复用配置
a1.sources.r1.selector.type = multiplexing
#拦截器设置的key
a1.sources.r1.selector.header = alphabet
a1.sources.r1.selector.mapping.a= c1
a1.sources.r1.selector.mapping.b= c2
#若都不匹配,发送到c3的channel
a1.sources.r1.selector.default = c3