Flume简介和配置
官网地址:http://flume.apache.org/
Flume是什么
Flume是一个分布式数据收集框架。
Flume是一种分布式的、可靠的、可用的服务,可以有效地收集、聚合和移动大量的日志数据。
收集(collecting): — 数据源 source
聚合(aggregating): — 存储 channel
移动(moving ): — 使用 sink
学习flume其实就是学习source、channel、sink的组合。
flume是框架。框架本身是没有source、channel、sink的组合关系的。框架要是使用source、channel、sink的组合,就必须是我们通过配置文件告诉框架。
学习flume其实就是学习source、channel、sink的组合配置。
channel管道存储的数据一旦被sink,就没有了。
channel是一种被动的状态,只负责存储数据。
Event 和 agent
Flume event被定义为具有字节有效负载(payload)和一 组可选字符串属性的数据流单元。
Flume event = payload(数据) + 属性
Flume agent是一个(JVM)进程,它承载着事件从外部源 流到下一个目的地(hop)所经过的组件。
Flume Sources
NetCat Source
采集网络数据—控制台日志输出
-
在/home/hadoop/apps/flume/conf目录下编写配置文件
vim flume-net-log.conf
# flume需要配置source、channel、sink # 一个flume中可以有多个source、channel、sink # 所以source、channel、sink需要取名字 # 定义source、channel、sink # a1是agent的名字 # r1是source的名字 # c1是channle的名字 # s1是sink的名字 a1.sources = r1 a1.channels = c1 a1.sinks = k1 # 配置source是什么数据源 # 这里的netcat就是服务端,需要安装netcat a1.sources.r1.type = netcat a1.sources.r1.bind = hadoop101 a1.sources.r1.port = 6666 # 配置channel a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 10000 a1.channels.c1.byteCapacityBufferPercentage = 20 a1.channels.c1.byteCapacity = 800000 # 配置sink a1.sinks.k1.type = logger # source写入哪一个channel # sink从哪一个channel获取数据 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
使用以上的配置文件启动flume
flume-ng agent --conf ./ --conf-file ./flume-net-log.conf --name a1 -Dflume.root.logger=INFO,console ## 简化命令 [hadoop@hadoop101 conf]$ flume-ng agent -n a1 -c ./ -f flume-net-log.conf -Dflume.root.logger=INFO,console
yum失效解决办法
wget -O /etc/yum.repos.d/CentOS-Base.repo http://file.kangle.odata.cc/repo/Centos6.repo
wget -O /etc/yum.repos.d/epel.repo http://file.kangle.odata.cc/repo/epel6.repo
yum makecache
安装NetCat
-
解压netcat安装包netcat-0.7.1.tar.gz(直接解压到当前目录,还需要编译和安装)
[hadoop@hadoop101 installPkg]$ tar -zxvf netcat-0.7.1.tar.gz
-
配置安装路径
[hadoop@hadoop101 netcat-0.7.1]$ ./configure --prefix=/home/hadoop/apps/netcat/
-
编译和安装(因为src目录下是C语言,编译需要先安装gcc)
[hadoop@hadoop101 src]$ make && make install
-
配置环境变量
[hadoop@hadoop101 bin]$ sudo vim /etc/profile ## netcat的环境变量 export NETCAT_HOME=/home/hadoop/apps/netcat export PATH=$PATH:$NETCAT_HOME/bin [hadoop@hadoop101 bin]$ . /etc/profile
-
netcat以socket客户端的身份启动(这里的hadoop101和端口号6666,是自己创建的flume配置文件flume-net-log.conf里的配置)
[hadoop@hadoop101 ~]$ nc hadoop101 6666 hello world OK
-
控制台输出结果
2020-12-17 14:26:35,639 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:169)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.152.81:6666] 2020-12-17 14:27:13,644 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
Exec Source
监控文件数据—控制台日志输出
exec数据源
vim flume-exec-log.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data/access.log
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
# 配置sink
a1.sinks.k1.type = logger
# source写入哪一个channel
# sink从哪一个channel获取数据
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
向access.log中写入数据,服务端控制台就会输出相应的数据
[hadoop@hadoop101 data]$ echo java > flume.txt
2020-12-17 15:10:54,796 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: r1 started
2020-12-17 15:11:24,807 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)]
Event: { headers:{} body: 6A 61 76 61 java }
Spooling Directory Source
- 与Exec源不同,该源是可靠的,不会丢失数据,即使Flume被重新启动或终止。
- 该目录中的文件必须是不可变的、唯一命名的文件
- 文件完成后会重命名文件
flume-spool-log.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/data/flumeSpool
a1.sources.r1.fileHeader = true
a1.sources.r1.basenameHeader = true
## 忽略以.tmp结尾的文件
a1.sources.r1.ignorePattern = ^.*\\.tmp$
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
# 配置sink
a1.sinks.k1.type = logger
# source写入哪一个channel
# sink从哪一个channel获取数据
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Taildir Source – 重点
flume1.7新增加的数据源
监视指定的文件,一旦检测到添加到每个文件中的新行,就近乎实时地跟踪它们。如果正在写入新行,则此源将重新尝试读取它们,直到写入完成。
这个源是可靠的,即使在拖尾文件旋转(指flume停止,文件现在依然在写入数据)时也不会丢失数据。
它以JSON格式定期地将每个文件的最后读取位置写入给定位置文件。如果flume由于某种原因停止或停机,它可以从写入现有位置文件的位置重新开始跟踪。
此源文件不会重命名、删除或对被跟踪的文件进行任何修改。目前这个源不支持跟踪二进制文件。它逐行读取文本文件。
flume-taildir-log.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /home/hadoop/apps/flume/conf/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /home/hadoop/data/word.txt
a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /home/hadoop/data/wc.txt
a1.sources.r1.headers.f2.headerKey1 = value2
a1.sources.r1.fileHeader = true
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
# 配置sink
a1.sinks.k1.type = logger
# source写入哪一个channel
# sink从哪一个channel获取数据
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Flume Channels
Memory Channel
事件存储在内存队列中。可以实现高吞吐;但是flume 失败了数据丢失。
File Channel
https://blogs.apache.org/flume/entry/apache_flume_filechannel
MemoryChannel提供高吞吐量,但在崩溃或断电时会丢失数据。因此,需要建立一个持久的渠道。
FileChannel的目标是提供可靠的高吞吐量通道。FileChannel保证在提交事务时不会由于后续崩溃或断电而丢失任何数据。
taildir-file-log.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /home/hadoop/apps/flume/conf/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /home/hadoop/data/word.txt
a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /home/hadoop/data/wc.txt
a1.sources.r1.headers.f2.headerKey1 = value2
a1.sources.r1.fileHeader = true
# 配置channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/hadoop/apps/flume/checkpoint
a1.channels.c1.dataDirs = /home/hadoop/apps/flume/data
# 配置sink
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Flume Sinks
HDFS Sink
这个sink将事件写入Hadoop分布式文件系统(HDFS)。
它目前支持创建文本和序列文件。它支持两种文件类型的压缩。
可以根据运行时间、数据大小或事件数量定期滚动文件(关闭当前文件并创建一个新文件)。
net-meme-hdfs.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
# 配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M
a1.sinks.k1.hdfs.filePrefix = events-
# 目录滚动的配置
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
# 文件滚动的配置
# 按照时间(s)滚动,0表示禁用
a1.sinks.k1.hdfs.rollInterval = 30
# 按照大小滚动,0表示禁用
a1.sinks.k1.hdfs.rollSize = 1024
# 按照数量滚动,0表示禁用
a1.sinks.k1.hdfs.rollCount = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 配置文件类型
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
File Roll Sink
在本地文件系统上存储事件
net-mem-file.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
# 配置sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/data/flume
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
AsyncHBaseSink
此接收器使用异步模型将数据写入HBase。
net-mem-hbase.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
# 配置sink
a1.sinks.k1.type = asynchbase
a1.sinks.k1.table = foo_table
a1.sinks.k1.columnFamily = bar_cf
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume串联
Flume一个数据源对应多个channel,多个sink的叫扇出(fan out);
多个source配一个channel和一个sinks,这叫扇入(fan in);
但是不能同时多个source配多个channel和多个sinks。
multi-agent flow
第一个flume在hadoop101启动
第二个flume在hadoop102启动
注意:先启动hadoop102的flume
net-mem-avro.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4545
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
avro-mem-log.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop102
a1.sources.r1.port = 4545
# 配置channel
a1.channels.c1.type = memory
# 配置sink
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Multiplexing the flow(多路复用流)
net-channels-sinks.conf
a1.sources = r1
a1.channels = c1 c2 c3
a1.sinks = k1 k2 k3
# 配置source是什么数据源
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
a1.channels.c2.type = memory
a1.channels.c3.type = memory
# 配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M
a1.sinks.k1.hdfs.filePrefix = events-
# 目录滚动的配置
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
# 文件滚动的配置
# 按照时间(s)滚动,0表示禁用
a1.sinks.k1.hdfs.rollInterval = 30
# 按照大小滚动,0表示禁用
a1.sinks.k1.hdfs.rollSize = 1024
# 按照数量滚动,0表示禁用
a1.sinks.k1.hdfs.rollCount = 10
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 配置文件类型
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k2.type = logger
a1.sinks.k3.type = avro
a1.sinks.k3.hostname = hadoop102
a1.sinks.k3.port = 4545
# source写入哪一个channel
# sink从哪一个channel获取数据
a1.sources.r1.channels = c1 c2 c3
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3
拦截器
Timestamp Interceptor
这个拦截器将在事件头中插入它处理事件的时间(单位为 毫秒)。
net-timestamp.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source是什么数据源
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
# 配置sink
a1.sinks.k1.type = logger
# source写入哪一个channel
# sink从哪一个channel获取数据
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Host Interceptor
net-timestamp-host.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置拦截器
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i2.type = host
a1.sources.r1.interceptors.i2.hostHeader = hostname
a1.sources.r1.interceptors.i2.useIP = false
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
# 配置sink
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Static Interceptor
自定义header的信息
net-static.conf
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = author
a1.sources.r1.interceptors.i1.value = lee
# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
# 配置sink
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
自定义拦截器
-
编写一个类实现Interceptor,将文件数据转换成json格式(参照HostInterceptor类的源码进行修改)
先导入依赖
<dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>1.7.0</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.72</version> </dependency>
package com.bigdata.demo; import com.alibaba.fastjson.JSONObject; import com.google.common.collect.Lists; import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.interceptor.Interceptor; import java.io.UnsupportedEncodingException; import java.util.HashMap; import java.util.List; public class LogInterceptor implements Interceptor { private String colName; private String separator; private HashMap<String,Object> map; private LogInterceptor(String colName, String separator){ this.colName = colName; this.separator = separator; } @Override public void initialize() { map = new HashMap<>(); } @Override public Event intercept(Event event) { map.clear(); byte[] body = event.getBody(); try { String data = new String(body,"UTF-8"); String[] datas = data.split(separator); String[] fields = colName.split(","); if(fields.length != datas.length){ return null; } for (int i = 0; i < datas.length; i++) { map.put(fields[i],datas[i]); } //将map --> json String json = JSONObject.toJSONString(map); //将json设置到event的body上 event.setBody(json.getBytes("UTF-8")); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return event; }; @Override public List<Event> intercept(List<Event> events) { List<Event> out = Lists.newArrayList(); for (Event event : events) { Event outEvent = intercept(event); if (outEvent != null) { out.add(outEvent); } } return out; } @Override public void close() { //no-op } public static class Builder implements Interceptor.Builder { private String colName; private String separator; @Override public Interceptor build() { return new LogInterceptor(colName,separator); } @Override public void configure(Context context) { colName = context.getString("colName",""); separator = context.getString("separator"," "); } } }
-
将代码打成jar包,添加到flume的lib目录下
-
编写配置
a1.sources = r1 a1.channels = c1 a1.sinks = k1 # 配置source a1.sources.r1.type = netcat a1.sources.r1.bind = hadoop101 a1.sources.r1.port = 6666 # 配置拦截器 a1.sources.r1.interceptors = i1 # 这里通过类的全限定名获取的是class文件,不是java文件,内部类前需加$符号 a1.sources.r1.interceptors.i1.type = com.bigdata.demo.LogInterceptor$Builder a1.sources.r1.interceptors.i1.colName =id,name,age a1.sources.r1.interceptors.i1.separator =, # 配置channel a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 # 配置sink a1.sinks.k1.type = file_roll a1.sinks.k1.sink.directory = /home/hadoop/data a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
自定义hbase序列化器
-
实现AsyncHbaseEventSerializer,将数据写入hbase表中(参照SimpleAsyncHbaseEventSerializer类的源码)
先导入依赖
<dependency> <groupId>org.apache.flume.flume-ng-sinks</groupId> <artifactId>flume-ng-hbase-sink</artifactId> <version>1.7.0</version> </dependency>
package com.bigdata.demo; import com.google.common.base.Charsets; import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.FlumeException; import org.apache.flume.conf.ComponentConfiguration; import org.apache.flume.sink.hbase.AsyncHbaseEventSerializer; import org.hbase.async.AtomicIncrementRequest; import org.hbase.async.PutRequest; import java.util.ArrayList; import java.util.List; public class LogHbaseEventSerializer implements AsyncHbaseEventSerializer { private byte[] table; private byte[] cf; private byte[] payload; private byte[] incrementColumn; private byte[] incrementRow; private String separator; private String pCol; @Override public void initialize(byte[] table, byte[] cf) { this.table = table; this.cf = cf; } @Override public List<PutRequest> getActions() { List<PutRequest> actions = new ArrayList<PutRequest>(); if (pCol != null) { byte[] rowKey; try { //获取rowkey,使用采集数据中的用户id当作rowkey //得到id String data = new String(payload); String[] strings = data.split(separator); String[] fields = pCol.split(","); if(strings.length != fields.length){ return actions; } String id = strings[0]; rowKey = id.getBytes("UTF-8"); for (int i = 0; i < strings.length; i++) { PutRequest putRequest = new PutRequest(table, rowKey, cf, fields[i].getBytes("UTF-8"), strings[i].getBytes("UTF-8")); actions.add(putRequest); } } catch (Exception e) { throw new FlumeException("Could not get row key!", e); } } return actions; } public List<AtomicIncrementRequest> getIncrements() { List<AtomicIncrementRequest> actions = new ArrayList<AtomicIncrementRequest>(); if (incrementColumn != null) { AtomicIncrementRequest inc = new AtomicIncrementRequest(table, incrementRow, cf, incrementColumn); actions.add(inc); } return actions; } @Override public void cleanUp() { // TODO Auto-generated method stub } @Override public void configure(Context context) { //HBase的列名称 pCol = context.getString("colName", "pCol"); //flume采集数据的分隔符 separator = context.getString("separator", ","); String iCol = context.getString("incrementColumn", "iCol"); if (iCol != null && !iCol.isEmpty()) { incrementColumn = iCol.getBytes(Charsets.UTF_8); } incrementRow = context.getString("incrementRow", "incRow").getBytes(Charsets.UTF_8); } @Override public void setEvent(Event event) { this.payload = event.getBody(); } @Override public void configure(ComponentConfiguration conf) { // TODO Auto-generated method stub } }
-
将代码打成jar包,添加到flume的lib目录下
-
编写配置
a1.sources = r1 a1.channels = c1 a1.sinks = k1 # 配置source a1.sources.r1.type = netcat a1.sources.r1.bind = hadoop101 a1.sources.r1.port = 6666 # 配置channel a1.channels.c1.type = memory # 配置sink a1.sinks.k1.type = asynchbase a1.sinks.k1.table = myhbase a1.sinks.k1.columnFamily = c a1.sinks.k1.serializer = com.bigdata.demo.LogHbaseEventSerializer a1.sinks.k1.serializer.colName = id,name,age a1.sinks.k1.serializer.separator = , a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
Agent的内部原理
Flume的故障转移和负载均衡
使用sink组
故障转移
使用sink组对应一个channel,sink组中只能有一个sink在take数据。如果该sink出现了故障,sink组中的可以使用另一个sink来take数据。
sink有一个与之相关的优先级,数量越大,优先级越高。
failover.conf:
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4545
a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /home/hadoop/data/flume
# 配置故障转移
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 50
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
负载均衡
负载均衡sink处理器提供了在多个sink上实现负载均衡流的能力。
它维护一个活动接收器的索引列表,负载必须分布在该列表上。
实现支持通过round_robin或random选择机制分配负载。
选择机制默认为round_robin类型,但可以通过配置重写。
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2
# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 6666
# 配置channel
a1.channels.c1.type = memory
# 配置负载均衡
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
# 配置sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/hadoop/data/flume01
a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /home/hadoop/data/flume
# 绑定channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1