Apache Flume(一)

最新推荐文章于 2023-09-22 23:49:35 发布

丿沐染烟忱丶

最新推荐文章于 2023-09-22 23:49:35 发布

阅读量317

点赞数

分类专栏： flume 文章标签： flume 大数据 kafka linux

本文链接：https://blog.csdn.net/weixin_45607513/article/details/104152186

版权

flume 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Apache Flume是一个高可用、高可靠的数据采集、聚合和移动系统，主要用于日志流数据处理。Flume由Source、Channel和Sink组成，支持多种数据源和目标端，如Avro、Kafka、HDFS等。其优势包括高速数据采集、事务保障和数据恢复能力。本文介绍了Flume的基本概念、架构、安装和配置，以及组件如Source、Sink和Channel的工作原理。

摘要由CSDN通过智能技术生成

概述

flume定义
flume是Cloudera提供的一个高可用的、高可靠的、分布式的海量日志采集，聚合和移动的系统。flume基于流式(日志流)架构，灵活简单。

它具有可靠的可靠性机制和许多故障转移和恢复机制，具有强大的容错性。使用flume这套架构实现对日志流数据的实时在线分析。

Flume支持在日志系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接收方(可定制)的能力。当前Flume有两个版本Flume 0.9x版本的统称Flume-og，Flume1.X版本的统称Flume-ng。由于Flume-ng经过重大重构，与Flume-og有很大不同，使用时请注意区分。

其主要作用是实时的读取服务器本地磁盘的数据，将数据写入到HDFS中。

flume的优势
1.可以高速采集数据，采集的数据能够以想要的文件格式以及压缩方式存储在hdfs上。
2.事务功能保证了数据在采集的过程中数据不丢失。
3.部分Source保证了Flume挂了以后重启依旧能够继续在上一次采集点采集数据，真正做到数据零丢失。

Flume架构

Agent：最小日志采集单元，所谓的Flume的日志采集是通过拼装若干个Agent完成的。
Agent中包含Source、Channel、Sink。
1.Source(源端数据采集)：Flume提供了各种各样的Source，同时还提供了自定义Source。
2.Channel(临时存储聚合数据)：主要用的是memory channel和File channel(生产最常用)，生产中channel的数据一定是要监控的，防止sink挂了，撑爆channel。
3.Sink(移动数据到目标端)：如HDFS、KAFKA、DB以及自定义的sink。

单Agent：
在这里插入图片描述

串联Agent：
在这里插入图片描述
并联Agent(生产中最多的使用)：

多sinkAgent(也很常见):

可以将Flume想象成血管，负责运输到心脏后，再由心脏将血液运输到各个器官。

Flume安装

安装JDK 1.8+ 配置环境变量
安装Flume

[root@CentOSA ~]# tar -zxf  apache-flume-1.9.0-bin.tar.gz -C /usr/
[root@CentOSA ~]# cd /usr/apache-flume-1.9.0-bin/
[root@CentOSA apache-flume-1.9.0-bin]# ./bin/flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9

Agent配置

# 声明组件信息
<Agent>.sources = <Source1> <Source2>
<Agent>.sinks = <Sink1> <Sink1>
<Agent>.channels = <Channel1> <Channel2>

# 组件配置
<Agent>.sources.<Source>.<someProperty> = <someValue>
<Agent>.channels.<Channel>.<someProperty> = <someValue>
<Agent>.sinks.<Sink>.<someProperty> = <someValue>

# 链接组件
<Agent>.sources.<Source>.channels = <Channel1> <Channel2> ...
<Agent>.sinks.<Sink>.channel = <Channel1>

<Agent>、<Source>、<Sink>表示组件的名字，系统有哪些可以使用的组件需要查阅文档
http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html

测试

[root@CentOSA apache-flume-1.9.0-bin]# vi conf/demo01.properties
[root@CentOSA ~]# yum install -y telnet #必须安装该插件，否则r1组件无法运行

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = CentOSA
a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动组件

[root@CentOSA apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file conf/demo01.properties --name a1 -Dflume.root.logger=INFO,console

测试

[root@CentOSA ~]# telnet CentOSA 44444
Trying 192.168.40.129...
Connected to CentOSA.
Escape character is '^]'.
hello world
OK
ni hao
OK

组件概述

Source 输入源

Avro Source：内部启动一个Avro服务器，用于接收来自Avro Client的请求，并且将接收数据存储到Channel中。

Avro是一个数据序列化系统，设计用于支持大批量数据交换的应用。它的主要特点有：支持二进制序列化方式，可以便捷，快速地处理大量数据；动态语言友好，Avro提供的机制使动态语言可以方便地处理Avro数据。
在这里插入图片描述

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = avro
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@train apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/avro.properties -Dflume.root.logger=INFO,console

[root@train apache-flume-1.9.0-bin]# ./bin/flume-ng avro-client --host train --port 44444 --filename /root/data/t_employee

Exec Source:可以将指令在控制台输出采集过来。

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/data/t_employee

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@train apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/exec.properties -Dflume.root.logger=INFO,console

echo 'hello world' >> data/t_employee

Spooling Directory Source: 采集静态目录下，新增文本文件，采集完成后会修改文件后缀，但是不会删除采集的源文件，如果用户只想采集一次，可以修改该source默认行为。

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /root/spooldir
a1.sources.r1.fileHeader = true

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@train apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/spooldir.properties -Dflume.root.logger=INFO,console

注意： 只采取指定目录下新增文件，即当文件名被修改后也会采集。但是文件名不变，内容改变不会采集数据。

Taildir Source : 实时监测动态文本行的追加，并且记录采集的文件读取的位置的偏移量，即使下一次再次采集，可以实现增量采集。

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = g1 g2
a1.sources.r1.filegroups.g1 = /root/taildir/.*\.xml$
a1.sources.r1.filegroups.g2 = /root/taildir/.*\.properties$
a1.sources.r1.headers.g1.type = xml
a1.sources.r1.headers.g2.type = properties


a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

[root@train apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/taildir.properties -Dflume.root.logger=INFO,console

[root@train ~]# ls .flume/
jdbc-channel  taildir_position.json
[root@train ~]# cat .flume/taildir_position.json
[{"inode":34827659,"pos":12,"file":"/root/taildir/tail01.xml"}]

taildir_position.json：记录的位置信息，可以实现增量采集。

kafka Source：Kafka Source是一个Apache Kafka消费者，它从Kafka主题中读取消息。这目前支持Kafka服务器版本0.10.1.0或更高版本。

属性	默认值	说明
channels		对接的通道
type		必须知道指定为org.apache.flume.source.kafka.KafkaSource
kafka.bootstrap.servers		source代码使用的Kafka集群中的 agent 列表，寻找kafka
batchSize	1000	一批中写入Channel的最大消息数
batchDurationMillis	1000	将批次写入channel之前的最长时间（以毫秒为单位）只要达到第一个大小和时间，就会写入批次。
kafka.topics		以逗号分隔的主题列表，kafka消费者将从中读取消息。
kafka.consumer.group.id	flume	独特的消费者群体。在多个source或 agent 中设置相同的ID表示它们是同一个使用者组的一部分

用逗号分隔的主题列表订阅主题的示例。

1.配置文件

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 100
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = train:9092
a1.sources.r1.kafka.topics = test1
a1.sources.r1.kafka.consumer.group.id = group1

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

注意事项： a1.sources.r1.batchSize需要小于a1.channels.c1.transactionCapacity的数量。
否则会出现
在这里插入图片描述
2.运行flume

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/kafkaSource.properties -Dflume.root.logger=INFO,console

3.启动Kafka producer

./bin/kafka-console-producer.sh --broker-list train:9092 --topic test1

Sink输出

Logger Sink：通常用于测试/调试日志。
File Roll Sink：可以将采集的数据写入到本地文件

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /root/file_roll
a1.sinks.k1.sink.rollInterval = 0


a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/fileSink.properties

HDFS Sink：可以将数据写入到HDFS文件系统。

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume-hdfs/%y-%m-%d
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

./bin/flume-ng agent --conf conf/ --name  --conf-file conf/hdfsSink.p.properties

telnet train 44444

kafka Sink: 将数据写入kafka的topic中

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = train:9092
a1.sinks.k1.topic = topic01
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/kafkaSink.properties

./bin/kafka-console-consumer.sh --bootstrap-server train:9092 --topic topic01 --group custom.g.id

telnet train 44444

Avro Sink: 将数据写入到Avro Source中。

1.配置文件

# 组件配置
a1.sources.r1.type = avro
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

# 声明组件信息
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# 组件配置
a2.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a2.sources.r1.batchSize = 100
a2.sources.r1.batchDurationMillis = 2000
a2.sources.r1.kafka.bootstrap.servers = train:9092
a2.sources.r1.kafka.topics = test1
a2.sources.r1.kafka.consumer.group.id = custom.g.id

a2.sinks.k1.type = avro
a2.sinks.k1.hostname = train
a2.sinks.k1.port = 44444

a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# 链接组件
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

Agent a2：kafka source读取数据，Avro Sink写数据到a1的Avro Source中
Agent a1：Avro Source读取数据，以日志形式输出

2.由于Avro Sink写出数据时会寻找对应的ip、端口，所以必须先启动a1

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/avroSink.properties -Dflume.root.logger=INFO,console

3.启动a2

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/avroSink.properties

4.启动kafka的produce测试

./bin/kafka-console-producer.sh --broker-list train:9092 --topic test1

成功的话会在a1的日志中看见效果。

先启动a2会报错
在这里插入图片描述

Channel 通道

Memory Channel：传输速度快，将Source数据直接写入内存，不安全，可能会导致数据丢失。

transactionCapacity数<capacity

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

JDBC Channel：事件存储在数据库支持的持久性存储中。 JDBC通道当前支持嵌入式Derby。这是一种持久通道，非常适合可恢复性很重要的流程。存储非常重要的数据的时候可以使用。

a1.channels.c1.type = jdbc

Kafka Channel: 将Source采集的数据写入外围系统的Kafka集群。

a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = train:9092
a1.channels.c1.kafka.topic = test1
a1.channels.c1.kafka.consumer.group.id = flume-consumer

启动Flume

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/kafkaChannel.properties -Dflume.root.logger=INFO,console

订阅kafka的topic

./bin/kafka-console-consumer.sh --bootstrap-server train:9092 --topic test1 --group custom.g.id

测试

telnet train 44444

File Channel

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /root/flume/checkpoint
a1.channels.c1.dataDirs = /root/flume/data

高级组件

拦截器

作用域Source组件，对Source封装的Event数据进行拦截或者是装饰，Flume内建了许多拦截器：

Timestamp Intercepot:装饰类型，负责在Event Header添加时间信息。
Host Interceptor: 装饰类型，负责在Event Header添加主机信息。
Static Interceptor：装饰类型，负责在Event Header添加自定义key和value。
Remove Header Interceptor：装饰类型，负责删除Event Header中指定的key。
UUID Interceptor：装饰类型，负责在Event Header添加uuid的随机的唯一字符串。
Search and Replace Interceptor：装饰类型，负责搜索EventBody的内容，并且将匹配的内容进行替换。
Regex Filtering Interceptor：拦截类型，将满足正则表达式的内容进行过滤或者匹配。
Regex Extractor Interceptor：装饰类型，负责搜索EventBody的内容，并且将匹配的内容添加到Event Header里面。

测试1： Timestamp、Host、Static、Remove、UUID、Search and Replace

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#source配置,采集数据
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

#添加拦截器
a1.sources.r1.interceptors = i1 i2 i3 i4 i5 i6
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i2.type = host
# 自定义key和value
a1.sources.r1.interceptors.i3.type = static
a1.sources.r1.interceptors.i3.key = hello
a1.sources.r1.interceptors.i3.value = world
a1.sources.r1.interceptors.i4.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
a1.sources.r1.interceptors.i4.headerName = uuid
a1.sources.r1.interceptors.i5.type = remove_header
a1.sources.r1.interceptors.i5.withName = hello
a1.sources.r1.interceptors.i6.type = search_replace
a1.sources.r1.interceptors.i6.searchPattern = ^tangc
a1.sources.r1.interceptors.i6.replaceString = yes


#sink配置,将数据发送
a1.sinks.k1.type = logger


#channel 通道缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

测试2：Regex Filtering、regex extractor

# 声明组件信息
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#source配置,采集数据
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

#添加拦截器
a1.sources.r1.interceptors = i1 i2
#将EventBody含有INFO|ERROR的抽取出，并且添加到EventHeader
a1.sources.r1.interceptors.i1.type = regex_extractor
a1.sources.r1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.r1.interceptors.i1.serializers = s1
a1.sources.r1.interceptors.i1.serializers.s1.name = loglevel

#过滤
a1.sources.r1.interceptors.i2.type = regex_filter
#包含tang
a1.sources.r1.interceptors.i2.regex = .*tang.*
#false表示匹配   true表示排除
a1.sources.r1.interceptors.i2.excludeEvents = false



#sink配置,将数据发送
a1.sinks.k1.type = logger


#channel 通道缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

通道选择器

当一个Source组件对接多个Channel组件的时候，通道选择器决定了Source的数据如何路由到channel中，如果用户不指定通道选择器，默认系统会将Source数据广播给所有的Channel(默认使用replication模式)。

replication
在这里插入图片描述

# 声明组件信息
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444
# sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /root/file_roll_1
a1.sinks.k1.sink.rollInterval = 0

a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /root/file_roll_2
a1.sinks.k2.sink.rollInterval = 0
#channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = jdbc
# 链接组件
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

错误
在这里插入图片描述
原因：由于环境变量配置了HIVE_HOME，导致hive中的debry的jar和flume的jar冲突。

解决：1.如果用户配置HIVE_HOME环境，需要用户移除hive的lib下的derby或者flume的lib下的derby(仅仅删除一方即可)
2.默认情况下，flume使用的是复制|广播模式的通道选择器。

测试：启动flume、telnet测试数据。查看sink写入的文件下的数据。

Multiplexing：将不同的数据分类，写入到不同的channel中。
在这里插入图片描述
案列： 将含有INFO的数据写入到c1，含有ERROR的数据写入到c2。

a1.sinks = k1 k2
a1.channels = c1 c2

#通道选择器  分流模试
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.channels = c1 c2
a1.sources.r1.selector.header = level
a1.sources.r1.selector.mapping.INFO = c1
a1.sources.r1.selector.mapping.ERROR = c2
a1.sources.r1.selector.default = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_extractor
a1.sources.r1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.r1.interceptors.i1.serializers = s1
a1.sources.r1.interceptors.i1.serializers.s1.name = level


a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /root/file_roll_1
a1.sinks.k1.sink.rollInterval = 0

a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /root/file_roll_2
a1.sinks.k2.sink.rollInterval = 0

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = jdbc
# 链接组件
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

Sink Processors

Flume使用Sink Group将多个Sink实例封装成一个逻辑的Sink组件，内部通过Sink Processors实现Sink Group的故障和负载均衡。

Load balancing Sink Processor:负载平衡接收器处理器提供了在多个接收器上实现负载平衡的能力
在这里插入图片描述

# 声明组件信息
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /root/file_roll_1
a1.sinks.k1.sink.rollInterval = 0
a1.sinks.k1.sink.batchSize = 1

a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /root/file_roll_2
a1.sinks.k2.sink.rollInterval = 0
a1.sinks.k2.sink.batchSize = 1

#配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1



# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

如果想看到负载均衡效果，sink.batchSize和transactionCapacity必须配置成1

Fileover Sink Processor:故障转移接收器处理器维护一个按优先级排序的接收器列表，确保只要有一个可用的接收器，就会处理(交付)事件。

# 声明组件信息
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1

# 组件配置
a1.sources.r1.type = netcat
a1.sources.r1.bind = train
a1.sources.r1.port = 44444

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /root/file_roll_1
a1.sinks.k1.sink.rollInterval = 0
a1.sinks.k1.sink.batchSize = 1

a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /root/file_roll_2
a1.sinks.k2.sink.rollInterval = 0
a1.sinks.k2.sink.batchSize = 1


#Sink Processor
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 20
a1.sinkgroups.g1.processor.priority.k2 = 10
#失败Sink的最大回退周期
a1.sinkgroups.g1.processor.maxpenalty = 10000


a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactiionCapacity = 1

# 链接组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

API集成

<dependency>
    <groupId>org.apache.flume</groupId>
    <artifactId>flume-ng-sdk</artifactId>
    <version>1.9.0</version>
</dependency>

单机

private RpcClient client;
    @Before
    public void before(){
        client = RpcClientFactory.getDefaultInstance("10.15.0.34",44444);
    }

    @Test
    public void testAvro() throws EventDeliveryException {
        Event event = EventBuilder.withBody("1 zhangsan true 28".getBytes());
        client.append(event);

    }

    @After
    public void after(){
        client.close();
    }