Apache Flume

最新推荐文章于 2024-07-01 13:05:14 发布

不会秃头的小白菜

最新推荐文章于 2024-07-01 13:05:14 发布

阅读量203

点赞数

分类专栏： Flume

本文链接：https://blog.csdn.net/qq_45536740/article/details/105559164

版权

Flume 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

Apache Fulme

Flume 是什么

Flume是一种分布式，可靠且可用的服务，用于有效地收集，聚合和移动大量日志数据。Flume构建在⽇志流之上一个简单灵活的架构。它具有可靠的可靠性机制和许多故障转移和恢复机制，具有强⼤的容错性。使用Flume这套架构实现对日志流数据的实时在线分析。Flume支持在日志系统中定制各类数据发送⽅，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方（可定制）的能力。当前Flume有两个版本Flume 0.9X版本的统称Flume-og，Flume1.X版本的统称Flume-ng。由于Flume-ng经过重大重构，与Flume-og有很大不同，使用时请注意区分。本次课程使用的是apache-flume-1.9.0-bin.tar.gz

Fulme架构

在这里插入图片描述

Fulme安装

安装JDK 1.8+ 配置JAVA_HOME环境变量量-略略
安装Flume下载地址http://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz

[root@CentOS ~]# tar -zxf apache-flume-1.9.0-bin.tar.gz -C /usr/
[root@CentOS ~]# cd /usr/apache-flume-1.9.0-bin/
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9

Agent配置模板说明

# 声明组件信息
<Agent>.sources = <Source1> <Source2>
<Agent>.sinks = <Sink1> <Sink1>
<Agent>.channels = <Channel1> <Channel2>
# 组件配置
<Agent>.sources.<Source>.<someProperty> = <someValue>
<Agent>.channels.<Channel>.<someProperty> = <someValue>
<Agent>.sinks.<Sink>.<someProperty> = <someValue>
# 链接组件
<Agent>.sources.<Source>.channels = <Channel1> <Channel2> ...
<Agent>.sinks.<Sink>.channel = <Channel1>

模板结构是必须掌握的，掌握该模板的⽬目的是为了了便便于后期的查阅和配置。
<Agent> 、 <Channel> 、 <Sink> 、 <Source> 表示组件的名字，系统有哪些可以使⽤用的组
件需要查阅⽂文档.

[查阅 :http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html]:

快速入门

① Agent配置

example1.properties单个Agent的配置,将该配置⽂文件放置在flume安装⽬目录下的conf⽬目录下。

# 声明基本组件 Source Channel Sink
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

1、安装⼀一下 yum -y install nmap-ncat ，这样⽅方便便后续的测试。
2、需要安装 yum -y install telnet ，⽅方便便做测试

②启动a1 采集组件

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example1.properties -Dflume.root.logger=INFO,console

附注启动命令参数

Usage: ./bin/flume-ng <command> [options]...
commands:
help display this help text
agent run a Flume agent
avro-client run an avro Flume client
version show Flume version info
global options:# 全局属性
--conf,-c <conf> use configs in <conf> directory
--classpath,-C <cp> append to the classpath
--dryrun,-d do not actually start Flume, just print the command
--plugins-path <dirs> colon-separated list of plugins.d directories. See the
plugins.d section in the user guide for more details.
Default: $FLUME_HOME/plugins.d
-Dproperty=value sets a Java system property value
-Xproperty=value sets a Java -X option
agent options:
--name,-n <name> the name of this agent (required)
--conf-file,-f <file> specify a config file (required if -z missing)
--zkConnString,-z <str> specify the ZooKeeper connection to use (required if -f
missing)
--zkBasePath,-p <path> specify the base path in ZooKeeper for agent configs
--no-reload-conf do not reload config file if changed
--help,-h display help text
avro-client options:
--rpcProps,-P <file> RPC client properties file with server connection params
--host,-H <host> hostname to which events will be sent
--port,-p <port> port of the avro source
--dirname <dir> directory to stream to avro source
--filename,-F <file> text file to stream to avro source (default: std input)
--headerFile,-R <file> File containing event headers as key/value pairs on each new
line
--help,-h display help text
Either --rpcProps or both --host and --port must be specified.
Note that if <conf> directory is specified, then it is always included first
in the classpath.

③测试a1

[root@CentOS apache-flume-1.9.0-bin]# telnet CentOS 44444
Trying 192.168.52.134...
Connected to CentOS.
Escape character is '^]'.
hello world
2020-02-05 11:44:43,546 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO -
org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{}
body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 0D hello world. }

组件概述

Source—输入源

Avro Source:

内部启动一个Avro 服务器，用于接收来自Avro Client的请求，并且将接收数据存储到Chanel中

属性	默认值	含义
channels		需要对接Channel
type		表示组件类型，必须给avro
bind		绑定IP
prot		绑定监听口

#声明组件
a1.sources = s1
# 配置组件
a1.sources.s1.type = avro
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 对接channel
a1.sources.s1.channels = c1
<Agent>.sources = <Source>
# 组件配置
<Agent>.sources.<Source>.<someProperty> = <someValue>

# 声明基本组件 Source Channel Sink example2.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = avro
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1


[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example2.properties -Dflume.root.logger=INFO,console

读取文件

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng avro-client --host CentOS --port 44444 --filename /root/t_emp

Exec Source :

可以将指令在控制台输出采集过来。

属性	默认值	含义
channels		需要对接Channel
type		必须指定为exec
command		要执行的命令

# 声明基本组件 Source Channel Sink example3.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /root/t_user
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example3.properties -Dflume.root.logger=INFO,console

[root@CentOS ~]# tail -f t_user

Spooling Directory Source：

采集静态目录下，新增文本文件，采集完成后会修改文件后缀，但是不会删除采集的源文件，如果用户只想采集一次，可以修改该source默认行为。

属性	默认值	说明
channels		需要对接Channel
type		表示组件类型，必须改为spooldir
spoolDir		给定需要采集的目录
fileSuffix	.COMPLETED	使用该值修改采集完成文件名
deletePolicy	never	可选值never/immediate
includePattern	^.*$	表示匹配所有文件
ignorePattern	^$	表示不匹配的文件

# 声明基本组件 Source Channel Sink example4.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /root/spooldir
a1.sources.s1.fileHeader = true
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example4.properties -Dflume.root.logger=INFO,console

Taildir Source：

实时监测动态⽂文本⾏行行的追加，并且记录采集的⽂文件读取的位置了了偏移量量，即使下⼀一
次再次采集，可以实现增量量采集

属性	默认值	说明
channels		需要对接Channel
type		必须给TAILDIR
filegroups		以空格分隔的文件组列表。
filegroups.		文件组的绝对路径。正则表达式（而非文件系统模式）只能用于文件名
positionFile	~/.flume/taildir_position.json	记录采集文件的位置信息，实现增量采集

# 声明基本组件 Source Channel Sink example5.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = TAILDIR
a1.sources.s1.filegroups = g1 g2
a1.sources.s1.filegroups.g1 = /root/taildir/.*\.log$
a1.sources.s1.filegroups.g2 = /root/taildir/.*\.java$
a1.sources.s1.headers.g1.type = log
a1.sources.s1.headers.g2.type = java
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --
conf-file conf/example5.properties -Dflume.root.logger=INFO,console

Kafka Source

Sink-输出

Logger Sink ：

通常⽤用于测试/调试⽬目的。

File Roll Sink:

可以将采集的数据写⼊入到本地⽂文件

# 声明基本组件 Source Channel Sink example6.properties
HDFS Sink: 可以将数据写⼊入到HDFS⽂文件系统。
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll
a1.sinks.sk1.sink.rollInterval = 0
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example6.properties

HDFS Sink:

可以将数据写⼊入到HDFS⽂文件系统。

# 声明基本组件 Source Channel Sink example7.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = hdfs
a1.sinks.sk1.hdfs.path = /flume-hdfs/%y-%m-%d
a1.sinks.sk1.hdfs.rollInterval = 0
a1.sinks.sk1.hdfs.rollSize = 0
a1.sinks.sk1.hdfs.rollCount = 0
a1.sinks.sk1.hdfs.useLocalTimeStamp = true
a1.sinks.sk1.hdfs.fileType = DataStream
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

Kafka Sink：

将数据写⼊入Kafka的Topic中

# 声明基本组件 Source Channel Sink example8.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.sk1.kafka.bootstrap.servers = CentOS:9092
a1.sinks.sk1.kafka.topic = topic01
a1.sinks.sk1.kafka.flumeBatchSize = 20
a1.sinks.sk1.kafka.producer.acks = 1
a1.sinks.sk1.kafka.producer.linger.ms = 1
a1.sinks.sk1.kafka.producer.compression.type = snappy
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

Avro Sink:

将数据写出给Avro Sourc

# 声明基本组件 Source Channel Sink example9.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.batchSize = 100
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = CentOS:9092
a1.sources.s1.kafka.topics = topic01
a1.sources.s1.kafka.consumer.group.id = g1
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = avro
a1.sinks.sk1.hostname = CentOS
a1.sinks.sk1.port = 44444
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
# 声明基本组件 Source Channel Sink example9.properties
a2.sources = s1
a2.sinks = sk1
a2.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a2.sources.s1.type = avro
a2.sources.s1.bind = CentOS
a2.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a2.sinks.sk1.type = file_roll
a2.sinks.sk1.sink.directory = /root/file_roll
a2.sinks.sk1.sink.rollInterval = 0
# 配置Channel通道，主要负责数据缓冲
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a2.sources.s1.channels = c1
a2.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file
conf/emple9.properties --name a2
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file
conf/emple9.properties --name a1
[root@CentOS kafka_2.11-2.2.0]# ./bin/kafka-console-producer.sh --broker-list
CentOS:9092 --topic topic01

Channel-通道

Memory Channel：

快将Source数据直接写入内存，不安全，可能会导致数据丢失

参数默认值说明
type 只可以写 memory
capacity 100 通道中存储的最⼤大事件数
transactionCapacity 100 每一次source或者Sink组件写入Channel或者读取Channel的批量

transactionCapacity <= capacity

参数	默认值	说明
type		只可以写 memory
capacity	100	通道中存储的最⼤大事件数
transactionCapacity	100	每一次source或者Sink组件写入Channel或者读取Channel的批量

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

JDBC Channel:

参数	默认值	说明
type		只可以写 jdbc
db.type	DERBY	数据库供应商，必须是DERBY

事件存储在数据库支持的持久性存储中。 JDBC通道当前支持嵌入式Derby。这是一种持久通道，非常适
合可恢复性很重要的流程。-存储非常重要的数据，的时候可以使用jdbc channel

a1.channels.c1.type = jdbc

Kafka Channel:

参数	默认值	说明
type		只可以写org.apache.flume.channel.kafka.KafkaChannel
kafka.bootstrap.servers		该通道使⽤用的Kafka集群中的Broker列列表
kafka.topic	flume-channel	该频道将使⽤用的Kafka主题
kafka.consumer.group.id	flume	Consumer⽤用于向Kafka注册的消费者组ID

将Source采集的数据写入外围系统的Kafka集群。

a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = CentOS:9092
a1.channels.c1.kafka.topic = topic_channel
a1.channels.c1.kafka.consumer.group.id = g1
# 声明基本组件 Source Channel Sink example10.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = CentOS:9092
a1.channels.c1.kafka.topic = topic_channel
a1.channels.c1.kafka.consumer.group.id = g1
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

File Channel

参数	默认值	说明
type		只可以写file
checkpointDir	~/.flume/file-channel/checkpoint	将存储检查点⽂文件的⽬目录
dataDirs	~/.flume/file-channel/data	⽤用逗号分隔的⽬目录列列表，⽤用于存储⽇日志⽂文件

使⽤用⽂文件系统作为通道的实现，能够实现对缓冲数据的持久化。

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /root/flume/checkpoint
a1.channels.c1.dataDirs = /root/flume/data

高级组件

拦截器

作⽤用于Source组件，对Source封装的Event数据进⾏行行拦截或者是装饰 ,Flume内建了了许多拦截器器：

Timestamp Interceptor：装饰类型，负责在Event Header添加时间信息
Host Interceptor：装饰类型，负责在Event Header添加主机信息
Static Interceptor：装饰类型,负责在Event Header添加⾃自定义key和value。
Remove Header Interceptor:装饰类型,负责删除Event Header中指定的 key
UUID Interceptor：装饰类型，负责在Event Header添加uuid的随机的唯⼀字符串
Search and Replace Interceptor：装饰类型，负责搜索EventBody的内容，并且将匹配的内容进⾏替换
Regex Filtering Interceptor：拦截类型，将满⾜足正则表达式的内容进⾏行行过滤或者匹配。
Regex Extractor Interceptor：装饰类型，负责搜索EventBody的内容，并且将匹配的内容添加到Event Header⾥面
测试装饰拦截器

i1:时间戳 i2:主机名 i3:静态key,value i4:UUID i5:remove_heade 删除连接 i6:search_replace替换

# 声明基本组件 Source Channel Sink example11.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 添加拦截器器
a1.sources.s1.interceptors = i1 i2 i3 i4 i5 i6
a1.sources.s1.interceptors.i1.type = timestamp
a1.sources.s1.interceptors.i2.type = host
a1.sources.s1.interceptors.i3.type = static
a1.sources.s1.interceptors.i3.key = from
a1.sources.s1.interceptors.i3.value = baizhi
a1.sources.s1.interceptors.i4.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
a1.sources.s1.interceptors.i4.headerName = uuid
a1.sources.s1.interceptors.i5.type = remove_header
a1.sources.s1.interceptors.i5.withName = from
a1.sources.s1.interceptors.i6.type = search_replace
a1.sources.s1.interceptors.i6.searchPattern = ^jiangzz
a1.sources.s1.interceptors.i6.replaceString = baizhi
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

测试过滤和抽取拦截器


# 声明基本组件 Source Channel Sink example12.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 添加拦截器器
a1.sources.s1.interceptors = i1 i2
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = loglevel
a1.sources.s1.interceptors.i2.type = regex_filter
a1.sources.s1.interceptors.i2.regex = .*baizhi.*
a1.sources.s1.interceptors.i2.excludeEvents = false
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

i1:regex_extracto设置权限 i2： regex_filter过滤

通道选择器

当一个Source组件对接多个Channel组件的时候， 通道选择器 决定了Source的数据如何路由到Channel
中，如果用户不指定通道选择器，默认系统会将Source数据广播给所有的Channel（默认使用replicating
模式）。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MKHL1oMY-1587022286982)(D:\大数据\BigData\05.flume\assets\2020-02-06_211646.png)]

# 声明基本组件 Source Channel Sink example13.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

1、如果用户配置HIVE_HOME环境，需要用户移除hive的lib下的derby或者flume的lib下的derby（仅
仅删除⼀方即可）
2、默认情况下，flume使用的是复制|广播模式的通道选择器。

等价写法:

# 声明基本组件 Source Channel Sink example14.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器器 复制模式
a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ZDL7Fuf0-1587022286983)(D:\大数据\BigData\05.flume\assets\2020-02-06_212006.png)]

# 声明基本组件 Source Channel Sink example15.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器器 复制模式
a1.sources.s1.selector.type = multiplexing
a1.sources.s1.channels = c1 c2
a1.sources.s1.selector.header = level
a1.sources.s1.selector.mapping.INFO = c1
a1.sources.s1.selector.mapping.ERROR = c2
a1.sources.s1.selector.default = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = level
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

Sink Processors

Flume使⽤用Sink Group将多个Sink实例例封装成⼀一个逻辑的Sink组件，内部通过Sink Processors实现Sink
Group的故障和负载均衡。

Load balancing Sink Processor

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tZoNDQDM-1587022286983)(D:\大数据\BigData\05.flume\assets\2020-02-06_212325.png)]

# 声明基本组件 Source Channel Sink example16.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk1.sink.batchSize = 1
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
a1.sinks.sk2.sink.batchSize = 1
# 配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = sk1 sk2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c1

如果想看到负载均衡效果， sink.batchSize 和 transactionCapacity 必须配置成1

Failover Sink Processor

# 声明基本组件 Source Channel Sink example17.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk1.sink.batchSize = 1
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
a1.sinks.sk2.sink.batchSize = 1
# 配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = sk1 sk2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.sk1 = 20
a1.sinkgroups.g1.processor.priority.sk2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# 配置Channel通道，主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c1

不会秃头的小白菜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Apache Flume

Apzche FulmeFlume 是什么Flume是一种分布式，可靠且可用的服务，用于有效地收集，聚合和移动大量日志数据。Flume构建在⽇志流之上一个简单灵活的架构。它具有可靠的可靠性机制和许多故障转移和恢复机制，具有强⼤的容错性。使用Flume这套架构实现对日志流数据的实时在线分析。Flume支持在日志系统中定制各类数据发送⽅，用于收集数据；同时，Flume提供对数据进行简单处理，并写到...
复制链接

扫一扫

专栏目录