Apache Flume

Apache Fulme

Flume 是什么

Flume是一种分布式,可靠且可用的服务,用于有效地收集,聚合和移动大量日志数据。Flume构建在⽇志流之上一个简单灵活的架构。它具有可靠的可靠性机制和许多故障转移和恢复机制,具有强⼤的容错性。使用Flume这套架构实现对日志流数据的实时在线分析。Flume支持在日志系统中定制各类数据发送⽅,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。当前Flume有两个版本Flume 0.9X版本的统称Flume-og,Flume1.X版本的统称Flume-ng。由于Flume-ng经过重大重构,与Flume-og有很大不同,使用时请注意区分。本次课程使用的是apache-flume-1.9.0-bin.tar.gz

Fulme架构

在这里插入图片描述

Fulme安装

  • 安装JDK 1.8+ 配置JAVA_HOME环境变量量-略略
  • 安装Flume下载地址http://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
[root@CentOS ~]# tar -zxf apache-flume-1.9.0-bin.tar.gz -C /usr/
[root@CentOS ~]# cd /usr/apache-flume-1.9.0-bin/
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9

Agent配置模板说明

# 声明组件信息
<Agent>.sources = <Source1> <Source2>
<Agent>.sinks = <Sink1> <Sink1>
<Agent>.channels = <Channel1> <Channel2>
# 组件配置
<Agent>.sources.<Source>.<someProperty> = <someValue>
<Agent>.channels.<Channel>.<someProperty> = <someValue>
<Agent>.sinks.<Sink>.<someProperty> = <someValue>
# 链接组件
<Agent>.sources.<Source>.channels = <Channel1> <Channel2> ...
<Agent>.sinks.<Sink>.channel = <Channel1>

模板结构是必须掌握的,掌握该模板的⽬目的是为了了便便于后期的查阅和配置。
<Agent> 、 <Channel> 、 <Sink> 、 <Source> 表示组件的名字,系统有哪些可以使⽤用的组
件需要查阅⽂文档.

[查阅 :http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html]:

快速入门

① Agent配置

example1.properties单个Agent的配置,将该配置⽂文件放置在flume安装⽬目录下的conf⽬目录下。

# 声明基本组件 Source Channel Sink
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
1、安装⼀一下 yum -y install nmap-ncat ,这样⽅方便便后续的测试。
2、需要安装 yum -y install telnet ,⽅方便便做测试

②启动a1 采集组件

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example1.properties -Dflume.root.logger=INFO,console

附注启动命令参数

Usage: ./bin/flume-ng <command> [options]...
commands:
help display this help text
agent run a Flume agent
avro-client run an avro Flume client
version show Flume version info
global options:# 全局属性
--conf,-c <conf> use configs in <conf> directory
--classpath,-C <cp> append to the classpath
--dryrun,-d do not actually start Flume, just print the command
--plugins-path <dirs> colon-separated list of plugins.d directories. See the
plugins.d section in the user guide for more details.
Default: $FLUME_HOME/plugins.d
-Dproperty=value sets a Java system property value
-Xproperty=value sets a Java -X option
agent options:
--name,-n <name> the name of this agent (required)
--conf-file,-f <file> specify a config file (required if -z missing)
--zkConnString,-z <str> specify the ZooKeeper connection to use (required if -f
missing)
--zkBasePath,-p <path> specify the base path in ZooKeeper for agent configs
--no-reload-conf do not reload config file if changed
--help,-h display help text
avro-client options:
--rpcProps,-P <file> RPC client properties file with server connection params
--host,-H <host> hostname to which events will be sent
--port,-p <port> port of the avro source
--dirname <dir> directory to stream to avro source
--filename,-F <file> text file to stream to avro source (default: std input)
--headerFile,-R <file> File containing event headers as key/value pairs on each new
line
--help,-h display help text
Either --rpcProps or both --host and --port must be specified.
Note that if <conf> directory is specified, then it is always included first
in the classpath.

③测试a1

[root@CentOS apache-flume-1.9.0-bin]# telnet CentOS 44444
Trying 192.168.52.134...
Connected to CentOS.
Escape character is '^]'.
hello world
2020-02-05 11:44:43,546 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO -
org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{}
body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 0D hello world. }

组件概述

Source—输入源

Avro Source:

内部启动一个Avro 服务器,用于接收来自Avro Client的请求,并且将接收数据存储到Chanel中

属性默认值含义
channels需要对接Channel
type表示组件类型,必须给avro
bind绑定IP
prot绑定监听口
#声明组件
a1.sources = s1
# 配置组件
a1.sources.s1.type = avro
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 对接channel
a1.sources.s1.channels = c1
<Agent>.sources = <Source>
# 组件配置
<Agent>.sources.<Source>.<someProperty> = <someValue>
# 声明基本组件 Source Channel Sink example2.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = avro
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1



[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example2.properties -Dflume.root.logger=INFO,console

读取文件

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng avro-client --host CentOS --port 44444 --filename /root/t_emp
Exec Source :

可以将指令在控制台输出采集过来。

属性默认值含义
channels需要对接Channel
type必须指定为exec
command要执行的命令
# 声明基本组件 Source Channel Sink example3.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /root/t_user
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example3.properties -Dflume.root.logger=INFO,console

[root@CentOS ~]# tail -f t_user
Spooling Directory Source:

采集静态目录下,新增文本文件,采集完成后会修改文件后缀,但是不会删除采集的源文件,如果用户只想采集一次,可以修改该source默认行为。

属性默认值说明
channels需要对接Channel
type表示组件类型,必须改为spooldir
spoolDir给定需要采集的目录
fileSuffix.COMPLETED使用该值修改采集完成文件名
deletePolicynever可选值never/immediate
includePattern^.*$表示匹配所有文件
ignorePattern^$表示不匹配的文件
# 声明基本组件 Source Channel Sink example4.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /root/spooldir
a1.sources.s1.fileHeader = true
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example4.properties -Dflume.root.logger=INFO,console
Taildir Source:

实时监测动态⽂文本⾏行行的追加,并且记录采集的⽂文件读取的位置了了偏移量量,即使下⼀一
次再次采集,可以实现增量量采集

属性默认值说明
channels需要对接Channel
type必须给TAILDIR
filegroups以空格分隔的文件组列表。
filegroups.文件组的绝对路径。正则表达式(而非文件系统模式)只能用于文件名
positionFile~/.flume/taildir_position.json记录采集文件的位置信息,实现增量采集
# 声明基本组件 Source Channel Sink example5.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = TAILDIR
a1.sources.s1.filegroups = g1 g2
a1.sources.s1.filegroups.g1 = /root/taildir/.*\.log$
a1.sources.s1.filegroups.g2 = /root/taildir/.*\.java$
a1.sources.s1.headers.g1.type = log
a1.sources.s1.headers.g2.type = java
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --
conf-file conf/example5.properties -Dflume.root.logger=INFO,console
Kafka Source

Sink-输出

Logger Sink :

通常⽤用于测试/调试⽬目的。

File Roll Sink:

可以将采集的数据写⼊入到本地⽂文件

# 声明基本组件 Source Channel Sink example6.properties
HDFS Sink: 可以将数据写⼊入到HDFS⽂文件系统。
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll
a1.sinks.sk1.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/example6.properties

HDFS Sink:

可以将数据写⼊入到HDFS⽂文件系统。

# 声明基本组件 Source Channel Sink example7.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = hdfs
a1.sinks.sk1.hdfs.path = /flume-hdfs/%y-%m-%d
a1.sinks.sk1.hdfs.rollInterval = 0
a1.sinks.sk1.hdfs.rollSize = 0
a1.sinks.sk1.hdfs.rollCount = 0
a1.sinks.sk1.hdfs.useLocalTimeStamp = true
a1.sinks.sk1.hdfs.fileType = DataStream
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
Kafka Sink:

将数据写⼊入Kafka的Topic中

# 声明基本组件 Source Channel Sink example8.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.sk1.kafka.bootstrap.servers = CentOS:9092
a1.sinks.sk1.kafka.topic = topic01
a1.sinks.sk1.kafka.flumeBatchSize = 20
a1.sinks.sk1.kafka.producer.acks = 1
a1.sinks.sk1.kafka.producer.linger.ms = 1
a1.sinks.sk1.kafka.producer.compression.type = snappy
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
Avro Sink:

将数据写出给Avro Sourc

# 声明基本组件 Source Channel Sink example9.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.batchSize = 100
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = CentOS:9092
a1.sources.s1.kafka.topics = topic01
a1.sources.s1.kafka.consumer.group.id = g1
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = avro
a1.sinks.sk1.hostname = CentOS
a1.sinks.sk1.port = 44444
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
# 声明基本组件 Source Channel Sink example9.properties
a2.sources = s1
a2.sinks = sk1
a2.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a2.sources.s1.type = avro
a2.sources.s1.bind = CentOS
a2.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a2.sinks.sk1.type = file_roll
a2.sinks.sk1.sink.directory = /root/file_roll
a2.sinks.sk1.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a2.sources.s1.channels = c1
a2.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file
conf/emple9.properties --name a2
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file
conf/emple9.properties --name a1
[root@CentOS kafka_2.11-2.2.0]# ./bin/kafka-console-producer.sh --broker-list
CentOS:9092 --topic topic01

Channel-通道

Memory Channel:
  • 快 将Source数据直接写入内存,不安全,可能会导致数据丢失

    参数默认值说明
    type只可以写 memory
    capacity100通道中存储的最⼤大事件数
    transactionCapacity100每一次source或者Sink组件写入Channel或者读取Channel的批量

    transactionCapacity <= capacity

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
JDBC Channel:
参数默认值说明
type只可以写 jdbc
db.typeDERBY数据库供应商,必须是DERBY

事件存储在数据库支持的持久性存储中。 JDBC通道当前支持嵌入式Derby。这是一种持久通道,非常适
合可恢复性很重要的流程。-存储非常重要的数据,的时候可以使用jdbc channel

a1.channels.c1.type = jdbc
Kafka Channel:
参数默认值说明
type只可以写org.apache.flume.channel.kafka.KafkaChannel
kafka.bootstrap.servers该通道使⽤用的Kafka集群中的Broker列列表
kafka.topicflume-channel该频道将使⽤用的Kafka主题
kafka.consumer.group.idflumeConsumer⽤用于向Kafka注册的消费者组ID

将Source采集的数据写入外围系统的Kafka集群。

a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = CentOS:9092
a1.channels.c1.kafka.topic = topic_channel
a1.channels.c1.kafka.consumer.group.id = g1
# 声明基本组件 Source Channel Sink example10.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = CentOS:9092
a1.channels.c1.kafka.topic = topic_channel
a1.channels.c1.kafka.consumer.group.id = g1
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
File Channel
参数默认值说明
type只可以写file
checkpointDir~/.flume/file-channel/checkpoint将存储检查点⽂文件的⽬目录
dataDirs~/.flume/file-channel/data⽤用逗号分隔的⽬目录列列表,⽤用于存储⽇日志⽂文件

使⽤用⽂文件系统作为通道的实现,能够实现对缓冲数据的持久化。

a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /root/flume/checkpoint
a1.channels.c1.dataDirs = /root/flume/data

高级组件

拦截器

作⽤用于Source组件,对Source封装的Event数据进⾏行行 拦截 或者是 装饰 ,Flume内建了了许多拦截器器:

  • Timestamp Interceptor:装饰类型,负责在Event Header添加时间信息

  • Host Interceptor:装饰类型,负责在Event Header添加主机信息

  • Static Interceptor:装饰类型,负责在Event Header添加⾃自定义key和value。

  • Remove Header Interceptor:装饰类型,负责删除Event Header中指定的 key

  • UUID Interceptor:装饰类型,负责在Event Header添加uuid的随机的唯⼀字符串

  • Search and Replace Interceptor:装饰类型,负责搜索EventBody的内容,并且将匹配的内容进⾏替换

  • Regex Filtering Interceptor:拦截类型,将满⾜足正则表达式的内容进⾏行行过滤或者匹配。

  • Regex Extractor Interceptor:装饰类型,负责搜索EventBody的内容,并且将匹配的内容添加到Event Header⾥面

  • 测试装饰拦截器

    i1:时间戳 i2:主机名 i3:静态key,value i4:UUID i5:remove_heade 删除连接 i6:search_replace替换

# 声明基本组件 Source Channel Sink example11.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 添加拦截器器
a1.sources.s1.interceptors = i1 i2 i3 i4 i5 i6
a1.sources.s1.interceptors.i1.type = timestamp
a1.sources.s1.interceptors.i2.type = host
a1.sources.s1.interceptors.i3.type = static
a1.sources.s1.interceptors.i3.key = from
a1.sources.s1.interceptors.i3.value = baizhi
a1.sources.s1.interceptors.i4.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
a1.sources.s1.interceptors.i4.headerName = uuid
a1.sources.s1.interceptors.i5.type = remove_header
a1.sources.s1.interceptors.i5.withName = from
a1.sources.s1.interceptors.i6.type = search_replace
a1.sources.s1.interceptors.i6.searchPattern = ^jiangzz
a1.sources.s1.interceptors.i6.replaceString = baizhi
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
  • 测试过滤和抽取拦截器

# 声明基本组件 Source Channel Sink example12.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 添加拦截器器
a1.sources.s1.interceptors = i1 i2
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = loglevel
a1.sources.s1.interceptors.i2.type = regex_filter
a1.sources.s1.interceptors.i2.regex = .*baizhi.*
a1.sources.s1.interceptors.i2.excludeEvents = false
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

i1:regex_extracto设置权限 i2: regex_filter过滤

通道选择器

当一个Source组件对接多个Channel组件的时候, 通道选择器 决定了Source的数据如何路由到Channel
中,如果用户不指定通道选择器,默认系统会将Source数据广播给所有的Channel(默认使用replicating
模式)。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MKHL1oMY-1587022286982)(D:\大数据\BigData\05.flume\assets\2020-02-06_211646.png)]

# 声明基本组件 Source Channel Sink example13.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

1、如果用户配置HIVE_HOME环境,需要用户移除hive的lib下的derby或者flume的lib下的derby(仅
仅删除⼀方即可)
2、默认情况下,flume使用的是复制|广播模式的通道选择器。

等价写法:

# 声明基本组件 Source Channel Sink example14.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器器 复制模式
a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ZDL7Fuf0-1587022286983)(D:\大数据\BigData\05.flume\assets\2020-02-06_212006.png)]

# 声明基本组件 Source Channel Sink example15.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器器 复制模式
a1.sources.s1.selector.type = multiplexing
a1.sources.s1.channels = c1 c2
a1.sources.s1.selector.header = level
a1.sources.s1.selector.mapping.INFO = c1
a1.sources.s1.selector.mapping.ERROR = c2
a1.sources.s1.selector.default = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = level
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

Sink Processors

Flume使⽤用Sink Group将多个Sink实例例封装成⼀一个逻辑的Sink组件,内部通过Sink Processors实现Sink
Group的故障和负载均衡。

Load balancing Sink Processor

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tZoNDQDM-1587022286983)(D:\大数据\BigData\05.flume\assets\2020-02-06_212325.png)]

# 声明基本组件 Source Channel Sink example16.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk1.sink.batchSize = 1
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
a1.sinks.sk2.sink.batchSize = 1
# 配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = sk1 sk2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c1

如果想看到负载均衡效果, sink.batchSize 和 transactionCapacity 必须配置成1

Failover Sink Processor

# 声明基本组件 Source Channel Sink example17.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1
# 配置Source组件,从Socket中接收⽂文本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇日志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk1.sink.batchSize = 1
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
a1.sinks.sk2.sink.batchSize = 1
# 配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = sk1 sk2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.sk1 = 20
a1.sinkgroups.g1.processor.priority.sk2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1
# 进⾏行行组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c1
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值