flume日志采集

flume.apache.org

flume概念解析

Flume是一种分布式的,能够有效地收集,聚合和移动大量日志数据的工具。flume有着可靠的故障转移和恢复机制,具有强大的容错性。
flume有两个版本,Flume-og和Flume-ng,本次使用的是
apache-flume-1.9.0-bin.tar.gz。

Flume架构


在这里插入图片描述

Ageng是最小的日志收集单元,所谓flume的日志采集是通过拼接若干个Agent完成的。
agent中的source从web server中提取原生日志流,通过通道拦截器进行拦截和装饰Event,再进入通道选择器进行复制和分流,然后进入channel通道,再分别进入不同的SinkGroup(负载均衡,故障转移)组,最后进入kafka集群或者hdfs文件处理系统。

Flume安装

1.保证jdk1.8版本正常运行,并且配置JAVA_HOME环境变量
2.安装Flume                ---网址   flume.apache.org  左侧download
tar -zxvf  apache-flume-1.9.0-bin.tar.gz -C /usr/soft/
cd /usr/soft/apache-flume-1.9.0-bin/
#验证是否安装成功      执行./bin/flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9

Agent的配置模板

#声明组件信息
<Agent>.sources=<source1> <source2>
<Agent>.sinks = <Sink1> <Sink1>
<Agent>.channels = <Channel1> <Channel2>

#配置组件信息
<Agent>.sources.<Source>.<someProperty> = <someValue>
<Agent>.channels.<Channel>.<someProperty> = <someValue>
<Agent>.sinks.<Sink>.<someProperty> = <someValue>

#链接组件
<Agent>.sources.<Source>.channels = <Channel1> <Channel2>
<Agent>.sinks.<Sink>.channel = <Channel1>

模板结构是必须掌握的,掌握该模板的⽬的是为了便于后期的查阅和配置。

简单案例解析

1.配置flume的配置文件

e1.properties 单个Agent的配置,将此配置文件放在flume安装目录的conf目录下。

#声明基本组件
a1.sources = sr1
a1.sinks = sk1
a1.channels = c1
#配置组件信息,从netcat中接收数据     ---去网站上面查找对应的配置信息
a1.sources.sr1.type = netcat
a1.sources.sr1.bind = centos
a1.sources.sr1.port = 44444
#配置Sink组件信息,将数据打印在日志控制台    ---一般测试调试使用
a1.sinks.sk1.type = logger
#配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000             --容纳1000个event
a1.channels.c1.transactionCapacity = 100   --一次传输100个event
#进行组件链接
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

为了测试需要安装netcat服务,在linux系统下执行如下命令:

yum -y install nmap-ncat

yum -y install telnet

配置组件信息的网址为:flume.apache.org 左侧选择document,

在这里插入图片描述

选择第一个Flume User Guide即可

2.启动a1 采集组件
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/e1.properties -Dflume.root.logger=INFO,console
#agent 必要参数
#conf flume配置文件位置
#name agent名称
#conf-file 具体哪一个Agent配置文件
#-Dflume.root.logger=INFO,console 将对应sink的输出,将输出打印在日志控制台上

附注启动命令参数

[root@centos apache-flume-1.9.0-bin]# ./bin/flume-ng help
3.测试a1
[root@CentOS apache-flume-1.9.0-bin]# telnet CentOS 44444
Trying 192.168.52.134...
Connected to CentOS.
Escape character is '^]'.
hello world
2020-02-05 11:44:43,546 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO -
org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{}
body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 0D hello world. }

组件概述

source-输入源

1.Avro Source

内部启动一个Avro的服务器,用于接收Avro client的请求,并且将存储的数据保存在Channel中

Avro类似http的一种传输协议,只有符合Avro的才能相互传递。

属性默认值说明
channels需要对接的channel
type对应source组件的类型,必须为 avro
bind绑定的ip(服务器主机ip)
port绑定监听的端口号

在flume的conf中创建文件e2.properties

# 声明基本组件
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = avro
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

启动a1对应的文件 —e2.properties

./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/e2.properties -Dflume.root.logger=INFO,console

由于设置avro source,所以输入端必须为avro client,可以通过:

[root@centos apache-flume-1.9.0-bin]# ./bin/flume-ng help命令查询执行avro client 所需要的参数

[root@centos apache-flume-1.9.0-bin]# ./bin/flume-ng avro-client --host centos --port 44444 --filename /root/t_user
2.Exec Source

可以将指令在控制台的输出采集出来

属性默认值说明
channels需要对接Channel
type必须为 exec
command要执⾏的命令

在flume的conf中创建配置文件e3.properties

#声明基本组件
a1.sources = sr1
a1.sinks = sk1
a1.channels = c1
#配置组件信息
a1.sources.sr1.type = exec
a1.sources.sr1.command = tail -f /root/t_user
#配置sink组件信息
a1.sinks.sk1.type = logger
#配置channel组件信息
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#进行组件间的链接
a1.sources.sr1.channels = c1
a1.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/e3.properties -Dflume.root.logger=INFO,console

注意:此类型的source组件可以进行动态采集命令结果信息,但每次执行agent都会从头执行一次,无法实现增量采集。

3.Spooling Directory Source

动态采集静态目录下,新增的文本文件,采集完成后会修改文件后缀,但是不会删除采集的源文件,会更改源文件名称,如果用户想要采集一次,可以通过修改该source默认行为来实现。

属性默认值说明
channels对接的Channel
type必须修改为 spooldir
spoolDir给定需要采集的⽬录
fileSuffix.COMPLETED使⽤该值修改采集完成⽂件名
deletePolicynever可选值 never / immediate
includePattern^.*$表示匹配所有⽂件
ignorePattern^$表示不匹配的⽂件
# 声明基本组件 Source Channel Sink example4.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = spooldir
a1.sources.s1.spoolDir = /root/spooldir
a1.sources.s1.fileHeader = true
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

启动a1

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/e4.properties -Dflume.root.logger=INFO,console
4.Taildir Source

实时监测动态文本文件的追加行,并且记录采集文件读取位置的偏移量,即下一次再次采集,可以实现增量采集(从上一次采集过的位置进行采集)。

positionFile:用来记录采集文件的位置,如果不想实现增量采集,可以直接删除 ~/.flume/ 此目录下的文件。

属性默认值说明
channels对接的通道
type必须指定为 TAILDIR
filegroups以空格分隔的⽂件组列表。
filegroups.⽂件组的绝对路径。正则表达式(⽽⾮⽂件系统模式)只能⽤于⽂件名。
positionFile~/.flume/taildir_position.json记录采集⽂件的位置信息,实现增量采集

在flume中创建e5.properties

# 声明基本组件 Source Channel Sink example5.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = TAILDIR
a1.sources.s1.filegroups = g1 g2
a1.sources.s1.filegroups.g1 = /root/taildir/.*\.log$
a1.sources.s1.filegroups.g2 = /root/taildir/.*\.java$
#source采集event头信息
a1.sources.s1.headers.g1.type = log
#source采集event头信息
a1.sources.s1.headers.g2.type = java
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
#队列最多存储数据条数
a1.channels.c1.capacity = 1000
#sink最大收集数据条数
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

启动a1

[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --
conf-file conf/e5.properties -Dflume.root.logger=INFO,console
5.Kafka Source

Sink-输出

1.logger sink

通常用于测试和调试数据

2.File Roll Sink

可以将采集的数据写入到本地文件

属性默认值说明
channel对应的channel
type必须为 file_roll
sink.directory采集数据存储的地方
sink.rollInterval更换文件的时间(0—永不更换文件,30—30秒后更换文件)

在flume的conf中配置e6.properties

# 声明基本组件 Source Channel Sink example6.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll
a1.sinks.sk1.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/e6.properties
3.HDFS Sink

可以将数据写入到hdfs文件系统

属性默认值说明
channel要连接的channel
type必须为 hdfs
hdfs.path存储hdfs文件路径(eg hdfs://namenode/flume/webdata/)

在flume的conf目录下创建e7.properties

# 声明基本组件 Source Channel Sink example7.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = hdfs
a1.sinks.sk1.hdfs.path = /flume-hdfs/%y-%m-%d         #默认上传到本机的hdfs文件系统
a1.sinks.sk1.hdfs.rollInterval = 0					#hdfs文件系统不在本机拷贝hadoop目录,并配置hadoop
a1.sinks.sk1.hdfs.rollSize = 0					    #环境变量
a1.sinks.sk1.hdfs.rollCount = 0
a1.sinks.sk1.hdfs.useLocalTimeStamp = true
a1.sinks.sk1.hdfs.fileType = DataStream
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
4.Kafka Sink

将数据写⼊Kafka的Topic中

在flume的conf中创建配置文件e8.properties

# 声明基本组件 Source Channel Sink example8.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.sk1.kafka.bootstrap.servers = CentOS:9092
a1.sinks.sk1.kafka.topic = topic01
a1.sinks.sk1.kafka.flumeBatchSize = 20
a1.sinks.sk1.kafka.producer.acks = 1
a1.sinks.sk1.kafka.producer.linger.ms = 1
a1.sinks.sk1.kafka.producer.compression.type = snappy
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
5.Avro Sink

将数据写出给 Avro Source
在这里插入图片描述
avro sink 可以作为 avro client 输出端将采集的日志信息输入到另一个 avro source中去。

# 声明基本组件 Source Channel Sink example9.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.batchSize = 100
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = CentOS:9092
a1.sources.s1.kafka.topics = topic01
a1.sources.s1.kafka.consumer.group.id = g1
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = avro
a1.sinks.sk1.hostname = CentOS
a1.sinks.sk1.port = 44444
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
# 声明基本组件 Source Channel Sink example9.properties
a2.sources = s1
a2.sinks = sk1
a2.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a2.sources.s1.type = avro
a2.sources.s1.bind = CentOS
a2.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a2.sinks.sk1.type = file_roll
a2.sinks.sk1.sink.directory = /root/file_roll
a2.sinks.sk1.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a2.sources.s1.channels = c1
a2.sinks.sk1.channel = c1
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file
conf/emple9.properties --name a2
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file
conf/emple9.properties --name a1
[root@CentOS kafka_2.11-2.2.0]# ./bin/kafka-console-producer.sh --broker-list
CentOS:9092 --topic topic01
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file conf/emple9.properties --name a2
[root@CentOS apache-flume-1.9.0-bin]# ./bin/flume-ng agent --conf conf/ --conf-file conf/emple9.properties --name a1
[root@CentOS kafka_2.11-2.2.0]# ./bin/kafka-console-producer.sh --broker-list CentOS:9092 --topic topic01

Channel-通道

1.Memory Channel

,将 source 采集的数据直接写入内存,不安全,可能会丢失。

参数默认值说明
type只可以写 memory
capacitychannel通道内存储的event最大时间数
transactionCapacity每一次source写入或sink读取channel的最大批量

transactionCapacity <= capacity

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
2.JDBC Channel

事件存储在数据库中,支持持久性存储,并且支持事务。channel内置一个数据库 Derby,这是一个持久通道,非常适合可恢复型很重要的流程。存储很重要的数据时,使用 jdbc 存储 channel。

a1.channels.c1.type = jdbc
3.Kafka Channel

将source采集的数据存入外围的 Kafka 集群中。 —相当于一个消费者组

a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = centos:9092
a1.channels.c1.kafka.topic = topic_channel
a1.channels.c1.kafka.consumer.group.id = g1
4.File Channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /root/flume/checkpoint
a1.channels.c1.dataDirs = /root/flume/data

高级组件配置

1.通道拦截器

作用于 source 组件,对 source 组装的 Event 进行装饰拦截( Event 中包含 Event header 和 Event Body),Flume中内建了许多拦截器。

Tempstamp Interceptor:装饰类型,负责在 Event Header 中添加时间信息。

Host Interceptor:装饰类型,负责在 Event Header 中添加主机信息。

Static Interceptor:装饰类型,负责在 Event Header 中添加自定义的 Key 和 Value 类型。

Remove Header Interceptor:装饰类型,负责删除 Event Header 指定的 Key。

UUID Interceptor:装饰类型,负责在 Event Header 中添加uuid的随机的唯⼀字符串。

Search and Replace Interceptor:装饰类型,负责搜索EventBody的内容,并且将匹配的内容进⾏
替换。

Regex Filtering Interceptor:拦截类型,将中满⾜正则表达式的内容进⾏过滤或者匹配。

属性默认值说明
type必须为 regex_filter
regex要匹配的形式(正则表达式)
excludeEventsfalsefalse,匹配满足正则表达式的内容

Regex Extrator Interceptor:装饰类型,负责搜索EventBody的内容,并且将匹配的内容添加到
Event Header⾥⾯。

属性默认值说明
type必须为 regex_extractor
regex搜索的内容(正则)^(INFO|ERROR)
serializers将抽取的内容添加到header内,添加时key的值
案例解析
# 声明基本组件 Source Channel Sink example11.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = centos
a1.sources.s1.port = 44444
# 添加拦截器
a1.sources.s1.interceptors = i1 i2 i3 i4 i5 i6
a1.sources.s1.interceptors.i1.type = timestamp
a1.sources.s1.interceptors.i2.type = host
a1.sources.s1.interceptors.i3.type = static
#自定义的 event header 中的键值对
a1.sources.s1.interceptors.i3.key = from
a1.sources.s1.interceptors.i3.value = baizhi
a1.sources.s1.interceptors.i4.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
a1.sources.s1.interceptors.i4.headerName = uuid
a1.sources.s1.interceptors.i5.type = remove_header
a1.sources.s1.interceptors.i5.withName = from
#搜索 Event Body 中的内容,将匹配的数据进行替换
a1.sources.s1.interceptors.i6.type = search_replace
a1.sources.s1.interceptors.i6.searchPattern = ^jiangzz
a1.sources.s1.interceptors.i6.replaceString = baizhi
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

解析 Regex Filtering Interceptor,Regex Extrator Interceptor (Event body 的过滤和 信息在 header 中的扩展)

# 声明基本组件 Source Channel Sink example12.properties
a1.sources = s1
a1.sinks = sk1
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = centos
a1.sources.s1.port = 44444
# 添加拦截器
a1.sources.s1.interceptors = i1 i2
#将event body中满足条件的内容抽取到event header中,以loglevel作为key
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = loglevel
#只匹配带有baizhi的信息
a1.sources.s1.interceptors.i2.type = regex_filter
a1.sources.s1.interceptors.i2.regex = .*baizhi.*
a1.sources.s1.interceptors.i2.excludeEvents = false
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = logger
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1

2.通道选择器

当一个 Source 组件链接多个 Channel 时,通道选择器决定了Source的数据进入哪个channel通道中,如果用户不指定通道选择器,系统默认会将source中的数据广播给所有Channel(默认使用replicating)。

replicating

在这里插入图片描述

配置信息

# 声明基本组件 Source Channel Sink example13.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = centos
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bJu7yEUW-1580996073271)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1580989488084.png)]

1.如果系统中已经安装过hive,hive的lib目录下的derby-10.14.1.0.jar会与flume的lib目录中的derby-10.14.1.0.jar产生冲突,移走flume中lib包下的jar包即可。

如果不手动指定通道选择器的类型,会默认使用复制|广播模式的选择器。

等价配置

# 通道选择器 复制模式
a1.sources.s1.selector.type = replicating
a1.sources.s1.channels = c1 c2
Multiplexing

使用这种选择器,会将source中的数据分流给不同的channel通道中,最后进入不同的sink group。

在这里插入图片描述
配置文件

# 声明基本组件 Source Channel Sink example15.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1 c2
# 通道选择器 复制模式
a1.sources.s1.selector.type = multiplexing
a1.sources.s1.channels = c1 c2
#source中event的header中键(level)对应的值分别为(INFO,ERROR)
a1.sources.s1.selector.header = level
a1.sources.s1.selector.mapping.INFO = c1
a1.sources.s1.selector.mapping.ERROR = c2
a1.sources.s1.selector.default = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
#通道过滤器,对event的header进行装饰
a1.sources.s1.interceptors = i1
a1.sources.s1.interceptors.i1.type = regex_extractor
a1.sources.s1.interceptors.i1.regex = ^(INFO|ERROR)
a1.sources.s1.interceptors.i1.serializers = s1
a1.sources.s1.interceptors.i1.serializers.s1.name = level
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = jdbc
# 进⾏组件间的绑定
a1.sources.s1.channels = c1 c2
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c2

3,Sink Processors

Flume使用SInk group将多个Sink实例封装成一个逻辑上的Sink组件,内部使用Sink Processor实现SInk Group的负载均衡和故障转移。

Load balancing Sink Processor

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IY60N62h-1580996073275)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1580990447359.png)]

# 声明基本组件 Source Channel Sink example16.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk1.sink.batchSize = 1
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
a1.sinks.sk2.sink.batchSize = 1
# 配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = sk1 sk2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c1

如果想看到负载均衡效果, sink.batchSize 和 transactionCapacity 必须配置成1

Failover Sink Processor
# 声明基本组件 Source Channel Sink example17.properties
a1.sources = s1
a1.sinks = sk1 sk2
a1.channels = c1
# 配置Source组件,从Socket中接收⽂本数据
a1.sources.s1.type = netcat
a1.sources.s1.bind = CentOS
a1.sources.s1.port = 44444
# 配置Sink组件,将接收数据打印在⽇志控制台
a1.sinks.sk1.type = file_roll
a1.sinks.sk1.sink.directory = /root/file_roll_1
a1.sinks.sk1.sink.rollInterval = 0
a1.sinks.sk1.sink.batchSize = 1
a1.sinks.sk2.type = file_roll
a1.sinks.sk2.sink.directory = /root/file_roll_2
a1.sinks.sk2.sink.rollInterval = 0
a1.sinks.sk2.sink.batchSize = 1
# 配置Sink Porcessors
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = sk1 sk2
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.sk1 = 20
a1.sinkgroups.g1.processor.priority.sk2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# 配置Channel通道,主要负责数据缓冲
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1
# 进⾏组件间的绑定
a1.sources.s1.channels = c1
a1.sinks.sk1.channel = c1
a1.sinks.sk2.channel = c1

Flume 应用集成 API

原生API集成 Flume-SDK

应用集成时要保证 avro source 进行接收

<dependency>
    <groupId>org.apache.flume</groupId>
    <artifactId>flume-ng-sdk</artifactId>
    <version>1.9.0</version>
</dependency>

在test目录中建立class

package com.baizhi;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.util.HashMap;
import java.util.Map;

public class RpcClientTest {
    private RpcClient client;
    @Before
    public void before(){
        client= RpcClientFactory.getDefaultInstance("centos",44444);
    }

    @Test
    public void testClient() throws EventDeliveryException {
        Event event= EventBuilder.withBody("this is a demo".getBytes());
        Map<String,String> map=new HashMap<String, String>();
        map.put("from","world");
        event.setHeaders(map);
        client.append(event);
    }

    @After
    public void after() {
        client.close();
    }
}

集成配置 —替换上面before中的内容

// Setup properties for the failover
Properties props = new Properties();
props.put("client.type", "default_failover");

// List of hosts (space-separated list of user-chosen host aliases)
props.put("hosts", "h1 h2 h3");

// host/port pair for each host alias
String host1 = "host1.example.org:41414";
String host2 = "host2.example.org:41414";
String host3 = "host3.example.org:41414";
props.put("hosts.h1", host1);
props.put("hosts.h2", host2);
props.put("hosts.h3", host3);

props.put("host-selector", "random"); // For random host selection
// props.put("host-selector", "round_robin"); // For round-robin host
//                                            // selection
props.put("backoff", "true"); // Disabled by default.

props.put("maxBackoff", "10000"); // Defaults 0, which effectively
                                  // becomes 30000 ms

// create the client with failover properties
RpcClient client = RpcClientFactory.getInstance(props);

lo4j集成

<dependency>
    <groupId>org.apache.flume</groupId>
    <artifactId>flume-ng-sdk</artifactId>
    <version>1.9.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flume.flume-ng-clients</groupId>
    <artifactId>flume-ng-log4jappender</artifactId>
    <version>1.9.0</version>
</dependency>
<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.17</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12</artifactId>
    <version>1.7.5</version>
</dependency>

单机

log4j.rootLogger=debug,FLUME
        
log4j.appender.flume=org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = 192.168.40.129
log4j.appender.flume.Port = 44444
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%p %d{yyyy-MM-dd HH:mm:ss} %c %m%n

负载均衡配置

log4j.rootLogger=debug,FLUME
        
log4j.appender.flume= org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppender
log4j.appender.flume.Hosts = 192.168.40.129:44444,...
log4j.appender.flume.Selector = ROUND_ROBIN
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%p %d{yyyy-MM-dd HH:mm:ss} %  c %m%n
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

public class TestLog4j {
    private static Log log= LogFactory.getLog(TestLog4j.class);
    public static void main(String[] args) {
        log.debug("你好!_debug");
        log.info("你好!_info");
        log.warn("你好!_warn");
        log.error("你好!_error");
    }
}

集成springboot

将资料中springboot-flume.zip的com文件夹拷贝到项目中,再将资料中logback.xml拷贝到项目的resources目录中

https://github.com/gilt/logback-flume-appender

ss} %c %m%n


负载均衡配置

```properties
log4j.rootLogger=debug,FLUME
        
log4j.appender.flume= org.apache.flume.clients.log4jappender.LoadBalancingLog4jAppender
log4j.appender.flume.Hosts = 192.168.40.129:44444,...
log4j.appender.flume.Selector = ROUND_ROBIN
log4j.appender.flume.UnsafeMode = true
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%p %d{yyyy-MM-dd HH:mm:ss} %  c %m%n
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

public class TestLog4j {
    private static Log log= LogFactory.getLog(TestLog4j.class);
    public static void main(String[] args) {
        log.debug("你好!_debug");
        log.info("你好!_info");
        log.warn("你好!_warn");
        log.error("你好!_error");
    }
}

集成springboot

将资料中springboot-flume.zip的com文件夹拷贝到项目中,再将资料中logback.xml拷贝到项目的resources目录中

https://github.com/gilt/logback-flume-appender

在logback.xml中添加对应的append即可,append可在上面的网址中进行查询。

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值