flume 实时监控单个追加文件实时监控目录下的多个追加文件

最新推荐文章于 2024-01-04 09:05:43 发布

塞上江南o

最新推荐文章于 2024-01-04 09:05:43 发布

阅读量790

点赞数 3

分类专栏： Flume

本文链接：https://blog.csdn.net/qq_43192537/article/details/101712103

版权

Flume 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

✨注意：

若以下配置参数有什么不清楚的。可以看这个链接
flume 一个简单的官方案例：监控端口数据链接描述

✨实时监控单个追加文件 exec —>flume—>hdfs

案例需求：使用Flume实时监控Hive日志，并上传到HDFS中

在这里插入图片描述

0. 准备

Flume要想将数据输出到HDFS，必须持有Hadoop相关jar包,就类似通过java写数据到mysql，就需要jdbc

commons-configuration-1.6.jar、
hadoop-auth-2.7.2.jar、
hadoop-common-2.7.2.jar、
hadoop-hdfs-2.7.2.jar、
commons-io-2.4.jar、
htrace-core-3.1.0-incubating.jar
#注：如果hadoop的版本不一样，请更换这些jar包，方法自行百度

将上述jar包，拷贝到/opt/modules/flume-1.7.0/lib文件夹下。

私人下载地址
链接:https://caiyun.139.com/m/i?185CkuBAdN6dp
提取码:xwAr
复制内容打开和彩云手机APP，操作更方便哦

1. 进入到job文件夹编辑这个文件flume-file-hdfs.conf

job文件夹的由来链接

hive日志目录配置链接

[hadoop@hadoop201 job]$ vim hive-flume-hdfs.conf

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec
# /opt/moudle/hive-1.2.1/logs/hive.log换成你的hive日志目录
a2.sources.r2.command = tail -F /opt/modules/hive-1.2.1/logs/hive.log

# Describe the sink
a2.sinks.k2.type = hdfs
# hadoop201换成你的hdfs文件系统所在节点
a2.sinks.k2.hdfs.path = hdfs://hadoop201:9000/flume/%Y%m%d/%H
#上传文件的前缀
a2.sinks.k2.hdfs.filePrefix = logs-

#########与文件夹相关的滚动#########
#是否按照时间滚动文件夹
a2.sinks.k2.hdfs.round = true
#多少时间单位创建一个新的文件夹
a2.sinks.k2.hdfs.roundValue = 1
#重新定义时间单位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true

#积攒多少个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#设置文件类型，可支持压缩
a2.sinks.k2.hdfs.fileType = DataStream

#########与文件相关的滚动#########
#文件的滚动与Event数量无关
a2.sinks.k2.hdfs.rollCount = 0
#多久生成一个新的文件 单位s
a2.sinks.k2.hdfs.rollInterval = 10
#设置每个文件的滚动大小 比128m稍微小点
a2.sinks.k2.hdfs.rollSize = 134217700


# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100


# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

2. 执行监控配置

[hadoop@hadoop201 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/hive-flume-hdfs.conf

3. 运行hive

hive运行的前提是hdfs文件系统已在运行

[hadoop@hadoop201 hive-1.2.1]$ bin/hive

4. web端进行查看

在这里插入图片描述

✨实时监控目录下的多个追加文件 taildir —>flume—>logger

案例需求：使用Flume监听整个目录的实时追加文件，并打印到控制台

0. 重复上述步骤 0

1. 进入到job文件夹编辑这个文件taildir -flume-logger.conf

job文件夹的由来链接

hive日志目录配置链接

[hadoop@hadoop201 job]$ vim taildir-flume-logger.conf

# Name the components on this agent
a3.sources = r3 
a3.sinks = k3 
a3.channels = c3 
 
# Describe/configure the source 
a3.sources.r3.type = TAILDIR 
a3.sources.r3.positionFile = /opt/modules/flume-1.7.0/position/position.json 
a3.sources.r3.filegroups = f1 f2 
a3.sources.r3.filegroups.f1 = /opt/modules/flume-1.7.0/files/f1.txt 
a3.sources.r3.filegroups.f2 = /opt/modules/flume-1.7.0/files/f2.txt 


# Describe the sink
a3.sinks.k3.type = logger

 
# Use a channel which buffers events in memory 
a3.channels.c3.type = memory 
a3.channels.c3.capacity = 1000 
a3.channels.c3.transactionCapacity = 100 
 
 
# Bind the source and sink to the channel 
a3.sources.r3.channels = c3 
a3.sinks.k3.channel = c3

Taildir Source 维护了一个json格式的position File，其会定期的往 position File
中更新每个文件读取到的最新的位置，因此能够实现断点续传。

2. 创建文件

[hadoop@hadoop201 flume-1.7.0]$ mkdir files
[hadoop@hadoop201 flume-1.7.0]$ cd files/
[hadoop@hadoop201 files]$ touch f1.txt
[hadoop@hadoop201 files]$ touch f2.txt

3. 启动

[hadoop@hadoop201 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/taildir-flume-logger.conf -Dflume.root.logger=INFO,console

#新开一个窗口
[hadoop@hadoop201 flume-1.7.0]$ echo ccc >> files/f1.txt
[hadoop@hadoop201 flume-1.7.0]$ echo ddd >> files/f2.txt