flume tail-dir source实现断点续传采集

最新推荐文章于 2021-05-07 10:11:35 发布

爱吃甜食_

最新推荐文章于 2021-05-07 10:11:35 发布

阅读量1.4k

点赞数

分类专栏： flume

本文链接：https://blog.csdn.net/a3125504x/article/details/108261737

版权

flume断点续传

tail-dir source实现断点续传采集
- tail-dir的优势
- flume配置文件
- agent
- source
- channel
- sink
- 整合
- 示例

tail-dir source实现断点续传采集

tail-dir 使用flume内置json文件记录读取位置，实现了断点续传，避免了flume宕机后重启的脏数据问题。

tail-dir的优势

可以监控多个目录
可以使用正则表达式监控不断变化的文件名
可以检测目标文件不断追加的内容
需求
采集需求，使用tail-dirsource监听某个目录下的多个文件，并且实现文件的断点续传功能

flume配置文件

vim tail-dir.conf

agent

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

source

# Describe/configure the source

a1.sources.r1.type = TAILDIR
## 用于记录每个节点的绝对路径和每次最后的读取位置
a1.sources.r1.positionFile = /test/apache-flume-1.6.0-cdh5.14.2-bin/taildir_position.json

## 指定filegroups，可以有多个，以空格分隔；（TailSource可以同时监控tail多个目录中的文件）
a1.sources.r1.filegroups = f1 f2
## 监控目录的绝对路径，监控的文件名可以使用正则表达式
a1.sources.r1.filegroups.f1 = /test/dirfile/*.log
## 监控目录的绝对路径
a1.sources.r1.filegroups.f1 = /test/dirfile2/test.log

## 一次读取源文件和一次向channel发送数据的行数，一般使用默认的100就可以
a1.sources.ri.maxBatchCount = 1000

关于filegroup的补充

channel

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.ch

最低0.47元/天解锁文章

爱吃甜食_

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
flume tail-dir source实现断点续传采集

flume断点续传tail-dir source实现断点续传采集flume配置文件agentsourcechannelsink整合tail-dir source实现断点续传采集tail-dir 使用flume内置json文件记录读取位置，实现了断点续传，避免了flume宕机后重启的脏数据问题。tail-dir的优势可以监控多个目录可以使用正则表达式监控不断变化的文件名需求采集需求，使用tail-dirsource监听某个目录下的多个文件，并且实现文件的断点续传功能flume配置文件v
复制链接

扫一扫