Flume>采集案例(2)

最新推荐文章于 2023-11-08 10:59:15 发布

BigMoM1573

最新推荐文章于 2023-11-08 10:59:15 发布

阅读量684

点赞数

分类专栏： Flume 文章标签： Flume

本文链接：https://blog.csdn.net/qq_44509920/article/details/103408064

版权

文章目录

1、采集目录到HDFS
- 启动flume
- 具体代码
2、采集文件到HDFS
3、两个agent级联

1、采集目录到HDFS

需求分析
结构示意图：
在这里插入图片描述
采集需求：某服务器的某特定目录下，会不断产生新的文件，每当有新文件出现，就需要把文件采集到HDFS中去
根据需求，首先定义以下3大要素

数据源组件，即source ——监控文件目录 : spooldir
spooldir特性：
1、监视一个目录，只要目录中出现新文件，就会采集文件中的内容
2、采集完成的文件，会被agent自动添加一个后缀：COMPLETED
3、所监视的目录中不允许重复出现相同文件名的文件
下沉组件，即sink——HDFS文件系统 : hdfs sink
通道组件，即channel——可用file channel 也可以用内存channel
flume配置文件开发
配置文件编写：

cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

mkdir -p /export/servers/dirfile

vim spooldir.conf
# Name the components on this agent
a1.sources=r1
a1.channels=c1
a1.sinks=k1
# Describe/configure the source
##注意：不能往监控目中重复丢同名文件
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/export/dir
a1.sources.r1.fileHeader = true
# Describe the sink
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/
# Describe the channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
# Bind the source and sink to the channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

启动flume

bin/flume-ng agent -c ./conf -f ./conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console

上传文件到指定目录
将不同的文件上传到下面目录里面去，注意文件不能重名

cd /export/dir

具体代码

[root@node01 apache-flume-1.8.0-bin]# mkdir -p /export/install/dirfile
[root@node01 apache-flume-1.8.0-bin]# vi tmpconf/b1.conf
# Name the components on this agent
a1.sources=r1
a1.channels=c1
a1.sinks=k1
# Describe/configure the source
##注意：不能往监控目中重复丢同名文件
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/export/install/dirfile
a1.sources.r1.fileHeader = true
# Describe the sink
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/
# Describe the channel
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

最低0.47元/天解锁文章

BigMoM1573

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flume>采集案例(2)

文章目录1、采集目录到HDFS启动flume具体代码2、采集文件到HDFS定义flume的配置文件启动flume具体代码开发shell脚本定时追加文件内容3、两个agent级联第一步：node02安装flume第二步：node02配置flume配置文件第三步：node02开发定脚本文件往写入数据第四步：node03开发flume配置文件第五步：顺序启动具体代码1、采集目录到HDFS需求分析结...
复制链接

扫一扫