04案例-实时读取本地文件

最新推荐文章于 2023-03-01 11:58:13 发布

hao难懂

最新推荐文章于 2023-03-01 11:58:13 发布

阅读量359

点赞数

分类专栏： flume basic 文章标签： flume 实时读取

本文链接：https://blog.csdn.net/ExclusiveName/article/details/100135710

版权

flume basic 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

实时读取本地文件到HDFS案例

案例需求:实时监控Hive日志，并上传到HDFS中

实现步骤：

1.Flume想要将数据输出到HDFS，必须持有Hadoop相关jar包

将commons-configuration-1.6.jar、hadoop-auth-2.7.2.jar、hadoop-common-2.7.2.jar、
hadoop-hdfs-2.7.2.jar、commons-io-2.4.jar、htrace-core-3.1.0-incubating.jar
拷贝到/root/app/flume/lib文件夹下。

2.创建"flume-file-hdfs.conf"文件并修改

touch flume-file-hdfs.conf

vim flume-file-hdfs.conf

	# Name the components on this agent
	a2.sources = r2
	a2.sinks = k2
	a2.channels = c2

	# Describe/configure the source
	a2.sources.r2.type = exec
	a2.sources.r2.command = tail -F /opt/module/hive/logs/hive.log
	a2.sources.r2.shell = /bin/bash -c

	# Describe the sink
	a2.sinks.k2.type = hdfs
	a2.sinks.k2.hdfs.path = hdfs://hadoop102:9000/flume/%Y%m%d/%H
	#上传文件的前缀
	a2.sinks.k2.hdfs.filePrefix = logs-
	#是否按照时间滚动文件夹
	a2.sinks.k2.hdfs.round = true
	#多少时间单位创建一个新的文件夹
	a2.sinks.k2.hdfs.roundValue = 1
	#重新定义时间单位
	a2.sinks.k2.hdfs.roundUnit = hour
	#是否使用本地时间戳
	a2.sinks.k2.hdfs.useLocalTimeStamp = true
	#积攒多少个Event才flush到HDFS一次
	a2.sinks.k2.hdfs.batchSize = 1000
	#设置文件类型，可支持压缩
	a2.sinks.k2.hdfs.fileType = DataStream
	#多久生成一个新的文件
	a2.sinks.k2.hdfs.rollInterval = 600
	#设置每个文件的滚动大小
	a2.sinks.k2.hdfs.rollSize = 134217700
	#文件的滚动与Event数量无关
	a2.sinks.k2.hdfs.rollCount = 0
	#最小冗余数
	a2.sinks.k2.hdfs.minBlockReplicas = 1

	# Use a channel which buffers events in memory
	a2.channels.c2.type = memory
	a2.channels.c2.capacity = 1000
	a2.channels.c2.transactionCapacity = 100

	# Bind the source and sink to the channel
	a2.sources.r2.channels = c2
	a2.sinks.k2.channel = c2

3.执行监控配置

	bin/flume-ng agent --conf conf/ --name a2 
	--conf-file job/flume-file-hdfs.conf

4.开启Hadoop和hive并操作hive产生日志

	sbin/start-dfs.sh
	sbin/start-yarn.sh
	bin/hive

5.在HDFS上查看文件

	"node01:50070"

hao难懂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
04案例-实时读取本地文件

实时读取本地文件到HDFS案例案例需求:实时监控Hive日志，并上传到HDFS中实现步骤：1.Flume想要将数据输出到HDFS，必须持有Hadoop相关jar包将commons-configuration-1.6.jar、hadoop-auth-2.7.2.jar、hadoop-common-2.7.2.jar、hadoop-hdfs-2.7.2.jar、commons-io-...
复制链接

扫一扫