Flume | 数据监控以及传输性能优化

最新推荐文章于 2023-03-19 20:46:35 发布

给我一杯珍珠奶茶

最新推荐文章于 2023-03-19 20:46:35 发布

阅读量1k

点赞数 1

文章标签：大数据 flume

本文链接：https://blog.csdn.net/qq_44249833/article/details/106794437

版权

测试环境

阿里云学生机：2核4G 1Mbps
Vm虚拟机：2核6G 100Mbps
数据量： 380w+
测试source： spooldir
测试channel： memory channel
测试sink： hdfs sink

初始配置文件(flume默认值)

test1.conf

a1.sources = source1
a1.channels = channel1
a1.sinks = sink1 sink2 sink3

#Define a memory channel called channel1 on a1
a1.channels.channel1.type = memory
a1.channels.channel1.capacity = 100
a1.channels.channel1.transactionCapacity = 100

# Define an Exec source called source1 on a1 and tell it
a1.sources.source1.channels = channel1
a1.sources.source1.type = spooldir
a1.sources.source1.spoolDir = /home/hadoop/Downloads/taobao
a1.sources.source1.batchSize = 100

#Define an File Roll Sink called sink1 on a1
a1.sinks.sink1.channel = channel1
a1.sinks.sink1.type = hdfs
#sink类型是hdfs
a1.sinks.sink1.hdfs.path = hdfs://172.17.51.183:9000/from-WebServer/
#sink接收到源数据后写到哪个目录下面
a1.sinks.sink1.hdfs.filePrefix = log.
#写入hdfs里面的文件前缀
a1.sinks.sink1.hdfs.rollInterval = 30
#多少秒产生一个新的文件，这里是30秒产生一个新的文件
a1.sinks.sink1.hdfs.rollSize = 134200000
#rollSize设置为0表示不会根据文件大小滚动切割
a1.sinks.sink1.hdfs.rollCount = 0
#根据写入文件的event数量来滚动，0就是不根据这个滚动。
a1.sinks.sink1.hdfs.minBlockReplicas = 1
# 指定每个HDFS块的最小数量的副本。 如果未指定，则它来

最低0.47元/天解锁文章

给我一杯珍珠奶茶

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
Flume | 数据监控以及传输性能优化

测试环境阿里云学生机：2核4G 1MbpsVm虚拟机：2核6G 100Mbps数据量： 380w+测试source： spooldir测试channel： memory channel测试sink： hdfs sink初始配置文件(flume默认值)test1.confa1.sources = source1a1.channels = channel1a1.sinks = sink1 sink2 sink3#Define a memory channel called chann
复制链接

扫一扫