测试环境
阿里云学生机:2核4G 1Mbps
Vm虚拟机:2核6G 100Mbps
数据量: 380w+
测试source: spooldir
测试channel: memory channel
测试sink: hdfs sink
初始配置文件(flume默认值)
test1.conf
a1.sources = source1
a1.channels = channel1
a1.sinks = sink1 sink2 sink3
#Define a memory channel called channel1 on a1
a1.channels.channel1.type = memory
a1.channels.channel1.capacity = 100
a1.channels.channel1.transactionCapacity = 100
# Define an Exec source called source1 on a1 and tell it
a1.sources.source1.channels = channel1
a1.sources.source1.type = spooldir
a1.sources.source1.spoolDir = /home/hadoop/Downloads/taobao
a1.sources.source1.batchSize = 100
#Define an File Roll Sink called sink1 on a1
a1.sinks.sink1.channel = channel1
a1.sinks.sink1.type = hdfs
#sink类型是hdfs
a1.sinks.sink1.hdfs.path = hdfs://172.17.51.183:9000/from-WebServer/
#sink接收到源数据后写到哪个目录下面
a1.sinks.sink1.hdfs.filePrefix = log.
#写入hdfs里面的文件前缀
a1.sinks.sink1.hdfs.rollInterval = 30
#多少秒产生一个新的文件,这里是30秒产生一个新的文件
a1.sinks.sink1.hdfs.rollSize = 134200000
#rollSize设置为0表示不会根据文件大小滚动切割
a1.sinks.sink1.hdfs.rollCount = 0
#根据写入文件的event数量来滚动,0就是不根据这个滚动。
a1.sinks.sink1.hdfs.minBlockReplicas = 1
# 指定每个HDFS块的最小数量的副本。 如果未指定,则它来自类路径中的默认Hadoop配置。</