采集数据到HDFS
- 安装flume
在虚拟机hdp-1中, 打开SFTP-hdp-1窗口,将fllume压缩包导入到虚拟机hdp-1的/root/目录中.
解压flume压缩包到/root/apps/下,命令:
tar -xvzf apache-flume-1.6.0-bin.tar.gz -C apps/
并将apache-flume-1.6.0-bin文件夹重命名为flume-1.6.0,
命令为 mv apache-flume-1.6.0-bin flume-1.6.0
2.配置flume
进入/root/ apps/flume-1.6.0/下新建文件dir-hdfs.conf和tail-hdfs.conf
dir-hdfs.conf文件内容
ag1.sources = source1
ag1.sinks = sink1
ag1.channels = channel1
ag1.sources.source1.type = spooldir
ag1.sources.source1.spoolDir = /root/log
ag1.sources.source1.fileSuffix=.FINISHED
ag1.sources.source1.deserializer.maxLineLength=5129
ag1.sinks.sink1.type = hdfs
ag1.sinks.sink1.hdfs.path =hdfs://hdp-1:9000/access_log/%y-%m-%d/%H-%M
ag1.sinks.sink1.hdfs.filePrefix = app_log
ag1.sinks.sink1.hdfs.fileSuffix = .log
ag1.sinks.sink1.hdfs.batchSize= 100
ag1.sinks.sink1.hdfs.fileType = DataStream
ag1.sinks.sink1.hdfs.writeFormat = Text
ag1.sinks.sink1.hdfs.rollSize = 512000
ag1.sinks.sink1.hdfs.rollCount = 1000000
ag1.sinks.sink1.hdfs.rollInterval = 60
ag1.sinks.sink1.hdfs.round = true
ag1.sinks.sink1.hdfs.roundValue = 10
ag1.sinks.sink1.hdfs.roundUnit = minute
ag1.sinks.sink1.hdfs.useLocalTimeStamp = true
ag1.channels.channel1.type = memory
ag1.channels.channel1.capacity = 500000
ag1.channels.channel1.transactionCapacity = 600
ag1.sources.source1.channels = channel1
ag1.sinks.sink1.channel = channel1
tail-hdfs.conf文件内容
ag1.sources = source1
ag1.sinks = sink1
ag1.channels = channel1
ag1.sources.source1.type = exec
ag1.sources.source1.command = tail -F /root/log/access.log
ag1.sinks.sink1.type = hdfs
ag1.sinks.sink1.hdfs.path =hdfs://hdp-1:9000/access_log/%y-%m-%d/%H-%M
ag1.sinks.sink1.hdfs.filePrefix = app_log
ag1.sinks.sink1.hdfs.fileSuffix = .log
ag1.sinks.sink1.hdfs.batchSize= 100
ag1.sinks.sink1.hdfs.fileType = DataStream
ag1.sinks.sink1.hdfs.writeFormat = Text
ag1.sinks.sink1.hdfs.rollSize = 512000
ag1.sinks.sink1.hdfs.rollCount = 1000000
ag1.sinks.sink1.hdfs.rollInterval = 60
ag1.sinks.sink1.hdfs.round = true
ag1.sinks.sink1.hdfs.roundValue = 10
ag1.sinks.sink1.hdfs.roundUnit = minute
ag1.sinks.sink1.hdfs.useLocalTimeStamp = true
ag1.channels.channel1.type = memory
ag1.channels.channel1.capac