采集数据到HDFS

采集数据到HDFS

  1. 安装flume

在虚拟机hdp-1中, 打开SFTP-hdp-1窗口,将fllume压缩包导入到虚拟机hdp-1的/root/目录中.

解压flume压缩包到/root/apps/下,命令: 

 

tar -xvzf apache-flume-1.6.0-bin.tar.gz -C apps/

并将apache-flume-1.6.0-bin文件夹重命名为flume-1.6.0,

命令为 mv apache-flume-1.6.0-bin flume-1.6.0

2.配置flume

进入/root/ apps/flume-1.6.0/下新建文件dir-hdfs.conf和tail-hdfs.conf

dir-hdfs.conf文件内容

ag1.sources = source1
ag1.sinks = sink1
ag1.channels = channel1
ag1.sources.source1.type = spooldir    
ag1.sources.source1.spoolDir = /root/log
ag1.sources.source1.fileSuffix=.FINISHED   
ag1.sources.source1.deserializer.maxLineLength=5129  
ag1.sinks.sink1.type = hdfs
ag1.sinks.sink1.hdfs.path =hdfs://hdp-1:9000/access_log/%y-%m-%d/%H-%M
ag1.sinks.sink1.hdfs.filePrefix = app_log
ag1.sinks.sink1.hdfs.fileSuffix = .log
ag1.sinks.sink1.hdfs.batchSize= 100  
ag1.sinks.sink1.hdfs.fileType = DataStream   
ag1.sinks.sink1.hdfs.writeFormat = Text
ag1.sinks.sink1.hdfs.rollSize = 512000    
ag1.sinks.sink1.hdfs.rollCount = 1000000  
ag1.sinks.sink1.hdfs.rollInterval = 60   
ag1.sinks.sink1.hdfs.round = true
ag1.sinks.sink1.hdfs.roundValue = 10    
ag1.sinks.sink1.hdfs.roundUnit = minute   
ag1.sinks.sink1.hdfs.useLocalTimeStamp = true  
ag1.channels.channel1.type = memory
ag1.channels.channel1.capacity = 500000   
ag1.channels.channel1.transactionCapacity = 600  
ag1.sources.source1.channels = channel1
ag1.sinks.sink1.channel = channel1
tail-hdfs.conf文件内容
ag1.sources = source1
ag1.sinks = sink1
ag1.channels = channel1
ag1.sources.source1.type = exec
ag1.sources.source1.command = tail -F /root/log/access.log
ag1.sinks.sink1.type = hdfs
ag1.sinks.sink1.hdfs.path =hdfs://hdp-1:9000/access_log/%y-%m-%d/%H-%M
ag1.sinks.sink1.hdfs.filePrefix = app_log
ag1.sinks.sink1.hdfs.fileSuffix = .log
ag1.sinks.sink1.hdfs.batchSize= 100
ag1.sinks.sink1.hdfs.fileType = DataStream
ag1.sinks.sink1.hdfs.writeFormat = Text
ag1.sinks.sink1.hdfs.rollSize = 512000
ag1.sinks.sink1.hdfs.rollCount = 1000000
ag1.sinks.sink1.hdfs.rollInterval = 60   
ag1.sinks.sink1.hdfs.round = true
ag1.sinks.sink1.hdfs.roundValue = 10
ag1.sinks.sink1.hdfs.roundUnit = minute
ag1.sinks.sink1.hdfs.useLocalTimeStamp = true
ag1.channels.channel1.type = memory
ag1.channels.channel1.capac
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值