Flume安装部署

-》版本
-》flume-ng:目前都用
1.5-1.7
-》flume-og:以前的版本,淘汰
-》安装测试
-》下载解压
tar -zxvf flume-ng-1.6.0-cdh5.7.6.tar.gz -C /opt/cdh-5.7.6/
-》修改配置
env.sh export JAVA_HOME=/opt/modules/jdk1.8.0_91
-》找到hdfs的地址
-》申明HADOOP_HOME环境变量
全局配、env文件中
-》将core-site和hdfs-site放入flume的conf目录即可(最常用)
cp ../hadoop-2.6.0-cdh5.7.6/etc/hadoop/core-site.xml ../hadoop-2.6.0-cdh5.7.6/etc/hadoop/hdfs-site.xml conf/
-》直接在使用时写明hdfs的绝对路径
hdfs://hostname:8020/input/
-》运行测试
bin/flume-ng agent –conf flume_conf_dir –name agent_name –conf-file run_file
hive-mem-log.properties文件

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'a1'


#define the agent
a1.sources = s1
a1.channels = c1
a1.sinks = k1

# define source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh5.7.6/hive-1.1.0/logs/hive.log 
a1.sources.s1.shell = /bin/sh -c

# define channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# define sink
a1.sinks.k1.type = logger

#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1
 bin/flume-ng agent --conf /opt/cdh5.7.6/flume-1.6.0-cdh5.7.6-bin/conf --name a1 --conf-file /opt/cdh5.7.6/flume-1.6.0-cdh5.7.6-bin/case/hive-mem-log.properties -Dflume.root.logger=INFO,console

-》读取HIVE的日志文件,将数据采集到logger

   -Dflume.root.logger=INFO,consol

案例1:读取hive日志文件采集到hdfs
-》source:exec source
-》channel:
mem:速度快,但安全性低,适合于小数据量,要求效率高
file:速度相对慢,但安全性高,适合大数据量,效率要求不高
file channel:将source发来的数据存储为文件
创建文件
/opt/datas/flume/filechannel/check
/opt/datas/flume/filechannel/data
a1.sinks.k1.hdfs.path = /flume/hdfs/ /flume/hdfs/ hdfs中如果没有该文件夹,会自动创建

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'a1'


#define the agent
a1.sources = s1
a1.channels = c1
a1.sinks = k1

# define source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh5.7.6/hive-1.1.0/logs/hive.log 
a1.sources.s1.shell = /bin/sh -c

# define channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/datas/flume/filechannel/check
a1.channels.c1.dataDirs = /opt/datas/flume/filechannel/data

# define sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/hdfs/ 
a1.sinks.k1.hdfs.fileType = DataStream 
a1.sinks.k1.hdfs.writeFormat = Text

#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

这里写图片描述

-》sink:hdfs
-》文件大小都是1kb左右,不利于MapReduce进行处理
我们把文件大小变成原来的10倍

hive-mem-size.properties文件

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'a1'


#define the agent
a1.sources = s1
a1.channels = c1
a1.sinks = k1

# define source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh5.7.6/hive-1.1.0/logs/hive.log 
a1.sources.s1.shell = /bin/sh -c

# define channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/datas/flume/filechannel/check
a1.channels.c1.dataDirs = /opt/datas/flume/filechannel/data

# define sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/size
a1.sinks.k1.hdfs.fileType = DataStream 
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 10240
a1.sinks.k1.hdfs.rollCount = 0

#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

案例2:如何动态监控一个目录
logs/2018-01-01.log
2018-01-02.log.tmp 2018-01-02.log
……
spooling dir:用于监控读取一个文件夹中所有的文件

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'a1'


#define the agent
a1.sources = s1
a1.channels = c1
a1.sinks = k1

# define source
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /opt/cdh5.7.6/hive-1.1.0/logs/hive.log
a1.sources.s1.shell = /bin/sh -c

# define channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# define sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/part/yearstr=%Y/monthstr=%m/daystr=%d
a1.sinks.k1.hdfs.fileType = DataStream 
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 10240
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#bond
a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

这里写图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值