安装Flume日志采集组件
a)、安装
1).上传flume到/tools目录下
2).解压
tar -zvxf apache-flume-1.7.0-bin.tar.gz -C /training/
3).环境变量
export FLUME_HOME=/training/apache-flume-1.7.0-bin
export PATH=
P
A
T
H
:
PATH:
PATH:FLUME_HOME/bin
4).将hadoop-2.7.3安装路径下的依赖的jar导入到/apache-flume-1.7.0-bin/lib下:
share/hadoop/common/hadoop-common-2.7.3.jar
share/hadoop/common/lib/commons-configuration-1.6.jar
share/hadoop/common/lib/hadoop-auth-2.7.3.jar
share/hadoop/hdfs/hadoop-hdfs-2.7.3.jar
share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar
share/hadoop/common/lib/common-io-2.4.jar
5)、验证
bin/flume-ng version
b)、配置Flume HDFS Sink:
在/training/apache-flume-1.7.0-bin/conf/新建一个flume-hdfs.conf
添加如下内容:
# define the agent
a1.sources=r1
a1.channels=c1
a1.sinks=k1
# define the source
#上传目录类型
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/training/nginx/logs/flumeLogs
#定义自滚动日志完成后的后缀名
a1.sources.r1.fileSuffix=.FINISHED
#根据每行文本内容的大小自定义最大长度4096=4k
a1.sources.r1.deserializer.maxLineLength=4096
# define the sink
a1.sinks.k1.type = hdfs
#上传的文件保存在hdfs的/flumeLogs目录下
a1.sinks.k1.hdfs.path = hdfs://niit110:9000/flumeLogs/%y-%m-%d/%H/%M/%S
a1.sinks.k1.hdfs.filePrefix=access_log
a1.sinks.k1.hdfs.fileSufix=.log
a1.sinks.k1.hdfs.batchSize=1000
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat= Text
# roll 滚动规则:按照数据块128M大小来控制文件的写入,与滚动相关其他的都设置成0
#为了演示,这里设置成500k写入一次
a1.sinks.k1.hdfs.rollSize= 512000
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollInteval=0
#控制生成目录的规则:一般是一天或者一周或者一个月一次,这里为了演示设置10秒
a1.sinks.k1.hdfs.round=true
a1.sinks.k1.hdfs.roundValue=10
a1.sinks.k1.hdfs.roundUnit= second
#是否使用本地时间
a1.sinks.k1.hdfs.useLocalTimeStamp=true
#define the channel
a1.channels.c1.type = memory
#自定义event的条数
a1.channels.c1.capacity = 500000
#flume事务控制所需要的缓存容量1000条event
a1.channels.c1.transactionCapacity = 1000
#source channel sink cooperation
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
注意:(*)需要先在/training/nginx/logs/创建flumeLogs
(*)需要在hdfs的根目录/下创建flumeLogs
c)、修改conf/flume-env.sh(该文件事先是不存在的,需要复制一份)
复制:
cp flume-env.template.sh flume-env.sh
设置JAVA_HOME:
export JAVA_HOME = /training/jdk-1.8.0.171
修改默认的内存:
export JAVA_OPTS="-Xms1024m -Xmx1024m -Xss256k -Xmn2g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit"
c)、启动flume
(1)测试数据:把 /training/nginx/logs/access.log 复制到
/training/nginx/logs/flumeLogs/access_201904251200.log
(2)、启动
在/training/apache-flume-1.7.0-bin目录下,执行如下命令进行启动:
bin/flume-ng agent --conf ./conf/ -f ./conf/flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,console
(3)、到Hadoop的控制台http://niit110:50070
/flumeLogs 查看有没有数据