1.下载并解压安装包
没有安装包的同志看这里【[大数据技术与应用省赛学习记录一]——软件准备】
解压安装包(我一般放置Downloads目录下)
[hadoop@master Downloads]$ sudo tar -zxf ./apache-flume-1.7.0-bin.tar.gz -C /software
[hadoop@master Downloads]$ sudo mv /software/flume-1.7.0 /software/flume
2.配置文件
1. 全局配置
[hadoop@master Downloads]$ sudo vim /etc/profile
进入文件后添加以下内容:
export FLUME_HOME=/software/flume
export PATH=${FLUME_HOME}/bin:$PATH
添置完 Esc:wq保存退出,然后使全局变量生效
[hadoop@master Downloads]$ source /etc/profile
2.flume-env.sh(/flume/conf)
[hadoop@master conf]$ cp ./vim flume-env.ps1.template ./flume-env.sh
[hadoop@master conf]$ vim flume-env.sh
#写入你的JAVA与HADOOP环境即可(JAVA_HOME,HADOOP_HOME)
3.master.conf
a1.sources=r1
a1.sinks=k1
a1.channels=c1
a1.sources.r1.type=http
a1.sources.r1.port=9000
a1.sources.r1.bind=192.168.9.105
a1.sources.r1.positionFile=/software/flume/flume.json
a1.sources.r1.filegroups=f1 f2
a1.sources.r1.filegroups.f1=/software/flume/.*file.*
a1.sources.r1.filegroups.f2=/software/flume/logs/.*log.*
a1.sinks.k1.channel = c1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.bootstrap.servers = master:9092,hadoop1:9092,hadoop2:9092,hadoop3:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.ki.kafka.producer.compression.type = snappy
a1.sources.r1.handler=org.example.rest.RestHandler
a1.sources.r1.handler.nickname=random props
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1
3.启动flume
[hadoop@master ~]$ cd /software/flume/bin
[hadoop@master bin]$ ./flume-ng agent -n a1 -c ../conf -f ../conf/master.conf -Dflume.root.logger=DEBUG,console &
启动成功后需分发至开发端,并在中创建logs目录,否则回报错;
分发,配置完,启动其他开发端的flume;
确认日志采集方法
一般Flume采集日志source有两种方法:
1.Exec型的source
a1.sources.r1.type=exec
a1.sources.r1.command=ping (IP地址)
2.Spooling Directory型的source
a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=(指定日志文件夹)