整体流程
flume配置
flume下载地址https://pan.baidu.com/s/1slNuhad 提取码:q81v
1.配置flume路径(配置环境变量)
$ cd /xx/flume (这里进入flume所在文件夹)
$ vi .bash_profile(这里若没有该文件,会自动创建)
export FLUME_HOME=/xx(flume所在文件夹路径)/apache-flume-1.6.0-cdh5.8.4-bin
export PATH=$PATH:$FLUME_HOME/bin
$ source .bash_profile
2.配置flume-env.sh文件
配置java路径
$ cd /xx/apache-flume-1.6.0-cdh5.8.4-bin/conf
$ vi flume-env.sh
export JAVA_HOME=/xx/jdk1.8.0_121.jdk/
export HADOOP_HOME=/xx/apache-flume-1.6.0-cdh5.8.4-bin
关于jdk配置两种方式:
1.引用环境中安装好的jdk路径
2.直接上传一个新的jdk,解压。直接配置该jdk路径
(关于jdk配置,网上也能找到。我也会在另外一篇blog中写出)
3.配置 xx.conf 文件
tier1.sources = source_ETE_SERV_SSPS
tier1.channels = channel_ETE_SERV_SSPS_kafka
tier1.sinks = sink_ETE_SERV_SSPS_kafka
#ETE_SERV_SSPS
tier1.sources.source_ETE_SERV_SSPS.type = TAILDIR
tier1.sources.source_ETE_SERV_SSPS.positionFile = position/taildir_position_ETE_SERV_SSPS.json
tier1.sources.source_ETE_SERV_SSPS.filegroups = f1
# 监控日志文件路径
----------
tier1.sources.source_ETE_SERV_SSPS.filegroups.f1 = /oss/ztracer/.*info*.*log(目标日志文件路径)
----------
tier1.sources.source_ETE_SERV_SSPS.idleTimeout = 8000
#batchSize一般要大于等于transactionCapacity
tier1.sources.source_ETE_SERV_SSPS.batchSize = 2000
tier1.sources.source_ETE_SERV_SSPS.channels = channel_ETE_SERV_SSPS_kafka
tier1.channels.channel_ETE_SERV_SSPS_kafka.type = memory
tier1.channels.channel_ETE_SERV_SSPS_kafka.capacity = 100000
tier1.channels.channel_ETE_SERV_SSPS_kafka.transactionCapacity = 2000
tier1.sinks.sink_ETE_SERV_SSPS_kafka.type = org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink_ETE_SERV_SSPS_kafka.channel = channel_ETE_SERV_SSPS_kafka
----------
# kafka主题(监控日志将被放在kafka该主题下)
tier1.sinks.sink_ETE_SERV_SSPS_kafka.topic = ETE_SERV_SSPS
# kafka 连接信息,可配置多个
tier1.sinks.sink_ETE_SERV_SSPS_kafka.brokerList =
kafka01:9092,kafka01:9093,kafka01:9094,kafka02:9092,kafka02:9093,kafka02:9094
----------
tier1.sinks.sink_ETE_SERV_SSPS_kafka.batchSize = 1000
4.启动flume
./flume-ng agent -n tier1 -c /IBM/flume/apache-flume-1.6.0-cdh5.8.4-bin/conf -f /IBM/flume/apache-flume-1.6.0-cdh5.8.4-bin/conf/xx.conf -Dflume.root.logger=INFO,console
(像我的文件放置在这个目录下,启动即可)
这样,kafka就能接收到我们上传的日志了。
修改日志文件,kafka能实时接收到增量日志。
大功告成~
[1]linux环境如何查看jdk安装路径: https://www.cnblogs.com/kerrycode/archive/2015/08/27/4762921.html
[2]Linux下搭建kafka环境简易教程【转】:http://blog.csdn.net/aitcax/article/details/49583351