1、flume NG安装
具体下载地址http://archive.apache.org/dist/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
安装解压flume安装包,命令如下所示:
- tar -zxvf apache-flume-1.7.0-bin.tar.gz
配置环境变量
- export FLUME_HOME=/home/hadoop/cloud/programs/flume
- export PATH=$PATH:$FLUME_HOME/bin
2、flume NG的配置
开始之前,先约定产生日志我们称为client端,存储日志的我们称为server端。
无论是client端还是server端,都分为source channel sink三个部分。
只是client端的sink连接的是server的source。大致是这个样子。
配置文件的主要目的,就是配置一条链路,一个链路称为一个agent,一个agent需要配置source,channel,sink三个组件,每个组件的具体配置flume ng官网有,下面配置示例是文件转到hdfs,中转的chennel也是文件
- #agent1 name
- agent1.sources=source1
- agent1.sinks=sink1
- agent1.channels=channel1
- #Spooling Directory
- #set source1
- agent1.sources.source1.type=spooldir
- agent1.sources.source1.spoolDir=/home/hadoop/flumetest/dir/logdfs
- agent1.sources.source1.channels=channel1
- agent1.sources.source1.fileHeader = false
- agent1.sources.source1.interceptors = i1
- agent1.sources.source1.interceptors.i1.type = timestamp
- #set sink1
- agent1.sinks.sink1.type=hdfs
- agent1.sinks.sink1.hdfs.path=hdfs://hadoopmaster:8020/flume/logdfs
- agent1.sinks.sink1.hdfs.fileType=DataStream
- agent1.sinks.sink1.hdfs.writeFormat=TEXT
- agent1.sinks.sink1.hdfs.rollInterval=1
- agent1.sinks.sink1.channel=channel1
- agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
- agent1.sinks.sink1.hdfs.fileSuffix=.txt
- #set channel1
- agent1.channels.channel1.type=file
- agent1.channels.channel1.checkpointDir=/home/hadoop/flumetest/dir/logdfstmp/point
- agent1.channels.channel1.dataDirs=/home/hadoop/flumetest/dir/logdfstmp
启动
- flume-ng agent --conf conf --conf-file /home/hadoop/cloud/programs/flume/conf/flume-hdfs.conf --name agent1 -Dflume.root.logger=INFO,console > /home/hadoop/cloud/programs/flume/logs/flume-hdfs.log 2>&1 &
集群配置
理解单点配置的原理以后,配置集群就不难理解了。
接着上例:
file source ->file chennel->hdfs1/hdfs2
source和chennel的配置同上,sink的配置要配置一个sink group
具体见下:
- #agent1 name
- agent1.channels = c1
- agent1.sources = r1
- agent1.sinks = k1 k2
- #set gruop
- agent1.sinkgroups = g1
- #set channel
- agent1.channels.c1.type = memory
- agent1.channels.c1.capacity = 1000
- agent1.channels.c1.transactionCapacity = 100
- agent1.sources.r1.channels = c1
- agent1.sources.r1.type = exec
- agent1.sources.r1.command = tail -F /home/hadoop/flumetest/dir/logdfs/flumetest.log
- agent1.sources.r1.interceptors = i1 i2
- agent1.sources.r1.interceptors.i1.type = static
- agent1.sources.r1.interceptors.i1.key = Type
- agent1.sources.r1.interceptors.i1.value = LOGIN
- agent1.sources.r1.interceptors.i2.type = timestamp
- # set sink1
- agent1.sinks.k1.channel = c1
- agent1.sinks.k1.type = avro
- agent1.sinks.k1.hostname = hadoopmaster
- agent1.sinks.k1.port = 52020
- # set sink2
- agent1.sinks.k2.channel = c1
- agent1.sinks.k2.type = avro
- agent1.sinks.k2.hostname = hadoopslave1
- agent1.sinks.k2.port = 52020
- #set sink group
- agent1.sinkgroups.g1.sinks = k1 k2
- #set failover
- agent1.sinkgroups.g1.processor.type = failover
- agent1.sinkgroups.g1.processor.priority.k1 = 10
- agent1.sinkgroups.g1.processor.priority.k2 = 1
- agent1.sinkgroups.g1.processor.maxpenalty = 10000