Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
Flumey由source,channel,sink三个组件组成。
source:监听的数据源,包含console(控制台)、RPC(Thrift-RPC)、text(文件)、tail(UNIX tail)、syslog(syslog日志系统),支持TCP和UDP等2种模式),exec(命令执行)等数据源
channel:是一种短暂的存储容器,它将从source处接收到的event格式的数据缓存起来,直到它们被sinks消费掉,是source和sink的桥梁。
sink:从channals消费数据(events)并将其传递给目标地,可以是等HDFS,HBase。可以自定义。
使用flume时,需要在配置文件中将这三个组件配置好。
flume的安装:下载压缩包,解压。配置环境变量即可。如果不配置环境变量,每次启动flume都需要先切换到flume的bin目录下
flume的启动命令解析:
flume-ng agent -n a1 -c ../conf -f ../conf/example.file -Dflume.root.logger=DEBUG,console
参数说明: -n 指定agent名称(与配置文件中代理的名字相同)
-c 指定flume中配置文件的目录
-f 指定配置文件
-Dflume.root.logger=DEBUG,console 设置日志等级
我们项目中使用:
flume-ng.cmdagent -conf ../conf -conf-file ../conf/flume.conf -name a1 -propertyflume.root.logger=INFO,console
flume的配置文件flume.conf(主)
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# source
a1.sources.r1.type = avro //有很多比如Avro、Thrift、Exec、JMS、Spooling Directory、Taildir等
a1.sources.r1.bind = localhost
a1.sources.r1.port = 60000 //连接本地端口60000的另一台flume。可以配置多个
# sink
#a1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSink
#a1.sinks.k1.hostname = localhost
#a1.sinks.k1.port = 9999
a1.sinks.k1.channel = c1
#a1.sinks.k1.type = logger
# MySinks
a1.sinks.k1.type = SinksLog //自定义的sink,将监测到的日志保存到指定地点,此处为自定义sink的全路径类名
a1.sinks.k1.fileName = F://123//3//
# channel
a1.channels.c1.type = memory //缓存的大小
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Avro、Thrift、Exec、JMS、Spooling Directory、Taildir等
a1.sources.r1.bind = localhost
a1.sources.r1.port = 60000 //连接本地端口60000的另一台flume。可以配置多个
# sink
#a1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSink
#a1.sinks.k1.hostname = localhost
#a1.sinks.k1.port = 9999
a1.sinks.k1.channel = c1
#a1.sinks.k1.type = logger
# MySinks
a1.sinks.k1.type = SinksLog //自定义的sink,将监测到的日志保存到指定地点,此处为自定义sink的全路径类名
a1.sinks.k1.fileName = F://123//3//
# channel
a1.channels.c1.type = memory //缓存的大小
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
flume的配置文件(从):
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost //监听本地端口:4444,获得该端口输入
a1.sources.r1.port = 4444
a1.sources.r1.interceptors = f1 //对监测到的日志文件进行初步的过滤
a1.sources.r1.interceptors.f1.type = regex_filter
a1.sources.r1.interceptors.f1.regex = (:healthIndex@)?\\{sc.+?httpCode.+?loadspeed.+?depth.+?\\}
a1.sources.r1.interceptors.f1.excludeEvents = false
# sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = localhost
a1.sinks.k1.port = 60000 //将采集到的日志传输到指定端口,给master
# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
自定义sink:
package sink;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import util.DateUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Date;
/**
* Created by ding on 2018/5/4.
*/
public class SinksLog extends AbstractSink implements Configurable {
private static final Logger logger = LoggerFactory.getLogger(SinksLog.class);
private static final String PROP_KEY_ROOTPATH = "fileName";
private String filePath;
private String fileName;
@Override
public void configure(Context context) {
filePath = context.getString(PROP_KEY_ROOTPATH);
fileName = filePath+"sink_" + DateUtils.formatShortDate(new Date())+DateUtils.getNowHour()+".log";
}
@Override
public Status process() throws EventDeliveryException {
String newFileName = filePath+"sink_" + DateUtils.formatShortDate(new Date())+DateUtils.getNowHour()+".log";
if(!fileName.equals(newFileName)){
fileName = newFileName;
}
Channel ch = getChannel();
//get the transaction
Transaction txn = ch.getTransaction();
Event event =null;
//begin the transaction
txn.begin();
while(true){
event = ch.take();
if (event!=null) {
break;
}
}
try {
logger.debug("Get event.");
String body = new String(event.getBody());
String res = body + "\r\n";
File file = new File(fileName);
FileOutputStream fos = null;
try {
fos = new FileOutputStream(file, true);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
fos.write(res.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
txn.commit();
return Status.READY;
} catch (Throwable th) {
txn.rollback();
if (th instanceof Error) {
throw (Error) th;
} else {
throw new EventDeliveryException(th);
}
} finally {
txn.close();
}
}
}
将自定义的sink打成jar包,放于master的lib目录下,顺便jar包的打包命令:jar cvf spring-objenesis-repack-2.6.jar *
把当前目录下的文件打成jar包。