flume日志采集环境搭建,单机版

Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。

Flumey由source,channel,sink三个组件组成。

source:监听的数据源,包含console(控制台)、RPC(Thrift-RPC)、text(文件)、tail(UNIX tail)、syslog(syslog日志系统),支持TCP和UDP等2种模式),exec(命令执行)等数据源

channel:是一种短暂的存储容器,它将从source处接收到的event格式的数据缓存起来,直到它们被sinks消费掉,是source和sink的桥梁。

sink:从channals消费数据(events)并将其传递给目标地,可以是等HDFS,HBase。可以自定义。

使用flume时,需要在配置文件中将这三个组件配置好。

flume的安装:下载压缩包,解压。配置环境变量即可。如果不配置环境变量,每次启动flume都需要先切换到flume的bin目录下

flume的启动命令解析:

flume-ng  agent -n a1 -c  ../conf   -f ../conf/example.file  -Dflume.root.logger=DEBUG,console

参数说明: -n 指定agent名称(与配置文件中代理的名字相同) 

 

-c 指定flume中配置文件的目录 
-f 指定配置文件 

-Dflume.root.logger=DEBUG,console 设置日志等级

我们项目中使用:

flume-ng.cmdagent -conf ../conf -conf-file ../conf/flume.conf -name a1 -propertyflume.root.logger=INFO,console

flume的配置文件flume.conf(主)

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# source
a1.sources.r1.type = avro    //有很多比如Avro、Thrift、Exec、JMS、Spooling Directory、Taildir等
a1.sources.r1.bind = localhost
a1.sources.r1.port = 60000     //连接本地端口60000的另一台flume。可以配置多个

 
# sink
#a1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSink
#a1.sinks.k1.hostname = localhost
#a1.sinks.k1.port = 9999
a1.sinks.k1.channel = c1
#a1.sinks.k1.type = logger

 # MySinks
a1.sinks.k1.type = SinksLog        //自定义的sink,将监测到的日志保存到指定地点,此处为自定义sink的全路径类名
a1.sinks.k1.fileName = F://123//3//   
 
# channel
a1.channels.c1.type = memory       //缓存的大小   
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1Avro、Thrift、Exec、JMS、Spooling Directory、Taildir等
a1.sources.r1.bind = localhost
a1.sources.r1.port = 60000     //连接本地端口60000的另一台flume。可以配置多个

 
# sink
#a1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSink
#a1.sinks.k1.hostname = localhost
#a1.sinks.k1.port = 9999
a1.sinks.k1.channel = c1
#a1.sinks.k1.type = logger

 # MySinks
a1.sinks.k1.type = SinksLog        //自定义的sink,将监测到的日志保存到指定地点,此处为自定义sink的全路径类名
a1.sinks.k1.fileName = F://123//3//   
 
# channel
a1.channels.c1.type = memory       //缓存的大小   
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume的配置文件(从):

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost    //监听本地端口:4444,获得该端口输入
a1.sources.r1.port = 4444


a1.sources.r1.interceptors = f1   //对监测到的日志文件进行初步的过滤
a1.sources.r1.interceptors.f1.type = regex_filter
a1.sources.r1.interceptors.f1.regex = (:healthIndex@)?\\{sc.+?httpCode.+?loadspeed.+?depth.+?\\}
a1.sources.r1.interceptors.f1.excludeEvents = false


# sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = localhost
a1.sinks.k1.port = 60000            //将采集到的日志传输到指定端口,给master

# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

自定义sink:

package sink;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import util.DateUtils;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Date;

/**
 * Created by ding on 2018/5/4.
 */
public class SinksLog extends AbstractSink implements Configurable {
    private static final Logger logger = LoggerFactory.getLogger(SinksLog.class);
    private static final String PROP_KEY_ROOTPATH = "fileName";
    private String filePath;
    private String fileName;

    @Override
    public void configure(Context context) {
        filePath = context.getString(PROP_KEY_ROOTPATH);
        fileName = filePath+"sink_" + DateUtils.formatShortDate(new Date())+DateUtils.getNowHour()+".log";
    }

    @Override
    public Status process() throws EventDeliveryException {
        String newFileName = filePath+"sink_" + DateUtils.formatShortDate(new Date())+DateUtils.getNowHour()+".log";
        if(!fileName.equals(newFileName)){
            fileName = newFileName;
        }
        Channel ch = getChannel();
        //get the transaction
        Transaction txn = ch.getTransaction();
        Event event =null;
        //begin the transaction
        txn.begin();
        while(true){
            event = ch.take();
            if (event!=null) {
                break;
            }
        }
        try {

            logger.debug("Get event.");

            String body = new String(event.getBody());

            String res = body + "\r\n";
            File file = new File(fileName);

            FileOutputStream fos = null;
            try {
                fos = new FileOutputStream(file, true);
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }
            try {
                fos.write(res.getBytes());
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            txn.commit();
            return Status.READY;
        } catch (Throwable th) {
            txn.rollback();

            if (th instanceof Error) {
                throw (Error) th;
            } else {
                throw new EventDeliveryException(th);
            }
        } finally {
            txn.close();
        }
    }
}

将自定义的sink打成jar包,放于master的lib目录下,顺便jar包的打包命令:jar   cvf spring-objenesis-repack-2.6.jar *  

把当前目录下的文件打成jar包。

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值