flume自定义拦截器中常见的坑与maven打jar包两步走

最新推荐文章于 2022-05-09 09:36:57 发布

晴々明雅

最新推荐文章于 2022-05-09 09:36:57 发布

阅读量2.7k

点赞数 1

分类专栏： flume 文章标签： flume的坑按照数据中的时间戳来命名文件夹，实现数据与文件的同步

本文链接：https://blog.csdn.net/qq_43701760/article/details/89319975

版权

flume 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

一个flume拦截器，几起几伏的，最终搞定了原来如此简单的问题，看来多敲多练还是很有好处的
整个程序可以划分为三个部分：

按照需求编写Java代码，检验逻辑是否复合需求
打jar包，上传到flume下的lib文件夹
编写agent的组件

需求：按照数据中的时间戳来命名文件夹，实现数据与文件的同步
数据如下：
RoleCreate|10001|abcd10001988f|5|3|1555291800|1|XMLDKGLKSSJDGL|玩家1|147.10.2.56|GMT+8
分析：首先要获取event中的数据体body，切分，获取时间戳，修改格式
最重要的就是将我们获取到的内容添加到map中保存，以便使用，这是坑中№1

1，代码如下

package flume_event;

import com.google.common.collect.Lists;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.List;
import java.util.Map;

public class ChangeEvent  implements Interceptor {

    public void initialize() {

    }
//数据被source获取到后会保存成event格式，header+body
    //获取其body 转换格式，设置header的新格式

    //已测试，代码正确
    public Event intercept(Event event) {
        //获取数据
        String s = new String(event.getBody());
        //拆分数据，得到其中的时间戳
        String[] arr = s.split("\\|");
        String time = arr[5];
        //把时间戳转换为日期格式
        SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
        String timePath = sdf.format(new Date(Long.parseLong(time) * 1000));
        //获取到header头，把日期路径放进去
        Map<String, String> map = event.getHeaders();
        map.put("timePath",timePath);
        event.setHeaders(map);

        return event;   //此处的返回值注意更改，否则默认event为null，也就是没有数据进入channel，就好像你没有监视到数据一样
      ***// 此为坑2；***
    }

    public List<Event> intercept(List<Event> events) {
        List<Event> intercepted = Lists.newArrayListWithCapacity(events.size());
        for (Event event : events) {
 Event interceptedEvent = intercept(event);
            if (interceptedEvent != null) {
                intercepted.add(interceptedEvent);
            }
        }
        return intercepted;    //***同理，此为第三坑也***，不过此处在xshell中报错很明显，不会被带偏
    }

    public void close() {

    }
    public static class Builder implements Interceptor.Builder{

        public Interceptor build() {
            return new ChangeEvent(); //此处是本类的对象，注意名称要一致 ，***此为坑4***  做第二个需求时，犯懒，复制的代码，类名没有修改，导致我的另一个参数显示不出来，调了好久才发现是这里的错误
        }

        public void configure(Context context) {

        }
    }
最后：注意测试代码是否正确

}

2，maven工程中打jar包的方法分两步：

在这里插入图片描述

3，agent的组件的配置

测试版
a1.sources = r1
a1.sinks = k1
a1.channels = c1 
# Describe/configure the source
a1.sources.r1.type =exec
a1.sources.r1.command= tail -F /root/spool/interceptor.txt
#自定义拦截器的jar包使用
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = flume_event.ChangeEvent$Builder 
# Describe the sinka1.sinks.k1.type = logger 
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 1000
 # Bind the source and sink to the channel
 a1.sources.r1.channels = c1
 a1.sinks.k1.channel = c1  
 **正式版**
 a1.sources = r1
 a1.sinks = k1
 a1.channels = c1 
 # Describe/configure the source
 a1.sources.r1.type =exec
 a1.sources.r1.command= tail -F /root/spool/interceptor.txt
 #自定义拦截器的jar包使用
 a1.sources.r1.interceptors = i1
 a1.sources.r1.interceptors.i1.type = flume_event.ChangeEvent$Builder   #    $Builder要与Java代表中的内部类一致，算个知识点吧
 # Describe the sink
 a1.sinks.k1.type = hdfs
 a1.sinks.k1.hdfs.path = hdfs://hdp-01:9000/flume/%{timePath}           **#%{}这个格式是使用变量** 算一个坑吧
 # 生成的日志文件前缀名称,默认FlumeData
 a1.sinks.k1.hdfs.filePrefix =filePath
 #后缀名称
 a1.sinks.k1.hdfs.fileSuffix=.txt
 #设置文本类型为普通文本(默认SequenceFile),目前SequenceFile,DataStream数据或CompressedStream,#DataStream数据不会压缩输出文件,CompressedStream需要设置hdfs与一个可用的编解码器编解码器
 a1.sinks.k1.hdfs.fileType = DataStream
  # Use a channel which buffers events in memory
  a1.channels.c1.type = memory
  a1.channels.c1.capacity = 1000
  a1.channels.c1.transactionCapacity = 1000
   # Bind the source and sink to the channel 
   a1.sources.r1.channels = c1a1.sinks.k1.channel = c1

这里的配置根据自己的需求配，

出现这种效果即可在这里插入图片描述
总共有6——7个坑，小心，误入！