大数据【五十】- Flume【四】 :Flume拦截器---自定义拦截器

一。自定义拦截器过程

1. 实现接口Interceptor

import org.apache.flume.interceptor.Interceptor;

public class MyInterceptor implements Interceptor{
}

2. 重写Inteceptor接口中的方法:

代码运行顺序:initialize() --> intercept(Event)   /  intercept(List<Event>)  --> close()

    @Override
    public void initialize() {
    }

    @Override
    public Event intercept(Event event) {
        return null;
    }

    @Override
    public List<Event> intercept(List<Event> list) {
        return null;
    }

    @Override
    public void close() {
    }

3. 写一个新类来实现 Interceptor.Builder接口

    public static class Builder implements Interceptor.Builder{
    }

4. 重写 Interceptor.Builder接口中的方法

        @Override
        public Interceptor build() {
            return null;
        }

        @Override
        public void configure(Context context) {
        }

 

二。案例七:Flume自定义拦截器

(一)目的

     让文件的字母的小写变成大写

     

 

(二)依赖包

 

    <dependencies>
        <!-- flume核心依赖 -->
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>1.8.0</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <!-- 打包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.4</version>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>lib/</classpathPrefix>
                            <mainClass></mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
            <!-- 编译插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>utf-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

(三)代码

package ToUpCase;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.ArrayList;
import java.util.List;

public class MyInterceptor implements Interceptor{

    @Override
    public void initialize() {
        System.err.println("============初始化initialize方法=============");
    }

    /**
     *  拦截source发送到通道channel的消息
     * @param event  接收需要过滤的event
     * @return       返回业务处理后的event
     */
    @Override
    public Event intercept(Event event) {
        //获取时间对象中的字节数据
        byte[] arr = event.getBody();
        //将获取的数据转换成String,然后转化为大写
        event.setBody(new String(arr).toUpperCase().getBytes());
        //返回到消息中
        return event;
    }

    /**
     *   接收过滤后的事件集合
     */
    @Override
    public List<Event> intercept(List<Event> events) {
        ArrayList<Event> list = new ArrayList<>();
        //将这些数据放在一个集合中
        for(Event event : events){
            list.add(intercept(event));
        }
        return list;
    }

    @Override
    public void close() {
        System.err.println("============结束close方法=============");

    }

}
    public static class Builder implements Interceptor.Builder{
        //获取配置文件的属性
        @Override
        public Interceptor build() {
            return new MyInterceptor();
        }

        @Override
        public void configure(Context context) {
        }
    }

(四)打包

       1. 使用Maven做成Jar包  -->  Maven【六】- maven项目打包 / Clean指令 / package指令

       2. 在flume的目录下建立文件夹jar :

             [root@bigdata111 flume-1.8.0]# mkdir jar

       3. 上传此jar到这个jar目录中

         

(五)配置文件

ToUpCase.conf  【/opt/module/flume-1.8.0/jobconf】

#1.agent

a1.sources = r1

a1.sinks =k1

a1.channels = c1

 

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /opt/Andy

a1.sources.r1.interceptors = i1

#全类名$Builder

a1.sources.r1.interceptors.i1.type = ToUpCase.MyInterceptor$Builder

 

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path =hdfs://bigdata111:9000/ToUpCase1

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollInterval = 3

a1.sinks.k1.hdfs.rollSize = 20

a1.sinks.k1.hdfs.rollCount = 5

a1.sinks.k1.hdfs.batchSize = 1

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件类型,默认是 Sequencefile,可用 DataStream,则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

 

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

 

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

补充:

    a1.sources.r1.interceptors.i1.type = ToUpCase.MyInterceptor$Builder

其中的MyInterceptorBuilder是从我们写的类名上复制过来的

  

  

(五)运行命令:

bin/flume-ng agent -c conf/ -n a1 -f jobconf/ToUpCase.conf -C jar/Flume-1.0-SNAPSHOT.jar -Dflume.root.logger=DEBUG,console

(六)测试结果

============初始化initialize方法=============
2019-11-04 06:58:17,966 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.ExecSource.start(ExecSource.java:168)] Exec source starting with command: tail -F /opt/Andy
2019-11-04 06:58:17,970 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
2019-11-04 06:58:17,971 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SOURCE, name: r1 started
2019-11-04 06:58:18,012 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
2019-11-04 06:58:18,013 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: k1 started
2019-11-04 06:58:18,027 (lifecycleSupervisor-1-4) [DEBUG - org.apache.flume.source.ExecSource.start(ExecSource.java:183)] Exec source started
2019-11-04 06:58:18,065 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:141)] Polling sink runner starting
2019-11-04 06:58:22,105 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-11-04 06:58:24,714 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://bigdata111:9000/ToUpCase1/events-.1572879502106.tmp
2019-11-04 06:58:27,041 (hdfs-k1-call-runner-0) [DEBUG - org.apache.htrace.core.Tracer$Builder.loadSamplers(Tracer.java:106)] sampler.classes = ; loaded no samplers
2019-11-04 06:58:27,086 (hdfs-k1-call-runner-0) [DEBUG - org.apache.htrace.core.Tracer$Builder.loadSpanReceivers(Tracer.java:128)] span.receiver.classes = ; loaded no span receivers
2019-11-04 06:58:32,201 (hdfs-k1-call-runner-0) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas(AbstractHDFSWriter.java:200)] Using getNumCurrentReplicas--HDFS-826
2019-11-04 06:58:32,217 (hdfs-k1-call-runner-0) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetDefaultReplication(AbstractHDFSWriter.java:228)] Using FileSystem.getDefaultReplication(Path) from HADOOP-8014
2019-11-04 06:58:35,220 (hdfs-k1-roll-timer-0) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:291)] Rolling file (hdfs://bigdata111:9000/ToUpCase1/events-.1572879502106.tmp): Roll scheduled after 3 sec elapsed.
2019-11-04 06:58:35,220 (hdfs-k1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:393)] Closing hdfs://bigdata111:9000/ToUpCase1/events-.1572879502106.tmp
2019-11-04 06:58:35,353 (hdfs-k1-call-runner-6) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:655)] Renaming hdfs://bigdata111:9000/ToUpCase1/events-.1572879502106.tmp to hdfs://bigdata111:9000/ToUpCase1/events-.1572879502106

追加字符“i love you”,生成新文件。

在35秒的时候文件还是tmp文件,在38秒的时候文件已经变成最终文件了。因为rollInterval 的设置,3秒之后将临时文件滚动成最终目标文件。

2019-11-04 07:01:34,975 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-11-04 07:01:35,256 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://bigdata111:9000/ToUpCase1/events-.1572879694976.tmp
2019-11-04 07:01:35,590 (hdfs-k1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas(AbstractHDFSWriter.java:200)] Using getNumCurrentReplicas--HDFS-826
2019-11-04 07:01:35,591 (hdfs-k1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetDefaultReplication(AbstractHDFSWriter.java:228)] Using FileSystem.getDefaultReplication(Path) from HADOOP-8014
2019-11-04 07:01:38,592 (hdfs-k1-roll-timer-0) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter$2.call(BucketWriter.java:291)] Rolling file (hdfs://bigdata111:9000/ToUpCase1/events-.1572879694976.tmp): Roll scheduled after 3 sec elapsed.
2019-11-04 07:01:38,593 (hdfs-k1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:393)] Closing hdfs://bigdata111:9000/ToUpCase1/events-.1572879694976.tmp
2019-11-04 07:01:38,665 (hdfs-k1-call-runner-1) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:655)] Renaming hdfs://bigdata111:9000/ToUpCase1/events-.1572879694976.tmp to hdfs://bigdata111:9000/ToUpCase1/events-.1572879694976
2019-11-04 07:01:38,684 (hdfs-k1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:382)] Writer callback called.
2019-11-04 07:01:48,010 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:127)] Checking file:jobconf/ToUpCase.conf for changes

[root@bigdata111 opt]# hadoop fs -cat /ToUpCase1/*

"KEY1":"A"
"KEY2":"LOVE"
I LOVE YOU

 

(七)测试roundInterval

如果将roundInterval改为300秒,当不满足300秒但是追加新文本的时候,events-.**56自动由【events-.**56.tmp】文件变为【events-.**56】。并产生新文件【events-.**57.tmp】。滚动的结果和round的设置无关。

过去了300秒之后,tmp文件变为了最终文件。

2019-11-04 07:10:19,079  Creating hdfs://bigdata111:9000/ToUpCase1/events-.1572880101357.tmp
2019-11-04 07:15:19,436  Renaming hdfs://bigdata111:9000/ToUpCase1/events-.1572880101357.tmp to hdfs://bigdata111:9000/ToUpCase1/events-.1572880101357

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值