Flume编程-自定义Agent

1.自定义Interceptor

可以自定义Interceptor,对agent接收的event header中添加KV值。在后面的Selector中根据添加的KV值来决定将event分发到channel的分发规则。

例如下面代码定义了一个interceptor,当接收event的body中含有Hello字符串时,向header添加type:hello键值对;否则添加type:nonhello键值对。若selector接收的event header type是hello,则将其分发到channel c1;否则分发到channel c2。自定义interceptor使用方法Event intercept(Event event)对source进行处理,返回处理后的Event。

package com.fyk.flume.Interceptor;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class CustomInterceptor implements Interceptor {
    private List<Event> addHeaderEvents;

    public void initialize() {
        addHeaderEvents = new ArrayList<Event>();
    }

    public Event intercept(Event event) {
        Map<String, String> headers = event.getHeaders();
        String body = new String(event.getBody());

        if(body.contains("Hello")){
            headers.put("type", "hello");
        }
        else{
            headers.put("type", "nonhello");
        }

        return event;
    }

    public List<Event> intercept(List<Event> events) {

        addHeaderEvents.clear();

        for(Event event : events){
            addHeaderEvents.add(intercept(event));
        }

        return addHeaderEvents;
    }

    public void close() {

    }

    public static class Builder implements Interceptor.Builder {


        public Interceptor build() {
            return new CustomInterceptor();
        }

        public void configure(Context context) {

        }
    }
}

测试结果如下:

###启动
[root@master flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-custominteceptor-master.conf --name a1 -Dflume.root.logger=INFO,console
[root@slave1 flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-custominterceptor-slave1.conf --name a1 -Dflume.root.logger=INFO,console
[root@slave2 flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-custominteceptor-slave2.conf --name a1 -Dflume.root.logger=INFO,console


###测试结果
[root@slave1 ~]# telnet master 50000
Trying 192.168.56.100...
Connected to master.
Escape character is '^]'.
abcdef
OK
hello ghi
OK
Hello world
OK
Hello python
OK
Hello AI
OK
hello Bigdata
OK

#slave1
2020-12-22 07:21:37,117 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=hello} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 0D             Hello world. }
2020-12-22 07:22:07,125 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=hello} body: 48 65 6C 6C 6F 20 70 79 74 68 6F 6E 0D          Hello python. }
2020-12-22 07:22:07,125 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=hello} body: 48 65 6C 6C 6F 20 41 49 0D                      Hello AI. }

#slave2
2020-12-22 07:20:01,966 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=nonhello} body: 61 62 63 64 65 66 0D                            abcdef. }
2020-12-22 07:20:26,254 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=nonhello} body: 68 65 6C 6C 6F 20 67 68 69 0D                   hello ghi. }
2020-12-22 07:22:11,298 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=nonhello} body: 68 65 6C 6C 6F 20 42 69 67 64 61 74 61 0D       hello Bigdata. }




###使用的配置文件
#master
[root@master flume-1.6.0]# cat ./conf/flume-custom/flume-custominteceptor-master.conf
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

a1.sources.r1.type = netcat
a1.sources.r1.bind = master
a1.sources.r1.port = 50000

#Interceptor
a1.sources.r1.interceptors = i1
#自定义Interceptor类
a1.sources.r1.interceptors.i1.type = com.fyk.flume.Interceptor.CustomInterceptor$Builder

#channel selector, 根据自定义Interceptor值来分配channel
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = type
a1.sources.r1.selector.mapping.hello = c1
a1.sources.r1.selector.mapping.nonhello = c2

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

a1.sinks.k1.type = avro
a1.sinks.k1.hostname = slave1
a1.sinks.k1.port = 50000

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = slave2
a1.sinks.k2.port = 50000

a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

#slave1
[root@slave1 flume-1.6.0]# cat ./conf/flume-custom/flume-custominterceptor-slave1.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = avro
a1.sources.r1.bind = slave1
a1.sources.r1.port = 50000

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1


#slave2
[root@slave2 flume-1.6.0]# cat ./conf/flume-custom/flume-custominteceptor-slave2.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = avro
a1.sources.r1.bind = slave2
a1.sources.r1.port = 50000

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

1.2 自定义Source

以下代码自定义一个source,封装数据为event并由getChannelProcessor().processEvent(event);来处理。processEvent方法中使用事务处理机制来提交event给channel。

package com.fyk.flume.Source;

import org.apache.flume.Context;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.Source;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.lifecycle.LifecycleState;
import org.apache.flume.source.AbstractSource;

public class CustomSource extends AbstractSource implements Configurable, PollableSource {

    //global variable
    private String prefix;
    private String suffix;

    public void configure(Context context) {
        prefix = context.getString("prefix");
        suffix = context.getString("suffix", "Flume");
    }

    /*
    *1. receive out data
    * 2. make event
    * 3. send event to channel
     */
    public Status process() throws EventDeliveryException {
        Status status = null;
        try{
            for(int i = 0; i < 5; i++)
            {
                SimpleEvent event = new SimpleEvent();
                event.setBody((prefix + "--"+ i + suffix).getBytes());
                getChannelProcessor().processEvent(event);
                status = Status.READY;
            }
        }
        catch (Exception e)
        {
            e.printStackTrace();
            status = Status.BACKOFF;
        }

        try{
            Thread.sleep(2000);
        }catch(InterruptedException e)
        {
            e.printStackTrace();
        }
        return status;
    }

    public long getBackOffSleepIncrement() {
        return 0;
    }

    public long getMaxBackOffSleepInterval() {
        return 0;
    }


}
##启动
[root@master flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-customsource-master.conf --name a1 -Dflume.root.logger=INFO,console


##配置文件
[root@master flume-1.6.0]# cat ./conf/flume-custom/flume-customsource-master.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = com.fyk.flume.Source.CustomSource
a1.sources.r1.prefix = Bigdata
a1.sources.r1.suffix = spark

a1.sinks.k1.type = logger

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1



##测试结果
2020-12-22 23:54:01,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 30 73 70 61 72 6B    Bigdata--0spark }
2020-12-22 23:54:01,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 31 73 70 61 72 6B    Bigdata--1spark }
2020-12-22 23:54:01,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 32 73 70 61 72 6B    Bigdata--2spark }
2020-12-22 23:54:01,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 33 73 70 61 72 6B    Bigdata--3spark }
2020-12-22 23:54:01,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 34 73 70 61 72 6B    Bigdata--4spark }
2020-12-22 23:54:03,950 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 30 73 70 61 72 6B    Bigdata--0spark }
2020-12-22 23:54:03,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 31 73 70 61 72 6B    Bigdata--1spark }
2020-12-22 23:54:03,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 32 73 70 61 72 6B    Bigdata--2spark }
2020-12-22 23:54:03,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 33 73 70 61 72 6B    Bigdata--3spark }
2020-12-22 23:54:03,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 34 73 70 61 72 6B    Bigdata--4spark }

1.3 自定义sink

以下定义一个sink,使用事务机制去channel pull event。

package com.fyk.flume.Sink;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class CustomSink extends AbstractSink implements Configurable {
    private Logger logger = LoggerFactory.getLogger(CustomSink.class);
    private String prefix;
    private String suffix;

    public Status process() throws EventDeliveryException {
        Status status = null;

        Channel channel = getChannel();

        Transaction txn = channel.getTransaction();

        txn.begin();
        try{
            Event event = channel.take();
            if(event != null) {
                String body = new String(event.getBody());
                logger.info(prefix + body + suffix);
            }

            txn.commit();
            status = Status.READY;
        }
        catch(Throwable e)
        {
            txn.rollback();
            status = Status.BACKOFF;
            if(e instanceof Exception){
                throw (Error)e;
            }
        }
        finally {
            txn.close();
        }

        return status;
    }

    public void configure(Context context) {
        prefix = context.getString("prefix");
        suffix = context.getString("suffix", "BigData");
    }
}

测试结果如下:

#启动
[root@master flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-customsink-master.conf --name a1 -Dflume.root.logger=INFO,console

##配置文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = netcat
a1.sources.r1.bind = master
a1.sources.r1.port = 50000

a1.sinks.k1.type = com.fyk.flume.Sink.CustomSink
a1.sinks.k1.prefix = sleep--
a1.sinks.k1.suffix = --boy

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

##测试结果
#slave1
[root@slave1 ~]# telnet master 50000
Trying 192.168.56.100...
Connected to master.
Escape character is '^]'.
abc def ghi
OK
Hello Bigdata
OK



#master
--boy12-23 00:03:51,530 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.fyk.flume.Sink.CustomSink.process(CustomSink.java:26)] sleep--abc def ghi
--boy12-23 00:05:12,854 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.fyk.flume.Sink.CustomSink.process(CustomSink.java:26)] sleep--Hello Bigdata



 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值