Flume 的应用9（自定义 Sink）

最新推荐文章于 2021-02-24 00:27:35 发布

TANCHISE

最新推荐文章于 2021-02-24 00:27:35 发布

阅读量164

点赞数 1

分类专栏： Flume 文章标签： flume 大数据

本文链接：https://blog.csdn.net/weixin_48067943/article/details/108254987

版权

Flume 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

文章目录

- - 7、自定义Sink

7、自定义Sink

1）介绍

Sink 不断地轮询 Channel 中的事件且批量地移除它们，并将这些事件批量写入到存储或索引系统、或者被发送到另一个Flume Agent。
Sink 是完全事务性的。在从 Channel 批量删除数据之前，每个 Sink 用 Channel 启动一个事务。批量事件一旦成功写出到存储系统或下一个 Flume Agent，Sink 就利用 Channel 提交事务。事务一旦被提交，该Channel从自己的内部缓冲区删除事件。
Sink 组件目的地包括 hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定义。官方提供的 Sink 类型已经很多，但是有时候并不能满足实际开发当中的需求，此时我们就需要根据实际需求自定义某些 Sink。
官方也提供了自定义sink的接口：
https://flume.apache.org/FlumeDeveloperGuide.html#sink根据官方说明自定义 MySink 需要继承 AbstractSink类并实现Configurable接口。
实现相应方法：
- configure(Context context)//初始化context（读取配置文件内容）
- process()//从Channel读取获取数据（event），这个方法将被循环调用。
- 使用场景：读取Channel数据写入MySQL或者其他文件系统。

2）需求

使用 flume 接收数据，并在 Sink 端给每条数据添加前缀和后缀，输出到控制台。前后缀可在 flume 任务配置文件中配置。

流程分析：

在这里插入图片描述

3）编码

/**
 * 模仿 logger 4j 打印日志
 */
public class MySink extends AbstractSink implements Configurable {
    //声明前后缀
    private String prefix;
    private String suffix;
    private Logger logger = LoggerFactory.getLogger(MySink.class);
    //TODO 将数据写出主要逻辑
    @Override
    public Status process() throws EventDeliveryException {
        //1、定义状态
        Status status=null;
        //2、获取与sink绑定的channel
        Channel channel = getChannel();
        //3、从channel获取事务
        Transaction transaction = channel.getTransaction();
        //4、开启事务
        transaction.begin();

        try {
            //5、从channel中 take event
            Event event = channel.take();
            //6、判断event是否为null
            if (event !=null){
                logger.info(prefix + "--" + new String(event.getBody()) + "--" +suffix);
            }
            //7、返回状态
            status = Status.READY;
            //8、提交事务
            transaction.commit();
        } catch (Exception e) {
            //9、打印栈信息
            e.printStackTrace();
            //10、返回退避状态
            status = Status.BACKOFF;
            //11、事务的回滚
            transaction.rollback();

        }finally {
            //12、关闭事务
            transaction.close();
        }
        return status;
    }

    //todo 获取配置信息
    @Override
    public void configure(Context context) {
        prefix = context.getString("prefix");
        suffix = context.getString("suffix","bajie");
    }
}

4）测试

（1）打包

将写好的代码打包，并放到 flume 的lib目录（/opt/module/flume-1.9.0）下。

（2）配置文件

[xiaoxq@hadoop105 jobs]$ vim myselfsink.conf
[xiaoxq@hadoop105 jobs]$ pwd
/opt/module/flume-1.9.0/jobs

添加如下内容

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1 

# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = localhost
a2.sources.r1.port = 44444

# Describe the sink
a2.sinks.k1.type = com.xiaoxq.sink.MySink
a2.sinks.k1.prefix = hello
a2.sinks.k1.suffix = bye

# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

（3）开启任务

[xiaoxq@hadoop105 flume-1.9.0]$ bin/flume-ng agent -c conf/ -f jobs/myselfsink.conf -n a2 -Dflume.root.logger=INFO,console

[xiaoxq@hadoop105 jobs]$ nc localhost 44444
flume
OK
hadoop
OK
hive
OK

（4）结果展示

在这里插入图片描述

TANCHISE

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flume 的应用9（自定义 Sink）

文章目录7、自定义Sink7、自定义Sink1）介绍Sink 不断地轮询 Channel 中的事件且批量地移除它们，并将这些事件批量写入到存储或索引系统、或者被发送到另一个Flume Agent。Sink 是完全事务性的。在从 Channel 批量删除数据之前，每个 Sink 用 Channel 启动一个事务。批量事件一旦成功写出到存储系统或下一个 Flume Agent，Sink 就利用 Channel 提交事务。事务一旦被提交，该Channel从自己的内部缓冲区删除事件。Sink 组件目的地
复制链接

扫一扫

专栏目录