Flume 自定义 Sink

最新推荐文章于 2024-07-19 14:02:11 发布

SuperQiu~

最新推荐文章于 2024-07-19 14:02:11 发布

阅读量855

点赞数

分类专栏： Flume 文章标签： flume solr big data

本文链接：https://blog.csdn.net/weixin_44966780/article/details/121952376

版权

Flume 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Flume 自定义 Sink

1）介绍

Sink 不断地轮询 Channel 中的事件且批量地移除它们，并将这些事件批量写入到存储或索引系统、或者被发送到另一个 Flume Agent。Sink 是完全事务性的。在从 Channel 批量删除数据之前，每个 Sink 用 Channel 启动一个事务。批量事件一旦成功写出到存储系统或下一个flume Agent，Sink 就利用 Channel 提交事务。事务一旦被提交，该 Channel 从自己的内部缓冲区删除事件。

Sink 组件目的地包括 hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定义。

官方提供的 Sink 类型已经很多，但是有时候并不能满足实际开发当中的需求，此时我们就需要根据实际需求自定义某些 Sink。
官方也提供了自定义 sink 的接口：
https://flume.apache.org/FlumeDeveloperGuide.html

#sink 根据官方说明自定义
MySink 需要继承 AbstractSink 类并实现 Configurable 接口。

实现相应方法：
configure(Context context)//初始化 context（读取配置文件内容）
process()//从 Channel 读取获取数据（event），这个方法将被循环调用。

使用场景：读取 Channel 数据写入 MySQL 或者其他文件系统。

2）需求
使用 flume 接收数据，并在 Sink 端给每条数据添加前缀和后缀，输出到控制台。前后缀可在 flume 任务配置文件中配置。
流程分析：
在这里插入图片描述

3）编码

public class MySink extends AbstractSink implements Configurable {

    private String prefix;
    private String subfix;
    //创建 Logger 对象
    private static final Logger logger = LoggerFactory.getLogger(AbstractSink.class);

    @Override
    public void configure(Context context) {
        prefix = context.getString("pre", "pre-");
        subfix = context.getString("sub");
    }

    @Override
    public Status process() throws EventDeliveryException {
        //1、获取Channel并开启事务
        Channel channel = getChannel();
        Transaction transaction = channel.getTransaction();
        transaction.begin();

        //2、从Channel中抓取数据打印到控制台
        try {
            //抓取数据
            Event event;
            while (true) {
                event = channel.take();
                if (event != null) {
                    break;
                }
            }
            //处理数据
            logger.info(prefix + new String(event.getBody()) + subfix);

            //提交事务
            transaction.commit();

            return Status.READY;

        } catch (Exception e) {
            transaction.rollback();
            return Status.BACKOFF;
        }finally {
            transaction.close();
        }
    }
}

4）测试
（1）打包
将写好的代码打包，并放到 flume 的 lib 目录（/opt/module/flume）下。
（2）配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = com.xiaoqiu.MySink
a1.sinks.k1.pre = 2021-
a1.sinks.k1.sub = -welcome 

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

（3）开启任务
[hadoop@hadoop102 flume]$ bin/flume-ng agent -c conf/ -f job/mysink.conf -n a1 -Dflume.root.logger=INFO,console
[hadoop@hadoop102 ~]$ nc localhost 44444
hello
OK
atguigu
OK
（4）结果展示
在这里插入图片描述

SuperQiu~

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Flume 自定义 Sink

Flume 自定义 Sink1）介绍Sink 不断地轮询 Channel 中的事件且批量地移除它们，并将这些事件批量写入到存储或索引系统、或者被发送到另一个 Flume Agent。Sink 是完全事务性的。在从 Channel 批量删除数据之前，每个 Sink 用 Channel 启动一个事务。批量事件一旦成功写出到存储系统或下一个 lume Agent，Sink 就利用 Channel 提交事务。事务一旦被提交，该 Channel 从自己的内部缓冲区删除事件。Sink 组件目的地包括 hdfs、l
复制链接

扫一扫

专栏目录