flume自定义sink开发——flume clickhouse sink

flume优点之一就是支持插件扩展功能,现在clickhouse流行,数据想直接写入clickhouse,flume官网看不了一样,没有clickhouse sink,没有咱们就自已写呗。

网上开发自定义拦截器的文章很多,开发sink的反而找不到,特记录一下,供大家参考。

一、开发流程
  1. 搭建flume开发环境。
  2. 新建一个类,实现Configurable接口,继承AbstractSink类。
  3. 重写configure、start、stop、process方法。
  4. 编译打jar包,放到flume的lib目录下。
  5. flume.conf文件增加clickhouse的配置。
二、搭建flume开发环境。

新建maven工程,在pom.xml添加如下依赖。

<dependencies>
    <dependency>
        <groupId>org.apache.flume</groupId>
        <artifactId>flume-ng-core</artifactId>
    </dependency>

    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>19.0</version>
    </dependency>

    <dependency>
        <groupId>ru.yandex.clickhouse</groupId>
        <artifactId>clickhouse-jdbc</artifactId>
        <version>0.2</version>
    </dependency>
</dependencies>
三、构建自定义ClickHouseSink类代码
package org.apache.flume.sink.clickhouse;

import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;

public class ClickHouseSink extends AbstractSink implements Configurable {
    @Override
    public void configure(Context context) {
    }

    @Override
    public void start() {
    }

    @Override
    public void stop() {
    }

    @Override
    public Status process() throws EventDeliveryException {
    }
}
四、实现说明:

configure函数是解析处理配置参数,它接收Context对象,可以用context.getString(“xxx”)获取配置信息,例如context.getString(“host”)可以获得ClickHouseSink配置的host参数。

	 @Override
    public void configure(Context context) {
        Preconditions.checkArgument(context.getString(HOST) != null && context.getString(HOST).length() > 0, "ClickHouse host must be specified!");
        this.host = context.getString(HOST);
        if (!this.host.startsWith("jdbc:clickhouse://")) {
            this.host = "jdbc:clickhouse://" + this.host;
        }
    }

start函数是启动是的初始化,用于建立clickhouse连接对象,启动flume内置计数器等。

@Override
    public void start() {
        String jdbcUrl = String.format("%s:%s/%s", this.host, this.port, this.database);
        ClickHouseProperties properties = new ClickHouseProperties().withCredentials(this.user, this.password);
        this.dataSource = new BalancedClickhouseDataSource(jdbcUrl, properties);
        sinkCounter.start();
        super.start();
    }

stop函数是负责退出前的回收清理工作。

@Override
    public void stop() {
        logger.info("ClickHouse sink {} stopping", getName());
        sinkCounter.incrementConnectionClosedCount();
        sinkCounter.stop();
        super.stop();
    }

process函数是就核心的处理函数了。要注意是的flume是数据传输是事务的,可以保证数据不丟失,所以我们开发的sink在消费channel的数据时,也是要用事务。

@Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        Channel ch = getChannel();
        Transaction txn = ch.getTransaction();
        txn.begin();
        try {
            txn.commit();
        } catch (Throwable t) {
            txn.rollback();
        } finally {
            txn.close();
        }
        return status;
    }

clickhouse的数据写入方式有很多种,具体大家可以参考官方文档,这里我们用的是json方式写入。

ClickHouseStatement sth = conn.createStatement();
    sth.write().table(String.format(" %s.%s", database, table)).data(new ByteArrayInputStream(batch.toString().getBytes()), ClickHouseFormat.JSONEachRow).addDbParam(ClickHouseQueryParam.MAX_PARALLEL_REPLICAS, "2").send();

每次操作还需要修改flume的内置计数器,例如:

sinkCounter.addToEventDrainAttemptCount(count);//准备处理的event的个数
sinkCounter.incrementEventDrainSuccessCount();//处理成功,将准备处理的event数量累加到成功处理上。
五、编译打jar包

用mvn package编译,生成flume-ng-clickhouse-sink-1.0.jar包,放到flume的lib目录下。

六、flume clickhouse sink 配置
standard_storage.sinks.sink2ch.type = org.apache.flume.sink.clickhouse.ClickHouseSink
standard_storage.sinks.sink2ch.channel = channel2ch
standard_storage.sinks.sink2ch.host = xxxx.xxxxx.com
standard_storage.sinks.sink2ch.port = 8123
standard_storage.sinks.sink2ch.database = xxxxx
standard_storage.sinks.sink2ch.table = xxxxx
standard_storage.sinks.sink2ch.batchSize = 10000
standard_storage.sinks.sink2ch.user = xxxxxx
standard_storage.sinks.sink2ch.password = xxxxxxxxxxxx

上这个例子,配置clickhouse的host、port、database、table、batchSize、user、password即可。type是要配ClickHouseSink的main类名

七、完整代码

ClickHouseSink.java

package org.apache.flume.sink.clickhouse;

import com.google.common.base.Preconditions;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.instrumentation.SinkCounter;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ru.yandex.clickhouse.BalancedClickhouseDataSource;
import ru.yandex.clickhouse.ClickHouseConnectionImpl;
import ru.yandex.clickhouse.ClickHouseStatement;
import ru.yandex.clickhouse.domain.ClickHouseFormat;
import ru.yandex.clickhouse.settings.ClickHouseProperties;
import ru.yandex.clickhouse.settings.ClickHouseQueryParam;

import java.io.ByteArrayInputStream;

import static org.apache.flume.sink.clickhouse.ClickHouseSinkConstants.*;

public class ClickHouseSink extends AbstractSink implements Configurable {

    private static final Logger logger = LoggerFactory.getLogger(ClickHouseSink.class);

    private BalancedClickhouseDataSource dataSource = null;
    private SinkCounter sinkCounter = null;
    private String host = null;
    private String port = null;
    private String user = null;
    private String password = null;
    private String database = null;
    private String table = null;

    private int batchSize;


    @Override
    public void configure(Context context) {

        if (sinkCounter == null) {
            sinkCounter = new SinkCounter(getName());
        }

        Preconditions.checkArgument(context.getString(HOST) != null && context.getString(HOST).length() > 0, "ClickHouse host must be specified!");
        this.host = context.getString(HOST);
        if (!this.host.startsWith("jdbc:clickhouse://")) {
            this.host = "jdbc:clickhouse://" + this.host;
        }

        Preconditions.checkArgument(context.getString(DATABASE) != null && context.getString(DATABASE).length() > 0, "ClickHouse database must be specified!");
        this.database = context.getString(DATABASE);
        Preconditions.checkArgument(context.getString(TABLE) != null && context.getString(TABLE).length() > 0, "ClickHouse table must be specified!");
        this.table = context.getString(TABLE);
        this.port = context.getString(PORT, DEFAULT_PORT);
        this.user = context.getString(USER, DEFAULT_USER);
        this.password = context.getString(PASSWORD, DEFAULT_PASSWORD);
        this.batchSize = context.getInteger(BATCH_SIZE, DEFAULT_BATCH_SIZE);
    }

    @Override
    public void start() {
        String jdbcUrl = String.format("%s:%s/%s", this.host, this.port, this.database);
        ClickHouseProperties properties = new ClickHouseProperties().withCredentials(this.user, this.password);
        //properties.setUseServerTimeZone(false);
        this.dataSource = new BalancedClickhouseDataSource(jdbcUrl, properties);
        sinkCounter.start();
        super.start();
    }


    @Override
    public void stop() {
        logger.info("ClickHouse sink {} stopping", getName());
        sinkCounter.incrementConnectionClosedCount();
        sinkCounter.stop();
        super.stop();
    }


    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;

        // Start transaction
        Channel ch = getChannel();
        Transaction txn = ch.getTransaction();
        txn.begin();
        try {
            ClickHouseConnectionImpl conn = (ClickHouseConnectionImpl) dataSource.getConnection();
            int count;
            StringBuilder batch = new StringBuilder();
            for (count = 0; count < batchSize; ++count) {
                Event event = ch.take();
                if (event == null) {
                    break;
                }
                batch.append(new String(event.getBody(), "UTF-8")).append("\n");
            }
            if (count <= 0) {
                sinkCounter.incrementBatchEmptyCount();
                txn.commit();
                return Status.BACKOFF;
            } else if (count < batchSize) {
                sinkCounter.incrementBatchUnderflowCount();
            } else {
                sinkCounter.incrementBatchCompleteCount();
            }
            sinkCounter.addToEventDrainAttemptCount(count);
            ClickHouseStatement sth = conn.createStatement();
            sth.write().table(String.format(" %s.%s", database, table)).data(new ByteArrayInputStream(batch.toString().getBytes()), ClickHouseFormat.JSONEachRow).addDbParam(ClickHouseQueryParam.MAX_PARALLEL_REPLICAS, "2").send();
            sinkCounter.incrementEventDrainSuccessCount();
            status = Status.READY;
            txn.commit();

        } catch (Throwable t) {
            txn.rollback();
            logger.error(t.getMessage(), t);
            status = Status.BACKOFF;
            // re-throw all Errors
            if (t instanceof Error) {
                throw (Error) t;
            }
        } finally {
            txn.close();
        }
        return status;
    }
}

ClickHouseSinkConstants.java

package org.apache.flume.sink.clickhouse;

public class ClickHouseSinkConstants {
    public static final String HOST = "host";
    public static final String PORT = "port";
    public static final String BATCH_SIZE = "batchSize";
    public static final String USER = "user";
    public static final String PASSWORD = "password";
    public static final String DATABASE = "database";
    public static final String TABLE = "table";
    public static final String DEFAULT_PORT = "8123";
    public static final int DEFAULT_BATCH_SIZE = 10000;
    public static final String DEFAULT_USER = "";
    public static final String DEFAULT_PASSWORD = "";
}
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
要将CSV文件写入MySQL,可以使用Flume自定义Sink。以下是一些基本步骤: 1. 创建一个自定义Sink类,继承AbstractSink类。 2. 在该类中实现process方法,在该方法中编写将CSV文件数据写入MySQL的逻辑。 3. 在Flume配置文件中指定自定义Sink类,并设置相关参数,例如CSV文件路径、MySQL连接信息等。 下面是一个简单的示例: ```java public class CsvToMysqlSink extends AbstractSink { private String csvPath; private String mysqlUrl; private String mysqlUsername; private String mysqlPassword; private String mysqlTable; private Connection connection; private PreparedStatement statement; @Override public void configure(Context context) { csvPath = context.getString("csvPath"); mysqlUrl = context.getString("mysqlUrl"); mysqlUsername = context.getString("mysqlUsername"); mysqlPassword = context.getString("mysqlPassword"); mysqlTable = context.getString("mysqlTable"); } @Override public void start() { try { Class.forName("com.mysql.jdbc.Driver"); connection = DriverManager.getConnection(mysqlUrl, mysqlUsername, mysqlPassword); statement = connection.prepareStatement("INSERT INTO " + mysqlTable + " VALUES (?, ?, ?)"); } catch (Exception e) { e.printStackTrace(); } } @Override public void stop() { try { statement.close(); connection.close(); } catch (Exception e) { e.printStackTrace(); } } @Override public Status process() throws EventDeliveryException { Status status = null; try { File csvFile = new File(csvPath); BufferedReader br = new BufferedReader(new FileReader(csvFile)); String line; while ((line = br.readLine()) != null) { String[] values = line.split(","); statement.setString(1, values[0]); statement.setString(2, values[1]); statement.setString(3, values[2]); statement.executeUpdate(); } br.close(); status = Status.READY; } catch (Exception e) { e.printStackTrace(); status = Status.BACKOFF; } return status; } } ``` 在Flume配置文件中,可以这样指定自定义Sink类: ```properties agent.sinks.mysqlSink.type = com.example.CsvToMysqlSink agent.sinks.mysqlSink.csvPath = /path/to/csv/file.csv agent.sinks.mysqlSink.mysqlUrl = jdbc:mysql://localhost:3306/mydatabase agent.sinks.mysqlSink.mysqlUsername = myusername agent.sinks.mysqlSink.mysqlPassword = mypassword agent.sinks.mysqlSink.mysqlTable = mytable ``` 这个示例假设CSV文件每行有三个值,分别对应MySQL表中的三个字段。在process方法中,将读取CSV文件中的每一行,并将其分割为三个值,然后使用PreparedStatement将这些值插入到MySQL表中。 注意,这个示例没有包含一些必要的异常处理和错误处理逻辑,需要根据实际情况进行完善。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值