Flink写入 Clickhouse

一、背景

每天上百亿的日志数据实时查询是个挑战,在架构设计上采用了Kafka + Flink + Clickhouse+Redash,实现海量数据的实时分析。计算层,我们开发了基于Flink计算引擎的实时数据平台,简化开发流程,数据通过配置化实现动态Schema生成,底层数据解析统一,无需重复造轮子,整个数据链路,从数据的采集,转换,存储,可视化,无需写一行代码,配置化完成。本文主要介绍实时日志数据写入Clickhouse的实践。

Flink Clickhouse Sink

1

2

3

4

5

<dependency>

    <groupId>ru.yandex.clickhouse</groupId>

    <artifactId>clickhouse-jdbc</artifactId>

    <version>0.1.50</version>

</dependency>

复制代码

public class ClickhouseSink extends RichSinkFunction<Row> implements Serializable {
    private String tablename;
    private String[] tableColums; 
    private List<String> types;   
    private String[] columns;    
    private String username;     
    private String password;     
    private String[] ips;   
    private String drivername = "ru.yandex.clickhouse.ClickHouseDriver";
    private List<Row> list = new ArrayList<>();
    private List<PreparedStatement> preparedStatementList = new ArrayList<>();
    private List<Connection> connectionList = new ArrayList<>();
    private List<Statement> statementList = new ArrayList<>();

    private long lastInsertTime = 0L;
    private long insertCkTimenterval = 4000L;
    // 插入的批次
    private int insertCkBatchSize = 10000;

    public ClickhouseSink(String tablename, String username, String password, String[] ips, String[] tableColums, List<String> types, String[] columns) {
        this.tablename = tablename;
        this.username = username;
        this.password = password;
        this.ips = ips;
        this.tableColums = tableColums;
        this.types = types;
        this.columns = columns;  // 新增字段
    }

    // 插入数据
    public void insertData(List<Row> rows, PreparedStatement preparedStatement, Connection connection) throws SQLException {

        for (int i = 0; i < rows.size(); ++i) {
            Row row = rows.get(i);
            for (int j = 0; j < this.tableColums.length; ++j) {
                if (null != row.getField(j)) {
                    preparedStatement.setObject(j + 1, row.getField(j));

  • 3
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
可以使用 Flink 的 JDBC Sink 将数据写入 ClickHouse 数据库。具体步骤如下: 1. 在 pom.xml 中添加 ClickHouse JDBC 驱动的依赖: ```xml <dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhouse-jdbc</artifactId> <version>0.3.0</version> </dependency> ``` 2. 在 Flink 程序中创建 ClickHouse JDBC Sink: ```java import java.sql.PreparedStatement; import java.sql.SQLException; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import ru.yandex.clickhouse.ClickHouseConnection; import ru.yandex.clickhouse.ClickHouseDataSource; public class ClickHouseSink extends RichSinkFunction<String> { private static final long serialVersionUID = 1L; private static final Logger LOG = LoggerFactory.getLogger(ClickHouseSink.class); private ClickHouseConnection connection; private PreparedStatement statement; @Override public void open(Configuration parameters) throws Exception { super.open(parameters); // 初始化 ClickHouse 连接 ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://<clickhouse-host>:<clickhouse-port>/<clickhouse-database>"); connection = dataSource.getConnection(); statement = connection.prepareStatement("INSERT INTO <clickhouse-table> (col1, col2, ...) VALUES (?, ?, ...)"); } @Override public void invoke(String value, Context context) throws Exception { String[] fields = value.split(","); // 设置 PreparedStatement 的参数 statement.setString(1, fields[0]); statement.setInt(2, Integer.parseInt(fields[1])); ... // 执行插入操作 statement.executeUpdate(); } @Override public void close() throws Exception { super.close(); // 关闭 ClickHouse 连接 if (statement != null) { statement.close(); } if (connection != null) { connection.close(); } } } ``` 3. 在 Flink 程序中使用 ClickHouse JDBC Sink 输出数据: ```java DataStream<String> dataStream = ... // 获取数据流 dataStream.addSink(new ClickHouseSink()); ``` 其中 `<clickhouse-host>`、`<clickhouse-port>`、`<clickhouse-database>` 和 `<clickhouse-table>` 分别表示 ClickHouse 数据库的主机名、端口号、数据库名称和数据表名称。在执行插入操作时,需要根据实际情况设置 PreparedStatement 的参数。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值