Flink sql实时读取Kafka写入Iceberg 实践亲测
前言
本文记录了使用HDFS的一个路径作为iceberg 的结果表,使用Flink sql实时消费kafka中的数据并写入iceberg表,并且使用Hive作为客户端实时读取。话不多说,上代码!!!
测试环境:
JDK1.8
Flink 1.11.1
iceberg 0.11.0
Hadoop3.0.0
Hive2.1.1
测试前准备工作:
1.安装flink1.11.1
2.下载kafka、hive与flink的集成jar包
3.下载iceberg-flink-runtime-0.11.0.jar
测试步骤:
启动flink sql-client:
bin/sql-client.sh embedded -j lib/iceberg-flink=runtime-0.11.0.jar -j lib/flink-connector-hive.jar -j lib/flink-sql-connector-kafka.jar shell
创建hive_catalog:
CREATE CATALOG hive_catalog WITH (
'type'='iceberg',
'catalog-type'='hive',
'uri'='thrift://xxx:9083',
'warehouse'='hdfs://nameservice1/user/hive/warehouse',
'clients'='5',
'property-version'='1'
);
使用hive_catalog:
use catalog hive_catalog;
创建并使用数据库:
create database iceberg_db;
use iceberg_db;
创建iceberg目标表:
CREATE TABLE hive_catalog.iceberg_db.iceberg01 (
user_id STRING COMMENT 'user_id',
order_amount DOUBLE COMMENT 'order_amount',
log_ts STRING
);
创建topic:source_kafka01,并写入数据:
kafka-console-producer.sh --broker-list kafka-server:9092 --topic source_kafka01
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:12:12"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:15:00"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:20:00"}
{"user_id":"a1111","order_amount":11.0,"log_ts":"2020-06-29 12:30:00"}
{"user_id":"a1111","order_amount":13.0,"log_ts":"2020-06-29 12:32:00"}
{"user_id":"a1112","order_amount":15.0,"log_ts":"2020-11-26 12:12:12"}
使用hive catalog创建kafka流表:
CREATE TABLE myhive.mydatabase.source_kafka01 (
user_id STRING,
order_amount DOUBLE,
log_ts TIMESTAMP(3)
) WITH (
'connector'='kafka',
'topic'='source_kafka01',
'scan.startup.mode'='earliest-offset',
'properties.bootstrap.servers'='kafka-server:9092',
'properties.group.id' = 'testGroup',
'format'='json'
);
使用SQL连接kafka流表和iceberg目标表:
insert into hive_catalog.iceberg_db.iceberg01 select * from myhive.mydatabase.source_kafka01;
查看数据是否写入:
select * from hive_catalog.iceberg_db.iceberg01;
查看iceberg表的数据目录:

总结:
Iceberg与flink sql的集成,目前还存在不足,比如对column schema的修改、生成文件的大小控制等。不过社区一直在完善,欢迎持续关注。

468

被折叠的 条评论
为什么被折叠?



