一、前置准备
1、下载 flink-sql-connector-kafka jar包,放到flink的lib目录下,选择对应的flink版本和Kafka版本
Central Repository: org/apache/flink/flink-sql-connector-kafka
2、下载Flink Doris Connector jar包,放到flink的lib目录下,选择对应的flink版本和Doris版本
Central Repository: org/apache/doris
版本兼容
Connector Version | Flink Version | Doris Version | Java Version | Scala Version |
---|---|---|---|---|
1.0.3 | 1.11,1.12,1.13,1.14 | 0.15+ | 8 | 2.11,2.12 |
1.1.1 | 1.14 | 1.0+ | 8 | 2.11,2.12 |
1.2.1 | 1.15 | 1.0+ | 8 | - |
1.3.0 | 1.16 | 1.0+ | 8 | - |
1.4.0 | 1.15,1.16,1.17 | 1.0+ | 8 | - |
1.5.2 | 1.15,1.16,1.17,1.18 | 1.0+ | 8 | - |
1.6.2 | 1.15,1.16,1.17,1.18,1.19 | 1.0+ | 8 | - |
24.0.1 | 1.15,1.16,1.17,1.18,1.19,1.20 | 1.0+ | 8 | - |
二、安装pyflink
pip install apache-flink
三、从kafka读取数据写入doris
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment, DataTypes
# 创建 StreamExecutionEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
# 创建 StreamTableEnvironment
t_env = StreamTableEnvironment.create(env)
# 定义 Kafka 源
kafka_source_ddl = '''
CREATE TABLE kafka_source (
f0 INT,
f1 STRING,
f2 STRING,
f3 STRING
) WITH (
'connector' = 'kafka',
'topic' = 'ods_mid_test',
'properties.bootstrap.servers' = '192.168.195.130:9092',
'properties.group.id' = 'ods_group',
'scan.startup.mode' = 'earliest-offset',
'format' = 'json'
)
'''
# 执行 DDL 语句注册 Kafka 表
t_env.execute_sql(kafka_source_ddl)
doris_sink_ddl = """
CREATE TABLE doris_sink (
id BIGINT,
name STRING,
desc STRING,
email STRING
)
WITH (
'connector' = 'doris',
'fenodes' = '192.168.195.129:8030',
'table.identifier' = 'test.user_access_controls',
'username' = 'root',
'password' = '123456',
'sink.label-prefix' = 'doris_label_02'
);
"""
# 执行 DDL 语句注册 doris 表
t_env.execute_sql(doris_sink_ddl)
# 读取 Kafka 数据
t_env.execute_sql("insert into doris_sink(id,name,desc,email) SELECT f0,f1,f2,f3 FROM kafka_source").print()
# 执行作业
env.execute("pyflink kafka job")