pyflink 实现从Kafka读取数据写入Doris数据库

一、前置准备

1、下载 flink-sql-connector-kafka jar包,放到flink的lib目录下,选择对应的flink版本和Kafka版本

Central Repository: org/apache/flink/flink-sql-connector-kafka

2、下载Flink Doris Connector  jar包,放到flink的lib目录下,选择对应的flink版本和Doris版本

Central Repository: org/apache/doris

版本兼容

Connector VersionFlink VersionDoris VersionJava VersionScala Version
1.0.31.11,1.12,1.13,1.140.15+82.11,2.12
1.1.11.141.0+82.11,2.12
1.2.11.151.0+8-
1.3.01.161.0+8-
1.4.01.15,1.16,1.171.0+8-
1.5.21.15,1.16,1.17,1.181.0+8-
1.6.21.15,1.16,1.17,1.18,1.191.0+8-
24.0.11.15,1.16,1.17,1.18,1.19,1.201.0+8-

二、安装pyflink

pip install apache-flink

三、从kafka读取数据写入doris

from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment, DataTypes


# 创建 StreamExecutionEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
# 创建 StreamTableEnvironment
t_env = StreamTableEnvironment.create(env)

# 定义 Kafka 源
kafka_source_ddl = '''
CREATE TABLE kafka_source (
    f0 INT,
    f1 STRING,
    f2 STRING,
    f3 STRING
) WITH (
  'connector' = 'kafka',
  'topic' = 'ods_mid_test',
  'properties.bootstrap.servers' = '192.168.195.130:9092',
  'properties.group.id' = 'ods_group',
  'scan.startup.mode' = 'earliest-offset',
  'format' = 'json'
)
'''


# 执行 DDL 语句注册 Kafka 表
t_env.execute_sql(kafka_source_ddl)


doris_sink_ddl = """
CREATE TABLE doris_sink (
id BIGINT,
name STRING,
desc STRING,
email STRING
)
WITH (
'connector' = 'doris',
'fenodes' = '192.168.195.129:8030',
'table.identifier' = 'test.user_access_controls',
'username' = 'root',
'password' = '123456',
'sink.label-prefix' = 'doris_label_02'
);

"""

# 执行 DDL 语句注册 doris 表
t_env.execute_sql(doris_sink_ddl)




# 读取 Kafka 数据
t_env.execute_sql("insert into doris_sink(id,name,desc,email) SELECT f0,f1,f2,f3 FROM kafka_source").print()

# 执行作业
env.execute("pyflink kafka job")

当然可以。在Apache Flink中,我们可以使用Flink连接器(如`FlinkKafkaConsumer`和`FlinkSink`)以及一些库(如Jackson或Gson)来解析JSON数据和操作数据库。以下是一个简单的Java示例,展示如何完成这个过程: ```java import org.apache.flink.api.common.functions.MapFunction; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; import org.apache.flink.streaming.connectors.doris.FlinkDorisSink; import com.fasterxml.jackson.databind.ObjectMapper; public class KafkaToJsonToDoris { public static void main(String[] args) throws Exception { // 初始化Flink环境和配置 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // 从Kafka消费JSON数据 String kafkaTopic = "your-kafka-topic"; Properties kafkaProps = new Properties(); kafkaProps.setProperty("bootstrap.servers", "localhost:9092"); FlinkKafkaConsumer<String> kafkaSource = new FlinkKafkaConsumer<>(kafkaTopic, new SimpleJsonMapper(), kafkaProps); DataStream<String> jsonDataStream = env.addSource(kafkaSource); // 解析JSON数据到Bean对象 TypeReference<YourBeanClass> beanTypeRef = new TypeReference<YourBeanClass>() {}; YourBeanClass yourBean = new ObjectMapper().readValue(jsonDataStream.collect().get(0), beanTypeRef); // 示例仅用于演示 // 创建Doris sink并写入数据 DorisConfig dorisConfig = new DorisConfig(); dorisConfig.setHosts("localhost:8042"); // Doris集群地址 dorisConfig.setUsername("your-doris-user"); dorisConfig.setPassword("your-doris-password"); Tuple2<String, YourBeanClass> output = jsonDataStream.map(new ToBeanMapper(yourBean)); FlinkDorisSink<String, YourBeanClass> dorisSink = new FlinkDorisSink<>(output, dorisConfig, "your-table-name"); jsonDataStream.addSink(dorisSink); env.execute("Kafka to JSON to Doris"); } private static class SimpleJsonMapper implements MapFunction<String, String> { @Override public String map(String value) { return value; // 这里需要实际的JSON解析逻辑,例如:value = mapper.readValue(value, YourJsonClass.class).toJson() } } private static class ToBeanMapper implements MapFunction<Tuple2<String, String>, Tuple2<String, YourBeanClass>> { private final YourBeanClass yourBean; public ToBeanMapper(YourBeanClass yourBean) { this.yourBean = yourBean; } @Override public Tuple2<String, YourBeanClass> map(Tuple2<String, String> input) { // 将字符串转换回Bean对象 yourBean.setField(input.f1); // 示例只设置了一个字段,实际根据你的Bean结构调整 return new Tuple2<>(input.f0, yourBean); } } } ``` **注意**: 这个例子假设你已经有一个名为`YourBeanClass`的Java Bean类,它对应于你的JSON数据结构,并且有对应的getter和setter方法。此外,`SimpleJsonMapper`和`ToBeanMapper`需要处理实际的JSON解析和Bean对象构建。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值