hudi cdc导入代码模板

hudi cdc导入模板

写入方式

CDC 数据同步

CDC 数据保存了完整的数据库变更,当前可通过两种途径将数据导入 hudi:

第一种:通过 cdc-connector 直接对接 DB 的 binlog 将数据导入 hudi,优点是不依赖消息队列,缺点是对 db server 造成压力。

第二种:对接 cdc format 消费 kafka 数据导入 hudi,优点是可扩展性强,缺点是依赖 kafka。

注意:如果上游数据无法保证顺序,需要指定 write.precombine.field 字段。

1)准备MySQL表

(1)MySQL开启binlog

(2)建表

create table stu3 (

  id int unsigned auto_increment primary key COMMENT '自增id',

  name varchar(20) not null comment '学生名字',

  school varchar(20) not null comment '学校名字',

  nickname varchar(20) not null comment '学生小名',

  age int not null comment '学生年龄',

  class_num int not null comment '班级人数',

  phone bigint not null comment '电话号码',

  email varchar(64) comment '家庭网络邮箱',

  ip varchar(32) comment 'IP地址'

  ) engine=InnoDB default charset=utf8;

2)添加flinkCDC的依赖

上传flink连接器依赖到flink的lib目录下:

3)flink读取mysql binlog并写入kafka

(1)创建MySQL表

create table stu3_binlog(

  id bigint not null,

  name string,

  school string,

  nickname string,

  age int not null,

  class_num int not null,

  phone bigint not null,

  email string,

  ip string,

  primary key (id) not enforced

) with (

  'connector' = 'mysql-cdc',

  'hostname' = 'hdp1',

  'port' = '3306',

  'username' = 'root',

  'password' = '',

  'database-name' = 'gmall',

  'table-name' = 'stu3'

);

(2)创建Kafka表

create table stu3_binlog_sink_kafka(

  id bigint not null,

  name string,

  school string,

  nickname string,

  age int not null,

  class_num int not null,

  phone bigint not null,

  email string,

  ip string,

  primary key (id) not enforced

) with (

  'connector' = 'upsert-kafka'

  ,'topic' = 'cdc_mysql_stu3_sink'

  ,'properties.zookeeper.connect' = 'hdp1:2181'

  ,'properties.bootstrap.servers' = 'hdp1:9092'

  ,'key.format' = 'json'

  ,'value.format' = 'json'

);

(3)将mysql binlog日志写入kafka

insert into stu3_binlog_sink_kafka

select * from stu3_binlog;

3)flink读取kafka数据并写入hudi数据湖

(1)创建kafka源表

create table stu3_binlog_source_kafka(

  id bigint not null,

  name string,

  school string,

  nickname string,

  age int not null,

  class_num int not null,

  phone bigint not null,

  email string,

  ip string

 ) with (

  'connector' = 'kafka',

  'topic' = 'cdc_mysql_stu3_sink',

  'properties.bootstrap.servers' = 'hdp1:9092',

  'format' = 'json',

  'scan.startup.mode' = 'earliest-offset',

  'properties.group.id' = 'testGroup'

  );

(2)创建hudi目标表

 create table stu3_binlog_sink_hudi(

  id bigint not null,

  name string,

  `school` string,

  nickname string,

  age int not null,

  class_num int not null,

  phone bigint not null,

  email string,

  ip string,

  primary key (id) not enforced

)

 partitioned by (`school`)

 with (

  'connector' = 'hudi',

  'path' = 'hdfs://hdp1:8020/tmp/hudi_flink/stu3_binlog_sink_hudi',

  'table.type' = 'MERGE_ON_READ',

  'write.option' = 'insert',

  'write.precombine.field' = 'school'

  );

(3)将kafka数据写入到hudi中

insert into stu3_binlog_sink_hudi

select * from  stu3_binlog_source_kafka;

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值