flink-cdc-doris容易踩得坑
背景
最近在做mysql-doris的cdc,但是无论怎么操作,flink任务启动正常,不报任何错,但是数据怎么都不会同步,卡了3-4个工作日,影响心情!
先说结论,就是一定要开启checkpoint,因为Flink CDC For Doirs是基于CheckPoint提交事务请求的
mysql 到 doris
1 开启mysql的binlog,步骤略过,相关资料太多,这里不再赘述
2 Doris环境安装和配置,此处也略过,不属于此处范畴
doris数据库表建立
mysql客户端连接doris fe的9030端口
create database dfcf
CREATE TABLE IF NOT EXISTS dfcf.expamle_tbl
(
id LARGEINT NOT NULL COMMENT "id",
code varchar(64),
name varchar(64),
type varchar(64),
)
UNIQUE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
jar包的下载
对于cdc来讲,个人的经验,版本一定要匹配!
通过这里进行下载
https://repo.maven.apache.org/maven2/org/apache/doris/
flink sql
# mysql的连接配置
create table data_source (
id int,
code varchar,
name varchar,
type varchar,
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'server-time-zone' = 'Asia/Shanghai',
'hostname' = '*.*.*.*',
'port' = '3306',
'database-name' = 'dfcf',
'table-name' = 'stock_code',
'username' = 'root',
'password' = '****',
'jdbc.properties.useSSL' = 'false'
);
# doris的连接配置
create table data_sink (
id int,
code varchar(110),
name varchar(110),
type varchar(110),
) WITH (
'connector' = 'doris',
'fenodes' = '192.168.1.210:8030',
'table.identifier' = 'dfcf2.stock_code',
'username' = 'root',
'password' = '',
'sink.label-prefix'='test_04'
);
insert into data_sink select * from data_source;
注意上面的 ‘server-time-zone’ = ‘Asia/Shanghai’, 还有sink.label-prefix 每次提交都不要相同
问题排查
如果cdc发现没有成功,又没有报错,可先将doris的sink改为print,看是否能够打印感知到的mysql数据
也可以在taskmanager的logs中查询关键词 streamload,如下图
上图的日志,就是flink任务启动时的全量同步日志,因为
Flink CDC首次启动会全量同步一次历史数据,等全量数据同步完成后会开启增量同步任务。