项目场景:
项目为数据同步工具FlinkCDC性能压测,压测过程中发现无论是使用MySQLCDC Connector还是MongoDBCDC Connector,或者使用Flink1.13或者flink1.16版本,均会出现“增量同步阶段写入并行度强制为1”的问题,导致虽然默认并行度给到很高,但是由于只有一个subtask在写入,因此性能无法继续提升的问题。
问题描述
增量同步阶段写入并行度强制为1,全量同步阶段并行度无影响。
CDC测试代码如下:
#创建flink yarn-seesion集群
/opt/module/flink-1.13.6/bin/yarn-session.sh -nm flinkcdc_poc -d -s 8 -jm 2g -tm 2g
#启动sql-client
#引入各种CDC Connector和Sink Connector的jar
/opt/module/flink-1.13.6/bin/sql-client.sh embedded -s yarn-session -yid application_1682239076448_0001 -j /opt/module/flink-1.13.6/lib/flink-sql-connector-mysql-cdc-2.2.1.jar -j /opt/module/flink-1.13.6/lib/mysql-connector-java-5.1.47.jar -j /opt/module/flink-1.13.6/lib/flink-connector-jdbc_2.11-1.13.0.jar -j /opt/module/flink-1.13.6/lib/flink-sql-connector-mongodb-cdc-2.2.1.jar -j /opt/module/flink-1.13.6/lib/flink-sql-connector-sqlserver-cdc-2.2.1.jar -j /opt/module/flink-1.13.6/lib/flink-connector-clickhouse-1.13.2-SNAPSHOT.jar
#设置基础环境信息
set 'parallelism.default' = '8';
set 'execution.checkpointing.interval' = '3s';
set execution.checkpointing.tolerable-failed-checkpoints = 20;
set 'table.local-time-zone' = 'Asia/Shanghai';
set 'table.dynamic-table-options.enabled' = 'true';
set 'pipeline.operator-chaining'='false';
#创建Source Table
CREATE TABLE sbtest1_source (
`id` int,
`k` int,
`c` char(120),
`pad` char(60),
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'XXXXXX',
'port' = '3306',
'username' = 'XXXXXX',
'password' = 'XXXXXX',
'server-time-zone' = 'Asia/Shanghai',
'database-name' = 'sysbench',
'table-name' = 'sbtest1'
);
#创建Sink Table
CREATE TABLE `sbtest1_sink` (
`id` int,
`k` int,
`c` char(120),
`pad` char(60),
PRIMARY KEY (`id`) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://XXXXXX:3306/sysbench_sink?useSSL=false&autoReconnect=true'
'driver' = 'com.mysql.cj.jdbc.Driver',
'table-name' = 'sbtest1',
'username' = 'XXXXXX',
'password' = 'XXXXXX',
'lookup.cache.max-rows' = '3000',
'lookup.cache.ttl' = '10s',
'lookup.max-retries' = '3',
'sink.parallelism' = '8'
);
insert into sbtest1_sink(id,k,c,pad) select id,k,c,pad from sbtest1_source;
原因分析:
binlog读取由于内部Debezium插件要保证binlog的顺序只有一个并行度处理,而且source与sink的并行度一致,因此分区策略为forward,下游sink也相应地只有一个并行度。
解决方案:
1. 如果在全量数据比较少的情况下( 在1C2G的规格下,单线程可在分钟级别同步完成千万级别的表),考虑将source读取的并行度设为1, sink的并行度按需设置即可
2. 如果在全量数据较多的情况下,考虑将source读取的并行度设一个小于sink并行度的值即可(更改
sink.parallelism
)。
参考链接:https://blog.51cto.com/u_15127651/3800041