Flink CDC
Flink社区开发了 flink-cdc-connectors 组件,这是一个可以直接从 MySQL、PostgreSQL 等数据库直接读取全量数据和增量变更数据的 source 组件。目前也已开源,
FlinkCDC是基于Debezium的.
FlinkCDC相较于其他工具的优势:
①能直接把数据捕获到Flink程序中当做流来处理,避免再过一次kafka等消息队列,而且支持历史数据同步,使用更方便.
②FlinkCDC的断点续传功能:
Flink-CDC将读取binlog的位置信息以状态的方式保存在CK,如果想要做到断点续传, 需要从Checkpoint或者Savepoint启动程序,通过这种方式来实现断点续传
测试demo
object CDCTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val mySqlSource = MySqlSource.builder()
.hostname("localhost")
.port(3306)
.username("root")
.password("dylan")
.databaseList("default")
.tableList("default.test")
.startupOptions(StartupOptions.latest())
.deserializer(new StringDebeziumDeserializationSchema)
.build()
val dataStream = env.addSource(mySqlSource)
dataStream.print()
env.execute()
}
}
-- databaseList可以写多个数据库
-- tableList可省略代表监测databaseList库中所有表变化,
-- tableList填写要 database.tableName格式
-- startupOptions
.initial() –-初始化全量并持续监测变化
. earliest() –-初始化全量并持续监测变化,需要binlog包含库表的整个历史过程
. latest() --从最近一条记录开始持续监测变化
. specificOffset(String specificOffsetFile, int specificOffsetPos)
--指定偏移量开始并持续监测变化
.timestamp(long startupTimestampMillis)
--指定时间戳后开始持续监测变化
Flink CDC sql
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tabEnv = StreamTableEnvironment.create(env)
tabEnv.executeSql("CREATE TABLE user_info (" +
" id INT," +
" name STRING," +
" phone_num STRING" +
") WITH (" +
" 'connector' = 'mysql-cdc'," +
" 'scan.startup.mode' = 'latest-offset'," +
" 'hostname' = 'localhost'," +
" 'port' = '3306'," +
" 'username' = 'root'," +
" 'password' = '000000'," +
" 'database-name' = 'gmall-flink'," +
" 'table-name' = 'z_user_info'" +
")")
tabEnv.executeSql("select * from user_info").print()
//只支持latest-offset与initial
//只支持单库单表
//flink1.13以后支持
2517

被折叠的 条评论
为什么被折叠?



