Flink CDC(一)基础

Flink CDC

一、概述

CDC 是 Change Data Capture(变更数据获取)的简称。 核心思想是,监测并捕获数据库的变动(包括数据或数据表的插入、 更新以及删除等),将这些变更按发生的顺序完整记录下来,写入到消息中间件中以供其他服务进行订阅及消费。

在这里插入图片描述
在这里插入图片描述

Flink 社区开发了 flink-cdc-connectors 组件,这是一个可以直接从 MySQL、 PostgreSQL等数据库直接读取全量数据和增量变更数据的 source 组件。目前也已开源,
开源地址:https://github.com/ververica/flink-cdc-connectors

flink-cdc的文档地址: https://ververica.github.io/flink-cdc-connectors/master/

支持的数据库:

在这里插入图片描述

版本信息:

在这里插入图片描述

二、DataStream 方式

package com.yyds;

import com.alibaba.ververica.cdc.connectors.mysql.MySQLSource;
import com.alibaba.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.alibaba.ververica.cdc.debezium.DebeziumSourceFunction;
import com.alibaba.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class FlinkCDC {


    public static void main(String[] args) throws Exception {
        // 1、获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        
        // 开启 Checkpoint,每隔 5 秒钟做一次 Checkpoint
        env.enableCheckpointing(5000L);
        
        //指定 CK 的一致性语义
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        
        // 设置超时时间
        env
        
       //设置任务关闭的时候保留最后一次 CK 数据
        env.getCheckpointConfig().enableExternalizedCheckpoints(
            CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
        
        
        // 设置状态后端
		env.setStateBackend(new FsStateBackend("hdfs://centos01:8020/flinkCDC"));


        // 2、通过cdc构建SourceFunction并且读取数据
/*
initial (default): Performs an initial snapshot on the monitored database tables upon first startup, and continue to read the latest binlog.

latest-offset: Never to perform snapshot on the monitored database tables upon first startup, just read from the end of the binlog which means
only have the changes since the connector was started.

timestamp: Never to perform snapshot on the monitored database tables upon first startup, and directly read binlog from the specified timestamp.
The consumer will traverse the binlog from the beginning and ignore change events whose timestamp is smaller than the specified timestamp.

specific-offset: Never to perform snapshot on the monitored database tables upon first startup, and directly read binlog from the specified offset.
*/
        DebeziumSourceFunction<String> mySQLSource = MySQLSource.<String>builder()
                .hostname("centos01")
                .port(3306)
                .username("root")
                .password("123456")
                .databaseList("flink")
                .tableList("flink.base_trademark") //可选配置项,如果不指定该参数,则会读取上一个配置下的所有表的数据  注意:指定的时候需要使用"db.table"的方式
                .deserializer(new StringDebeziumDeserializationSchema())
                .startupOptions(StartupOptions.initial())
                .build();


        DataStreamSource<String> streamSource = env.addSource(mySQLSource);


        // 3、打印数据
        streamSource.print();


        // 4、启动任务
        env.execute("FlinkCDC");

    }
}
```shell
flink启动standalone模式
端口号8081
[root@centos01 flink-1.13.1]# bin/start-cluster.sh 
Starting cluster.
Starting standalonesession daemon on host centos01.
Starting taskexecutor daemon on host centos02.
Starting taskexecutor daemon on host centos03.

运行jar包
[root@centos01 flink-1.13.1]# bin/flink run -m centos01:8081 -c com.yyds.FlinkCDC ./flink-cdc-1.0.jar 



开启savepoint
[root@centos01 flink-1.13.1]# bin/flink savepoint 88acb63b3d39ab4b6749e7378259676c hdfs://centos01:8020/flinkCDC/savepoint
Triggering savepoint for job 88acb63b3d39ab4b6749e7378259676c.
Waiting for response...
Savepoint completed. Path: hdfs://centos01:8020/flinkCDC/savepoint/savepoint-88acb6-97b82909494b
You can resume your program from this savepoint with the run command.


从savepoint中启动,实现断点续传的功能
[root@centos01 flink-1.13.1]# bin/flink run -m centos01:8081 -s hdfs://centos01:8020/flinkCDC/savepoint/savepoint-88acb6-97b82909494b -c com.yyds.FlinkCDC ./flink-cdc-1.0.jar 

三、flink sql 方式

package com.yyds;

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

public class FlinkCDCWithSql {
    public static void main(String[] args) throws Exception {
        // 1、获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
        // 2、DDL方式建表
        tableEnv.executeSql("CREATE TABLE binlog (" +
                " id BIGINT NOT NULL," +
                " tm_name STRING," +
                " logo_url STRING" +
                ") WITH (" +
                " 'connector' = 'mysql-cdc'," +
                " 'hostname' = 'centos01'," +
                " 'port' = '3306'," +
                " 'username' = 'root'," +
                " 'password' = '123456'," +
                " 'database-name' = 'flink'," +
                " 'table-name' = 'base_trademark'" +
                ")");

        // 3、查询数据
        Table table = tableEnv.sqlQuery("select * from binlog ");


        // 4、将动态表转换维流
        DataStream<Tuple2<Boolean, Row>> retractStream = tableEnv.toRetractStream(table, Row.class);

        retractStream.print();
//        tableEnv.executeSql("select * from binlog").print();

        // 5、启动任务
        env.execute("FlinkCDCSQL");

    }
}

pom文件:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>gmall2021</artifactId>
        <groupId>org.yyds</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>flink-cdc</artifactId>

    <dependencies>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-java</artifactId>
        <version>1.13.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-streaming-java_2.12</artifactId>
        <version>1.13.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_2.12</artifactId>
        <version>1.13.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>3.1.3</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.49</version>
    </dependency>

        <dependency>
            <groupId>com.alibaba.ververica</groupId>
            <artifactId>flink-connector-mysql-cdc</artifactId>
            <version>1.2.0</version>
        </dependency>


        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.12</artifactId>
            <version>1.13.1</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.75</version>
        </dependency>

    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>


</project>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值