flinkcdc初识

定义

CDC 是 Change Data Capture(变更数据获取)的简称。核心思想是监测并捕获数据库的变动(包括数据或数据表的插入、更新以及删除等),将这些变更按发生的顺序完整记录下来,写入到消息中间件中以供其他服务进行订阅及消费。

前置要求

1. 启动zookeeper

[root@only ~]# /opt/zookeeper-3.8.0/bin/zkServer.sh start

2. 启动hadoop

[root@only conf]# /opt/hadoop-3.3.2/sbin/start-all.sh

3. 完善flink的配置要求(此处为我的配置,不必跟风操纵) - 新建hdfs对应的目录

hdfs dfs -mkdir -p /flink/dan /flink-checkpoints /flink-savepoints /flink_history/ /flink_history/ 

4. 启动flink

[root@only bin]# /opt/flink-1.14.4/bin/start-cluster.sh
Starting HA cluster with 1 masters.
Starting standalonesession daemon on host only.
Starting taskexecutor daemon on host only.

5. 查看进程

48274 org.apache.flink.runtime.taskexecutor.TaskManagerRunner --configDir /opt/flink-1.14.4/conf -D taskmanager.memory.network.min=134217730b -D taskmanager.cpu.cores=4.0 -D taskmanager.memory.task.off-heap.size=0b -D taskmanager.memory.jvm-metaspace.size=268435456b -D external-resources=none -D taskmanager.memory.jvm-overhead.min=201326592b -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=134217730b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=536870920b -D taskmanager.memory.task.heap.size=402653174b -D taskmanager.numberOfTaskSlots=4 -D taskmanager.memory.jvm-overhead.max=201326592b
22006 org.apache.hadoop.yarn.server.nodemanager.NodeManager
20696 org.apache.hadoop.hdfs.server.namenode.NameNode
47960 org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint --configDir /opt/flink-1.14.4/conf --executionMode cluster --host only --webui-port 8081 -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b
45369 org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper-3.8.0/bin/../conf/zoo.cfg
21370 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
57850 sun.tools.jps.Jps -ml
20959 org.apache.hadoop.hdfs.server.datanode.DataNode
21807 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

数据库新建用户及赋权

1. 新建数据库 - 用于flinkcdc

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| addToDoris         |
| canalShit          |
| dolphinscheduler   |
| hivedb             |
| kylin              |
| mysql              |
| performance_schema |
| shit               |
| sys                |
+--------------------+
10 rows in set (0.03 sec)

mysql> create database if not exists flinkcdc;
Query OK, 1 row affected (0.02 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| addToDoris         |
| canalShit          |
| dolphinscheduler   |
| flinkcdc           |
| hivedb             |
| kylin              |
| mysql              |
| performance_schema |
| shit               |
| sys                |
+--------------------+
11 rows in set (0.00 sec)

2. 新建用户

mysql> create user 'flinkcdc'@'%' identified by '$PASSWORD';
Query OK, 0 rows affected (0.03 sec)

3. 为用户赋权

mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flinkcdc' IDENTIFIED BY '$PASSWORD';
ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database 'flinkcdc'
# 指明主机来登录 -h
[root@only hadoop-3.3.2]# mysql -uroot -p${PASSWORD} -h only
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.36-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

-- 无登陆权限
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flinkcdc' IDENTIFIED BY '$PASSWORD';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)


-- 有登陆权限
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flinkcdc' IDENTIFIED BY '$PASSWORD'with grant option;
Query OK, 0 rows affected (0.00 sec)

4. 使用新用户登陆,查看数据库

[root@only hadoop-3.3.2]# mysql -uflinkcdc -p${PASSWORD} -h only
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.36-log MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| addToDoris         |
| canalShit          |
| dolphinscheduler   |
| flinkcdc           |
| hivedb             |
| kylin              |
| mysql              |
| performance_schema |
| shit               |
| sys                |
+--------------------+
11 rows in set (0.03 sec)

5. 新建表

CREATE TABLE flinkcdc.info(
`id` varchar(255) primary key,
`name` varchar(255),
`sex` varchar(255)
);

6. 修改mysql配置

[root@only ~]# cat /etc/my.cnf
[client]
port = 3306
default-character-set=utf8
[mysqld]
basedir = /home/mysql/mysql-5.7.36
datadir = /home/mysql/mysql-5.7.36/data
port = 3306
character-set-server=utf8
default_storage_engine = InnoDB
server-id=1
log-bin=mysql-bin
binlog_format=row
# binlog-do-db指定数据库开启row级别日志,不指定则表示所有库均开启row级别日志;
binlog-do-db=canalShit
binlog-do-db=flinkcdc
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

7. 重启mysql服务

[root@only ~]# systemctl restart mysql
[root@only ~]# systemctl status mysql
● mysqld.service - LSB: start and stop MySQL
   Loaded: loaded (/etc/rc.d/init.d/mysqld; bad; vendor preset: disabled)
   Active: active (running) since 一 2023-10-16 14:47:05 CST; 7s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 56499 ExecStop=/etc/rc.d/init.d/mysqld stop (code=exited, status=0/SUCCESS)
  Process: 56945 ExecStart=/etc/rc.d/init.d/mysqld start (code=exited, status=0/SUCCESS)
    Tasks: 30
   Memory: 201.6M
   CGroup: /system.slice/mysqld.service
           ├─56960 /bin/sh /home/mysql/mysql-5.7.36/bin/mysqld_safe --datadir...
           └─57192 /home/mysql/mysql-5.7.36/bin/mysqld --basedir=/home/mysql/...

1016 14:47:04 only systemd[1]: Starting LSB: start and stop MySQL...
1016 14:47:05 only mysqld[56945]: Starting MySQL. SUCCESS!
1016 14:47:05 only systemd[1]: Started LSB: start and stop MySQL.

8. 查看配置是否生效

[root@only ~]# ls /home/mysql/mysql-5.7.36/data
addToDoris        kylin             mysql-bin.000014  mysql-bin.000029
auto.cnf          mysql             mysql-bin.000015  mysql-bin.000030
ca-key.pem        mysql-bin.000001  mysql-bin.000016  mysql-bin.000031
canalShit         mysql-bin.000002  mysql-bin.000017  mysql-bin.index
ca.pem            mysql-bin.000003  mysql-bin.000018  only.err
client-cert.pem   mysql-bin.000004  mysql-bin.000019  only.pid
client-key.pem    mysql-bin.000005  mysql-bin.000020  performance_schema
dolphinscheduler  mysql-bin.000006  mysql-bin.000021  private_key.pem
flinkcdc          mysql-bin.000007  mysql-bin.000022  public_key.pem
hivedb            mysql-bin.000008  mysql-bin.000023  server-cert.pem
ib_buffer_pool    mysql-bin.000009  mysql-bin.000024  server-key.pem
ibdata1           mysql-bin.000010  mysql-bin.000025  shit
ib_logfile0       mysql-bin.000011  mysql-bin.000026  sys
ib_logfile1       mysql-bin.000012  mysql-bin.000027
ibtmp1            mysql-bin.000013  mysql-bin.000028
# 存在mysql-bin***文件便是生效了!

flinkcdc编码 - idea

1. pom文件引入对应的依赖

<properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <flinkcdc-version>2.4.0</flinkcdc-version>
        <flink-version>1.14.0</flink-version>
        <hadoop-version>3.3.2</hadoop-version>
        <mysql-version>8.0.31</mysql-version>
        <fastjson-version>1.2.83</fastjson-version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink-version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>${flink-version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>${flink-version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop-version}</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql-version}</version>
        </dependency>

<!--        <dependency>-->
<!--            <groupId>org.apache.flink</groupId>-->
<!--            <artifactId>flink-table-planner-blink_2.12</artifactId>-->
<!--            <version>${flink-version}</version>-->
<!--        </dependency>-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.12</artifactId>
            <version>${flink-version}</version>
        </dependency>

        <dependency>
            <groupId>com.ververica</groupId>
            <artifactId>flink-connector-mysql-cdc</artifactId>
<!--            <version>2.0.0</version>-->
            <version>${flinkcdc-version}</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>${fastjson-version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.3.0</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

2. flinkcdc demo 代码 - datastream

package com.javaye.flinkcdc;

import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @Author: Java页大数据
 * @Date: 2023-10-14:16:55
 * @Describe:
 */
public class FlinkcdcDemo {
    public static void main(String[] args) throws Exception {

        String hostname = "only";
        int port = 3306;
        String username = "flinkcdc";
        String password = "$PASSWORD";
        String databaseList = "flinkcdc";
        String tableList= "flinkcdc.info";
        int parallelism = 1;

//        获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//        获取数据源
        MySqlSource<String> mysqlSource
                = MySqlSource.<String>builder()
                .hostname(hostname)
                .port(port)
                .username(username)
                .password(password)
                .databaseList(databaseList)
                .tableList(tableList)
//                序列化:官方的并不是很实用
//                .deserializer(new StringDebeziumDeserializationSchema())
                .deserializer(new JsonDebeziumDeserializationSchema())
//                几个参数!
                .startupOptions(StartupOptions.initial())
                .build();

//        对数据进行处理
        env.fromSource(mysqlSource, WatermarkStrategy.noWatermarks(), "flinkcdc's mysql source")
                .setParallelism(parallelism)
                .print();

//        flink执行
        env.execute("flinckcdc mysql source demo");

    }
}

3. mysql进行对应的操作

SELECT id, name, sex
FROM flinkcdc.info;

INSERT INTO flinkcdc.info
(id, name, sex)
VALUES('1002', 'java页', 'male');

DELETE FROM flinkcdc.info
WHERE id='1002';

UPDATE flinkcdc.info
SET name='java-ye', sex='male'
WHERE id='1002';

4. idea输出信息

8> {"before":null,"after":{"id":"1002","name":"java-ye","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":0,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":0,"gtid":null,"file":"","pos":0,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1697376064189,"transaction":null}
9> {"before":null,"after":{"id":"1000","name":"javaye","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":0,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":0,"gtid":null,"file":"","pos":0,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1697376064188,"transaction":null}

5. 使用hdfs的checkpoints

5.1. 代码修改

package com.javaye.flinkcdc;

import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @Author: Java页大数据
 * @Date: 2023-10-14:16:55
 * @Describe:
 */
public class FlinkcdcDemo {
    public static void main(String[] args) throws Exception {

        String hostname = "only";
        int port = 3306;
        String username = "flinkcdc";
        String password = "$PASSWORD";
        String databaseList = "flinkcdc";
        String tableList= "flinkcdc.info";
        int parallelism = 1;

//        获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//        开启checkpoint
        env.enableCheckpointing(5000);
        env.getCheckpointConfig().setCheckpointTimeout(10000);
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);

//
        env.setStateBackend(new FsStateBackend("hdfs://only:9000/flinkcdc-backend/checkpoint"));

//        获取数据源
        MySqlSource<String> mysqlSource
                = MySqlSource.<String>builder()
                .hostname(hostname)
                .port(port)
                .username(username)
                .password(password)
                .databaseList(databaseList)
                .tableList(tableList)
//                序列化:官方的并不是很实用
//                .deserializer(new StringDebeziumDeserializationSchema())
                .deserializer(new JsonDebeziumDeserializationSchema())
//                几个参数!
                .startupOptions(StartupOptions.initial())
                .build();

//        对数据进行处理
        env.fromSource(mysqlSource, WatermarkStrategy.noWatermarks(), "flinkcdc's mysql source")
                .setParallelism(parallelism)
                .print();

//        flink执行
        env.execute("flinckcdc mysql source demo");

    }
}

5.2. 打包上传到服务器,并启动

[root@only cdc]# /opt/flink-1.14.4/bin/flink run -m only:8081 -c com.javaye.flinkcdc.FlinkcdcDemo ./flinkcdc-project-1.0-SNAPSHOT-jar-with-dependencies.jar

Job has been submitted with JobID 8a4f2cf7315e71587aa1ebe703ab46b9

5.3. 查看webui相关信息

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

5.4 启动savepoint

# 8a4f2cf7315e71587aa1ebe703ab46b9 为jobid
[root@only ~]# /opt/flink-1.14.4/bin/flink savepoint 8a4f2cf7315e71587aa1ebe703ab46b9 hdfs://only:9000/flinkcdc-backend/savepoint/
Triggering savepoint for job 8a4f2cf7315e71587aa1ebe703ab46b9.
Waiting for response...
Savepoint completed. Path: hdfs://only:9000/flinkcdc-backend/savepoint/savepoint-8a4f2c-29f70090243e
You can resume your program from this savepoint with the run command.

5.5. 取消flink作业

5.6. mysql对该表进行以下操作

SELECT id, name, sex
FROM flinkcdc.info;

INSERT INTO flinkcdc.info
(id, name, sex)
VALUES('1004', 'javaye-flinkcdc', 'male');

UPDATE flinkcdc.info
SET name='java-ye-------------', sex='male'
WHERE id='1004';

DELETE FROM flinkcdc.info
WHERE id='1004';

5.7. 以savepoint启动flink作业

# 需要指定savepoint路径地址
[root@only cdc]# /opt/flink-1.14.4/bin/flink run -m only:8081 -s hdfs://only:9000/flinkcdc-backend/savepoint/savepoint-8a4f2c-29f70090243e  -c com.javaye.flinkcdc.FlinkcdcDemo ./flinkcdc-project-1.0-SNAPSHOT-jar-with-dependencies.jar
Job has been submitted with JobID e50348cd0e35a055c62a25f0272a2223

5.8. 查看日志

2> {"before":null,"after":{"id":"1004","name":"javaye-flinkcdc","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":1697378263000,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":1,"gtid":null,"file":"mysql-bin.000030","pos":4937,"row":0,"thread":4,"query":null},"op":"c","ts_ms":1697378414226,"transaction":null}
1> {"before":{"id":"1004","name":"javaye-flinkcdc","sex":"male"},"after":{"id":"1004","name":"java-ye-------------","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":1697378271000,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":1,"gtid":null,"file":"mysql-bin.000030","pos":5233,"row":0,"thread":4,"query":null},"op":"u","ts_ms":1697378414231,"transaction":null}
2> {"before":{"id":"1004","name":"java-ye-------------","sex":"male"},"after":null,"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":1697378278000,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":1,"gtid":null,"file":"mysql-bin.000030","pos":5565,"row":0,"thread":4,"query":null},"op":"d","ts_ms":1697378414234,"transaction":null}

6. 自定义反序列化器

6.1. 自定义类

package com.javaye.flinkcdc.seri;

import com.alibaba.fastjson.JSONObject;
import com.ververica.cdc.debezium.DebeziumDeserializationSchema;
import io.debezium.data.Envelope;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.util.Collector;
import org.apache.kafka.connect.data.Field;
import org.apache.kafka.connect.data.Schema;
import org.apache.kafka.connect.data.Struct;
import org.apache.kafka.connect.source.SourceRecord;
import java.util.List;

/**
 * @Author: Java页大数据
 * @Date: 2023-10-16:10:12
 * @Describe:
 */
public class Seri01 implements DebeziumDeserializationSchema<String> {
    @Override
    public void deserialize(SourceRecord sourceRecord, Collector collector) throws Exception {
        JSONObject jsonObject = new JSONObject();
//        String sourceServer = (String) sourceRecord.sourcePartition().get("server");
//        Map<String, ?> sourceOffset = sourceRecord.sourceOffset();
//        String binlogFile = (String) sourceOffset.get("file");
//        String position = (String) sourceOffset.get("pos");

//        String topic = sourceRecord.topic();
//        String[] databaseAndTable = topic.split("//.");
//        String database = databaseAndTable[0];
//        String table = databaseAndTable[1];
        Struct value = (Struct) sourceRecord.value();
        Struct before = value.getStruct("before");
        JSONObject beforeJson = new JSONObject();
        if (before != null){
            Schema schema = before.schema();
            List<Field> fields = schema.fields();
            for (Field field : fields) {
                beforeJson.put(field.name(), before.get(field));
            }
        }
        jsonObject.put("before", beforeJson);

        Struct after = value.getStruct("after");
        JSONObject afterJson = new JSONObject();
        if (after != null){
            Schema schema = after.schema();
            List<Field> fields = schema.fields();
            for (Field field : fields) {
                afterJson.put(field.name(), after.get(field));
            }
        }
        jsonObject.put("after", afterJson);

        Struct source = value.getStruct("source");
        String lastSource = source.getString("connector");
        String lastSourceName = source.getString("name");
        Long lastSourceTs = source.getInt64("ts_ms");
        String dbName = source.getString("db");
        String tableName = source.getString("table");
        Long serverId = source.getInt64("server_id");
        String binlogFileName = source.getString("file");
        Long position = source.getInt64("pos");
        String op = value.getString("op");
        JSONObject sourceJson = new JSONObject();
        sourceJson.put("lastSource", lastSource);
        sourceJson.put("lastSourceName", lastSourceName);
        sourceJson.put("lastSourceTs", lastSourceTs);
        sourceJson.put("dbName", dbName);
        sourceJson.put("tableName", tableName);
        sourceJson.put("serverId", serverId);
        sourceJson.put("binlogFileName", binlogFileName);
        sourceJson.put("position", position);
        sourceJson.put("op", op);
        jsonObject.put("source", sourceJson);

        Envelope.Operation operation = Envelope.operationFor(sourceRecord);
        jsonObject.put("op-",operation);

        collector.collect(jsonObject.toJSONString());
    }

    @Override
    public TypeInformation getProducedType() {
        return BasicTypeInfo.STRING_TYPE_INFO;
    }
}

6.2. 使用自定义反序列化器

package com.javaye.flinkcdc;

import com.javaye.flinkcdc.seri.Seri01;
import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import com.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * @Author: Java页大数据
 * @Date: 2023-10-14:16:55
 * @Describe:
 */
public class FlinkcdcDemo {
    public static void main(String[] args) throws Exception {

        String hostname = "only";
        int port = 3306;
        String username = "flinkcdc";
        String password = "$PASSWORD";
        String databaseList = "flinkcdc";
        String tableList= "flinkcdc.info";
        int parallelism = 1;

//        获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

//        hdfs
        开启checkpoint
//        env.enableCheckpointing(5000);
//        env.getCheckpointConfig().setCheckpointTimeout(10000);
//        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
//        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
//

        env.setStateBackend(new FsStateBackend("hdfs://only:8020/flinkcdc-backend/checkpoint"));
//        env.setStateBackend(new FsStateBackend("hdfs://only:9000/flinkcdc-backend/checkpoint"));

//        获取数据源
        MySqlSource<String> mysqlSource
                = MySqlSource.<String>builder()
                .hostname(hostname)
                .port(port)
                .username(username)
                .password(password)
                .databaseList(databaseList)
                .tableList(tableList)
//                序列化:官方的并不是很实用
//                .deserializer(new StringDebeziumDeserializationSchema())
//                .deserializer(new JsonDebeziumDeserializationSchema())
                .deserializer(new Seri01())
//                几个参数!
                .startupOptions(StartupOptions.initial())
                .build();


//        对数据进行处理
        DataStreamSource<String> res = env.fromSource(mysqlSource, WatermarkStrategy.noWatermarks(), "flinkcdc's mysql source")
                .setParallelism(parallelism);

        res.print();



//        flink执行
        env.execute("flinckcdc mysql source demo");

    }
}

idea编写 flinkcdc-sql

1. flinkcdc使用flinksql方式

package com.javaye.flinkcdc;

import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

/**
 * @Author: Java页大数据
 * @Date: 2023-10-15:22:14
 * @Describe:
 */
public class FlinkcdcBySql {
    public static void main(String[] args) throws Exception {

//        获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        tableEnv.executeSql("" +
                "CREATE TABLE info (\n" +
                " id STRING PRIMARY KEY,\n" +
                " name STRING,\n" +
                " sex STRING\n" +
                ") WITH (\n" +
                " 'connector' = 'mysql-cdc',\n" +
                " 'scan.startup.mode' = 'initial',\n" +
                " 'hostname' = 'only',\n" +
                " 'port' = '3306',\n" +
                " 'username' = 'flinkcdc',\n" +
                " 'password' = '$PASSWORD',\n" +
                " 'database-name' = 'flinkcdc',\n" +
                " 'table-name' = 'info'\n" +
                ")");

        Table table = tableEnv.sqlQuery("select * from info");
        DataStream<Tuple2<Boolean, Row>> tuple2DataStream = tableEnv.toRetractStream(table, Row.class);
        tuple2DataStream.print();
        env.execute();
    }
}

2. 输出截图

在这里插入图片描述

flinkcdc官网

  • https://ververica.github.io/flink-cdc-connectors/master/content/about.html
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值