定义
CDC 是 Change Data Capture(变更数据获取)的简称。核心思想是监测并捕获数据库的变动(包括数据或数据表的插入、更新以及删除等),将这些变更按发生的顺序完整记录下来,写入到消息中间件中以供其他服务进行订阅及消费。
前置要求
1. 启动zookeeper
[root@only ~]# /opt/zookeeper-3.8.0/bin/zkServer.sh start
2. 启动hadoop
[root@only conf]# /opt/hadoop-3.3.2/sbin/start-all.sh
3. 完善flink的配置要求(此处为我的配置,不必跟风操纵) - 新建hdfs对应的目录
hdfs dfs -mkdir -p /flink/dan /flink-checkpoints /flink-savepoints /flink_history/ /flink_history/
4. 启动flink
[root@only bin]# /opt/flink-1.14.4/bin/start-cluster.sh
Starting HA cluster with 1 masters.
Starting standalonesession daemon on host only.
Starting taskexecutor daemon on host only.
5. 查看进程
48274 org.apache.flink.runtime.taskexecutor.TaskManagerRunner --configDir /opt/flink-1.14.4/conf -D taskmanager.memory.network.min=134217730b -D taskmanager.cpu.cores=4.0 -D taskmanager.memory.task.off-heap.size=0b -D taskmanager.memory.jvm-metaspace.size=268435456b -D external-resources=none -D taskmanager.memory.jvm-overhead.min=201326592b -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=134217730b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=536870920b -D taskmanager.memory.task.heap.size=402653174b -D taskmanager.numberOfTaskSlots=4 -D taskmanager.memory.jvm-overhead.max=201326592b
22006 org.apache.hadoop.yarn.server.nodemanager.NodeManager
20696 org.apache.hadoop.hdfs.server.namenode.NameNode
47960 org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint --configDir /opt/flink-1.14.4/conf --executionMode cluster --host only --webui-port 8081 -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b
45369 org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper-3.8.0/bin/../conf/zoo.cfg
21370 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
57850 sun.tools.jps.Jps -ml
20959 org.apache.hadoop.hdfs.server.datanode.DataNode
21807 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
数据库新建用户及赋权
1. 新建数据库 - 用于flinkcdc
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| addToDoris |
| canalShit |
| dolphinscheduler |
| hivedb |
| kylin |
| mysql |
| performance_schema |
| shit |
| sys |
+--------------------+
10 rows in set (0.03 sec)
mysql> create database if not exists flinkcdc;
Query OK, 1 row affected (0.02 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| addToDoris |
| canalShit |
| dolphinscheduler |
| flinkcdc |
| hivedb |
| kylin |
| mysql |
| performance_schema |
| shit |
| sys |
+--------------------+
11 rows in set (0.00 sec)
2. 新建用户
mysql> create user 'flinkcdc'@'%' identified by '$PASSWORD';
Query OK, 0 rows affected (0.03 sec)
3. 为用户赋权
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flinkcdc' IDENTIFIED BY '$PASSWORD';
ERROR 1044 (42000): Access denied for user 'root'@'localhost' to database 'flinkcdc'
# 指明主机来登录 -h
[root@only hadoop-3.3.2]# mysql -uroot -p${PASSWORD} -h only
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.36-log MySQL Community Server (GPL)
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
-- 无登陆权限
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flinkcdc' IDENTIFIED BY '$PASSWORD';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
-- 有登陆权限
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'flinkcdc' IDENTIFIED BY '$PASSWORD'with grant option;
Query OK, 0 rows affected (0.00 sec)
4. 使用新用户登陆,查看数据库
[root@only hadoop-3.3.2]# mysql -uflinkcdc -p${PASSWORD} -h only
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.36-log MySQL Community Server (GPL)
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| addToDoris |
| canalShit |
| dolphinscheduler |
| flinkcdc |
| hivedb |
| kylin |
| mysql |
| performance_schema |
| shit |
| sys |
+--------------------+
11 rows in set (0.03 sec)
5. 新建表
CREATE TABLE flinkcdc.info(
`id` varchar(255) primary key,
`name` varchar(255),
`sex` varchar(255)
);
6. 修改mysql配置
[root@only ~]# cat /etc/my.cnf
[client]
port = 3306
default-character-set=utf8
[mysqld]
basedir = /home/mysql/mysql-5.7.36
datadir = /home/mysql/mysql-5.7.36/data
port = 3306
character-set-server=utf8
default_storage_engine = InnoDB
server-id=1
log-bin=mysql-bin
binlog_format=row
# binlog-do-db指定数据库开启row级别日志,不指定则表示所有库均开启row级别日志;
binlog-do-db=canalShit
binlog-do-db=flinkcdc
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
7. 重启mysql服务
[root@only ~]# systemctl restart mysql
[root@only ~]# systemctl status mysql
● mysqld.service - LSB: start and stop MySQL
Loaded: loaded (/etc/rc.d/init.d/mysqld; bad; vendor preset: disabled)
Active: active (running) since 一 2023-10-16 14:47:05 CST; 7s ago
Docs: man:systemd-sysv-generator(8)
Process: 56499 ExecStop=/etc/rc.d/init.d/mysqld stop (code=exited, status=0/SUCCESS)
Process: 56945 ExecStart=/etc/rc.d/init.d/mysqld start (code=exited, status=0/SUCCESS)
Tasks: 30
Memory: 201.6M
CGroup: /system.slice/mysqld.service
├─56960 /bin/sh /home/mysql/mysql-5.7.36/bin/mysqld_safe --datadir...
└─57192 /home/mysql/mysql-5.7.36/bin/mysqld --basedir=/home/mysql/...
10月 16 14:47:04 only systemd[1]: Starting LSB: start and stop MySQL...
10月 16 14:47:05 only mysqld[56945]: Starting MySQL. SUCCESS!
10月 16 14:47:05 only systemd[1]: Started LSB: start and stop MySQL.
8. 查看配置是否生效
[root@only ~]# ls /home/mysql/mysql-5.7.36/data
addToDoris kylin mysql-bin.000014 mysql-bin.000029
auto.cnf mysql mysql-bin.000015 mysql-bin.000030
ca-key.pem mysql-bin.000001 mysql-bin.000016 mysql-bin.000031
canalShit mysql-bin.000002 mysql-bin.000017 mysql-bin.index
ca.pem mysql-bin.000003 mysql-bin.000018 only.err
client-cert.pem mysql-bin.000004 mysql-bin.000019 only.pid
client-key.pem mysql-bin.000005 mysql-bin.000020 performance_schema
dolphinscheduler mysql-bin.000006 mysql-bin.000021 private_key.pem
flinkcdc mysql-bin.000007 mysql-bin.000022 public_key.pem
hivedb mysql-bin.000008 mysql-bin.000023 server-cert.pem
ib_buffer_pool mysql-bin.000009 mysql-bin.000024 server-key.pem
ibdata1 mysql-bin.000010 mysql-bin.000025 shit
ib_logfile0 mysql-bin.000011 mysql-bin.000026 sys
ib_logfile1 mysql-bin.000012 mysql-bin.000027
ibtmp1 mysql-bin.000013 mysql-bin.000028
# 存在mysql-bin***文件便是生效了!
flinkcdc编码 - idea
1. pom文件引入对应的依赖
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<flinkcdc-version>2.4.0</flinkcdc-version>
<flink-version>1.14.0</flink-version>
<hadoop-version>3.3.2</hadoop-version>
<mysql-version>8.0.31</mysql-version>
<fastjson-version>1.2.83</fastjson-version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.12</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.12</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop-version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql-version}</version>
</dependency>
<!-- <dependency>-->
<!-- <groupId>org.apache.flink</groupId>-->
<!-- <artifactId>flink-table-planner-blink_2.12</artifactId>-->
<!-- <version>${flink-version}</version>-->
<!-- </dependency>-->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.12</artifactId>
<version>${flink-version}</version>
</dependency>
<dependency>
<groupId>com.ververica</groupId>
<artifactId>flink-connector-mysql-cdc</artifactId>
<!-- <version>2.0.0</version>-->
<version>${flinkcdc-version}</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${fastjson-version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
2. flinkcdc demo 代码 - datastream
package com.javaye.flinkcdc;
import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* @Author: Java页大数据
* @Date: 2023-10-14:16:55
* @Describe:
*/
public class FlinkcdcDemo {
public static void main(String[] args) throws Exception {
String hostname = "only";
int port = 3306;
String username = "flinkcdc";
String password = "$PASSWORD";
String databaseList = "flinkcdc";
String tableList= "flinkcdc.info";
int parallelism = 1;
// 获取执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 获取数据源
MySqlSource<String> mysqlSource
= MySqlSource.<String>builder()
.hostname(hostname)
.port(port)
.username(username)
.password(password)
.databaseList(databaseList)
.tableList(tableList)
// 序列化:官方的并不是很实用
// .deserializer(new StringDebeziumDeserializationSchema())
.deserializer(new JsonDebeziumDeserializationSchema())
// 几个参数!
.startupOptions(StartupOptions.initial())
.build();
// 对数据进行处理
env.fromSource(mysqlSource, WatermarkStrategy.noWatermarks(), "flinkcdc's mysql source")
.setParallelism(parallelism)
.print();
// flink执行
env.execute("flinckcdc mysql source demo");
}
}
3. mysql进行对应的操作
SELECT id, name, sex
FROM flinkcdc.info;
INSERT INTO flinkcdc.info
(id, name, sex)
VALUES('1002', 'java页', 'male');
DELETE FROM flinkcdc.info
WHERE id='1002';
UPDATE flinkcdc.info
SET name='java-ye', sex='male'
WHERE id='1002';
4. idea输出信息
8> {"before":null,"after":{"id":"1002","name":"java-ye","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":0,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":0,"gtid":null,"file":"","pos":0,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1697376064189,"transaction":null}
9> {"before":null,"after":{"id":"1000","name":"javaye","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":0,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":0,"gtid":null,"file":"","pos":0,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1697376064188,"transaction":null}
5. 使用hdfs的checkpoints
5.1. 代码修改
package com.javaye.flinkcdc;
import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* @Author: Java页大数据
* @Date: 2023-10-14:16:55
* @Describe:
*/
public class FlinkcdcDemo {
public static void main(String[] args) throws Exception {
String hostname = "only";
int port = 3306;
String username = "flinkcdc";
String password = "$PASSWORD";
String databaseList = "flinkcdc";
String tableList= "flinkcdc.info";
int parallelism = 1;
// 获取执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 开启checkpoint
env.enableCheckpointing(5000);
env.getCheckpointConfig().setCheckpointTimeout(10000);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
//
env.setStateBackend(new FsStateBackend("hdfs://only:9000/flinkcdc-backend/checkpoint"));
// 获取数据源
MySqlSource<String> mysqlSource
= MySqlSource.<String>builder()
.hostname(hostname)
.port(port)
.username(username)
.password(password)
.databaseList(databaseList)
.tableList(tableList)
// 序列化:官方的并不是很实用
// .deserializer(new StringDebeziumDeserializationSchema())
.deserializer(new JsonDebeziumDeserializationSchema())
// 几个参数!
.startupOptions(StartupOptions.initial())
.build();
// 对数据进行处理
env.fromSource(mysqlSource, WatermarkStrategy.noWatermarks(), "flinkcdc's mysql source")
.setParallelism(parallelism)
.print();
// flink执行
env.execute("flinckcdc mysql source demo");
}
}
5.2. 打包上传到服务器,并启动
[root@only cdc]# /opt/flink-1.14.4/bin/flink run -m only:8081 -c com.javaye.flinkcdc.FlinkcdcDemo ./flinkcdc-project-1.0-SNAPSHOT-jar-with-dependencies.jar
Job has been submitted with JobID 8a4f2cf7315e71587aa1ebe703ab46b9
5.3. 查看webui相关信息



5.4 启动savepoint
# 8a4f2cf7315e71587aa1ebe703ab46b9 为jobid
[root@only ~]# /opt/flink-1.14.4/bin/flink savepoint 8a4f2cf7315e71587aa1ebe703ab46b9 hdfs://only:9000/flinkcdc-backend/savepoint/
Triggering savepoint for job 8a4f2cf7315e71587aa1ebe703ab46b9.
Waiting for response...
Savepoint completed. Path: hdfs://only:9000/flinkcdc-backend/savepoint/savepoint-8a4f2c-29f70090243e
You can resume your program from this savepoint with the run command.
5.5. 取消flink作业
5.6. mysql对该表进行以下操作
SELECT id, name, sex
FROM flinkcdc.info;
INSERT INTO flinkcdc.info
(id, name, sex)
VALUES('1004', 'javaye-flinkcdc', 'male');
UPDATE flinkcdc.info
SET name='java-ye-------------', sex='male'
WHERE id='1004';
DELETE FROM flinkcdc.info
WHERE id='1004';
5.7. 以savepoint启动flink作业
# 需要指定savepoint路径地址
[root@only cdc]# /opt/flink-1.14.4/bin/flink run -m only:8081 -s hdfs://only:9000/flinkcdc-backend/savepoint/savepoint-8a4f2c-29f70090243e -c com.javaye.flinkcdc.FlinkcdcDemo ./flinkcdc-project-1.0-SNAPSHOT-jar-with-dependencies.jar
Job has been submitted with JobID e50348cd0e35a055c62a25f0272a2223
5.8. 查看日志
2> {"before":null,"after":{"id":"1004","name":"javaye-flinkcdc","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":1697378263000,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":1,"gtid":null,"file":"mysql-bin.000030","pos":4937,"row":0,"thread":4,"query":null},"op":"c","ts_ms":1697378414226,"transaction":null}
1> {"before":{"id":"1004","name":"javaye-flinkcdc","sex":"male"},"after":{"id":"1004","name":"java-ye-------------","sex":"male"},"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":1697378271000,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":1,"gtid":null,"file":"mysql-bin.000030","pos":5233,"row":0,"thread":4,"query":null},"op":"u","ts_ms":1697378414231,"transaction":null}
2> {"before":{"id":"1004","name":"java-ye-------------","sex":"male"},"after":null,"source":{"version":"1.9.7.Final","connector":"mysql","name":"mysql_binlog_source","ts_ms":1697378278000,"snapshot":"false","db":"flinkcdc","sequence":null,"table":"info","server_id":1,"gtid":null,"file":"mysql-bin.000030","pos":5565,"row":0,"thread":4,"query":null},"op":"d","ts_ms":1697378414234,"transaction":null}
6. 自定义反序列化器
6.1. 自定义类
package com.javaye.flinkcdc.seri;
import com.alibaba.fastjson.JSONObject;
import com.ververica.cdc.debezium.DebeziumDeserializationSchema;
import io.debezium.data.Envelope;
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.util.Collector;
import org.apache.kafka.connect.data.Field;
import org.apache.kafka.connect.data.Schema;
import org.apache.kafka.connect.data.Struct;
import org.apache.kafka.connect.source.SourceRecord;
import java.util.List;
/**
* @Author: Java页大数据
* @Date: 2023-10-16:10:12
* @Describe:
*/
public class Seri01 implements DebeziumDeserializationSchema<String> {
@Override
public void deserialize(SourceRecord sourceRecord, Collector collector) throws Exception {
JSONObject jsonObject = new JSONObject();
// String sourceServer = (String) sourceRecord.sourcePartition().get("server");
// Map<String, ?> sourceOffset = sourceRecord.sourceOffset();
// String binlogFile = (String) sourceOffset.get("file");
// String position = (String) sourceOffset.get("pos");
// String topic = sourceRecord.topic();
// String[] databaseAndTable = topic.split("//.");
// String database = databaseAndTable[0];
// String table = databaseAndTable[1];
Struct value = (Struct) sourceRecord.value();
Struct before = value.getStruct("before");
JSONObject beforeJson = new JSONObject();
if (before != null){
Schema schema = before.schema();
List<Field> fields = schema.fields();
for (Field field : fields) {
beforeJson.put(field.name(), before.get(field));
}
}
jsonObject.put("before", beforeJson);
Struct after = value.getStruct("after");
JSONObject afterJson = new JSONObject();
if (after != null){
Schema schema = after.schema();
List<Field> fields = schema.fields();
for (Field field : fields) {
afterJson.put(field.name(), after.get(field));
}
}
jsonObject.put("after", afterJson);
Struct source = value.getStruct("source");
String lastSource = source.getString("connector");
String lastSourceName = source.getString("name");
Long lastSourceTs = source.getInt64("ts_ms");
String dbName = source.getString("db");
String tableName = source.getString("table");
Long serverId = source.getInt64("server_id");
String binlogFileName = source.getString("file");
Long position = source.getInt64("pos");
String op = value.getString("op");
JSONObject sourceJson = new JSONObject();
sourceJson.put("lastSource", lastSource);
sourceJson.put("lastSourceName", lastSourceName);
sourceJson.put("lastSourceTs", lastSourceTs);
sourceJson.put("dbName", dbName);
sourceJson.put("tableName", tableName);
sourceJson.put("serverId", serverId);
sourceJson.put("binlogFileName", binlogFileName);
sourceJson.put("position", position);
sourceJson.put("op", op);
jsonObject.put("source", sourceJson);
Envelope.Operation operation = Envelope.operationFor(sourceRecord);
jsonObject.put("op-",operation);
collector.collect(jsonObject.toJSONString());
}
@Override
public TypeInformation getProducedType() {
return BasicTypeInfo.STRING_TYPE_INFO;
}
}
6.2. 使用自定义反序列化器
package com.javaye.flinkcdc;
import com.javaye.flinkcdc.seri.Seri01;
import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import com.ververica.cdc.connectors.mysql.table.StartupOptions;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import com.ververica.cdc.debezium.StringDebeziumDeserializationSchema;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* @Author: Java页大数据
* @Date: 2023-10-14:16:55
* @Describe:
*/
public class FlinkcdcDemo {
public static void main(String[] args) throws Exception {
String hostname = "only";
int port = 3306;
String username = "flinkcdc";
String password = "$PASSWORD";
String databaseList = "flinkcdc";
String tableList= "flinkcdc.info";
int parallelism = 1;
// 获取执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// hdfs
开启checkpoint
// env.enableCheckpointing(5000);
// env.getCheckpointConfig().setCheckpointTimeout(10000);
// env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
//
env.setStateBackend(new FsStateBackend("hdfs://only:8020/flinkcdc-backend/checkpoint"));
// env.setStateBackend(new FsStateBackend("hdfs://only:9000/flinkcdc-backend/checkpoint"));
// 获取数据源
MySqlSource<String> mysqlSource
= MySqlSource.<String>builder()
.hostname(hostname)
.port(port)
.username(username)
.password(password)
.databaseList(databaseList)
.tableList(tableList)
// 序列化:官方的并不是很实用
// .deserializer(new StringDebeziumDeserializationSchema())
// .deserializer(new JsonDebeziumDeserializationSchema())
.deserializer(new Seri01())
// 几个参数!
.startupOptions(StartupOptions.initial())
.build();
// 对数据进行处理
DataStreamSource<String> res = env.fromSource(mysqlSource, WatermarkStrategy.noWatermarks(), "flinkcdc's mysql source")
.setParallelism(parallelism);
res.print();
// flink执行
env.execute("flinckcdc mysql source demo");
}
}
idea编写 flinkcdc-sql
1. flinkcdc使用flinksql方式
package com.javaye.flinkcdc;
import com.ververica.cdc.connectors.mysql.source.MySqlSource;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
/**
* @Author: Java页大数据
* @Date: 2023-10-15:22:14
* @Describe:
*/
public class FlinkcdcBySql {
public static void main(String[] args) throws Exception {
// 获取执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
tableEnv.executeSql("" +
"CREATE TABLE info (\n" +
" id STRING PRIMARY KEY,\n" +
" name STRING,\n" +
" sex STRING\n" +
") WITH (\n" +
" 'connector' = 'mysql-cdc',\n" +
" 'scan.startup.mode' = 'initial',\n" +
" 'hostname' = 'only',\n" +
" 'port' = '3306',\n" +
" 'username' = 'flinkcdc',\n" +
" 'password' = '$PASSWORD',\n" +
" 'database-name' = 'flinkcdc',\n" +
" 'table-name' = 'info'\n" +
")");
Table table = tableEnv.sqlQuery("select * from info");
DataStream<Tuple2<Boolean, Row>> tuple2DataStream = tableEnv.toRetractStream(table, Row.class);
tuple2DataStream.print();
env.execute();
}
}
2. 输出截图

flinkcdc官网
- https://ververica.github.io/flink-cdc-connectors/master/content/about.html
1373

被折叠的 条评论
为什么被折叠?



