❝
Debezium是一个捕获数据更改(CDC)平台,并且利用Kafka和Kafka Connect实现了自己的持久性、可靠性和容错性。常见的数据更改捕获都是通过数据库比如mysql的binlog来达到目的。
「这样的好处是,只需导入依赖,不额外引入组件,同时无需改动之前的代码。两边完全解耦,互不干扰。」❞
为什么是debezium
这么多技术框架,为什么选debezium?
看起来很多。但一一排除下来就debezium和canal。
sqoop,kettle,datax之类的工具,属于前大数据时代的产物,地位类似于web领域的structs2。而且,它们基于查询而非binlog日志,其实不属于CDC。首先排除。
flink cdc是大数据领域的框架,一般web项目的数据量属于大材小用了。
同时databus,maxwell相对比较冷门,用得比较少。
「最后不用canal的原因有以下几点:」
- canal需要安装,这违背了“如非必要,勿增实体”的原则。
- canal只能对MYSQL进行CDC监控。有很大的局限性。
- 大数据领域非常流行的flink cdc(阿里团队主导)底层使用的也是debezium,而非同是阿里出品的canal。
- debezium可借助kafka组件,将变动的数据发到kafka topic,后续的读取操作只需读取kafka,可有效减少数据库的读取压力。可保证一次语义,至少一次语义。
- 同时,也可基于内嵌部署模式,无需我们手动部署kafka集群,可满足”如非必要,勿增实体“的原则。
springboot 整合 Debezium之Mysql
依赖
<debezium.version>1.7.0.Final</debezium.version>
<mysql.connector.version>8.0.26</mysql.connector.version>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.connector.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-api</artifactId>
<version>${debezium.version}</version>
</dependency>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-embedded</artifactId>
<version>${debezium.version}</version>
</dependency>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-connector-mysql</artifactId>
<version>${debezium.version}</version>
<exclusions>
<exclusion>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</exclusion>
</exclusions>
</dependency>
注意debezium版本为1.7.0.Final,对应mysql驱动为8.0.26,低于这个版本会报兼容错误。
配置
相应的配置
debezium.datasource.hostname = localhost
debezium.datasource.port = 3306
debezium.datasource.user = root
debezium.datasource.password = 123456
debezium.datasource.tableWhitelist = test.test
debezium.datasource.storageFile = E:/debezium/test/offsets/offset.dat
debezium.datasource.historyFile = E:/debezium/test/history/custom-file-db-history.dat
debezium.datasource.flushInterval = 10000
debezium.datasource.serverId = 1
debezium.datasource.serverName = name-1
然后进行配置初始化。
主要的配置项:
connector.class
监控的数据库类型,这里选mysql。offset.storage
选择FileOffsetBackingStore
时,意思把读取进度存到本地文件,因为我们不用kafka,当使用kafka时,选KafkaOffsetBackingStore
。offset.storage.file.filename
存放读取进度的本地文件地址。offset.flush.interval.ms
读取进度刷新保存频率,默认1分钟。如果不依赖kafka的话,应该就没有exactly once只读取一次语义,应该是至少读取一次。意味着可能重复读取。如果web容器挂了,最新的读取进度没有刷新到文件里,下次重启时,就会重复读取binlog。table.whitelist
监控的表名白名单,建议设置此值,只监控这些表的binlog。database.whitelist
监控的数据库白名单,如果选此值,会忽略table.whitelist
,然后监控此db下所有表的binlog。
import io.debezium.connector.mysql.MySqlConnector;
import io.debezium.relational.history.FileDatabaseHistory;
import lombok.Data;
import org.apache.kafka.connect.storage.FileOffsetBackingStore;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.io.File;
import java.io.IOException;
/**
* @className: MysqlConfig
* @author: nyp
* @description: TODO
* @date: 2023/8/7 13:53
* @version: 1.0
*/
@Configuration
@ConfigurationProperties(prefix ="debezium.datasource")
@Data
public class MysqlBinlogConfig {
private String hostname;
private String port;
private String user;
private String password;
private String tableWhitelist;
private String storageFile;
private String historyFile;
private Long flushInterval;
private String serverId;
private String serverName;
@Bean
public io.debezium.config.Configuration MysqlBinlogConfig () throws Exception {
checkFile();
io.debezium.config.Configuration configuration = io.debezium.config.Configuration.create()
.with("name", "mysql_connector")
.with("connector.class", MySqlConnector.class)
// .with("offset.storage", KafkaOffsetBackingStore.class)
.with("offset.storage", FileOffsetBackingStore.class)
.with("offset.storage.file.filename", storageFile)
.with("offset.flush.interval.ms", flushInterval)
.with("database.history", FileDatabaseHistory.class.getName())
.with("database.history.file.filename", historyFile)
.with("snapshot.mode", "Schema_only")
.with("database.server.id", serverId)
.with("database.server.name", serverName)
.with("database.hostname", hostname)
// .with("database.dbname", dbname)
.with("database.port", port)
.with("database.user", user)
.with("database.password", password)
// .with("database.whitelist", "test")
.with("table.whitelist", tableWhitelist)
.build();
return configuration;
}
private void checkFile() throws IOException {
String dir = storageFile.substring(0, storageFile.lastIndexOf("/"));
File dirFile = new File(dir);
if(!dirFile.exists()){
dirFile.mkdirs();
}
File file = new File(storageFile);
if(!file.exists()){
file.createNewFile();
}
}
}
snapshot.mode
快照模式,指定连接器启动时运行快照的条件。可能的设置有:
initial
只有在没有为逻辑服务器名记录偏移量时,连接器才运行快照。When_needed
当连接器认为有必要时,它会在启动时运行快照。也就是说,当没有可用的偏移量时,或者当先前记录的偏移量指定了服务器中不可用的binlog位置或GTID时。Never
连接器从不使用快照。在第一次使用逻辑服务器名启动时,连接器从binlog的开头读取。谨慎配置此行为。只有当binlog保证包含数据库的整个历史记录时,它才有效。Schema_only
连接器运行模式而不是数据的快照。当您不需要主题包含数据的一致快照,而只需要主题包含自连接器启动以来的更改时,此设置非常有用。Schema_only_recovery
这是已经捕获更改的连接器的恢复设置。当您重新启动连接器时,此设置允许恢复损坏或丢失的数据库历史主题。您可以定期将其设置为“清理”意外增长的数据库历史主题。数据库历史主题需要无限保留。database.server.id
伪装成slave的Debezium服务的id,自定义,有多个Debezium服务不能重复,如果重复的话会报以下异常。
io.debezium.DebeziumException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event 'binlog.000013' at 46647257, the last event read from './binlog.000013' at 125, the last byte read from './binlog.000013' at 46647257. Error code: 1236; SQLSTATE: HY000.
at io.debezium.connector.mysql.MySqlStreamingChangeEventSource.wrap(MySqlStreamingChangeEventSource.java:1167)
at io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener.onCommunicationFailure(MySqlStreamingChangeEventSource.java:1212)
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:980)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:599)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:857)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.github.shyiko.mysql.binlog.network.ServerException: A slave with the same server_uuid/server_id as this slave has connected to the master; the first event 'binlog.000013' at 46647257, the last event read from './binlog.000013' at 125, the last byte read from './binlog.000013' at 46647257.
at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:944)
... 3 common frames omitted
监听
配置监听服务
import com.alibaba.fastjson.JSON;
import io.debezium.config.Configuration;
import io.debezium.data.Envelope;
import io.debezium.engine.ChangeEvent;
import io.debezium.engine.DebeziumEngine;
import io.debezium.engine.format.Json;
import lombok.Builder;
import lombok.Data;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import javax.annotation.Resource;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.Executor;
/**
* @projectName: test
* @package: com.test.config
* @className: MysqlBinlogListener
* @author: nyp
* @description: TODO
* @date: 2023/8/7 13:56
* @version: 1.0
*/
@Component
@Slf4j
public class MysqlBinlogListener {
@Resource
private Executor taskExecutor;
private final List<DebeziumEngine<ChangeEvent<String, String>>> engineList = new ArrayList<>();
private MysqlBinlogListener (@Qualifier("mysqlConnector") Configuration configuration) {
this.engineList.add(DebeziumEngine.create(Json.class)
.using(configuration.asProperties())
.notifying(record -> receiveChangeEvent(record.value()))
.build());
}
private void receiveChangeEvent(String value) {
if (Objects.nonNull(value)) {
Map<String, Object> payload = getPayload(value);
String op = JSON.parseObject(JSON.toJSONString(payload.get("op")), String.class);
if (!(StringUtils.isBlank(op) || Envelope.Operation.READ.equals(op))) {
ChangeData changeData = getChangeData(payload);
// 这里抛出异常会导致后面的日志监听失败
try {
mysqlBinlogService.service(changeData);
}catch (Exception e){
log.error("binlog处理异常,原数据: " + changeData, e);
}
}
}
}
@PostConstruct
private void start() {
for (DebeziumEngine<ChangeEvent<String, String>> engine : engineList) {
taskExecutor.execute(engine);
}
}
@PreDestroy
private void stop() {
for (DebeziumEngine<ChangeEvent<String, String>> engine : engineList) {
if (engine != null) {
try {
engine.close();
} catch (IOException e) {
log.error("", e);
}
}
}
}
public static Map<String, Object> getPayload(String value) {
Map<String, Object> map = JSON.parseObject(value, Map.class);
Map<String, Object> payload = JSON.parseObject(JSON.toJSONString(map.get("payload")), Map.class);
return payload;
}
public static ChangeData getChangeData(Map<String, Object> payload) {
Map<String, Object> source = JSON.parseObject(JSON.toJSONString(payload.get("source")), Map.class);
return ChangeData.builder()
.op(payload.get("op").toString())
.table(source.get("table").toString())
.after(JSON.parseObject(JSON.toJSONString(payload.get("after")), Map.class))
.source(JSON.parseObject(JSON.toJSONString(payload.get("source")), Map.class))
.before(JSON.parseObject(JSON.toJSONString(payload.get("before")), Map.class))
.build();
}
@Data
@Builder
public static class ChangeData {
/**
* 更改前数据
*/
private Map<String, Object> after;
private Map<String, Object> source;
/**
* 更改后数据
*/
private Map<String, Object> before;
/**
* 更改的表名
*/
private String table;
/**
* 操作类型, 枚举 Envelope.Operation
*/
private String op;
}
}
将监听到的binlog日志封装为ChangeData对象,包括表名,更改前后的数据,
以及操作类型
READ("r"),
CREATE("c"),
UPDATE("u"),
DELETE("d"),
TRUNCATE("t");
springboot 整合 Debezium之MongoDB
添加依赖
<debezium.version>1.6.1.Final</debezium.version>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-api</artifactId>
<version>${debezium.version}</version>
</dependency>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-embedded</artifactId>
<version>${debezium.version}</version>
</dependency>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-connector-mongodb</artifactId>
<version>${debezium.version}</version>
</dependency>
<dependency>
<groupId>io.debezium</groupId>
<artifactId>debezium-scripting</artifactId>
<version>${debezium.version}</version>
</dependency>
配置项目启动加载
package net.commchina.task.config;
import io.debezium.engine.DebeziumEngine;
import lombok.extern.slf4j.Slf4j;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import java.io.IOException;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
@Slf4j
public class DebeziumServerBootstrap implements ApplicationRunner {
private final Executor executor = Executors.newSingleThreadExecutor();
private DebeziumEngine<?> debeziumEngine;
public Executor getExecutor()
{
return executor;
}
public DebeziumEngine<?> getDebeziumEngine()
{
return debeziumEngine;
}
public void setDebeziumEngine(DebeziumEngine<?> debeziumEngine)
{
this.debeziumEngine = debeziumEngine;
}
@Override
public void run(ApplicationArguments args) throws Exception {
executor.execute(debeziumEngine);
Runtime.getRuntime().addShutdownHook(new Thread(()->{
try {
debeziumEngine.close();
} catch (IOException e) {
log.error("addShutdownHook error:{}",e);
}
}));
}
}
配置类Debezium
package net.commchina.task.config;
import com.alibaba.fastjson.JSON;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import io.debezium.engine.ChangeEvent;
import io.debezium.engine.DebeziumEngine;
import io.debezium.engine.format.Json;
import lombok.extern.slf4j.Slf4j;
import net.commchina.task.listener.TaskEventListener;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.Map;
@Slf4j
@Configuration
@ConditionalOnProperty(name = "mongodb.sync.data", havingValue = "true")
public class DebeziumConfiguration {
@Value("${mongodb.hosts}")
private String mongodbHosts;
@Value("${mongodb.name}")
private String mongodbName;
@Value("${mongodb.user}")
private String mongodbUser;
@Value("${mongodb.password}")
private String mongodbPassword;
@Value("${mongodb.authsource}")
private String mongodbAuthsource;
@Value("${database.include.list}")
private String databaseIncludeList;
@Value("${collection.include.list}")
private String collectionIncludeList;
private static final ObjectMapper objectMapper = new ObjectMapper();
@Autowired
private TaskEventListener taskEventListener;
/**
* Debezium 配置.
*
* @return configuration
*/
@Bean
io.debezium.config.Configuration debeziumConfig()
{
return io.debezium.config.Configuration.create()
//连接器的唯一名称
.with("name", "inventory-connector")
//连接器的Java类名称
.with("connector.class", "io.debezium.connector.mongodb.MongoDbConnector")
//副本集中 MongoDB 服务器的主机名和端口对(以“主机”或“主机:端口”的形式)的逗号分隔列表
//.with("mongodb.hosts", "Replica-Set/192.168.188.152:27017,Replica-Set/192.168.188.152:27018,Replica-Set/192.168.188.152:27019")
.with("mongodb.hosts", mongodbHosts)
.with("mongodb.name",mongodbName)
.with("mongodb.user",mongodbUser)
.with("mongodb.password",mongodbPassword)
.with("mongodb.authsource",mongodbAuthsource)
//包含的数据库列表
.with("database.include.list", databaseIncludeList)
.with("collection.include.list",collectionIncludeList)
//偏移量持久化,用来容错 默认值
.with("offset.storage", "org.apache.kafka.connect.storage.FileOffsetBackingStore")
//偏移量持久化文件路径 默认/tmp/offsets.dat 如果路径配置不正确可能导致无法存储偏移量 可能会导致重复消费变更
//如果连接器重新启动,它将使用最后记录的偏移量来知道它应该恢复读取源信息中的哪个位置。
.with("offset.storage.file.filename", "/logs/offsets.dat")
//捕获偏移量的周期
.with("offset.flush.interval.ms", "6000")
//是否包含数据库表结构层面的变更,建议使用默认值true
.with("include.schema.changes", "false")
//历史变更记录
.with("database.history", "io.debezium.relational.history.FileDatabaseHistory")
//历史变更记录存储位置,存储DDL
.with("database.history.file.filename", "/logs/dbhistory.dat")
.with("transforms","filter")
.with("transforms.filter.type","io.debezium.transforms.Filter")
.with("transforms.filter.language","jsr223.groovy")
.with("transforms.filter.condition","value.op == 'u' || value.op == 'c' || value.op == 'd'")
.build();
}
/**
* Debezium server bootstrap debezium server bootstrap.
*
* @param configuration the configuration
* @return the debezium server bootstrap
*/
@Bean
DebeziumServerBootstrap debeziumServerBootstrap(io.debezium.config.Configuration configuration)
{
DebeziumServerBootstrap debeziumServerBootstrap = new DebeziumServerBootstrap();
DebeziumEngine<ChangeEvent<String,String>> debeziumEngine = DebeziumEngine.create(Json.class)
.using(configuration.asProperties())
.notifying((records, committer)->{
records.forEach(r->{
try {
log.debug("key:{}---value:{}",r.key(),r.value());
if(null!=r.value()){
Map<String, Object> map = objectMapper.readValue(r.value(), new TypeReference<Map<String, Object>>() {});
Object payload = map.get("payload");
taskEventListener.syncData(JSON.toJSONString(payload));
}
committer.markProcessed(r);
} catch (Exception e) {
log.error("e:{}",e);
}
});
})
.build();
debeziumServerBootstrap.setDebeziumEngine(debeziumEngine);
return debeziumServerBootstrap;
}
}
测试
更改数据库数据进行测试监听