go 捕获数据库新增数据_更改数据捕获的经验教训

本文介绍了使用Go语言如何捕获数据库中的新增数据,内容源于对原文的翻译,分享了在数据库变更数据捕获过程中的经验教训。
摘要由CSDN通过智能技术生成

go 捕获数据库新增数据

The article summarizes experiences from various projects with a log-based change data capture (CDC). There are many use cases for which CDC is beneficial. Some DBs even have CDC functionality integrated without requiring a separate tool.

本文总结了使用基于日志的变更数据捕获(CDC)的各种项目的经验。 CDC有益于很多用例。 一些DB甚至集成了CDC功能,而无需单独的工具。

变更检测 (Change detection)

Detecting changes in transactional source systems often turn out as slow and not scalable. Lightweight, high-performance change detection is required. Well-known approaches are: change columns, triggers and log-based CDC.

检测事务性源系统中的更改通常会很慢且不可扩展。 需要轻巧,高性能的变化检测。 众所周知的方法是:更改列,触发器和基于日志的CDC。

  • Change columns

    变更栏

    Tables in the source system get created with an additional column, such as a timestamp. Such a column can be used to identify new or changed data records.

    源系统中的表是使用附加列(例如时间戳)创建的。 这样的列可用于标识新的或更改的数据记录。

    However, this solution does not work for deleted records. Another disadvantage is that the application needs to be changed with update logic for the column.

    但是,此解决方案不适用于已删除的记录。 另一个缺点是应用程序需要使用列的更新逻辑进行更改。

  • Triggers

    扳机

    Database triggers can protocol any data changes to a log table. A process transfers the data from the log table to the target.

    数据库触发器可以将任何数据更改记录到日志表中。 进程将数据从日志表传输到目标。

    This approach recognizes deletes. Additionally, the application does not need to be changed.

    这种方法识别删除。 此外,不需要更改应用程序。

    However, triggers put a strain on the source database from a performance point of view. If a transaction commits, it must wait until the trigger finished, too.

    但是,从性能的角度来看,触发器对源数据库造成了压力。 如果事务提交,它也必须等到触发器完成。

  • Log-based CDC

    基于日志的CDC

    Databases write changes into their transaction log. Backup and recovery need transaction logs. Additionally, sequential writes into a transaction log are much faster compared to random writes into data files.

    数据库将更改写入其事务日志。 备份和恢复需要事务日志。 此外,与对数据文件的随机写入相比,对事务日志的顺序写入要快得多。

    Open source tools like Debezium or commercial tools like Oracle Goldengate, IBM IIDR read changes from the log and replicate the changes to the target system.

    开源工具(如Debezium)或商业工具(如Oracle Goldengate,IBM IIDR)从日志中读取更改,并将更改复制到目标系统。

基于日志的CDC的优缺点 (Pros and Cons of log-based CDC)

Log-based CDC has fundamental advantages over the other change detection methods listed:

与其他列出的变更检测方法相比,基于日志的CDC具有根本优势:

  • The additional performance impact on the source system is low.

    对源系统的附加性能影响很小。

  • CDC enables the implementation of near real-time architectures with the possibility of faster and more accurate decisions based on the target systems.

    CDC可以实现近实时架构,并有可能根据目标系统做出更快,更准确的决策。

  • No significant changes to the application in the source system are necessary. Changes relate to, for example, the setting of database parameters to generate extended transaction log entries, or installing the CDC software.

    无需对源系统中的应用程序进行重大更改 。 更改涉及例如设置数据库参数以生成扩展的事务日志条目或安装CDC软件。

  • CDC reduces the amount of data transmitted over the network compared to the other detection methods described above.

    与上述其他检测方法相比,CDC 减少了通过网络传输的数据量

  • The coupling between source and target is asynchronous. A slow consumer does not affect the source database.

    源与目标之间的耦合是异步的 。 缓慢的使用者不会影响源数据库。

Log-based CDC also has some disadvantages you must be aware of:

基于日志的CDC还具有一些必须意识到的缺点:

  • Many commercial tools require an additional license.

    许多商业工具需要额外的许可证

  • Avoid no-logging transactions as these bypasses the transaction log. However, most databases allow force logging through suitable parameter settings.

    避免不记录事务,因为它们会绕过事务日志。 但是,大多数数据库都允许通过适当的参数设置进行强制日志记录。

  • Separate tools require operations and require additional know-how.

    单独的工具需要操作,并且需要其他专门知识

  • Primary or unique keys are required for many log-based CDC tools — but a good database design will guarantee this requirement anyway.

    许多基于日志的CDC工具都需要主键或唯一键 -但是良好的数据库设计将始终保证这一要求。

  • If the target system is down, transaction logs must be kept until the target absorbed the changes.

    如果目标系统关闭,则必须保留事务日志,直到目标吸收了更改为止。

基于日志的CDC如何工作? (How does log-based CDC work?)

The diagram above shows several uses of log-based CDC. Users or applications change data in the source database, e.g. insert, update, or delete data. The database writes all changes into the transaction log (or Write-ahead log) before integrating the changes into the data files.

上图显示了基于日志的CDC的几种用法。 用户或应用程序更改源数据库中的数据,例如插入,更新或删除数据。 在将更改集成到数据文件之前,数据库将所有更改写入事务日志(或预写日志)。

The RDBMS example shows a user inserting data and a second user updating the same data in the next transaction. The transaction log contains the change protocol until some retention-interval clears the entries. The data file contains just the current state.

RDBMS示例显示了一个用户插入数据,而第二个用户在下一个事务中更新了相同的数据。 事务日志包含更改协议,直到某个保留间隔清除条目为止。 数据文件仅包含当前状态。

Databases use transaction logs primarily for backup and recovery. But the data can also be used to replicate changes into a target system. CDC tools use the database API to read from the transaction logs (or from the memory if the data is still in the DB cache) and transfer the data to a target system.

数据库将事务日志主要用于备份和恢复。 但是数据还可以用于将更改复制到目标系统中。 CDC工具使用数据库API从事务日志中读取(如果数据仍在数据库高速缓存中,则从内存中读取)并将数据传输到目标系统。

The example shows a logical view of a CDC tool reading data from the transaction log of the source system and submitting the changes to target databases or a target data lake. There is no persistent storing of the data stream within the CDC software. Kafka could be used as a log-centric approach to capture the changes long-time and submitting the changes to further target systems.

该示例显示了CDC工具的逻辑视图,该工具从源系统的事务日志中读取数据,并将更改提交给目标数据库或目标数据湖。 CDC软件中没有持久存储数据流。 Kafka可以用作以日志为中心的方法,以长期捕获更改并将更改提交给其他目标系统。

The diagram shows two cases of data propagation:

该图显示了两种数据传播情况:

  • Replication-mode (“Target Mirror DB”)

    复制模式(“目标镜像数据库”)

    Replication creates a copy of the source: updates will change the data in the target. Deletes will remove data from the target.

    复制将创建源的副本:更新将更改目标中的数据。 删除将从目标中删除数据。

    The target system does not necessarily have the same structure as the source system. It is possible to only choose a subset of tables or to replicate some selected columns within a table or filter data.

    目标系统不一定具有与源系统相同的结构。 可以仅选择表的子集,也可以复制表中的某些选定列或过滤数据。

    The example shows the target DB, which contains the current state of the changes.

    该示例显示了目标数据库,其中包含更改的当前状态。

  • Auditing-mode (“Target DWH DB” and “Target Data Lake”)

    审核模式(“ Target DWH DB”和“ Target Data Lake”)

    Auditing-mode keeps the whole data history. The CDC tool converts updates and deletes into inserts in the “Target DWH DB” or “Target Data Lake”. A flag indicates the kind of operation (insert, update, delete) and a timestamp field tracks when the event did happen.

    审核模式保留整个数据历史记录。 CDC工具将更新和删除转换为“ Target DWH DB”或“ Target Data Lake”中的插入。 标志指示操作的类型(插入,更新,删除),时间戳字段跟踪事件的发生时间。

    The example shows the “Target DWH DB” and “Target Data Lake”, which contain the complete protocol of changes, including operation (insert, update, delete) and timestamp.

    该示例显示了“ Target DWH DB”和“ Target Data Lake”,其中包含完整的更改协议,包括操作(插入,更新,删除)和时间戳。

用例 (Use Cases)

There are a variety of use cases for CDC:

CDC有多种用例:

  • Replicate changes into a DWH or data lake.

    将更改复制到DWH或数据湖中

  • Replicate changes into Kafka in a microservices architecture.

    将更改复制到微服务架构中的Kafka中

  • Upgrade a database to a higher or the latest version with minimal downtime, e.g. upgrade PostgreSQL 11.9 to PostgreSQL 12.4.

    以最少的停机时间将数据库升级到更高版本或最新版本,例如将PostgreSQL 11.9升级到PostgreSQL 12.4。

  • Migrate data from database X to database Y with minimal downtime, e.g. migrate MariaDB to PostgreSQL or vice versa.

    将数据从数据库X 迁移到数据库Y的停机时间最少,例如,将MariaDB迁移到PostgreSQL,反之亦然。

  • Transfer data into the cloud over a secure connection.

    通过安全连接 数据传输到云中。

工装 (Tooling)

Commercial tools like Oracle Golden Gate, IBM IIDR, and many others gained a high maturity, and the performance is excellent even during peak loads. These tools can replicate data between heterogeneous database systems.

诸如Oracle Golden Gate,IBM IIDR等许多商业工具获得了很高的成熟度,即使在高峰负载期间,其性能也非常出色。 这些工具可以在异构数据库系统之间复制数据。

Additionally, there are some other interesting products:

此外,还有其他一些有趣的产品:

  • Debezium is an open-source product built on top of Kafka. Debezium has connectors to pull a change stream from DBs like PostgreSQL, MySQL, MongoDB, Cassandra, etc. and send the data to Kafka. There are some exciting activities in the open-source community going on.

    Debezium 建立在Kafka之上的开源产品。 Debezium具有连接器,可从PostgreSQL,MySQL,MongoDB,Cassandra数据库中提取更改流,并将数据发送到Kafka。 开源社区中正在进行一些激动人心的活动。

  • DynamoDB Streams can be enabled for the NoSQL DB DynamoDB. DynamoDB Streams captures table activities and provides JSON streams.

    可以为NoSQL DB DynamoDB启用DynamoDB流 。 DynamoDB Streams捕获表活动并提供JSON流。

  • Tables in VoltDB can be declared as export targets. The declaration means that VoltDB will send a stream of changes into Kafka for such tables.

    VoltDB中的表可以声明为导出目标。 该声明意味着VoltDB将针对此类表的更改流发送到Kafka中。

结论 (Conclusion)

CDC is an essential component in modern architectures for transferring transactional data from systems into a data stream.

CDC是现代体系结构中必不可少的组件,用于将事务数据从系统传输到数据流中。

CDC enables the provisioning of transactional data in real-time without causing a significant load to the source system, requires no changes in the source application and reduces the transferred amount of data to a minimum.

CDC可以实时提供事务数据,而不会给源系统造成很大的负担,不需要更改源应用程序,并将传输的数据量减少到最低限度。

Keeping data in only one system is not always possible. Be cautious with personal data. You have to know all appearances of personal data for deletion requirements according to GDPR, for example. A data catalog can help to keep track of replicated sensitive data.

并非始终只能将数据保存在一个系统中。 注意个人数据。 例如,您必须了解所有出现的个人数据,以符合GDPR的删除要求。 数据目录可以帮助跟踪复制的敏感数据。

翻译自: https://medium.com/daimler-tss-tech/change-data-capture-lessons-learnt-7976391cf78d

go 捕获数据库新增数据

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值