如何跟踪SQL Server中的更改

As a part of a Big Data project, we are often asked to find the best way to track the changes applied to the database tables, so that, no requirement is created to load all the huge tables to the data warehouse database at the end of the day, if not all of the data was changed.

作为大数据项目的一部分,经常会要求我们找到跟踪应用于数据库表的更改的最佳方法,因此,无需创建将所有大表加载到数据仓库数据库的要求。当天,如果不是所有数据都被更改。

The first available option in SQL server for tracking the changes are the After Insert, After Update and After Delete triggers, that requires coding effort from your side in order to handle these changes or added data, in addition to the performance impact of the triggers on the system.

SQL Server中用于跟踪更改的第一个可用选项是“插入后”,“更新后”和“删除后”触发器,除了触发器对性能的影响外,还需要您编写代码才能处理这些更改或添加的数据。系统。

更改数据捕获 (Change Data Capture)

Another tracking and capturing solution introduced in SQL Server 2008 Enterprise, that tracks any Insert, Update or Delete operation applied to the user tables , with What , When and Where these changes applied, without any extra coding effort and keep it in system tables that can be easily accessed using normal queries. This new feature is called SQL Server Change Data Capture, or CDC.

SQL Server 2008 Enterprise中引入的另一种跟踪和捕获解决方案,跟踪应用到用户表的所有插入,更新或删除操作,并应用了这些更改,时间和位置,而无需进行任何额外的编码工作并将其保存在系统表中使用普通查询即可轻松访问。 此新功能称为SQL Server更改数据捕获或CDC。

When Change Data Capture is enabled on a user table, a new system table will be created with the same structure of that source table, with extra columns to include the changes metadata.

在用户表上启用更改数据捕获后,将创建一个具有与源表相同结构的新系统表,并带有额外的列以包含更改元数据。

SQL Server Change Data Capture uses the SQL Server transaction log as the source of the changed data using asynchronous capture mechanism. Any DML change applied to the tracked table will be written to the transaction log. The CDC capture process reads these logs and copy it to the capture table and finally adding the associated changes information as the change metadata to the same table.

SQL Server更改数据捕获使用异步捕获机制将SQL Server事务日志用作更改数据的源。 应用于跟踪表的任何DML更改都将写入事务日志。 CDC捕获过程读取这些日志并将其复制到捕获表中,最后将关联的更改信息作为更改元数据添加到同一表中。

Below we will have a small demo showing how to configure the CDC on one of the SQLShackDemo database tables. As recommended by Microsoft, we will create a separate Filegroup and database file to host the Change Data Capture change tables:

下面,我们将有一个小型演示,演示如何在其中一个SQLShackDemo数据库表上配置CDC。 根据Microsoft的建议,我们将创建一个单独的Filegroup和数据库文件来承载Change Data Capture更改表:

First we will create a new Filegroup using the ALTER DATABASE ADD FILEGROUP SQL statement:

首先,我们将使用ALTER DATABASE ADD FILEGROUP SQL语句创建一个新的文件组:

 
USE [master]
GO
ALTER DATABASE [SQLShackDemo] ADD FILEGROUP [CDC]
GO
 

Once the Filegroup is created, a new database file will be created in this filegroup using ALTER DATABASE ADD FILE SQL statement:

创建文件组后,将使用ALTER DATABASE ADD FILE SQL语句在此文件组中创建一个新的数据库文件:

 
ALTER DATABASE [SQLShackDemo] ADD FILE ( NAME = N'SQLShackDemo_cdc', FILENAME = N'D:\CDC File\SQLShackDemo_cdc.ndf' , SIZE = 2048MB , FILEGROWTH = 128MB ) TO FILEGROUP [CDC]
GO
 

As the cdc.lsn_time_mapping system table will grow to a significant size and will have many I/O operations due to the table’s changes, it is also recommended to change the default filegroup for the database before you execute sys.sp_cdc_enble_db to the CDC filegroup created previously and change it back to the Primary filegroup once the metadata tables are created.

由于cdc.lsn_time_mapping系统表将增长到很大的大小,并且由于表的更改而将具有许多I / O操作,因此建议您在执行sys.sp_cdc_enble_db到创建的CDC文件组之前更改数据库的默认文件组。创建元数据表后,将其更改为“主”文件组。

To make the CDC Filegroup as the default filegroup, we will use the MODIFY FILEGROUP statement below:

为了使CDC文件组成为默认文件组,我们将在下面使用MODIFY FILEGROUP语句:

 
USE [SQLShackDemo]
GO
ALTER DATABASE [SQLShackDemo] MODIFY FILEGROUP [CDC] DEFAULT
GO
 

In order to enable the Change Data capture on the tables that you need to track and capture its DML changes, you need first to enable it on the database level. This can be done by executing the sys.sp_cdc_enable_db system stored procedure as follows:

为了在需要跟踪并捕获其DML更改的表上启用“更改数据”捕获,首先需要在数据库级别启用它。 这可以通过执行sys.sp_cdc_enable_db系统存储过程来完成,如下所示:

 
USE [SQLShackDemo]
GO
EXEC sys.sp_cdc_enable_db 
GO
 

We can change back the default FileGroup now to the PRIMARY filegroup as follows:

现在,我们可以将默认的FileGroup改回PRIMARY文件组,如下所示:

 
USE [SQLShackDemo]
GO
ALTER DATABASE [SQLShackDemo] MODIFY FILEGROUP [PRIMARY] DEFAULT
GO
 

To make sure that the CDC is enabled in the SQLShackDemo database, we will query the sys.databases table as below:

为了确保在SQLShackDemo数据库中启用了CDC,我们将查询sys.databases表,如下所示:

 
USE master 
GO 
SELECT [name], database_id, is_cdc_enabled  
FROM sys.databases
WHERE is_cdc_enabled = 1    
GO 
 

One the CDC is enabled on the database level, a new schema will be created in that database with the “CDC” name:

一个CDC在数据库级别启用,将在该数据库中创建一个新的模式,名称为“ CDC”:

Also, new system tables will be created under the CDC schema:

另外,将在CDC模式下创建新的系统表:

The role of these tables are as each table’s name indicates. The cdc.captured_columns contains the list of captured columns. The cdc.change_tables contains list of database tables with CDC enabled on it. The cdc.ddl_history contains the history of the DDL changes applied on the tracked table. The cdc.index_columns contains the tracked table’s indexes. And the cdc.lsn_time_mapping that maps LSN number.

这些表的作用如每个表的名称所指示。 cdc.captured_columns包含已捕获列的列表。 cdc.change_tables包含已启用CDC的数据库表的列表。 cdc.ddl_history包含在跟踪表上应用的DDL更改的历史记录。 cdc.index_columns包含被跟踪表的索引。 和cdc.lsn_time_mapping映射LSN编号。

Now we will enable the Change Data Capture on the table’s level. As the CDC is a table-level feature, you need to enable it on each table you need to track and capture its DML changes.

现在,我们将在表级别启用“更改数据捕获”。 由于CDC是表级功能,因此需要在每个表上启用它,以跟踪和捕获其DML更改。

To enable the CDC on the AWBuildVersion table from the SQLShackDemo database, we will run the sys.sp_cdc_enable_table system stored procedure:

要从SQLShackDemo数据库在AWBuildVersion表上启用CDC,我们将运行sys.sp_cdc_enable_table系统存储过程:

 
EXEC sys.sp_cdc_enable_table   
@source_schema = N'dbo',
@source_name   = N'AWBuildVersion',
@role_name     = NULL,
@filegroup_name = N'CDC',
@supports_net_changes = 0
GO
 

It is better to limit the number of columns to be captured by the CDC to only the ones you really need to track. You can use the @captured_column_list parameter of the sys.sp_cdc_enable_table system SP to specify the list of columns that will be included in the change table to be tracked.

最好将CDC捕获的列数限制为仅真正需要跟踪的列数。 可以使用sys .sp_cdc_enable_table系统 SP的@captured_column_list参数来指定要包含在要跟踪的变更表中的列的列表。

Once the change data capture is enabled on the table level, a new capture instance associated to that source tables is created to support the propagation of the source table’s changes. This capture instance contains the change table. By default, the name of this change table is SchemaName_SourceTableName_CT.

一旦在表级别启用了更改数据捕获,便会创建与该源表关联的新捕获实例,以支持源表的更改的传播。 此捕获实例包含更改表。 默认情况下,此更改表的名称为SchemaName_SourceTableName_CT。

To make sure that the CDC is enabled in our table successfully, we will query the sys.tables system tables for the is_tracked_by_cdc property as below:

为了确保成功在表中启用CDC,我们将查询sys.tables系统表中的is_tracked_by_cdc属性,如下所示:

 
SELECT [name], is_tracked_by_cdc  
FROM sys.tables 
WHERE is_tracked_by_cdc=1
GO   
 

Also you can check the tracked tables by querying the [cdc].[change_tables]

您也可以通过查询[cdc]来检查跟踪表。[change_tables]

CDC table with the result same as follows:

CDC表的结果如下:

And the list of columns captured in that table can be viewed by querying the [cdc].[captured_columns] CDC table. The output will be like:

并且可以通过查询[cdc]。[captured_columns] CDC表来查看该表中捕获的列的列表。 输出将如下所示:

You should make sure that the SQL Server Agent Service is enabled before enabling the CDC at the table level, as the CDC will create two new SQL Server jobs for each CDC- enabled database as follows:

您应该确保在启用表级别的CDC之前已启用SQL Server代理服务,因为CDC将为每个启用CDC的数据库创建两个新SQL Server作业,如下所示:

The capture job will run the sys.sp_MScdc_capture_job. System SP to capture the changes, and the cleanup job will call the sys.sp_MScdc_cleanup_job system SP to clean up the change table.

捕获作业将运行sys.sp_MScdc_capture_job 。 系统SP捕获更改,清理作业将调用sys.sp_MScdc_cleanup_job系统SP清理更改表。

The default retention period is three days, this means that the data will be kept in the change table for three days before removing it. You can override this value by executing the sys.sp_cdc_change_job system SP specifying a new retention value in minutes:

默认保留期为三天,这意味着在删除数据之前,数据将在更改表中保留三天。 您可以通过执行sys .sp_cdc_change_job系统SP以分钟为单位指定新的保留值来覆盖此值:

 
EXECUTE sys.sp_cdc_change_job 
    @job_type = N'cleanup',
    @retention = 2880;  
 

Now we reach to the point where the change data capture is enabled on the AWBuildVersion table from the SQLShackDemo database. This means that any change that will be applied to that table will be captured and written to the CDC capture table. If you perform INSERT operation, then the new value after the INSERT operation will be written in the capture table as one record. If you perform DELETE operation, the value before the DELETE operation will be written to the capture table as one record. Any UPDATE operation performed on the tracked table, two records will be written to the capture table, one for the value before the UPDATE and one for the value after UPDATE.

现在我们到了从SQLShackDemo数据库在AWBuildVersion表上启用更改数据捕获的地步。 这意味着将捕获将应用于该表的任何更改并将其写入CDC捕获表。 如果执行INSERT操作,则INSERT操作之后的新值将作为一条记录写入捕获表中。 如果执行DELETE操作,则DELETE操作之前的值将作为一条记录写入捕获表。 在跟踪表上执行的任何UPDATE操作,两条记录将被写入捕获表,一条记录用于UPDATE之前的值,一条记录在UPDATE之后的值。

The _$operation column from the CDC change table contains the DML operation type, where 1 indicates DELETE operation, 2 indicates INSERT operation, 3 indicates the value before the update process and 4 the value after the update process.

CDC更改表中的_ $ operation列包含DML操作类型,其中1表示DELETE操作,2表示INSERT操作,3表示更新过程之前的值,4表示更新过程之后的值。

For example. If we apply the below insert statement on the SQLShackDemo database, and try to query the dbo_ AWBuildVersion_CT change table, we will find a new record written on it showing the new inserted value:

例如。 如果我们在SQLShackDemo数据库上应用以下插入语句,并尝试查询dbo_ AWBuildVersion_CT更改表,则会发现上面写有一条显示新插入值的新记录:

 
USE [SQLShackDemo]
GO
INSERT INTO [dbo].[AWBuildVersion]
           ([Database Version]
           ,[VersionDate]
           ,[ModifiedDate])
     VALUES
           ('11.0.2100.60','2012-03-14 00:00:00.000','2016-03-02 00:00:00.000')
GO
 

On the other hand, if we apply the below update query on the same table, a new two records will be written to the capture table showing the value before the update with _$operation =3 and the value after the update with _$operation =4:

另一方面,如果我们在同一表上应用以下更新查询,则会将新的两个记录写入捕获表,该记录显示使用_ $ operation = 3更新之前的值和使用_ $ operation更新之后的值。 = 4:

 
UPDATE [SQLShackDemo].[dbo].[AWBuildVersion] 
  SET [Database Version] ='11.0.2100.80' 
  WHERE [SystemInformationID]=5
 

It is good to know that enabling change data capture in your SQL Server database table will not prevent you from applying any DDL changes on that table. But will this new change be reflected to the CDC change table? The answer depends on the change type; if you change the data type of a CDC- enabled table’s column, the new data type will be reflected to the change table and the tracking and capturing process will not be affected.

很高兴知道,在SQL Server数据库表中启用更改数据捕获不会阻止您在该表上应用任何DDL更改。 但是,此新更改会反映到CDC更改表中吗? 答案取决于变更类型。 如果更改启用CDC的表的列的数据类型,则新的数据类型将反映到更改表中,并且跟踪和捕获过程也不会受到影响。

If you drop a CDC- enabled table’s column, NULL values will be inserted for that column for each change entry in the change table. But if you add new column to the CDC- enabled table, this change will not be reflected to change table and any change on this column will not be captured.

如果删除启用CDC的表的列,将为变更表中的每个变更条目为该列插入NULL值。 但是,如果将新列添加到启用CDC的表中,则此更改将不会反映到更改表中,并且不会捕获此列上的任何更改。

Let’s add a new column to our AWBuildVersion table, that we enabled the CDC on it:

让我们在AWBuildVersion表中添加一个新列,在其上启用CDC:

 
ALTER TABLE dbo.AWBuildVersion ADD
	TS datetime NULL
GO
 

To check if this change is reflected to the CDC change table, we will query the cdc.captured_columns system table, which unfortunately shows that the change is not reflected to the CDC change table:

要检查此更改是否反映在CDC更改表中,我们将查询cdc.captured_columns系统表,不幸的是,该表显示该更改未反映在CDC更改表中:

And if we try to insert a new value to our table, using the insert statement below, the new column will not be shown:

而且,如果我们尝试使用下面的insert语句向表中插入新值,则不会显示新列:

 
 USE [SQLShackDemo]
GO
INSERT INTO [dbo].[AWBuildVersion]
           ([Database Version]
           ,[VersionDate]
           ,[ModifiedDate]
		   ,[TS])
     VALUES
           ('11.0.2100.80','2016-01-01 00:00:00.000','2016-03-02 00:00:00.000',Getdate())
GO
 

As you can see, the CDC will keep tracking the changes applied on the table but ignoring the new added column.

如您所见,CDC将继续跟踪对表应用的更改,但会忽略新添加的列。

To overcome this issue without losing the old captured data, we can create a new capture instance for the same source table that will reflect the new structure for that source table, where a new change table will be created associated with that new capture instance, then copy the change data from the old change table to the new one and finally disable the old capture instance. You can create up to two capture instances associated with the same source table at the same time.

为了在不丢失旧捕获数据的情况下克服此问题,我们可以为同一源表创建一个新捕获实例,该实例将反映该源表的新结构,在该实例中将创建一个与该新捕获实例相关联的新变更表,然后将更改数据从旧的更改表复制到新的更改表,最后禁用旧的捕获实例。 您最多可以同时创建两个与同一源表关联的捕获实例。

In order to disable the CDC on the database level, you should disable it on all the tables that you enabled the CDC on it, then disable it on the database level as follows:

为了在数据库级别禁用CDC,您应该在启用CDC的所有表上禁用它,然后在数据库级别禁用它,如下所示:

 
EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name   = N'AWBuildVersion',
@capture_instance = N'dbo_AWBuildVersion'
GO
 
USE SQLShackDemo
GO
EXEC sys.sp_cdc_disable_db
GO
 

变更追踪 (Change Tracking)

Another lightweight tracking solution introduced in SQL Server 2008 is the Change Tracking or CT. What makes it light is that it only captured that this row in that table is changed, without capturing the data that is changed or keeping historical data for the value before the change, with the least storage overhead. CT works in all SQL Server editions such as Express, Workgroup, Web, Standard, Enterprise and DataCenter.

SQL Server 2008中引入的另一种轻量级跟踪解决方案是变更跟踪或CT。 显而易见的是,它仅捕获该表中的该行已更改,而没有捕获更改的数据或保留更改前的值的历史数据,而存储开销最少。 CT适用于所有SQL Server版本,例如Express,Workgroup,Web,Standard,Enterprise和DataCenter。

Change Tracking uses a synchronous tracking mechanism to track the table changes. The only information provided by the CT about the tracked table is the changed record’s primary key. To obtain the new data after the change, coding effort required from the application side to join the source table with the tracking table using the primary key value.

更改跟踪使用同步跟踪机制来跟踪表更改。 CT提供的有关被跟踪表的唯一信息是更改的记录的主键。 为了获得更改后的新数据,需要应用程序端使用主键值进行编码,以将源表与跟踪表连接起来。

Same as the CDC, in order to enable the change tracking on the table level, you should enable it at the database level first:

与CDC相同,为了在表级别启用更改跟踪,您应该首先在数据库级别启用它:

 
ALTER DATABASE SQLShackDemo
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
 

Change tracking information that is older than Retention Period value specified previously will be removed automatically. The AUTO_CLEANUP option is used to enable or disable the cleanup task that delete the old CT information.

早于先前指定的保留期限值的变更跟踪信息将被自动删除。 AUTO_CLEANUP选项用于启用或禁用删除旧CT信息的清理任务。

To make sure that the CT is enabled at the database level, you can browse the Change Tracking tape of the database properties window. You can also enable the CT from here:

要确保在数据库级别启用了CT,您可以浏览数据库属性窗口的“更改跟踪”磁带。 您还可以从此处启用CT:

Again, the CT is a table-level feature, you need to enable it on each table you need to track and capture its DML changes. In order to enable the CT on the table level, the table should have primary key constraint defined previously. If you try to enable the CT on a table without primary key, you will get the error below:

同样,CT是表级别的功能,您需要在需要跟踪和捕获其DML更改的每个表上启用它。 为了在表级别启用CT,表应具有先前定义的主键约束。 如果您尝试在没有主键的表上启用CT,则会出现以下错误:

Cannot enable change tracking on table ‘XXX’. Change tracking requires a primary key on the table. Create a primary key on the table before enabling change tracking.

无法在表“ XXX”上启用更改跟踪。 变更跟踪需要表上的主键。 在启用更改跟踪之前,请在表上创建一个主键。

Enabling the CT on the table level is achieved by running the ALTER TABLE ENABLE CHANGE_TRACKING

通过运行ALTER TABLE ENABLE CHANGE_TRACKING可以在表级别启用CT

 
USE SQLShackDemo
GO
ALTER TABLE CountryInfo
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
 

Or using the SQL Server Management Studio, from the Table Properties window:

或使用SQL Server Management Studio,从“表属性”窗口中:

For example, if we apply the below insert statement to our table, we will find that the tacking current version value is changed from 0 to 1:

例如,如果将下面的插入语句应用于表,则会发现将当前版本的值从0更改为1:

 
USE [SQLShackDemo]
GO
 
INSERT INTO [dbo].[CountryInfo]
           ([CountyCode])
     VALUES
           ('AMM')
GO
 

In order to get the value that is changed, we will query the CHANGETABLE system table that will return the primary key for the inserted value:

为了获得更改的值,我们将查询CHANGETABLE系统表,该表将返回插入值的主键:

 
SELECT * FROM CHANGETABLE 
(CHANGES CountryInfo,0) as CT ORDER BY SYS_CHANGE_VERSION
 

To get the complete changed record, it is easily to join the CHANGETABLE with the source table:

为了获得完整的更改记录,可以很容易地将CHANGETABLE与源表连接起来:

 
SELECT CI.* FROM CHANGETABLE 
(CHANGES CountryInfo,0) as CT 
JOIN CountryInfo CI 
on CT.CountyCode=CI.CountyCode 
 

Opposite to CDC, you can apply any DDL changes on the source table without affecting the CT except for the changes on the primary key will fail unless you disable the CT on that table.

与CDC相反,您可以在源表上应用任何DDL更改而不影响CT,除非主键上的更改将失败,除非您在该表上禁用了CT。

If you try to change the primary key of the CountryInfo table, with the CT enabled on it, you will get the below error:

如果尝试在启用了CT的情况下更改CountryInfo表的主键,则会出现以下错误:

The primary key constraint ‘PK_CountryInfo_1’ on table ‘CountryInfo’ cannot be dropped because change tracking is enabled on the table. Change tracking requires a primary key constraint on the table. Disable change tracking before dropping the constraint.

由于在表上启用了更改跟踪,因此无法删除表“ CountryInfo”上的主键约束“ PK_CountryInfo_1”。 更改跟踪需要在表上使用主键约束。 在删除约束之前,请禁用更改跟踪。

In order to disable the CT on the database level, you should disable it on all the tables that you enablee the CT on it then disable it on the database level as follows:

为了在数据库级别禁用CT,您应该在启用了CT的所有表上禁用它,然后在数据库级别禁用它,如下所示:

 
ALTER TABLE [dbo].CountryInfo
DISABLE CHANGE_TRACKING
GO
ALTER DATABASE SQLShackDemo
SET CHANGE_TRACKING = OFF
GO
 

结论 (Conclusion)

As you can see, there are many options available to track and capture the changes performed in your database. Choosing the suitable one depends on your requirements. You should compromise between the system performance, the IO load and the available storage to decide which one of the mentioned methods you will use. It is better to test these methods in your DEV environment applying a heavy load, so you will decide if it is suitable for your situation or not.

如您所见,有许多选项可用来跟踪和捕获数据库中执行的更改。 选择合适的一个取决于您的要求。 您应该在系统性能,IO负载和可用存储之间进行折衷,以决定将使用上述方法之一。 最好在重负载的DEV环境中测试这些方法,以便您确定是否适合您的情况。

有用的链接: ( Useful Links: )

翻译自: https://www.sqlshack.com/how-to-track-changes-in-sql-server/

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值