SQL Server feature:Change Data Capture(CDC) and Change Tracking

 

Change Data Capture(CDC):

1. 概述

CDC 为数据仓库做数据抽取(ETL)提供了一个很好的捕获源数据(Source Data)数据变化的方案,它记录了SQL Server数据表所有的insert, update 和delete操作。

CDC是从SQL Server的log里获取表数据的变化,然后记录变化的数据到相关联的change table里。用相关的函数即可查看这些变化。下图显示了利用CDC进行ETL的Data flow,可以看到数据流是从source table -> CDC chagne table -> ETL -> DW.

 

2. 概念

2. 概念

1)CDC Capture Instance(捕获实例)

要对一张数据表进行变化数据捕捉,首先要打开DB和表的CDC配置,即要enable CDC on database and the source table,当一张source table的CDC配置打开后,SQL Server会为这张表建一个相关的Capture Instance来监视source table的数据变化。

Capture Instance包含一张change table和最多2个查询函数(query function, a function returned a data set). 这些信息可以从下面这些视图(view)中查到:cdc.change_tables, cdc.index_columns, and cdc.captured_columns。

与Capture Instance相关的对象都创建在cdc schema下,默认实例名为源表的<schema name_table name>,change table名字为<schema name_table name>_CT, 数据变化查询函数为fn_cdc_get_all_changes_<schema name_table name>。

2)CDC变化表(CDC change table)

Change Table的每一行包含了一些附加信息用来翻译该行数据,这些信息有。

             __$start_lsn标识出和这个change相关的SQL Server日志的LSN(the commit log sequence number (LSN),The commit LSN both identifies changes that were committed within the same transaction, and orders those transactions.)。

             __$seqval 用来对同一个transaction里发生的change进行排序。

             __$operation 记录变化的操作:1 = delete, 2 = insert, 3 = update (before image), and 4 = update (after image).

             __$update_mask 是由所有被Capture列组成的一个二进制为掩码(bit mask with one defined bit for each captured column)。

Change Table其他列为source Table中被Capture变化的列的一个镜像copy。

Change Table记录了源表(source Table)的所有Insert,delete和update操作时的数据变化。Insert操作记录Insert的数据;delete操作记录delete前的数据;update操作记录update前后的数据,因此有2条数据。(我测试只有一条数据,疑惑中.......)

 

3. CDC配置

 

Step 1:  数据库配置. Enable CDC for a database.

执行procedrue sys.sp_cdc_enable_db允许数据库CDC feature。(run procedrue sys.sp_cdc_enable_db to enable the database for cdc). To  determine if a database is already enabled, query theis_cdc_enabled column in thesys.databases catalog view.

E.g.

USE MyDB
GO

EXEC sys.sp_cdc_enable_db
GO

 

Step 2: Enable CDC for a table.(此处没有包含网络变化的配置(net chagne,以后编辑的时候加进来,相关配置也很简单)

After a database has been enabled for change data capture, members of the db_owner fixed database role can create a capture instance for individual source tables by using the stored proceduresys.sp_cdc_enable_table. To determine whether a source table has already been enabled for change data capture, examine theis_tracked_by_cdc column in the sys.tables catalog view.

The following options can be specified when creating a capture instance:

Columns in the source table to be captured.

By default, all of the columns in the source table are identified as captured columns. If only a subset of columns need to be tracked, such as for privacy or performance reasons, use the@captured_column_list parameter to specify the subset of columns.

A filegroup to contain the change table.

By default, the change table is located in the default filegroup of the database. Database owners who want to control the placement of individual change tables can use the@filegroup_name parameter to specify a particular filegroup for the change table associated with the capture instance. The named filegroup must already exist. Generally, it is recommended that change tables be placed in a filegroup separate from source tables. See theEnable a Table Specifying Filegroup Option template for an example showing use of the@filegroup_name parameter.

 

E.g.

USE MyDB
GO

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'MyTable',
@role_name     = N'MyRole',
@filegroup_name = N'MyDB_CT',
@supports_net_changes = 1
GO

The purpose of the named role is to control access to the change data. The specified role can be an existing fixed server role or a database role. If the specified role does not already exist, a database role of that name is created automatically. Members of either the sysadmin or db_owner role have full access to the data in the change tables. All other users must have SELECT permission on all the captured columns of the source table. In addition, when a role is specified, users who are not members of either the sysadmin or db_owner role must also be members of the specified role.

If you do not want to use a gating role, explicitly set the @role_name parameter to NULL.

 

4. CDC数据使用

CDC记录的所有变化数据都是通过函数cdc.fn_cdc_get_all_changes_<capture instance>来查看,这个函数需要一个最大最小的sql log的序列号(LSN, Log Sequence Number),这2个值分别通过函数sys.fn_cdc_get_max_lsnsys.fn_cdc_get_min_lsn得到。函数sys.fn_cdc_map_time_to_lsnsys.fn_cdc_map_lsn_to_time可以返回一段时间内的最大/最小LSN或者根据LSN确定log记录的时间。

E.g

-- create database and table

create database test;

go

use test;

go

create table bcd ( b int, c varchar(10), d datetime)

go

 

-- enable cdc

EXEC sys.sp_cdc_enable_db

GO

EXEC sys.sp_cdc_enable_table

@source_schema = N'dbo',

@source_name = N'bcd',

@role_name = NULL,

@supports_net_changes = 0

GO

-- insert one data, then update it

insert into bcd (b,c,d) values (3, 'c', getdate())

update bcd set c = 'c updated' where b = 3

go

-- Check cdc data

DECLARE @from_lsn binary(10),  @to_lsn binary(10)

SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_bcd')

SET @to_lsn = sys.fn_cdc_get_max_lsn()

 

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_bcd (@from_lsn, @to_lsn, N'all')

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_bcd (@from_lsn, @to_lsn, N'all update old')

 

-- the 2nd way to get LSN when checking CDC data

DECLARE @begin_time datetime,

               @end_time datetime,

               @begin_lsn binary(10),

                    @end_lsn binary(10);

SET @begin_time = '2011-01-01 00:00:00.000', @end_time '2011-01-02 00:00:00.000';

SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);

SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

 

SELECT * FROM cdc.fn_cdc_get_net_changes_HR_Department(@begin_lsn, @end_lsn, 'all');

SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_bcd (@from_lsn, @to_lsn, N'all update old')

查询结果中有个一个__$update_mask列,这是一个位掩码,每一列用一个bit来表示,要想知道source table里的某列是否被修改,可以通过函数sys.fn_cdc_is_bit_set 和sys.fn_cdc_get_column_ordinal组合使用来实现,例子如下:

SELECT *,

             sys.fn_cdc_is_bit_set( sys.fn_cdc_get_column_ordinal('dbo_bcd','b'), __$update_mask),

             sys.fn_cdc_is_bit_set( sys.fn_cdc_get_column_ordinal('dbo_bcd','c'), __$update_mask),

             sys.fn_cdc_is_bit_set( sys.fn_cdc_get_column_ordinal('dbo_bcd','d'), __$update_mask)

FROM cdc.fn_cdc_get_all_changes_dbo_bcd (@from_lsn, @to_lsn, N'all')

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值