最新项目稍有空隙,开始研究SQL Server 2012和2014的一些BI特性,参照(Matt)的一个示例,我们开始体验SSIS中的CDC(Change Data Capture,变更数据捕获)。
注:如果需要了解关于SQL Server 2008中的CDC,请看这里http://www.cnblogs.com/downmoon/archive/2012/04/10/2439462.html),本文假定读者对CDC的工作方式已有所了解。^_^。
我们分三步完成实例:
1、准备基础
2、设计一个初始包;
3、在2的基础上
首先请完成以下准备安装:
(1)Visual studio 2012或Visual Studio 2012 Shell (Isolated) Redistributable Package
http://www.microsoft.com/en-us/download/details.aspx?id=30678
http://www.microsoft.com/en-us/download/details.aspx?id=30670
(2)SQL Server Data Tools - Business Intelligence for Visual Studio 2012
http://www.microsoft.com/zh-cn/download/details.aspx?id=36843
(2)SQL Server 2012企业版或开发版
http://www.microsoft.com/en-us/download/details.aspx?id=29066
(3)示例
http://msftdbprodsamples.codeplex.com/releases/view/55330
好了,开始第一步:/*-- =============================================-- 创建测试数据库及*/--Create database CDCTest--GO--USE [CDCTest]--GO--SELECT * INTO DimCustomer_CDC--FROM [AdventureWorksDW2012].[dbo].[DimCustomer]--WHERE CustomerKey < 11500;--select * from DimCustomer_CDC;/*-- =============================================-- 启用*/USE [CDCTest]GOEXECsys.sp_GO--add a primary key to the DimCustomer_CDC table so we can enable support for net changesIF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[DimCustomer_CDC]') AND name = N'PK_DimCustomer_CDC')ALTER TABLE [dbo].[DimCustomer_CDC] ADD CONSTRAINT [PK_DimCustomer_CDC] PRIMARY KEY CLUSTERED([CustomerKey] ASC)GO/*-- =============================================-- 启用表级别CDC---Generate By downmoon(邀月),3w@live.cn-- =============================================*/EXECsys.sp_cdc_enable_table@source_schema = N'dbo',@source_name = N'DimCustomer_CDC',@role_name = N'',@supports_net_changes = 1GO/*-- =============================================-- 创建一个目标表,与源表(Source)有相同的表结构--注意,在生产环境中,完全可以是不同的实例或服务器,本例为了方便,在同一个*/SELECT TOP 0 * INTODimCustomer_DestinationFROMDimCustomer_CDC--select @@version;select * from DimCustomer_Destination;
第二步:创建初始包--=============================================--我们使用两个包来完成示例,一个初始包完成---Generate By downmoon(邀月),3w@live.cn--=============================================
初始包包含如下逻辑:
(1)使用CDC Control Task标记初始加载开始LSN(Use the CDC Control Task to mark the initial load start LSN)
(2)转换所有源表
新建一个SSIS项目,创建一个包“Initial Load”,如下图:
新建两个CDC Control Task,分别命名为“CDC Control Task Start”和“CDC Control Task End”,分别对应属性为“Mark initial load start”和""Mark initial load end"
连接管理器均为ADO.NET方式,其他属性如下图:
中间加入一个“Data Flow Task”,属性默认。
此时,运行包,可见CDC_States有初始标记。
第三步:创建增量包
增量包包含如下逻辑:
(1)创建一个源数据库的连接管理器(Create a connection manager for the Source database)
(2)设置CDC运算符以获取处理边界(Set the CDC Control Operation to Get processing range)
(3)创建一个新的CDC状态变量(CDC_state)(Create a new CDC state variable (CDC_state))
(4)创建一个目标
在项目中创建一个新包,命名为“Incremental Load”
在包的"Control Flow"视图中,自上而下分别手动6个Task,顺序如下图,除去上面用到的三个Task,其余均为Execute SQL Task
注意:CDC Control Task End的CDC运算符为MARK Process Range,CDC Control Task Start的CDC运算符为Get Process Range
其余4个Execute SQL Task的SQL语句如下:--Create stage TablesIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[stg_DimCustomer_UPDATES]') AND type in (N'U'))BEGIN SELECT TOP 0 * INTOstg_DimCustomer_UPDATESFROMDimCustomer_DestinationENDIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[stg_DimCustomer_DELETES]') AND type in (N'U'))BEGIN SELECT TOP 0 * INTOstg_DimCustomer_DELETESFROMDimCustomer_DestinationEND--batch updateUPDATEdestSETdest.FirstName=stg.FirstName, dest.MiddleName=stg.MiddleName, dest.LastName=stg.LastName, dest.YearlyIncome=stg.YearlyIncomeFROM [DimCustomer_Destination]dest,[stg_DimCustomer_UPDATES]stgWHEREstg.[CustomerKey] = dest.[CustomerKey]--batch deleteDELETE FROM [DimCustomer_Destination] WHERE[CustomerKey] IN(SELECT [CustomerKey] FROM [dbo].[stg_DimCustomer_DELETES])--truncate tabletruncate table [dbo].[stg_DimCustomer_DELETES]truncate table [dbo].[stg_DimCustomer_UPDATES]
最关键的一步,选中CDC Control Task Start,并切换到Data Flow,自上而下分别拖动CDC Source,CDC Splitter Transformer,三个ADO.NET Destination,如下图:
其中三个的目标表分别为:[DimCustomer_Destination],stg_DimCustomer_DELETES,stg_DimCustomer_UPDATES。
而CDC Source的连接管理器属性如下图:
此时,可运行增量包,但我们不会看到任何运行结果,因为此时我们还没有进行
下来我们提供一个脚本,测试下效果:--=============================================--更新一些---Generate By downmoon(邀月),3w@live.cn--=============================================USE [CDCTest]GO --Transfer the remaining customer rowsSET IDENTITY_INSERT DimCustomer_CDC ON INSERT INTODimCustomer_CDC( CustomerKey, GeographyKey, CustomerAlternateKey, Title, FirstName, MiddleName, LastName, NameStyle, BirthDate, MaritalStatus, Suffix, Gender, EmailAddress, YearlyIncome, TotalChildren, NumberChildrenAtHome, EnglishEducation, SpanishEducation, FrenchEducation, EnglishOccupation, SpanishOccupation, FrenchOccupation, HouseOwnerFlag, NumberCarsOwned, AddressLine1, AddressLine2, Phone, DateFirstPurchase, CommuteDistance)SELECTCustomerKey, GeographyKey, CustomerAlternateKey, Title, FirstName, MiddleName, LastName, NameStyle, BirthDate, MaritalStatus, Suffix, Gender, EmailAddress, YearlyIncome, TotalChildren, NumberChildrenAtHome, EnglishEducation, SpanishEducation, FrenchEducation, EnglishOccupation, SpanishOccupation, FrenchOccupation, HouseOwnerFlag, NumberCarsOwned, AddressLine1, AddressLine2, Phone, DateFirstPurchase, CommuteDistanceFROM [AdventureWorksDW2012].[dbo].[DimCustomer]WHERE CustomerKey =11502 SET IDENTITY_INSERT DimCustomer_CDC OFFGO --give 10 people a raiseUPDATEDimCustomer_CDCSETYearlyIncome= YearlyIncome + 10WHERECustomerKey>= 11000 AND CustomerKey <= 11010 GO
此时,我们可以看到变更捕获的结果:
如果您觉得还不够直观,请"Enable Data Viewer",
至此,一个SSIS 2012中CDC的实例演示结束,如果还有进一步的研究,请移驾MSDN,下面有链接。本文也提供示例项目包,以作研究之用。
项目文件下载1,项目文件下载2
本文参考:
http://msdn.microsoft.com/en-us/library/bb895315.aspx
http://www.mattmasson.com/index.php/2011/12/