在SQL Server Integration Services中管理变化缓慢的维度

A data warehouse has to be historically correct. This becomes an issue when data like the Product List Price for a previous year needs to be saved historically. Dimensional Modeling methodologies provide a solution for the situation. The Slowly Changing method integrated with components from SQL Server Integration Services solves the issue. This article will look at updating a product dimension table using the Slowly Changing Type 2 Dimension while maintaining the Type 1 columns.

数据仓库必须在历史上是正确的。 当需要历史保存上一年度的产品清单价格之类的数据时,这将成为一个问题。 维度建模方法论为这种情况提供了解决方案。 与SQL Server Integration Services中的组件集成的“缓慢更改”方法解决了该问题。 本文将着眼于在保持类型1列的同时,使用缓慢变化的类型2维更新产品维表的方法。

Slowly Changing Type 1 (SC1) refers to columns in a dimension table that are overwritten with new data. Say the color was entered incorrectly. The dimension process will need to update the incorrect value. The historical reporting will change but the business wants this. Slowly Changing Type 2 (SC2) refers to the example of the ListPrice changing from year to year. The reports from the previous year will need to include the List Price for that year. The dimension table will track multiple rows for the products with historical data in the previous rows based on a date range.

缓慢更改类型1(SC1)引用维表中被新数据覆盖的列。 假设颜色输入错误。 标注过程将需要更新不正确的值。 历史报告将更改,但是企业需要这样做。 缓慢更改类型2(SC2)是指ListPrice逐年更改的示例。 上一年的报告将需要包含该年的标价。 维度表将根据日期范围跟踪产品的多行以及前几行中的历史数据。

The regular product dimension table used in this article appears in Code Block 1.

本文中使用的常规产品尺寸表显示在代码块1中。

 
CREATE TABLE [dbo].[DimProduct](
	[ProductSKey] [int] IDENTITY(1,1) NOT NULL,
	[ProductID] [nvarchar](25) NULL,
	[ProductName] [nvarchar](50) NOT NULL,
	[Color] [nvarchar](15) NOT NULL,
	[ListPrice] [money] NULL,
	[Class] [nchar](2) NULL,
	[ProductSubcategorySKey] [int] NULL
 CONSTRAINT [PK_DimProduct_ProductSKey] PRIMARY KEY CLUSTERED 
(
	[ProductSKey] ASC
) ON [PRIMARY]) ON [PRIMARY]
GO
 
ALTER TABLE [dbo].[DimProduct]  WITH CHECK 
  ADD  CONSTRAINT [FK_DimProduct_DimProductSubcategory] 
  FOREIGN KEY([ProductSubcategorySKey])
REFERENCES [dbo].[DimProductSubcategory] ([ProductSubcategorySKey])
GO
ALTER TABLE [dbo].[DimProduct] CHECK CONSTRAINT [FK_DimProduct_DimProductSubcategory]
GO
 

The following columns will be added to the product dimension table.

以下列将添加到产品尺寸表中。

  1. StartDate (DateTime) – Start date for this product to being active and related to new sales

    StartDate(DateTime)–该产品有效并与新销售相关的开始日期
  2. Status (Boolean) – 0 means not active, 1 means active

    状态(布尔)– 0表示未激活,1表示已激活
  3. Enddate (DateTime) – End date for this row to be inactive but still related to historical sales

    结束日期(DateTime)–该行的结束日期无效,但仍与历史销售相关

The data would look like the following for Slowly Change Type 2 (SC2) products.

缓慢更改类型2(SC2)产品的数据如下所示。

Key ID Name Color ListPrice SubcatSKey StartDate EndDate Status
243 FR-R92R-44 HL Road Frame – Red, 44 Red 1431.50 14 2016-07-01 NULL 1
242 FR-R92R-44 HL Road Frame – Red, 44 Red 1263.4598 14 2014-07-01 2015-06-30 0
241 FR-R92R-44 HL Road Frame – Red, 44 Red 1301.3636 14 2015-07-01 2016-06-30 0
246 FR-R92R-48 HL Road Frame – Red, 48 Red 1431.50 14 2016-07-01 NULL 1
245 FR-R92R-48 HL Road Frame – Red, 48 Red 1263.4598 14 2014-07-01 2015-06-30 0
244 FR-R92R-48 HL Road Frame – Red, 48 Red 1301.3636 14 2015-07-01 2016-06-30 0
ID 名称 颜色 价格表 SubcatSKey 开始日期 结束日期 状态
243 FR-R92R-44 HL公路车架–红色,44 1431.50 14 2016-07-01 空值 1个
242 FR-R92R-44 HL公路车架–红色,44 1263.4598 14 2014-07-01 2015-06-30 0
241 FR-R92R-44 HL公路车架–红色,44 1301.3636 14 2015-07-01 2016-06-30 0
246 FR-R92R-48 HL公路车架–红色,48 1431.50 14 2016-07-01 空值 1个
245 FR-R92R-48 HL公路车架–红色,48 1263.4598 14 2014-07-01 2015-06-30 0
244 FR-R92R-48 HL公路车架–红色,48 1301.3636 14 2015-07-01 2016-06-30 0

In Table 1, the Product ID FR-R92R-44 (HL Road Frame – Red 44) has 3 rows. The data seems to be duplicated in the table, but the StartDate, EndDate, and Status do label each row for when it is valid. The Null in the EndDate along with the Status of 1, shows the Key row 243 to be the active row for this product.

在表1中,产品ID FR-R92R-44HL公路车架-红色44 )有3行。 数据似乎在表中重复,但是StartDate,EndDate和Status确实在有效的每一行都标记了标签。 EndDate中的Null以及Status 1(状态1)表示密钥行243是此产品的活动行。

SQL Server Integration Services provides a Slowly Changing Dimension component (it is actually a wizard), but sometimes it is better to build it with other components. This gives the package more flexibility when updating the dimension table with additional columns. The package will look like any dimension table import. Figure 2 shows the Control Flow for the product dimension.

SQL Server Integration Services提供了一个“尺寸变化缓慢”组件(它实际上是一个向导),但是有时最好与其他组件一起构建它。 当使用其他列更新维表时,这为程序包提供了更大的灵活性。 该软件包将看起来像任何维度表导入。 图2显示了产品尺寸的控制流。

The additional Execute SQL Task, named Update Product Changes, allows a set base update on the DimProduct table instead of using an OLEDB Command component in a Data Flow Task. An OLEDB Command used to execute a T-SQL UPDATE might be a performance problem when there are many rows to be updated. This is referred to as “Rows by Agonizing Rows” or RBAR.

名为“更新产品更改”的附加执行SQL任务允许在DimProduct表上进行设置的基础更新,而不是在数据流任务中使用OLEDB命令组件。 当有许多行要更新时,用于执行T-SQL UPDATE的OLEDB命令可能是性能问题。 这被称为“行化行”或RBAR。

Figure 2 shows the Data Flow Task for the Product Dimension. The part that needs to be modified is the Conditional Split. Here, a Multicast needs to be added to insert a new row for the Slowly Changing Type 2 (SC2) data in the Product table plus a pipe to a check for Slowly Changing Type 1 (SC1) changes. The code for SC2 needs to check for ListPrice changes. If the ListPrice changes, then a new row will be added to the product dimension table and the existing current product row needs the EndDate to be updated to the end of yesterday and the Status changed to 0 (Inactive).

图2显示了产品维度的数据流任务。 需要修改的部分是条件拆分。 在这里,需要添加一个多播,以在“产品”表中为“缓慢变化的类型2(SC2)”数据插入新行,以及用于检查“缓慢变化的类型1(SC1)”更改的管道。 SC2的代码需要检查ListPrice的更改。 如果ListPrice更改,则将在产品尺寸表中添加新行,并且现有的当前产品行需要将EndDate更新到昨天结束,并且Status更改为0(无效)。

Figure 3 shows the Change to the Lookup to see if an existing Product is in the Dimension table. The Status = 1 WHERE clause will only return active Products in the Data Warehouse. The columns selected (ListPrice, Color, etc.) will be compared to the existing dimension table for changes – SC1 and SC2.

图3显示了对Lookup的更改,以查看Dimension表中是否存在现有产品。 Status = 1 WHERE子句将仅返回数据仓库中的活动产品。 所选的列(标价,颜色等)将与现有维度表进行比较以进行更改-SC1和SC2。

The Multicast will be placed before the Conditional Split and this Conditional Split (SC1) will not look at ListPrice, only Color, and Class. These 2 columns have been determined to be Slowly Changing Type 1 which means overwrite existing Product Dimension rows whether active or inactive.

组播将放置在条件拆分之前,并且此条件拆分(SC1)不会查看ListPrice,而只会查看颜色和类。 这2列已确定为缓慢更改类型1,这意味着将覆盖现有的产品维度行,无论它们是活动的还是非活动的。

An additional Conditional Split (Figure 4) will be added after the Multicast to compare the ListPrice. A Union ALL component (Figure 5) will be added to merge existing new rows from the transactional database and the new SC2 rows from the new Conditional Split. Another Multicast will pipe the new SC2 rows to Union All and a Staging table.

在组播之后,将添加一个附加的条件拆分(图4)以比较ListPrice。 将添加Union ALL组件(图5)以合并事务数据库中现有的新行和新的Conditional Split中的新SC2行。 另一个“多播”会将新的SC2行通过管道传输到Union All和Staging表。

Figure 4 shows the Conditional Split comparison for ListPrice column. Note here that there is no check for Nulls. This Data Warehouse does not allow Nulls and the source query will return NA or Unknown for Null values in the source data like the code in Code Block 2.

图4显示了ListPrice列的条件拆分比较。 请注意,此处不检查Null。 此数据仓库不允许使用Null,并且源查询将返回源数据中的Null值(如代码块2中的代码)为NA或Unknown。

 
SELECT p.ProductNumber AS ProductID, p.Name AS ProductName
    , IsNull( p.Color, 'NA') AS Color, IsNull( p.Class, 'NA') AS Class
    , IsNull( p.ListPrice, 0) AS ListPrice
    , ISNULL(p.ProductSubcategoryID, -1) AS ProductSubcategoryID
    , CAST( CAST( GETDATE() AS DATE) AS DATETIME) AS StartDate 
    , DATEADD( SECOND, -1, CAST( CAST( GETDATE() AS DATE) AS DATETIME) ) AS EndDate 
    , 1 AS ProductActive
  FROM Production.Product p
 

The Data Flow Task (Figure 5) is now a little more complicated, but manageable.

数据流任务(图5)现在稍微复杂一些,但是可以管理。

Going back to the Control Flow, there are 2 Execute T-SQL Statement components rather than one as seen in Figure 6.

回到控制流,有2个Execute T-SQL Statement组件,而不是如图6所示的一个。

The Update statements for the SC1 changes are in Table 2 and the SC2 updates are in Code Blocks 3 and 4. They are updated based on a join to the staging table for the changed data.

SC2更改的Update语句在表2中,而SC2更新在代码块3和4中。它们是根据对变更数据的登台表的联接而更新的。

 
UPDATE dbo.DimProduct
  SET Color = s1.Color, Class = s1.Class
  FROM dbo.DimProduct dp
    INNER JOIN 
      StageDW.dbo.StageDimproduct sdp 
        ON sdp.matchProductSKey = dp.ProductSKey
 
 
UPDATE dbo.DimProduct
  SET EndDate = s1.EndDate
    , [Status] = 0
  FROM dbo.DimProduct dp
    INNER JOIN 
      StageDW.dbo.StageProduct_SC2 s2 
        ON sdp.matchProductSKey = dp.ProductSKey
 

SQL Server Integration Services can handle almost any data import situation that is given it. There are so many options to accomplish things like Slowly Changing Dimensions. Starting with an opportunity like this helps individuals dig deeper into the functionality of SQL Server. As newer releases become available, it is noticeable that Microsoft is improving the tools that make Data Management more attainable.

SQL Server Integration Services几乎可以处理给定的任何数据导入情况。 有许多选项可以完成诸如“缓慢更改尺寸”之类的事情。 从这样的机会开始,可以帮助个人更深入地研究SQL Server的功能。 随着较新版本的发布,Microsoft不断改进其使数据管理更易实现的工具。

有用的链接 (Useful links)

翻译自: https://www.sqlshack.com/managing-a-slowly-changing-dimension-in-sql-server-integration-services/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值