sql中什么时候应用临时表
Today the subject of investigation is the Temporal Table, which is a new feature in SQL Server 2016. My focus will slightly be on how to use it in Data Warehouse environments, but there is some general information passing by as I write.
今天,调查的主题是临时表 ,它是SQL Server 2016中的一项新功能。我的重点仅是如何在数据仓库环境中使用它,但在撰写本文时会传递一些常规信息。
I want to cover next topics:
我想介绍下一个主题:
- What is a temporal table (in short)? 什么是时间表(简而言之)?
- Can I use a temporal table for a table in the PSA (Persistent Staging Area)? 我可以在PSA(永久性暂存区)中的表中使用临时表吗?
- Can I use a temporal table for a Data Vault Satellite? 我可以将临时表用于Data Vault Satellite吗?
- Is using temporal tables for full auditing in an OLTP system a good idea? 使用临时表在OLTP系统中进行全面审核是否是一个好主意?
什么是时间表(简而言之)? (What is a temporal table (in short)?)
In short, a temporal table is a table that tracks changes in a (temporal) table to a second “History” table, and this process is managed by SQL Server 2016 itself, you do not have to write extra code for that.
简而言之,时态表是一个将(时态)表中的更改跟踪到第二个“历史”表的表,该过程由SQL Server 2016本身管理,您不必为此编写额外的代码。
Stated in the free eBook Introducing Microsoft SQL Server 2016 is the following:
在免费的介绍Microsoft SQL Server 2016的电子书中规定如下:
“When you create a temporal table, you must include a primary key and non-nullable period columns, a pair of columns having a datetime2 data type that you use as the start and end periods for which a row is valid.”
“当创建时态表时,必须包括一个主键和不可为空的期间列,一对具有datetime2数据类型的列,它们用作行的开始和结束期间。”
For more details please read the book.
有关更多详细信息,请阅读本书。
You can also read more about temporal tables on MSDN.
An example of how a temporal table looks in SQL Server Management Studio. You can see by the icon and the suffix (System Versioned) (both marked red) that this is a temporal table. Underneath the table node is history table is shown (marked with green). Start- and enddatetime are required columns, but you can give them a custom name (marked blue).
SQL Server Management Studio中时态表的外观示例。 通过图标和后缀(系统版本化)(均标记为红色),您可以看到这是一个临时表。 在表节点下方显示了历史记录表(标记为绿色)。 “开始时间”和“结束日期时间”是必填列,但您可以为其指定一个自定义名称(标记为蓝色)。
There are three ways to create the history table for a temporal table:
有三种创建时间表历史记录表的方法:
- anonymous history table: you don’t bother really about the history table, SQL Server gives it a name and creates it. 匿名历史记录表创建一个临时表:您不必担心历史记录表,SQL Server为它命名并创建它。
- Create a temporal table with a default history table: same as anonymous, but you provide the name for the history table to use.
- 使用默认历史记录表创建一个时态表:与匿名表相同,但是您提供了要使用的历史记录表的名称。
- Create a temporal table with an existing history table: you create the history table first, and can optimize storage and/or indexes. Then you create the temporal table and in the CREATE statement provide the name of the existing history table.
- 使用现有的历史记录表创建时间表 :首先创建历史记录表,然后可以优化存储和/或索引。 然后,创建时间表,并在CREATE语句中提供现有历史表的名称。
So the first two are the “lazy” options, and they might be good enough for smaller tables. The third option allows you to fully tweak the history table.
因此,前两个是“懒惰”选项,它们对于较小的表可能就足够了。 第三个选项使您可以完全调整历史记录表 。
I have used the third option in my Persistent Staging Area, see below.
我在持久性暂存区中使用了第三个选项,请参见下文。
我可以在PSA(永久性暂存区)中的表中使用临时表吗? (Can I use a temporal table for a table in the PSA (Persistent Staging Area)?)
In my previous blog post – Using a Persistent Staging Area: What, Why, and How – you could read what a Persistent Staging Area (or PSA for short) is.
在我以前的博客文章– 使用持久性暂存区:什么,为什么以及如何 –您可以阅读什么是持久性暂存区(简称PSA)。
Today I want to share my experiences on my lab tests using temporal tables in the PSA.
今天,我想分享我在PSA中使用时态表进行实验室测试的经验。
But besides a temporal table, I have also created a “normal” staging table, for loading the data. This is because:
但是除了临时表之外,我还创建了一个“普通”登台表,用于加载数据。 这是因为:
- A temporal table cannot be truncated, and because truncate is much faster than delete, I create a normal staging table to load the data from the source. 临时表不能被截断,并且因为截断比删除快得多,所以我创建了一个普通的临时表来从源中加载数据。
- I want load the source data as fast as possible, so I prefer plain insert instead of doing change detection with the rows currently in the temporal table. This would be slower, and I preferably do that later in parallel with loading the rest of the EDW. 我想尽快加载源数据,所以我更喜欢普通插入,而不是对时态表中的当前行进行更改检测。 这样会比较慢,我最好稍后再加载EDW的其余部分时这样做。
- Because I want the PSA to stay optional and not a core part of the EDW. If the PSA is additional to a normal Staging Area, it is easier to switch off later. 因为我希望PSA保持可选性,而不是EDW的核心部分。 如果PSA是常规暂存区之外的附加组件,则以后更容易关闭。
Here is the script I used to create the temporal table:
这是我用来创建时态表的脚本:
--\
---) Create the history table ourselves, to be used as a backing table
---) for a Temporal table, so we can tweak it for optimal performance.
---) Please note that I use the datatype DATETIME2(2) because it
---) uses 6 bytes storage, whereas DATETIME2(7) uses 8 bytes.
---) If the centiseconds precision of DATETIME2(2) is not enough
---) in your data warehouse, you can change it to DATETIME2(7).
--/
CREATE TABLE [psa].[Customer_TemporalHistory]
(
[CustomerID] INT NOT NULL,
[FirstName] NVARCHAR(20) NULL,
[Initials] NVARCHAR(20) NULL,
[MiddleName] NVARCHAR(20) NULL,
[SurName] NVARCHAR(50) NOT NULL,
[DateOfBirth] DATE NOT NULL,
[Gender] CHAR(1) NOT NULL,
[SocialSecurityNumber] CHAR(12) NOT NULL,
[Address] NVARCHAR(60) NOT NULL,
[PostalCode] CHAR(10) NULL,
[Residence] NVARCHAR(60) NULL,
[StateOrProvince] NVARCHAR(20) NULL,
[Country] NVARCHAR(60) NULL,
[RowHash] BINARY(16),
[SessionStartDts] DATETIME2(2) NOT NULL,
[EffectiveStartDts] DATETIME2(2) NOT NULL,
[EffectiveEndDts] DATETIME2(2) NOT NULL
);
GO
--\
---) Add indexes to history table
--/
CREATE CLUSTERED COLUMNSTORE INDEX [IXCS_Customer_TemporalHistory]
ON [psa].[Customer_TemporalHistory];
CREATE NONCLUSTERED INDEX [IXNC_Customer_TemporalHistory__EffectiveEndDts_EffectiveStartDts_CustomerID]
ON [psa].[Customer_TemporalHistory]
([EffectiveEndDts], [EffectiveStartDts], [CustomerID]);
GO
--\
---) Now create the temporal table
--/
CREATE TABLE [psa].[Customer_Temporal]
(
[CustomerID] INT NOT NULL,
[FirstName] NVARCHAR(20) NULL,
[Initials] NVARCHAR(20) NULL,
[MiddleName] NVARCHAR(20) NULL,
[SurName] NVARCHAR(50) NOT NULL,
[DateOfBirth] DATE NOT NULL,
[Gender] CHAR(1) NOT NULL,
[SocialSecurityNumber] CHAR(12) NOT NULL,
[Address] NVARCHAR(60) NOT NULL,
[PostalCode] CHAR(10) NULL,
[Residence] NVARCHAR(60) NULL,
[StateOrProvince] NVARCHAR(20) NULL,
[Country] NVARCHAR(60) NULL,
[RowHash] BINARY(16),
[SessionStartDts] DATETIME2(2) NOT NULL,
-- SessionStartDts is manually set, and is the same for all
-- rows of the same session/loadcycle.
CONSTRAINT [PK_Customer_Temporal]
PRIMARY KEY CLUSTERED ([CustomerID] ASC),
[EffectiveStartDts] DATETIME2(2) GENERATED ALWAYS AS ROW START NOT NULL,
[EffectiveEndDts] DATETIME2(2) GENERATED ALWAYS AS ROW END NOT NULL,
PERIOD FOR SYSTEM_TIME ([EffectiveStartDts], [EffectiveEndDts])
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = [psa].[Customer_TemporalHistory]));
GO
-- Add a few indexes.
CREATE NONCLUSTERED INDEX [IXNC_Customer_Temporal__EffectiveEndDts_EffectiveStartDts_CustomerID]
ON [psa].[Customer_Temporal]
([EffectiveEndDts], [EffectiveStartDts], [CustomerID]);
CREATE NONCLUSTERED INDEX [IXNC_Customer_Temporal__CustomerID_RowHash]
ON [psa].[Customer_Temporal]
([CustomerID], [RowHash]);
GO
Note: I have not included all the scripts that I used for my test in this article, because it could be overwhelming. But if you are interested you can download all the scripts and the two SSIS testpackages here.
注意:我没有在本文中包括用于测试的所有脚本,因为这可能会让人感到不知所措。 但是,如果您有兴趣,可以在此处下载所有脚本和两个SSIS测试包。
Maybe you are as curious as me to know if using temporal tables for a PSA is a good idea.
也许您和我一样好奇, 对于PSA使用临时表是否是个好主意。
Considerations are the following:
注意事项如下:
- Speed of data 加载 loading. 速度 。
- Speed of 读取行的reading rows at a certain moment in time (time travel mechanism). 速度 (时间旅行机制)。
- Ability to adopt changes in the datamodel.
- 能够采用数据模型中的更改 。
- Simplicity and reliability of the solution 简单性和可靠性
- Ability to do historic loads, for instance from archived source files or an older data warehouse.
- 能够执行历史性负载 ,例如从存档的源文件或较旧的数据仓库中进行负载 。
And of course we need something to compare with. Let that be a plain SQL Server table with a start- and enddatetime.
当然,我们需要一些可以比较的东西。 让它成为一个带有开始日期和结束日期时间的普通SQL Server表。
Before I present you the testresults, I just want to tell a few details about the test:
在向您介绍测试结果之前,我只想告诉您有关测试的一些详细信息:
- For testing I use a “Customer” table that is filled with half a million rows of dummy data. 为了进行测试,我使用了一个“客户”表,其中填充了半百万行的虚拟数据。
- I simulate 50 daily loads with deletes, inserts and updates in the staging table. After those 50 runs, the total number of rows has more than quadrupled to just over 2.2 million (2237460 to be exactly). 我在登台表中模拟了50次每日负载,包括删除,插入和更新。 经过这50次运行后,总行数增加了三倍多,达到220万以上(准确地说是2237460)。
- For the DATETIME2 columns, I use a precision of centiseconds, so DATETIME2(2). For justification see one of my older blog posts: Stop being so precise! and more about using Load(end)dates (Datavault Series). If needed you can use a higher precision for your data warehouse.
- 对于DATETIME2列,我使用的精度为厘秒,因此DATETIME2(2)。 出于正当理由,请参阅我的较早的博客文章之一: 别这么精确! 以及有关使用Load(end)dates(Datavault系列)的更多信息 。 如果需要,可以为数据仓库使用更高的精度。
- [RowHash] column, which is a MD5 hashvalue of all columns of a row that are relevant for a change (so start- and enddate are not used for the hashvalue). This is done primarily for having a better performance while comparing new and existing rows. [RowHash]列,它是与更改相关的行的所有列的MD5哈希值(因此,哈希值不使用开始日期和结束日期)。 这样做主要是为了在比较新行和现有行时具有更好的性能。
- I have compared all data in both the Non temporal PSA table and the temporal table with backing history table to check that the rows where exactly the same and this was the case (except for Start- and Enddates). 我已将非临时PSA表和临时表与支持历史记录表中的所有数据进行了比较,以检查行是否完全相同,并且确实如此(开始日期和结束日期除外)。
数据加载速度 (Speed of data loading)
Using T-SQL for synchronizing data from a staging table to a PSA table I got the following testresults:
使用T-SQL将数据从登台表同步到PSA表时,我得到以下测试结果:
Testcase | Average duration (50 loads, in ms) |
Synchronize PSA temporal | 6159 |
Synchronize PSA Non-temporal | 24590 |
测试用例 | 平均持续时间(50次加载,以毫秒为单位) |
同步PSA时间 | 6159 |
同步PSA非时间 | 24590 |
So we have a winner here, it’s the temporal table! It’s four times faster!
所以我们在这里有一个赢家,这是临时表! 快四倍!
在某一时刻读取行的速度(时间移动机制) (Speed of reading rows at a certain moment in time (time travel mechanism))
For reading, I used two views and two SSIS Packages with a time travel mechanism and a data flow task.
为了阅读,我使用了两个视图和两个SSIS包以及一个时间旅行机制和一个数据流任务。
The views return the rows valid at a certain point in time, selected from the temporal and non-temporal history table, respectively.
视图返回分别在时间点和非时间历史表中选择的在特定时间点有效的行。
The data flow tasks in the SSIS packages have a conditional split that is used to prevent that the rows actually are inserted into the OLE DB Destination. In this way it is a more pure readtest.
SSIS程序包中的数据流任务具有条件拆分,该条件拆分用于防止将行实际插入到OLE DB目标中。 这样,它是一个更纯粹的重新测试。
Here are the views that were used:
以下是使用的视图:
--\
---) For the demo, the virtualization layer is slightly different from a real life scenario.
---) Normally you would create separate databases for the Staging Area and the PSA, so you could do
---) the connection swap as explained here:
---) http://www.hansmichiels.com/2017/02/18/using-a-persistent-staging-area-what-why-and-how/
---) Normally you would also either have a normal history table or a temporal one, but not both.
---) As I now have three objects that would like to have the virtual name [stg].[Customer], I use a
---) suffix for the PSA versions, as this is workable for the demo.
---) So:
---) [stg].[Customer]: view on the normal [stg_internal].[Customer] table (only in downloadable materials).
---) [stg].[Customer_H]: view on the [psa].[Customer_History] table.
---) [stg].[Customer_TH]: view on the [psa].[Customer_Temporal] table.
--/
-------------- [Customer_H] --------------
IF OBJECT_ID('[stg].[Customer_H]', 'V') IS NOT NULL DROP VIEW [stg].[Customer_H];
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [stg].[Customer_H]
AS
/*
==========================================================================================
Author: Hans Michiels
Create date: 15-FEB-2017
Description: Virtualization view in order to be able to to full reloads from the PSA.
==========================================================================================
*/
SELECT
hist.[CustomerID],
hist.[FirstName],
hist.[Initials],
hist.[MiddleName],
hist.[SurName],
hist.[DateOfBirth],
hist.[Gender],
hist.[SocialSecurityNumber],
hist.[Address],
hist.[PostalCode],
hist.[Residence],
hist.[StateOrProvince],
hist.[Country],
hist.[RowHash],
hist.[SessionStartDts]
FROM
[psa].[PointInTime] AS pit
JOIN
[psa].[Customer_History] AS hist
ON hist.EffectiveStartDts <= pit.CurrentPointInTime
AND hist.EffectiveEndDts > pit.CurrentPointInTime
GO
-------------- [Customer_TH] --------------
IF OBJECT_ID('[stg].[Customer_TH]', 'V') IS NOT NULL DROP VIEW [stg].[Customer_TH];
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [stg].[Customer_TH]
AS
/*
==========================================================================================
Author: Hans Michiels
Create date: 15-FEB-2017
Description: Virtualization view in order to be able to to full reloads from the PSA.
==========================================================================================
*/
SELECT
hist.[CustomerID],
hist.[FirstName],
hist.[Initials],
hist.[MiddleName],
hist.[SurName],
hist.[DateOfBirth],
hist.[Gender],
hist.[SocialSecurityNumber],
hist.[Address],
hist.[PostalCode],
hist.[Residence],
hist.[StateOrProvince],
hist.[Country],
hist.[RowHash],
hist.[SessionStartDts]
FROM
[psa].[PointInTime] AS pit
JOIN
[psa].[Customer_Temporal] FOR SYSTEM_TIME ALL AS hist
-- "FOR SYSTEM_TIME AS OF" does only work with a constant value or variable,
-- not by using a column from a joined table, e.g. pit.[CurrentPointInTime]
-- So unfortunately we have to select all rows, and then do the date logic ourselves
--
-- Under the hood, a temporal table uses EXCLUSIVE enddating:
-- the enddate of a row is equal to the startdate of the rows that replaces it.
-- Therefore we can not use BETWEEN, as this includes the enddatetime.
ON hist.EffectiveStartDts <= pit.CurrentPointInTime
AND hist.EffectiveEndDts > pit.CurrentPointInTime
-- By the way, there are more ways to do this, you could also use a CROSS JOIN and
-- a WHERE clause here, instead doing the datetime filtering in the join.
GO
For measuring the duration I simply used my logging framework (see A Plug and Play Logging Solution) and selected the start- and enddatetime of the package executions from the [logdb].[log].[Execution] table.
为了测量持续时间,我只使用了我的日志记录框架(请参阅即插即用日志记录解决方案 ),然后从[logdb]。[log]。[Execution]表中选择了包执行的开始日期和结束日期。
Here are the results:
结果如下:
Testcase | Total duration (50 reads, in seconds) |
Read PSA temporal | 164 |
Read PSA Non-temporal | 2686 |
测试用例 | 总持续时间(50次读取,以秒为单位) |
阅读PSA时态 | 164 |
阅读非临时性PSA | 2686 |
And again, a very convincing winner, it is the temporal table again. It is even 16 times faster! I am still wondering how this is possible. Both tables have similar indexes, of which one columnstore, and whatever I tried, I kept getting the same differences.
再一次,非常令人信服的获胜者,这是临时表。 它甚至快了16倍! 我仍然想知道这怎么可能。 这两个表都有相似的索引,其中一个列存储,而无论我尝试什么,我一直得到相同的差异。
能够采用数据模型中的更改 (Ability to adopt changes in the datamodel)
Change happens. So if a column is deleted or added in the source, we want to make a change in the PSA:
变化发生了。 因此,如果在源中删除或添加了列,我们要对PSA进行更改:
- if a column is deleted, we make keep in the PSA to retain history (and make it NULLABLE when required). 如果删除了列,我们将保留在PSA中以保留历史记录(并在需要时将其设为NULLABLE)。
- if a column is added, we also add a column. 如果添加了列,我们还将添加一列。
I have tested the cases above plus the deletion of a column for the temporal table.
我已经测试了上述情况,并删除了时态表的列。
And yes, this works. You only have to change the temporal table (add, alter or drop column), the backing history table is changed automaticly by SQL Server.
是的,这可行。 您只需要更改临时表(添加,更改或删除列),备份历史记录表就会由SQL Server自动更改。
There are however a few exceptions when this is not the case, e.g. IDENTITY columns. You can read more about this on MSDN.
但是,如果不是这种情况,也有一些例外,例如IDENTITY列。 您可以在MSDN上阅读有关此内容的更多信息。
--\
---) Adapt Changes
---) After running this script, most other scripts are broken!!
--/
--\
---) Add a column
--/
-- Staging table
ALTER TABLE [stg_internal].[Customer] ADD [CreditRating] CHAR(5) NULL;
GO
-- Non-temporal history table
ALTER TABLE [psa].[Customer_History] ADD [CreditRating] CHAR(5) NULL;
GO
-- Temporal history table and it's backing table.
ALTER TABLE [psa].[Customer_Temporal] ADD [CreditRating] CHAR(5) NULL;
-- Not needed, SQL Server will do this behind the scenes:
-- ALTER TABLE [psa].[Customer_TemporalHistory] ADD [CreditRating] CHAR(5) NULL;
GO
--\
---) Make a column NULLABLE
--/
-- Staging table
ALTER TABLE [stg_internal].[Customer] ALTER COLUMN [SocialSecurityNumber] CHAR(12) NULL;
GO
-- Non-temporal history table
ALTER TABLE [psa].[Customer_History] ALTER COLUMN [SocialSecurityNumber] CHAR(12) NULL;
GO
-- Temporal history table and it's backing table.
ALTER TABLE [psa].[Customer_Temporal] ALTER COLUMN [SocialSecurityNumber] CHAR(12) NULL;
-- Not needed, SQL Server will do this behind the scenes:
-- ALTER TABLE [psa].[Customer_TemporalHistory] ALTER COLUMN [SocialSecurityNumber] CHAR(12) NULL;
GO
--\
---) Delete a column (not adviced since you will then lose history).
--/
-- Staging table
ALTER TABLE [stg_internal].[Customer] DROP COLUMN [StateOrProvince];
GO
-- Non-temporal history table
ALTER TABLE [psa].[Customer_History] DROP COLUMN [StateOrProvince];
GO
-- Temporal history table and it's backing table.
ALTER TABLE [psa].[Customer_Temporal] DROP COLUMN [StateOrProvince];
GO
Temporal table with added column “CreditRating”: when added to the temporal table the column is also automaticly added to the backing history table. (I removed some other columns from the picture for simplicity)
带有添加的列“ CreditRating”的时态表:添加到时态表时,该列也会自动添加到支持历史记录表中。 (为简单起见,我从图片中删除了其他一些列)
But the conclusion is that a temporal table structure can be changed when needed. This is what I wanted to know.
但是结论是,可以在需要时更改时态表结构。 这就是我想知道的。
解决方案的简单性和可靠性 (Simplicity and reliability of the solution)
Unless you use code generation tools that generate the loading process for you, and the code that comes out is thoroughly tested, I would say the code to keep track of changing using a temporal table is less complex and thus less prone to errors. Especially the enddating mechanism is handled by SQL Server, and that sounds nice to me.
除非您使用代码生成工具为您生成加载过程,并且未对输出的代码进行全面测试,否则我要说的是使用临时表跟踪更改的代码不会那么复杂,因此不太容易出错。 特别是终结机制是由SQL Server处理的,这对我来说听起来不错。
There is however also a disadvantage of using a temporal table: the start- and end-datetime are out of your control, SQL Server gives it a value and there is nothing you can do about that. For Data Vault loading it is a common best practice to set the LoadDts of a Satellite to the same value for the entire load and you could defend that this would also be a good idea for a PSA table.
但是,使用时态表也有一个缺点:开始日期时间和结束日期时间不在您的控制范围内,SQL Server为它提供了一个值,您对此无能为力。 对于Data Vault加载,通常的最佳做法是将整个加载的Satellite的LoadDts设置为相同的值,您可以辩称,这对于PSA表也是个好主意。
But, as you might have noticed, my solution for that is to just add a SessionStartDts to the table in addition to the start and end Dts that SQL Server controls. I think this is an acceptable workaround.
但是,正如您可能已经注意到的那样,我的解决方案是除了将SQL Server控制的开始和结束Dts之外,还向表中添加SessionStartDts。 我认为这是可以接受的解决方法。
By the way, SQL Server always uses the UTC date for start and end datetimes of a temporal table, keep that in mind!
顺便说一句,SQL Server始终将UTC日期用于临时表的开始和结束日期时间,请记住这一点!
能够进行历史负荷 (Ability to do historic loads)
For this topic I refer to Data Vault best practices again. When using Data Vault, the LoadDateTimeStamp always reflects the current system time except when historic data is loaded: then the LoadDateTimeStamp is changed to the value of the (estimated) original date/time of the delivery of the datarow.
对于本主题,我再次参考Data Vault最佳实践。 使用Data Vault时, 除非加载历史数据 ,否则LoadDateTimeStamp始终反映当前系统时间:然后将LoadDateTimeStamp更改为数据行交付的(估计)原始日期/时间的值。
This can be a bit problematic when you use a PSA with system generated start and end dates, at least that is what I thought for a while. I thought this was spoiling all the fun of the temporal table.
当您将PSA与系统生成的开始日期和结束日期一起使用时,这可能会有点问题,至少这是我一段时间以来的想法。 我认为这破坏了临时表的所有乐趣。
But suddenly I realized it is not!
但是突然我意识到事实并非如此!
Let me explain this. Suppose you have this staging table SessionStartDts (or LoadDts if you like) for which you provide the value.
让我解释一下。 假设您有此登台表SessionStartDts (如果需要,则为LoadDts),并为其提供值。
Besides that you have the EffectiveStartDts and EffectiveEndDts (or whatever name you give to these columns) of the temporal table that SQL Server controls.
除此之外,您还具有SQL Server控制的时态表的EffectiveStartDts和EffectiveEndDts (或为这些列提供的任何名称)。
Be aware of the role that both “timelines” must play:
注意两个“时间表”必须扮演的角色:
- only used to select the staging rows at a point in time. They are ignored further down the way into the EDW. 仅用于选择某个时间点的暂存行。 在进入EDW的途中,它们将被忽略。
- SessionStartDts, which can be set to a historic date/time, is used further down the way into the EDW to do the enddating of satellites and so on. SessionStartDts (可以设置为历史日期/时间)在进入EDW的过程中进一步用于完成卫星的终结等等。
How this would work? As an example a view that I used for the readtest, which contain both the [SessionStartDts] and the [PointInTimeDts] (for technical reasons converted to VARCHAR). The math to get the right rows out works on the ‘technical timeline’ (SQL Server controlled columns), while the [SessionStartDts] is available later for creating timelines in satellites.
这将如何工作? 作为示例,我使用了一个用于readtest的视图,该视图同时包含[SessionStartDts]和[PointInTimeDts](出于技术原因,已转换为VARCHAR)。 在“技术时间轴”(SQL Server控制的列)上进行正确排行的数学运算,而[SessionStartDts]稍后可用于在卫星中创建时间轴。
CREATE VIEW [psa].[Timeline_H]
AS
/*
==========================================================================================
Author: Hans Michiels
Create date: 15-FEB-2017
Description: View used for time travelling.
==========================================================================================
*/
SELECT TOP 2147483647
CONVERT(VARCHAR(30), alltables_timeline.[SessionStartDts], 126) AS [SessionStartDtsString],
-- If the point in time is the maximum value of the [EffectiveStartDts] of the applicable [SessionStartDts]
-- you will select all rows that were effective/valid after this session/load.
CONVERT(VARCHAR(30), MAX(alltables_timeline.[EffectiveStartDts]), 126) AS [PointInTimeDtsString]
FROM
(
SELECT DISTINCT
subh.[SessionStartDts],
subh.[EffectiveStartDts]
FROM
[psa].[Customer_History] subh WITH (READPAST)
-- UNION MORE STAGING TABLES HERE WHEN APPLICABLE
) alltables_timeline
GROUP BY
CONVERT(VARCHAR(30), alltables_timeline.[SessionStartDts], 126)
ORDER BY
CONVERT(VARCHAR(30), alltables_timeline.[SessionStartDts], 126)
GO
CREATE VIEW [psa].[Timeline_TH]
AS
/*
==========================================================================================
Author: Hans Michiels
Create date: 15-FEB-2017
Description: View used for time travelling.
==========================================================================================
*/
SELECT TOP 2147483647
CONVERT(VARCHAR(30), alltables_timeline.[SessionStartDts], 126) AS [SessionStartDtsString],
-- If the point in time is the maximum value of the [EffectiveStartDts] of the applicable [SessionStartDts]
-- you will select all rows that were effective/valid after this session/load.
CONVERT(VARCHAR(30), MAX(alltables_timeline.[EffectiveStartDts]), 126) AS [PointInTimeDtsString]
FROM
(
SELECT DISTINCT
subh.[SessionStartDts],
subh.[EffectiveStartDts]
FROM
[psa].[Customer_Temporal] FOR SYSTEM_TIME ALL AS subh WITH (READPAST)
-- UNION MORE STAGING TABLES HERE WHEN APPLICABLE
) alltables_timeline
GROUP BY
CONVERT(VARCHAR(30), alltables_timeline.[SessionStartDts], 126)
ORDER BY
CONVERT(VARCHAR(30), alltables_timeline.[SessionStartDts], 126)
GO
得出有关将时序表用于PSA的结论 (Drawing a conclusion about using temporal tables for a PSA)
Consideration | And the winner is .. |
Speed of data loading | Temporal table (4 times faster!) |
Speed of reading rows at a certain moment in time (time travel mechanism) | Temporal table (16 times faster!) |
Ability to adopt changes in the datamodel | Ex aequo (only in exceptional cases changing the temporal table is more complex). |
Simplicity and reliability of the solution | Temporal table. |
Ability to do historic loads | Ex aequo, if you know what you are doing. |
考虑 | 最终获胜者是 .. |
数据加载速度 | 时态表(快4倍!) |
在某一时刻读取行的速度(时间移动机制) | 时态表(快16倍!) |
能够采用数据模型中的更改 | 衡平法(仅在特殊情况下更改时态表更为复杂)。 |
解决方案的简单性和可靠性 | 时间表。 |
能够进行历史负荷 | 当然,如果您知道自己在做什么。 |
I think there are enough reasons for using temporal tables for a PSA! Do you agree?
我认为有足够的理由将临时表用于PSA! 你同意吗?
我可以将临时表用于Data Vault Satellite吗? (Can I use a temporal table for a Data Vault Satellite?)
Due to the similarities with a table in the Persistent Staging Area, I think those test results on read- and write performance also hold true for satellites.
由于与“永久暂存区”中的表格相似,我认为那些有关读写性能的测试结果也适用于卫星。
However in satellites you cannot get away with the system generated start- and enddatetimestamps when you have to deal with historic loads, unless you do serious compromises on the technical design.
但是,在卫星中,当您必须处理历史负载时, 您无法摆脱系统生成的开始日期和结束日期时间戳 , 除非您在技术设计上进行了重大妥协 。
What does not work is removing SYSTEM_VERSIONING temporarily ( ALTER TABLE [psa].[Customer_temporal] SET (SYSTEM_VERSIONING = OFF)) and update the dates then. Because the columns are created as GENERATED ALWAYS this is not allowed.
不起作用的是暂时删除SYSTEM_VERSIONING (ALTER TABLE [psa]。[Customer_temporal] SET(SYSTEM_VERSIONING = OFF)) ,然后更新日期。 因为这些列是作为“ 始终生成的”创建的,所以不允许这样做。
Besides that, this would be a clumsy solution that still requires manual management of the timeline in a more complex way than when normal satellites were used!
除此之外,这将是一个笨拙的解决方案,与使用普通卫星相比,它仍然需要以更复杂的方式手动管理时间轴!
So that leaves only one other solution, which requires – as said – a serious compromise on the technical design.
这样就只剩下一个解决方案,如上所述,这需要对技术设计进行认真的折衷。
If you make the use of point in time tables mandatory for every hub and its satellites, you could decouple historical and technical timelines. Using a similar mechanism as the view for time travelling, you could attach the point in time date 2013-02-02 (historical) to the EffectiveStartDts (or LoadDts if you like) of 2017-02-28 06:34:42.98 (technical date from temporal table) of a certain satellite row.
如果对每个集线器及其卫星强制使用时间表中的时间点 ,则可以将历史和技术时间表脱钩。 使用与时间旅行视图类似的机制,可以将时间点日期2013-02-02(历史)附加到2017-02-28 06:34:42.98(技术某个卫星行的时间表中的日期)。
And .. if you follow the holy rule that the Business Vault (in which the point in time tables exist) should always be rebuildable from the Raw Vault, you must also store the historical Startdate as an additional attribute in the satellite, but you exclude it for change detection.
并且..如果您遵循始终应从原始保管库中重建业务保管库(存在时间表的点)的神圣原则,则还必须将历史开始日期作为附加属性存储在卫星中,但要排除在外用于更改检测。
Is it worth this sacrifice in order to be able to use temporal tables?
为了能够使用临时表,是否值得为此付出牺牲?
I don’t know, “it depends”. It feels like bending the Data Vault rules, but at least it can be done, keep that in mind.
我不知道,“取决于”。 感觉就像弯曲Data Vault规则,但至少可以做到,记住这一点。
使用临时表在OLTP系统中进行全面审核是否是一个好主意? (Is using temporal tables for full auditing in an OLTP system a good idea?)
When auditing is needed due to legislation or internal audit requirements, I certainly think it is a good idea to use temporal tables. They are transparent to front end applications that write to the database and the performance seems quite okay (see above). Obviously the performance will always be a bit worse than non-temporal tables in an OLTP scenario, but that is not unique for temporal tables. Every solution to track history will cost performance.
当由于立法或内部审计要求而需要审计时,我当然认为使用临时表是一个好主意。 它们对写入数据库的前端应用程序是透明的,并且性能似乎还不错(请参见上文)。 显然,在OLTP场景中,性能始终会比非临时表差一些,但这对于临时表并不是唯一的。 跟踪历史记录的每种解决方案都会降低性能。
结论/总结 (Conclusion / Wrap up)
In this article I discussed some possible applications for the temporal table, a new feature in SQL Server 2016.
在本文中,我讨论了时态表的一些可能应用程序,这是SQL Server 2016中的一项新功能。
And it can be used for PSA (Persistent Staging Area) tables, Data Vault Satellites and tables in OLTP systems. If you know what you are doing, temporal tables can be of great value. That’s at least what I think.
它可用于PSA(永久暂存区)表,Data Vault Satellite和OLTP系统中的表。 如果您知道自己在做什么,则临时表可能会很有价值。 至少我是这样想的。
网络资源 (Resources on the web)
- Free ebook: introducing Microsoft SQL Server 2016 (on Microsoft web site) 免费电子书:介绍Microsoft SQL Server 2016(在Microsoft网站上)
- Temporal tables (MSDN). 时态表(MSDN )。
- Changing the Schema of a System-Versioned temporal table (MSDN) 更改系统版本时间表(MSDN)的架构
- Using a Persistent Staging Area: What, Why, and How (blog post) 使用持久性暂存区:什么,为什么和如何(博客文章)
- Stop being so precise! and more about using Load(end)dates (blog post)
- 别这么精确! 以及更多关于使用Load(end)dates的信息(博客文章)
- A Plug and Play Logging Solution (blog post) 即插即用日志记录解决方案(博客文章)
And again. if you are interested you can download all scripts and SSIS Packages used for my test here, also the ones not published inline in this article.
然后再次。 如果您有兴趣,可以在此处下载用于我的测试的所有脚本和SSIS包,以及本文中未内联发布的所有脚本和SSIS包。
翻译自: https://www.sqlshack.com/temporal-table-applications-in-sql-data-warehouse-environments/
sql中什么时候应用临时表