摘录本的摘录内容_使用tSQLtSQL单元测试数据仓库摘录

摘录本的摘录内容

This article talks about basic concepts of SQL unit testing from a Data Warehouse point of view using tSQLt, a highly productive and acclaimed SQL unit testing framework.

本文从数据仓库的角度使用tSQLt讨论了SQL单元测试的基本概念,tSQLt是一种高效且广受赞誉SQL单元测试框架。

This article is both for beginners to learn the basics and experienced Data Warehouse professionals to review their Data Warehouse unit testing approach in favor of the tSQLt framework.

本文既供初学者学习基础知识,又供有经验的数据仓库专业人员回顾以支持tSQLt框架的数据仓库单元测试方法。

Additionally, the readers of this article are going to learn the basic tips of applying tSQLt in the SQL unit testing of Data Warehouse Extracts alongside a quick overview of Data Warehouse concepts.

此外,本文的读者还将学习在数据仓库摘录SQL单元测试中应用tSQLt的基本技巧,以及对数据仓库概念的快速概述。

关于数据仓库 (About Data Warehouse )

It is very important to first get familiar with Data Warehouse concepts if you are not already familiar.

如果您还不熟悉,那么首先要熟悉数据仓库的概念非常重要。

Bear in mind that Data Warehouse is a massive subject that cannot be covered in one or two articles. However, it is not difficult to get an overview of Data Warehouse keeping in mind the scope of this article.

请记住,数据仓库是一个庞大的主题,一两个文章无法涵盖。 但是,请牢记本文的范围,不难了解Data Warehouse的概况。

什么是数据仓库? (What is Data Warehouse?)

Data Warehouse is a centralized repository or storage of data that is highly optimized for reporting and analysis purposes.

数据仓库是高度集中化的数据存储库或数据存储,可用于报告和分析。

数据仓库做什么? (What does Data Warehouse do?)

Data Warehouse helps businesses to understand their weaknesses and strengths by providing deep insights into their data by using special storage, architecture and processes.

数据仓库通过使用特殊的存储,架构和流程提供对数据的深刻见解,从而帮助企业了解其弱点和优势。

数据仓库还做什么? (What else Data Warehouse does?)

Data Warehouse also helps in making quick decisions based on the facts provided by it, thereby making business more productive and less susceptible.

数据仓库还有助于根据其提供的事实做出快速决策,从而使业务效率更高,更不易受到影响。

数据仓库的类型 (Types of Data Warehouse)

A traditional Data Warehouse can be built in the form of cubes or data models that can be accessed for reporting and real-time analysis by the business users.

传统的数据仓库可以以多维数据集或数据模型的形式构建,可以由业务用户访问以进行报告和实时分析。

维度和事实 (Dimension and facts )

A dimension is anything that is of interest to the business such as Product, Customer or Supplier, whereas FACT, as the name indicates, contains facts and figures alongside keys to link with dimensions.

维度是产品,客户或供应商等企业感兴趣的任何事物,而FACT,顾名思义,包含事实和数字以及与维度链接的键。

数据仓库的架构 (The architecture of Data warehouse )

A typical Data Warehouse either follows a star-like schema where a FACT is surrounded by dimensions (like a star) or a snowflake-like schema where a dimension can be directly linked with Fact or can also be linked with another dimension. However, their further details are beyond the scope of this article:

典型的数据仓库要么遵循FACT被维(例如星形)包围的星形模式,要么遵循可以与Fact直接链接或也可以与另一个维链接的雪花模式。 但是,它们的更多详细信息超出了本文的范围:

Fact and Dimensions in a Data Warehouse

数据仓库分段提取 (Data Warehouse Staging Extracts )

Let us now talk about Data Warehouse Extracts, particularly staging extracts.

现在让我们讨论数据仓库摘录,尤其是阶段摘录。

什么是数据仓库分段提取? (What are Data Warehouse Staging Extracts?)

Data Warehouse staging extract is basically a process that copies data from source to Data Warehouse Staging Area where it becomes available for further processing into dimensions and fact(s).

数据仓库分段提取基本上是一个将数据从源复制到数据仓库分段区域的过程,在该区域中,数据可用于进一步处理成维度和事实。

什么是暂存区? (What is Staging Area?)

Staging from a Data Warehouse perspective is a one-to-one mapping of data from source to destination where data from multiple sources is stored to be processed further.

从数据仓库的角度来看,分段是数据从源到目标的一对一映射,其中存储了来自多个源的数据以进行进一步处理。

分期示例 (Example of Staging)

For example, if you have a table named Customer, which consists of only two columns CustomerId and Name, then this table must also be present in staging with the same columns to be populated from the source or multiple sources.

例如,如果您有一个名为Customer的表该表仅由CustomerIdName两列组成则此表还必须与要从源或多个源填充的相同列一起显示在登台中。

临时区域的好处 (Benefits of Staging Area)

The main benefit of the Staging Area is the independence from the source once the data is extracted, which means the Data Warehouse further processing does not need to refer to the original source as long as its staging extract has been captured.

临时区域的主要好处是,一旦提取数据,便与源保持独立,这意味着只要已捕获其临时提取,数据仓库就无需参考原始源。

分期提取的重要性 (Importance of Staging Extract)

The most important part of the initial phase of Data Warehouse processing is the staging extract because this is the entrance to the Data Warehouse database, which then serves the Data Warehouse business intelligence needs.

数据仓库处理初始阶段最重要的部分是分段提取,因为这是数据仓库数据库的入口,然后数据库可以满足数据仓库业务智能需求。

Data Warehouse Staging Extract

SQL单元测试登台摘要 (SQL Unit Testing Staging Extract)

As mentioned earlier, staging extract(s) is the most important starting point as far as Data Warehouse workflows are concerned, so we should be then somehow SQL unit testing these extracts.

如前所述,对于数据仓库工作流而言,分段提取是最重要的起点,因此我们应该以某种方式对这些提取进行SQL单元测试。

However, it is also crucial to identify what needs to be unit tested and what can be excluded from SQL Unit Testing.

但是,确定哪些需要进行单元测试以及哪些可以从SQL单元测试中排除也是至关重要的。

We are first going to clarify the following two things in the context of SQL Unit Testing Data Warehouse Extracts:

我们首先要在SQL单元测试数据仓库摘录的上下文中阐明以下两件事:

  1. What is that we are going to write our SQL Unit Tests against?

    我们打算针对什么编写SQL单元测试?
  2. What SQL Unit Testing tool or framework is going to be used?

    将使用哪种SQL单元测试工具或框架?

暂存摘录和源脚本 (Staging Extract and Source Script )

The process of running staging extract (copying data from source to destination) typically depends on the source script, which runs against the source database to extract data, which is then copied over to the staging area.

运行暂存提取(将数据从源复制到目标)的过程通常取决于源脚本,该脚本针对源数据库运行以提取数据,然后将其复制到暂存区域。

A traditional source script selects some or all the columns of a table from source (database).

传统的源脚本从源(数据库)中选择表的某些或所有列。

So, the source script is actually the potential object against which we are going to write and run our SQL Unit Tests.

因此,源脚本实际上是我们要针对其编写和运行SQL单元测试的潜在对象。

TSQLt – SQL单元测试框架 (TSQLt – SQL unit testing framework )

The most suitable testing framework for SQL Unit Testing Data Warehouse Staging Extract is tSQLt, which is, by default, purpose-built and feature-rich.

用于SQL单元测试数据仓库分段提取的最合适的测试框架是tSQLt,默认情况下,tSQLt是专用的且功能丰富。

Please read the following articles to get started with tSQLt if you are not already familiar with it:

如果您还不熟悉tSQLt,请阅读以下文章以开始使用它:

复制SQL单元测试方案 (Replicating SQL Unit Testing scenario )

As already mentioned in this article that a Data Warehouse extract runs between source and Data Warehouse database, then it is easy to guess that at least two sample databases are required to implement the walkthrough to get a better understanding.

正如本文已经提到的,数据仓库提取在源数据库和数据仓库数据库之间运行,因此很容易猜测至少需要两个示例数据库才能实现本演练,以便更好地理解。

设置源样本数据库 (Setup source sample database)

Let us first setup source sample database called SQLDevArticlesV4 by running the following T-SQL script against the master database:

让我们首先通过对主数据库运行以下T-SQL脚本来设置名为SQLDevArticlesV4的源示例数据库:

-- 1 Create SQLDevArticlesV4 source database
CREATE DATABASE SQLDevArticlesV4;
GO
 
USE SQLDevArticlesV4;
GO
 
-- 2 Create author table
CREATE TABLE [dbo].[Author] (
    [AuthorId]         INT           IDENTITY (1, 1) NOT NULL,
    [Name]             VARCHAR (40)  NOT NULL,
    [RegistrationDate] DATETIME2 (7) NULL
);
 
-- 3 Create article tables
CREATE TABLE [dbo].[Article] (
    [ArticleId]      INT           IDENTITY (1, 1) NOT NULL,
    [Title]          VARCHAR (300) NOT NULL,
    [Published_Date] DATETIME2 (7) NOT NULL
);
 
-- 4 Populate author table
SET IDENTITY_INSERT [dbo].[Author] ON
INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (1, N'Asif', N'2018-01-01 00:00:00')
INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (2, N'Peter', N'2018-02-01 00:00:00')
INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (3, N'Sarah', N'2018-03-02 00:00:00')
INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (4, N'Adil', N'2018-04-02 00:00:00')
INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (5, N'Sam', N'2019-01-01 00:00:00')
SET IDENTITY_INSERT [dbo].[Author] OFF
 
-- 5 Populate article table
SET IDENTITY_INSERT [dbo].[Article] ON
INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (1, N'Fundamentals of Database Programming', N'2018-01-02 00:00:00')
INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (2, N'Advanced Database Programming', N'2018-01-03 00:00:00')
INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (3, N'Understanding SQL Stored Procedures ', N'2018-02-02 00:00:00')
INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (4, N'Database Design Concepts', N'2018-03-02 00:00:00')
INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (5, N'Power BI Desktop Fundamentals', N'2019-01-02 00:00:00')
SET IDENTITY_INSERT [dbo].[Article] OFF;
GO

设置数据仓库样本数据库 (Setup Data Warehouse sample database)

Once the source database sample is set up the next step is to create the Data Warehouse database SQLDevArticlesDW sample with staging area (schema) and tables similar to the ones in source by running the following T-SQL script:

设置源数据库样本后,下一步是通过运行以下T-SQL脚本来创建具有暂存区(模式)和与源中的表相似的表的数据仓库数据库SQLDevArticlesDW示例:

-- 1 Create SQLDevArticlesDW database
CREATE DATABASE SQLDevArticlesDW;
GO
 
USE SQLDevArticlesDW;
GO
 
-- 2 Create Staging schema
CREATE SCHEMA Staging
AUTHORIZATION dbo;
GO
 
 
-- 2 Create staging author table
CREATE TABLE [Staging].[Author] (
    [AuthorId]         INT           IDENTITY (1, 1) NOT NULL,
    [Name]             VARCHAR (40)  NOT NULL,
    [RegistrationDate] DATETIME2 (7) NULL
);
 
-- 3 Create article tables
CREATE TABLE [Staging].[Article] (
    [ArticleId]      INT           IDENTITY (1, 1) NOT NULL,
    [Title]          VARCHAR (300) NOT NULL,
    [Published_Date] DATETIME2 (7) NOT NULL
);

快速检查 (Quick check)

Have a quick look at both source and Data Warehouse sample (database) that you have just created:

快速浏览一下刚刚创建的源和数据仓库样本(数据库):

Source and Data Warehouse Sample Databases

接下来是什么 (What is next)

We have both samples ready to be used, so could you please guess what next step is?

我们已经准备好使用这两个样本,所以您能猜出下一步是什么吗?

Well, the answer is to think about the potential object, which helps us to get the source data to be copied over to Data Warehouse.

好吧,答案是考虑潜在的对象,这有助于我们获得要复制到数据仓库的源数据。

Those who are already familiar with Data Warehouse practices are well aware that we actually need here Integration Services Packages also called SSIS Packages to perform this Data Warehouse act, but we are more focused to author (build) an object which sources the required data which can then be initiated or activated via SSIS Packages and for which SQL Unit Testing can be applied.

那些已经熟悉数据仓库实践的人都清楚地知道,在这里我们实际上需要Integration Services软件包(也称为SSIS软件包)来执行此数据仓库操作,但是我们更专注于创作(构建)一个对象,该对象可以获取所需的数据。然后通过SSIS软件包启动或激活,并且可以对其应用SQL单元测试。

分析源脚本 (Analyze source script)

Analyze the source script for author extract, which is simply selecting all the columns from author tables from the source database as follows:

分析作者提取的源脚本,该脚本只需从源数据库的作者表中选择所有列,如下所示:

USE SQLDevArticlesV4
 
--Analyse auhtor extract source  script
SELECT a.[AuthorId],a.[Name],a.[RegistrationDate] FROM dbo.Author a

The output is as follows:

输出如下:

Source Script Output

源脚本映射选项 (Source script mapping options)

Please remember you can map this simple source script to any one of the following objects in the database:

请记住,您可以将此简单的源脚本映射到数据库中的以下任一对象:

  • SQL view

    SQL视图
  • Stored procedure

    存储过程

从源脚本创建GetAuthorExtract对象 (Creating GetAuthorExtract object from source script )

We are going to create a stored procedure called GetAuthorExtract (database) in the source database as follows:

我们将在源数据库中创建一个名为GetAuthorExtract (数据库)的存储过程,如下所示:

USE SQLDevArticlesV4;
GO
 
--Create GetAuthorExtract procedure
CREATE PROCEDURE GetAuthorExtract
AS
SELECT a.[AuthorId],a.[Name],a.[RegistrationDate] FROM dbo.Author a
RETURN 0

试运行程序 (Test run the procedure)

Go for a quick test run of the stored procedure as follows:

对存储过程进行快速测试,如下所示:

USE [SQLDevArticlesV4]
GO
 
-- Run author staging extract stored procedure
EXEC  [dbo].[GetAuthorExtract]
GO

The results should match the output shown below:

结果应与如下所示的输出匹配:

Test Run the Procedure

So now, you are ready to write a tSQLt Unit Test against this procedure because this is actually the object of interest that takes data from the source and loads it into the destination and we would like to make sure that it must pass SQL unit test to prove it serves the purpose.

现在,您就可以准备针对此过程编写tSQLt单元测试,因为实际上这是感兴趣的对象,它从源中获取数据并将其加载到目标中,并且我们希望确保它必须将SQL单元测试传递给证明它达到了目的。

设置tSQLt框架 (Setup tSQLt framework )

Run tSQLt setup (by running tSQLt.class.sql script you get when you download it) to install the framework in the same source database (SQLDevArticlesV4) where GetAuthorExtract stored procedure was created.

运行tSQLt安装程序(通过运行在下载时获得的tSQLt.class.sql脚本)将框架安装在创建GetAuthorExtract存储过程的同一源数据库(SQLDevArticlesV4)中。

Please read the article Conventional SQL Unit Testing with tSQLt in Simple Words to install tSQLt if you are not familiar with tSQLt installation.

如果您不熟悉tSQLt的安装,请阅读简单单词中的使用tSQLt进行常规SQL单元测试的文章以安装tSQLt

However, I strongly recommend to please use a test-driven database development approach in your day to day professional SQL unit testing work.

但是,我强烈建议您在日常的专业SQL单元测试工作中使用测试驱动的数据库开发方法

快速检查 (Quick check)

In Object Explorer, navigate to SQLDevArticlesV4 | Tables node to view the tSQLt installed objects:

在对象资源管理器中,导航到SQLDevArticlesV4 | 用于查看tSQLt安装对象的Tables节点:

tSQLt Installed

创建单元测试课程 (Create unit test class)

You have to create a new class (schema) for SQL unit testing with tSQLt as follows:

您必须使用tSQLt创建一个新类(模式)以进行SQL单元测试,如下所示:

USE SQLDevArticlesV4;
GO
 
-- Creating unit test class ArticleTests
CREATE SCHEMA [AuthorTests]
Authorization dbo
GO
EXECUTE sp_addextendedproperty @name='tSQLt.TestClass',@value=1,@level0type='SCHEMA',@level0name='AuthorTests'

创建一个单元测试以检查GetAuthorExtract是否正常运行 (Create a unit test to check GetAuthorExtract works)

In order to make sure that the procedure which extracts data from source works, we have to create a SQL Unit Test, which helps us to determine the reliability of the procedure prior to running it to copy the data from source to Data Warehouse.

为了确保从源中提取数据的过程能够正常工作,我们必须创建一个SQL单元测试,这有助于我们在运行该过程以将数据从源复制到数据仓库之前确定过程的可靠性。

Create a SQL Unit Test to check GetAuthorExtract object as follows:

创建一个SQL单元测试以检查GetAuthorExtract对象,如下所示:

USE SQLDevArticlesV4
GO
 
--Create unit test to check GetAuthorExtract works
ALTER PROCEDURE [AuthorTests].[test to check GetAuthorExtract pulls all data]
AS
-- Assemble
EXEC tSQLt.FakeTable @TableName='dbo.Author',@Identity=0 -- Fake Author table 
 
 
Create TABLE [AuthorTests].[Expected] -- Create expected table
(
  [AuthorId]         INT        NOT NULL,
    [Name]             VARCHAR (40)  NOT NULL,
    [RegistrationDate] DATETIME2 (7) NULL
 
)
 
INSERT INTO AuthorTests.Expected -- Insert data into exepcted table
(AuthorId,Name,RegistrationDate)
VALUES
(1,'Asim','01 Jul 2019'),
(2,'Ali','01 Nov 2019'),
(3,'Mike','02 Dec 2019')
 
SELECT * INTO AuthorTests.Actual FROM dbo.Author -- Create an actual table from the Author table (no data)
 
INSERT INTO dbo.Author -- Insert data into Author table
(AuthorId,Name,RegistrationDate)
VALUES
(1,'Asim','01 Jul 2019'),
(2,'Ali','01 Nov 2019'),
(3,'Mike','02 Dec 2019')
 
 
 
-- Act
INSERT INTO AuthorTests.Actual -- Insert data into an actual table by running the stored procedure
(AuthorId,Name,RegistrationDate)
EXEC dbo.GetAuthorExtract 
 
 
 
 -- Assert (compare expected table with actual table results)
EXEC tSQLt.AssertEqualsTable @Expected='AuthorTests.Expected',@Actual='AuthorTests.Actual'

运行SQL单元测试以检查GetAuthorExtract (Run SQL Unit Test to check GetAuthorExtract)

Finally, run the SQL unit test by the following command:

最后,通过以下命令运行SQL单元测试:

USE SQLDevArticlesV4
GO
 
-- Run all unit tests related to AuthorTests test class
EXEC tsqlt.RunTestClass "AuthorTests"

If you have followed all the steps of the walkthrough then we are expecting this SQL Unit Test to pass as shown below:

如果您已经完成了演练的所有步骤,那么我们期望该SQL单元测试能够通过,如下所示:

SQL Unit Test Passed

Congratulations! You have just learned how to write and run SQL Unit Test against Data Warehouse staging extract and this is also applicable to any other similar scenario where data needs to be extracted from source to destination.

恭喜你! 您已经了解了如何针对数据仓库分段提取编写和运行SQL单元测试,这也适用于需要从源到目标提取数据的任何其他类似情况。

翻译自: https://www.sqlshack.com/sql-unit-testing-data-warehouse-extracts-with-tsqlt/

摘录本的摘录内容

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值