SQL Server商业智能功能– SQL Server数据工具–商业智能

介绍 (Introduction)

在上一篇有关 introduction to SQL Server business intelligence we covered the general structure of an enterprise business intelligence solution. The tools needed to build these solutions were briefly mentioned. The purpose of this article is to provide you with a deeper understanding into the creation of an ETL (Extract, Transform and Load) dataflow. To do this one needs to use SQL Server Data Tools – Business Intelligence (previously known as BIDS or Business Intelligence Development Studio). In this article we’ll take a look at the basic functionality of SQL Server Data Tools and how to use it to keep your data warehouse up to date. It’s worth noting that there are many different ways to go about building your ETL solution. This article gives sound advice and pointers as to how to approach the problem. SQL Server商业智能简介的文章中,我们介绍了企业商业智能解决方案的一般结构。 简要介绍了构建这些解决方案所需的工具。 本文的目的是使您对ETL(提取,转换和加载)数据流的创建有更深入的了解。 为此,需要使用SQL Server数据工具–商业智能(以前称为BIDS或商业智能开发工作室)。 在本文中,我们将研究SQL Server数据工具的基本功能以及如何使用它来使您的数据仓库保持最新状态。 值得注意的是,构建ETL解决方案有很多不同的方法。 本文提供了有关如何解决该问题的合理建议和指示。

馈送数据仓库 (Feeding the Data Warehouse)

This article presumes that you have already created a de-normalized data warehouse. If you need more information as to how to create a star-schema based database you can read more here

本文假定您已经创建了非规范化数据仓库。 如果您需要有关如何创建基于星形模式的数据库的更多信息,请在此处阅读更多内容

SQL Server Data Tools - Feeding the Data Warehouse

Once you have created a de-normalized database model for your data warehouse you will need to fill it with data and schedule regular updates to keep your SQL Server business intelligence data up to date. Depending on your version of SQL Server you will be able to do this with either BIDS or SQL Server Data Tools.

为数据仓库创建非标准化数据库模型后,您将需要为它填充数据并安排定期更新,以使SQL Server商业智能数据保持最新。 根据您SQL Server版本,您将可以使用BIDS或SQL Server数据工具来执行此操作。

This program is an extremely handy ETL and automation tool. You can do anything from executing systems processes and PowerShell scripts, running raw T-SQL and custom C# scripts to sending e-mails or connecting to FTP servers or Web Services. However, the most commonly used tool is the Data Flow Task.

该程序是非常方便的ETL和自动化工具。 从执行系统进程和PowerShell脚本,运行原始T-SQL和自定义C#脚本到发送电子邮件或连接到FTP服务器或Web服务,您可以执行任何操作。 但是,最常用的工具是数据流任务。

跟踪更新 (Keeping track of updates)

A data warehouse does not need to be updated often. As the business intelligence is about analyzing historical data the data of the current day is not critically important. It is also obvious that one should not empty and refill a database every day. It is therefore necessary to keep a log table for the processing history of the data warehouse.

数据仓库不需要经常更新。 由于商业智能是要分析历史数据,因此当前数据并不是至关重要的。 同样很明显,不应每天清空并重新填充数据库。 因此,有必要为数据仓库的处理历史保留一个日志表。

 
CREATE TABLE [dbo].[ProcessingLog](
	[ProcessingLogID] [int] IDENTITY(1,1) NOT NULL,
	[ProcessingTime] [datetime] NOT NULL,
	[Object] [sysname] NULL
)
    

连接管理器 (Connection managers)

OLE DB connections

Allowing for connections between one or many sources and/or one or many destinations, a connection manager is very often the starting point of an ETL package. There are ordinary database connections (ODBC, OLEDB, ADO.NET etc.), connections to HTTP, FTP and SMTP servers and connections to various file types like CSV, flat-file or even Excel. For a simple BI data flow however two OLE DB connections would suffice: one for the data warehouse and another for the OLTP/Production database. You can set them up in the following way.

允许连接一个或多个源和/或一个或多个目的地之间的连接时,连接管理器通常是ETL包的起点。 有普通的数据库连接(ODBC,OLEDB,ADO.NET等),与HTTP,FTP和SMTP服务器的连接以及与各种文件类型(如CSV,平面文件甚至Excel)的连接。 但是,对于简单的BI数据流,两个OLE DB连接就足够了:一个用于数据仓库,另一个用于OLTP /生产数据库。 您可以通过以下方式进行设置。

Connection Manager dialog - SQL Server Data Tools

(Note that this connection is on the local machine and uses Windows Authentication. You can always connect directly to an instance in IP_ADDRESS\INSTANCE format and you may use SQL Server authentication if the instance does not support Active Directory accounts.)

(请注意,此连接位于本地计算机上,并且使用Windows身份验证。您始终可以直接连接到IP_ADDRESS \ INSTANCE格式的实例,如果该实例不支持Active Directory帐户,则可以使用SQL Server身份验证。)

Now that you have your connections you will need to make your first Data Flow Task.

现在您已建立连接,您将需要执行第一个数据流任务。

数据流任务 (The Data Flow Task)

Data flow task

To describe this task very simply; it is a unit of work that links a data source with a destination and provides the possibility to transform the dataflow in a number of ways. Sorting, conversion, merging, counting aggregating are just some of the possible tools that can be used within the Data Flow Task.

简单地描述这个任务; 它是将数据源与目标链接在一起的工作单元,并提供了以多种方式转换数据流的可能性。 排序,转换,合并,计数聚合只是“数据流任务”中可以使用的一些可能工具。

For SQL Server business intelligence solutions it is usually about sourcing data from the OLTP database, converting it from its relational model and inserting it into the star or snowflake model of your data warehouse. A tidy approach, if you’re comfortable with writing T-SQL, is to write SELECT statements that could fill your destination fact and dimension tables based on your OLTP database tables and save them as views in the source database. This would allow you to have a direct mapping from an object in your OLTP database to an object in your data warehouse without having to create complex ETL packages.

对于SQL Server商业智能解决方案,通常是从OLTP数据库中获取数据,从其关系模型转换数据并将其插入数据仓库的星型或雪花型。 如果您习惯于编写T-SQL,那么一种整洁的方法是编写SELECT语句,这些语句可以基于OLTP数据库表填充目标事实表和维表,并将它们另存为源数据库中的视图。 这将使您能够从OLTP数据库中的对象直接映射到数据仓库中的对象,而无需创建复杂的ETL包。

Sourcing data from the OLTP database and inserting it into the model of your data warehouse

For fact tables you will normally have date/time fields and you will be able to create a simple data flow from your data warehouse view in the OLTP database to the fact table in the data warehouse. You can do this by running an Execute SQL task just before your data task and mapping the result set to the a variable storing the last processing time for that table. You can use the following query for this

对于事实表,通常将具有日期/时间字段,并且您将能够创建从OLTP数据库中的数据仓库视图到数据仓库中的事实表的简单数据流。 为此,您可以在数据任务之前运行Execute SQL任务,然后将结果集映射到一个变量,该变量存储该表的最后处理时间。 您可以为此使用以下查询

 
SELECT TOP 1 ProcessingTime 
FROM ProcessingLog
WHERE object = 'YourFactTable'
ORDER BY ProcessingTime DESC
    

OLE DB source editor dialog

As you can see above you have added a “?” parameter to the source query. This allows you to add the last processed time to the query. You can do this by clicking on “Parameters…” and setting Parameter0 to you variable.

如上所示,您已经添加了“?” 源查询的参数。 这使您可以将最后处理的时间添加到查询中。 您可以通过单击“ Parameters…”并将参数0设置为变量来执行此操作。

Setting Parameter0 to you variable

Once you have your source that pulls only the most recent data that is not in your data warehouse you can create your destination connection and map the fields to your fact table fields.

一旦您拥有仅提取不在数据仓库中的最新数据的源,就可以创建目标连接并将字段映射到事实表字段。

OLE DB destination editor dialog

Once you have completed the data flow task you must add an Execute SQL task to update the log table to prepare for the next day’s ETL processing:

完成数据流任务后,必须添加执行SQL任务以更新日志表,以为第二天的ETL处理做准备:

 
INSERT INTO [ProcessingLog] ([ProcessingTime], [Object])
	SELECT
		GETDATE(),
		'FactFinance'
    

Dimension tables often have to be dealt with differently but two common options involve comparing the source and destination objects and transferring the difference. This can be down by using the Slowly Changing Dimension or an Execute SQL task and a MERGE statement if the two databases are in the same instance.

维度表通常必须以不同的方式处理,但是两个常见的选项涉及比较源对象和目标对象并转移差异。 如果两个数据库在同一实例中,则可以使用“缓慢变化的维”或“执行SQL”任务和MERGE语句来降低此开销。

SQL Server代理 (SQL Server agent)

It may seem strange to mention the SQL Server agent when focusing on business intelligence, however, it is central to keeping the structure up to date. Once you have a functioning ETL package you need to automate it and run it regularly to make sure your data warehouse gets updated.

在专注于商业智能时,提到SQL Server代理似乎有些奇怪,但是,对于保持结构的最新状态至关重要。 具有功能正常的ETL软件包后,您需要对其进行自动化并定期运行以确保您的数据仓库得到更新。

从这里到哪里? (Where to from here?)

Now that you have a running ETL solution that feeds your data warehouse with interesting, business intelligence data, you can take one of two routes. If, for any reason, you cannot use SSAS to build a multidimensional cube you can start building SSRS reports using SQL queries based directly on your new data warehouse system.

既然您已经有了一个运行中的ETL解决方案,可以为数据仓库提供有趣的商业智能数据,那么您可以采用以下两种方法之一。 如果由于某种原因而无法使用SSAS构建多维数据集,则可以直接基于新数据仓库系统使用SQL查询开始构建SSRS报表。

资源资源 (Resources)

Designing the Star Schema Database
SQL Server Integration Services (SSIS) overview
Dimension Tables

设计星型数据库
SQL Server集成服务(SSIS)概述
尺寸表

翻译自: https://www.sqlshack.com/sql-server-business-intelligence-features-sql-server-data-tools-business-intelligence/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值