ssis修改数据库数据_SSIS平衡数据分配器概述

ssis修改数据库数据

In this article, we will give a brief overview of SSIS Balanced Data Distributor (BDD). In addition, we will give some examples and illustrate alternatives.

在本文中,我们将简要概述SSIS平衡数据分发器(BDD)。 此外,我们将提供一些示例并说明替代方案。

介绍 (Introduction)

SSIS Balanced Data Distributor is a transform tool added in SQL Server 2008 that takes a single input and distributes the incoming rows to one or more outputs uniformly via multithreading.

SSIS平衡数据分发器是SQL Server 2008中添加的一种转换工具,该工具采用单个输入,并通过多线程将传入的行均匀地分配到一个或多个输出。

Before SQL Server 2016, this tool was not part of the SSIS toolbox; rather it was provided in a standalone installer hosted on the Microsoft website. As of SQL Server 2016, the tool is installed with SSIS.

在SQL Server 2016之前,此工具不属于SSIS工具箱。 而是由Microsoft网站上托管的独立安装程序提供的 。 从SQL Server 2016开始,该工具随SSIS一起安装。

When using BDD, it is recommended that all outputs be identical because if one output contains a transformation, it will affect the performance of all outputs. Having the same transformation components on each output will improve the performance.

使用BDD时,建议所有输出都相同,因为如果一个输出包含转换,则会影响所有输出的性能。 每个输出上具有相同的转换组件将提高性能。

Before deciding to use BDD, there are several things you should check:

在决定使用BDD之前,应检查以下几件事:

  1. The destination supports parallelism: if you are importing data to one destination table, BDD may be useless. However, if you are importing data to a staging database, it may improve the data import performance 目标支持并行性:如果将数据导入到一个目标表,则BDD可能没有用。 但是,如果要将数据导入到暂存数据库中,则可能会提高数据导入性能
  2. Machine resources: The SSIS package must be executed on a multi-processor and multi-core server 机器资源: SSIS包必须在多处理器和多核服务器上执行
  3. A large amount of data: This tool is useless when handling small amounts of data 大量数据:处理少量数据时此工具无用
  4. Ordering is not required: SSIS Balanced Data Distributor is used when data ordering is ignored since data is distributed without any consideration of row order 不需要排序:由于数据分发没有考虑行顺序,因此忽略了数据排序时,将使用SSIS平衡数据分发器

SSIS平衡数据分配器 (SSIS Balanced Data Distributor)

Balanced Data Distributor is a very simple component that doesn’t need configuration. You can simply add it into your data flow task:

平衡数据分发器是一个非常简单的组件,不需要配置。 您可以简单地将其添加到数据流任务中:

the description of SSIS balanced data distributor from the toolbox

Figure 1 – Description of SSIS Balanced Data Distributor from the toolbox

图1 –工具箱中的SSIS平衡数据分发器说明

BDD does not have an editor form, so when a double-click is registered, nothing happens. On the other hand, there are some properties that can be edited from the properties tab:

BDD没有编辑器表单,因此,双击注册后,将不会发生任何事情。 另一方面,有些属性可以在“属性”选项卡中进行编辑:

the properties of SSIS balanced data distributor

Figure 2 – Properties of Balanced Data Distributor

图2 –平衡数据分配器的属性

From the image above, there are 4 properties that can be edited:

在上图中,可以编辑4个属性:

  • Description: A description of the component 描述:组件描述
  • Name: The name of the component 名称:组件名称
  • LocaleID: The locale the component uses to interpret locale-sensitive data, such as date and time data LocaleID:组件用来解释对语言环境敏感的数据的语言环境,例如日期和时间数据
  • ValidateExternalMetadata: Whether the component is validated using external data sources at design time ValidateExternalMetadata:在设计时是否使用外部数据源验证了组件

These properties are not related to the BDD, but they are found in any data flow component. In general, you don’t need to change them.

这些属性与BDD不相关,但是可以在任何数据流组件中找到。 通常,您不需要更改它们。

To add output from the BDD, drag the output arrow (connector) and link it to another component:

要从BDD添加输出,请拖动输出箭头(连接器)并将其链接到另一个组件:

SSIS balanced data distributor component with the output connector

Figure 3 – Balanced Data Distributor component with connector

图3 –带有连接器的平衡数据分配器组件

(Example)

In this section, I will provide an example of using SSIS Balanced Data Distributor to import data from a relational database (SQL Server AdventureWorks2017 database) into 5 flat files.

在本节中,我将提供一个示例,该示例使用SSIS平衡数据分发器将关系数据库(SQL Server AdventureWorks2017数据库)中的数据导入5个平面文件中。

First, we have to add an OLE DB connection manager to connect to the SQL Server instance:

首先,我们必须添加一个OLE DB连接管理器以连接到SQL Server实例:

ole db connection manager used to connect to the source table

Figure 4 – OLE DB Connection Manager

图4 – OLE DB连接管理器

Within the control flow, we add a Data Flow task. Within this data flow, we add an OLE DB Source and a balanced data distributor. In the OLE DB Source, we selected Table or View name as data access mode, then we selected [Person].[Person] table from the drop-down list (this table contains 19,972 rows):

在控制流中,我们添加了数据流任务。 在此数据流中,我们添加了一个OLE DB源和一个平衡的数据分发器。 在OLE DB源中,我们选择“ 表”或“视图”名称作为数据访问模式,然后从下拉列表中选择[Person]。[Person]表(此表包含19,972行):

ole db source configuration

Figure 5 – OLE DB Source Editor

图5 – OLE DB源编辑器

In this example, uncheck AdditionalContactInfo and Demographics columns since they are binary columns. After configuring the OLE DB Source, we link it to the BDD.

在此示例中,取消选中AdditionalContactInfoDemographics 列,因为它们是二进制列。 配置OLE DB源后,我们将其链接到BDD。

Now, we add 5 Flat File destinations, and link these destinations to the BDD, as shown in the image below:

现在,我们添加5个平面文件目标,并将这些目标链接到BDD,如下图所示:

data flow task screenshot

Figure 6 – Data Flow task screenshot

图6 –数据流任务屏幕截图

Next, we open each Flat File Destination and create a Flat File connection manager for each one. As a result, we will have one OLE DB Connection manager for the source and 5 Flat File connection managers for the destination:

接下来,我们打开每个平面文件目标并为每个平面文件创建一个平面文件连接管理器。 结果,我们将有一个OLE DB连接管理器作为源,而5个平面文件连接管理器作为目标:

connection managers tab showing one ole db connection manager and 5 flat file connection managers

Figure 7 – Connection managers tab

图7 –连接管理器选项卡

After executing the package, note that rows are distributed into only 3 outputs while it should be 5 as shown in the image below:

执行该程序包后,请注意,行仅分配到3个输出中,而应为5个输出,如下图所示:

data flow task screenshot after execution

Figure 8 – Data Flow task screenshot after execution

图8 –执行后的数据流任务屏幕截图

After checking the number of rows distributed in each output, we see that the first two outputs contain 9,947 rows while the third one contains only 3 rows and the other outputs contain no rows.

在检查了每个输出中分配的行数之后,我们看到前两个输出包含9,947行,而第三个输出仅包含3行,其他输出不包含行。

This behavior is due to the data flow task DefaultBufferMaxRows and DefaultBufferSize properties:

此行为是由于数据流任务DefaultBufferMaxRowsDefaultBufferSize属性引起的:

data flow task properties

Figure 9 – Data Flow task properties tab

图9 –数据流任务属性选项卡

Since the number of rows (9,947) is less than the DefaultBufferMaxRows property (10,000), this means that the size of the row has reached the DefaultBufferSize value. To change this behavior, we will change the DefaultMaxRows to 3,995 and re-execute this package.

由于行数(9,947)小于DefaultBufferMaxRows属性(10,000),这意味着行的大小已达到DefaultBufferSize值。 若要更改此行为,我们将DefaultMaxRows更改为3995,然后重新执行此程序包。

The new behavior of the Balanced Data Distributor is shown in the image below:

下图显示了平衡数据分配器的新行为:

data flow task screenshot after changing the DefaultBufferMaxRows  property

Figure 10 – Data Flow task screenshot after changing the DefaultBufferMaxRows property value

图10 –更改DefaultBufferMaxRows属性值后的数据流任务屏幕快照

From this example, we conclude that to obtain an optimal situation, you must configure the data flow task-related properties such as DefaultBufferMaxRows, DefaultBufferSize, and EngineThreads based on the number of rows, row size, number of outputs and other factors.

从此示例中,我们得出结论,要获得最佳情况,必须基于行数,行大小,输出数和其他因素来配置与数据流任务相关的属性,例如DefaultBufferMaxRowsDefaultBufferSize和EngineThreads

备择方案 (Alternatives)

If you are using a previous version of SQL Server 2016, and you need a solution to distribute rows into multiple outputs without using Balanced Data Distributor, you can use an alternative. In this section, I will provide two alternatives to SSIS Balanced Data Distributor. Note that all alternatives require that the data contains a row number column or it must be generated using a script component. Refer to the following article for more information: How to add a row number to a data flow task in SSIS 2012

如果您使用的是SQL Server 2016的早期版本,并且需要一种无需使用平衡数据分配器即可将行分配到多个输出中的解决方案,则可以使用替代方法。 在本节中,我将提供SSIS平衡数据分配器的两种替代方法。 请注意,所有替代方法均要求数据包含行号列,或者必须使用脚本组件来生成。 有关更多信息,请参考以下文章: 如何在SSIS 2012中向数据流任务添加行号

使用条件拆分 (Using conditional split)

To split a single input into multiple outputs, you can add a conditional split, then add an expression for each output where you use a modulo operator to split rows using the following expression pattern:

要将单个输入拆分为多个输出,可以添加条件拆分,然后为每个输出添加一个表达式,在其中使用模运算符使用以下表达式模式拆分行:

[IdentityColumn] % <number of outputs needed> == <output number (zero based)>

[IdentityColumn]%<所需的输出数量> == <输出数量(从零开始)>

As an example, if you need to split one input into 5 outputs then you must add the following expressions within the conditional split:

例如,如果需要将一个输入拆分为5个输出,则必须在条件拆分中添加以下表达式:

  • [IdentityColumn] % 5 == 0

    [IdentityColumn]%5 == 0
  • [IdentityColumn] % 5 == 1

    [IdentityColumn]%5 == 1
  • [IdentityColumn] % 5 == 2

    [IdentityColumn]%5 == 2
  • [IdentityColumn] % 5 == 3

    [IdentityColumn]%5 == 3
  • [IdentityColumn] % 5 == 4

    [IdentityColumn]%5 == 4

See the image below:

参见下图:

conditional split used to distributed on input into different outputs

Figure 11 – Conditional split with 5 outputs

图11 –有5个输出的条件拆分

Or, you can add only 4 expressions and use the default output as the last output.

或者,您只能添加4个表达式并将默认输出用作最后一个输出。

使用脚本组件 (Using script component)

Similar to the previous approach, you can use a script component to split one input into multiple outputs. There are two approaches that you can use:

与以前的方法类似,可以使用脚本组件将一个输入拆分为多个输出。 您可以使用两种方法:

  1. Conditional Multicast Script 条件多播脚本
  2. Add a new column in the output buffer and use a script to store the column index within this column. Then, outside the script component, use a conditional split to distribute the script component output into multiple outputs

    在输出缓冲区中添加新列,并使用脚本在该列中存储列索引。 然后,在脚本组件之外,使用条件拆分将脚本组件输出分配到多个输出中

平衡数据分发器安装错误 (Balanced Data Distributor installation error)

We’ve often received questions about SSIS Balanced Data Distributor installation error since the standalone package throws the following error:

由于独立软件包会引发以下错误,因此我们经常收到有关SSIS平衡数据分发器安装错误的问题:

Problem with this Windows installer package. A program run as part of the installation did not finish as expected. Contact your support personnel or package vendor

Windows安装程序包出现问题。 安装过程中运行的程序未按预期完成。 与您的支持人员或包装供应商联系

In general, the package installation throws this exception in two cases:

通常,在两种情况下,程序包安装都会引发此异常:

  1. The installation requires administrative privileges (can be solved by running the package as an administrator)

    安装需要管理特权(可以通过以管理员身份运行软件包来解决)
  2. SSIS Balanced Data Distributor Installation Error SSIS平衡数据分发服务器安装错误

结论 (Conclusion)

SSIS Balanced Data Distributor is an efficient solution to implement multi-threading within SSIS packages introduced with SQL Server 2012. In order to maximize the BDD performance, we have to well configure DefaultBufferMaxRows and DefaultBufferSize properties in the Data Flow Task based on the machine resources, the data volume and number of BDD outputs.

SSIS平衡数据分发器是一种有效的解决方案,可以在SQL Server 2012引入的SSIS包中实现多线程。为了最大化BDD性能,我们必须根据计算机资源在数据流任务中很好地配置DefaultBufferMaxRowsDefaultBufferSize属性, BDD输出的数据量和数量。

When using the previous version of SQL Server 2012, there are many alternatives to implement multi-threading, such as using a conditional split or a script component transformation.

使用SQL Server 2012的早期版本时,有许多替代方法可以实现多线程,例如使用条件拆分或脚本组件转换。

翻译自: https://www.sqlshack.com/ssis-balanced-data-distributor-overview/

ssis修改数据库数据

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值