ssis 派生列_具有多个表达式与多个转换的SSIS派生列

ssis 派生列

In this article, we will first give an overview on SSIS derived column transformation, then we will run an experiment to check if there is any difference between adding multiple expressions within one derived column transformation and adding a derived column transformation for each expression.

在本文中,我们将首先概述SSIS派生的列转换,然后运行一个实验,以检查在一个派生的列转换中添加多个表达式与为每个表达式添加派生的列转换之间是否有任何区别。

This article is the sixth article in the SSIS feature series which aims to remove confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services.

本文是SSIS功能系列的第六篇文章,旨在消除混淆并说明SQL Server Integration Services提供的相似功能之间的某些区别。

派生列转换 (Derived Column Transformation)

SSIS Derived Column Transformation is one of the SQL Server Integration Services that can be added within a Data Flow Task, it is used to add a new column to the data pipeline by applying SSIS expressions. The developer can choose whether to add a new derived column or to replace an existing column.

SSIS派生列转换是可以在数据流任务中添加SQL Server集成服务之一,它用于通过应用SSIS表达式向数据管道添加新列。 开发人员可以选择添加新的派生列还是替换现有列。

SSIS expressions are a combination of literals, functions, operators that yields a single data value. An expression can be composed of a single value (“abc”) or a variable (@[User::Variable1]) or a function (GETDATE()), or it can be more complex and contains some conditionals (CASE statement) or some mathematical operations or some string manipulation. To learn more about SSIS expressions you can refer to the following article from earlier in the series: SSIS Expression Tasks Vs Evaluating variables as expressions

SSIS表达式是文字,函数和运算符的组合,产生单个数据值。 表达式可以由单个值(“ abc”)或变量(@ [User :: Variable1])或函数(GETDATE())组成,也可以更复杂并且包含一些条件(CASE语句)或一些数学运算或一些字符串操作。 要了解有关SSIS表达式的更多信息,可以参考本系列前面的以下文章: SSIS表达式任务Vs将变量作为表达式求值

SSIS Derived Column Transformation can be added from the SSIS toolbox.

可以从SSIS工具箱中添加SSIS派生列转换。

This image shows the descirption of SSIS dervied column from the SSIS tollbox

The SSIS Derived Column editor is composed of 4 parts as shown in the image below:

SSIS派生列编辑器由4个部分组成,如下图所示:

  1. Columns and Variables List

    列和变量列表
  2. Expressions built-in functions

    表达式内置函数
  3. Derived Columns Grid

    派生列网格
  4. Error Handling

    错误处理

This image shows the SSIS derived column transformation editor

The first two parts are explained in the previous article that we mentioned above. In this section, we will describe the other parts and we will try to give some of the most known expression and use cases.

我们前面提到的上一篇文章解释了前两个部分。 在本节中,我们将描述其他部分,并尝试给出一些最知名的表达式和用例。

派生列网格 (Derived Columns Grid)

The Derived Columns Grid contains the following configurations:

派生列网格包含以下配置:

  • Derived Column Name: Specify the derived column name 派生列名称:指定派生列名称
  • Derived Column: Select between <add new column> and <replace an existing column> options 派生列:在<添加新列>和<替换现有列>选项之间选择
  • Expression: Write the SSIS expression that produce the derived column 表达式:编写产生派生列的SSIS表达式
  • Data Type: This column is read-only and it shows the data type of the expression result 数据类型:此列是只读的,它显示表达式结果的数据类型
  • Length: This column is read =-only and it shows the column length based on the expression result 长度:此列为只读,仅显示,基于表达式结果显示列长
  • Scale: This column is read-only and it automatically sets the scale for numeric data based on the data type 此列为只读列,它会根据数据类型自动设置数字数据的小数位数
  • Precision: This column is read-only and it automatically sets the precision for numeric data based on the data type 精度:此列是只读的,它会根据数据类型自动设置数字数据的精度
  • Code Page: This column can be edited and it automatically sets code page for the 代码页:可以编辑此列,它会自动为DT_STR (string) data type DT_STR (字符串)数据类型设置代码页

The Derived Columns Grid contains an empty line that is used to add a new derived column. When start filling the derived column properties within this line a new empty line is added to allow the user to add more columns.

“派生列”网格包含一个空行,用于添加新的派生列。 当开始填充此行中的派生列属性时,将添加一个新的空行,以允许用户添加更多列。

If the expression cannot be applied, its color is changed to red as shown in the image below:

如果无法应用该表达式,则其颜色将变为红色,如下图所示:

This image shows the SSIS derived column grid from the transformation editor

When you put the mouse cursor over the expression, a tooltip appears with the error description:

当您将鼠标指针放在表达式上时,将显示一个工具提示,其中包含错误说明:

This image shows the error message tooltip when expression is invalid

To learn more about SSIS Derived Column Transformation you can refer to the following official documentation:

要了解有关SSIS派生列转换的更多信息,可以参考以下官方文档:

错误处理 (Error Handling)

In SSIS Derived column, errors are mostly caused by expression evaluation errors, which occur if expressions that are evaluated at a run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.

在“ SSIS派生”列中,错误主要由表达式评估错误引起,如果在运行时评估的表达式执行无效操作或由于缺少或不正确的数据值而在语法上变得不正确,则会发生错误。

Error output configuration is an option that is included in most of the Data Flow Task components. In this configuration you can select how this component behaves when an evaluation error or a truncation occurs:

错误输出配置是大多数“数据流任务”组件中包含的选项。 在此配置中,您可以选择在发生评估错误或截断时该组件的行为:

  • Fail component

    失败组件
  • Ignore Failure

    忽略故障
  • (Error output Flow) (错误输出流)

This image shows the error output configuration form in the SSIS derived column transformation

In this article, we have considered error output configuration as a part of SSIS Derived Column Transformation since it is required to implement many popular use cases such as TRY_PARSE SQL function (will be described in the next section).

在本文中,我们考虑将错误输出配置作为SSIS派生列转换的一部分,因为它需要实现许多流行的用例,例如TRY_PARSE SQL函数(将在下一节中进行介绍)。

To learn more about Error output configuration in SSIS, you can refer to the following documentation:

要了解有关SSIS中错误输出配置的更多信息,可以参考以下文档:

示例和用例 (Examples and Use cases)

In this section, we will list some of the most popular SSIS Derived Column examples and use cases and provide an expression or an external link that contains an example:

在本节中,我们将列出一些最受欢迎的SSIS派生列示例和用例,并提供一个包含示例的表达式或外部链接:

  • Concatenating first name and last name into one column:

    将名字和姓氏连接到一个列中:

    [FirstName] + “ “ + [LastName]

    [名字] +““ + [姓氏]

  • Extract a part of a string:

    提取字符串的一部分:

    SUBSTRING([Address],1,5)

    SUBSTRING([地址],1,5)

  • Change date to integer in yyyyMMdd format:

    将日期更改为yyyyMMdd格式的整数:

    (DT_I4)((DT_WSTR,4)YEAR([DateColumn]) +

    (DT_I4)((DT_WSTR,4)YEAR([DateColumn])+

    RIGHT( “00” + (DT_WSTR,2)MONTH([DateColumn]),2) +

    RIGHT(“ 00” +(DT_WSTR,2)MONTH([DateColumn]),2)+

    RIGHT( “00” + (DT_WSTR,2)DAY([DateColumn]),2))

    RIGHT(“ 00” +(DT_WSTR,2)DAY([DateColumn]),2))

  • Remove Leading and Trailing spaces from string:

    从字符串中删除前导和尾随空格:

    LTRIM(RTRIM([Address]))

    LTRIM(RTRIM([地址]))

多表达式与多派生列转换 (Multiple Expressions Vs Multiple Derived Column Transformation)

After describing the SSIS Derived Column Transformation component, now we will try to make a comparison between adding multiple expressions into one Derived Column Transformation or adding multiple Derived Column Transformations.

在描述了SSIS派生列转换组件之后,现在我们将尝试在将多个表达式添加到一个派生列转换或添加多个派生列转换之间进行比较。

性能 (Performance )

To compare both approaches from a performance perspective we have run the following experiment; we created two packages each one contains one Data Flow Task. We have used the [Person].[Person] table from the AdventureWorks2017 database as source. And we applied the following derived columns expressions in each package:

为了从性能角度比较这两种方法,我们进行了以下实验; 我们创建了两个包,每个包包含一个数据流任务。 我们已使用AdventureWorks2017数据库中的[Person]。[Person]表作为源。 并且我们在每个包中应用了以下派生的列表达式:

  • Name:

    名称:

    [FirstName] + ” ” + [LastName]

    [名字] +”” + [姓氏]

  • Full Name:

    全名:

    [FirstName] + ” ” + [MiddleName] + ” ” + [LastName]

    [名字] +”” + [中间名] +” + [姓氏]

  • Father Full Name:

    父亲全名:

    [MiddleName] + ” ” + [LastName]

    [中间名] +”” + [姓氏]

  • NumericDate:

    数值日期:

    (DT_I4)((DT_WSTR,4)YEAR(ModifiedDate) + RIGHT(“00” + (DT_WSTR,2)MONTH(ModifiedDate),2) + RIGHT(“00” + (DT_WSTR,2)DAY(ModifiedDate),2))

    (DT_I4)((DT_WSTR,4)YEAR(ModifiedDate)+ RIGHT(“ 00” +(DT_WSTR,2)MONTH(ModifiedDate),2)+ RIGHT(“ 00” +(DT_WSTR,2)DAY(ModifiedDate),2 ))

In the first package we added only one Derived column transformation as shown in the images below:

在第一个包中,我们仅添加了一个“派生”列转换,如下图所示:

This image shows the expressions defined in the SSIS derived column editor

This image shows the data flow task of the first package

And in the second package, each expression was added in a separate SSIS derived column transformation as shown in the image below:

在第二个程序包中,每个表达式都添加到一个单独的SSIS派生列转换中,如下图所示:

This image shows the data flow task in the second package

After executing both packages, the results show that adding multiple expressions into one SSIS derived column transformation lasted about 02.797 seconds while adding multiple SSIS derived column transformation lasted 03.329 seconds.

执行完两个程序包后,结果表明,将多个表达式添加到一个SSIS派生的列转换中大约持续02.797秒,而添加多个SSIS派生的列转换则持续03.329秒。

The main cause was that each component has its own execution phases (Validation, Pre Execute, Execute, Post Execute) which requires more time.

主要原因是每个组件都有其自己的执行阶段(验证,执行前,执行,执行后),这需要更多时间。

相依性 (Dependency)

Note that, we can put all derived columns within one transformation component if they are independent, because if one derived column expression used another one, we must add a component for each one.

请注意,如果所有派生的列都是独立的 ,则可以将它们放在一个转换组件中 ,因为如果一个派生的列表达式使用了另一个,则我们必须为每个派生的组件添加一个组件。

错误处理 (Error Handling)

The last thing to mention is that using one Transformation component with multiple expression will make the error handling process more difficult when we need to redirect erroneous rows since the error is caused by one of the expressions and it is harder to know which expression caused the error. While when using multiple transformation components, error handling is easier.

最后要提到的是,当我们需要重定向错误的行时,将一个Transformation组件与多个表达式一起使用将使错误处理过程更加困难,因为该错误是由其中一个表达式引起的,并且更难知道哪个表达式导致了错误。 当使用多个转换组件时,错误处理更加容易。

结论 (Conclusion)

In this article, we have described the Derived Column Transformation in SSIS, and we ran an experiment to illustrate the difference between adding multiple expression within one derived column transformation and adding one derived column transformation component for each expression. The results showed that from a performance perspective using the first approach is more efficient while the second provide an easier error handling.

在本文中,我们描述了SSIS中的派生列转换,并且我们进行了一个实验来说明在一个派生列转换中添加多个表达式与为每个表达式添加一个派生列转换组件之间的区别。 结果表明,从性能的角度来看,使用第一种方法更有效,而第二种方法更易于处理错误。

目录 (Table of contents)

SSIS OLE DB Source: SQL Command vs Table or View
SSIS Expression Tasks vs Evaluating variables as expressions
SSIS OLE DB Destination vs SQL Server Destination
Execute SQL Task in SSIS: SqlStatementSource Expressions vs Variable Source Types
Execute SQL Task in SSIS: Output Parameters vs Result Sets
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations
SSIS Connection Managers: OLE DB vs ODBC vs ADO.NET
SSIS Flat Files vs Raw Files
SSIS Foreach Loop vs For Loop Container
SSIS: Execute T-SQL Statement Task vs Execute SQL Task
SSIS OLE DB来源:SQL命令与表或视图
SSIS表达式任务与将变量作为表达式求值
SSIS OLE DB目标与SQL Server目标
在SSIS中执行SQL任务:SqlStatementSource表达式与可变源类型
在SSIS中执行SQL任务:输出参数与结果集
具有多个表达式与多个转换的SSIS派生列
SSIS数据类型:高级编辑器的更改与数据转换的转换
SSIS连接管理器:OLE DB与ODBC与ADO.NET
SSIS平面文件与原始文件
SSIS Foreach循环与For循环容器
SSIS:执行T-SQL语句任务与执行SQL任务

翻译自: https://www.sqlshack.com/ssis-derived-columns-with-multiple-expressions-vs-multiple-transformations/

ssis 派生列

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值