ssis zip压缩文件_SSIS平面文件与原始文件

ssis zip压缩文件

In this article, we will give an overview of using Flat Files and Raw Files in SSIS, then we will illustrate some of the differences between using these two types.

在本文中,我们将概述在SSIS中使用平面文件和原始文件,然后说明使用这两种类型之间的一些区别。

This is the ninth article in the SSIS features face-to-face series which aims to remove confusion around similar features in SQL Server Integration Services.

这是SSIS功能面对面系列文章中的第九篇,旨在消除对SQL Server Integration Services中类似功能的困惑。

平面文件 (Flat Files)

In general, a flat file doesn’t contain internal hierarchy; it may contain text, log entries or data in tabular form. More specifically, SSIS Flat Files are text files that store tabular data and are manipulated line-by-line.

通常,平面文件不包含内部层次结构。 它可能包含文本,日志条目或表格形式的数据。 更具体地说,SSIS平面文件是存储表格数据并逐行处理的文本文件。

To handle Flat Files, you must create an SSIS Flat File connection manager, where you define the Flat File metadata. There are different formats for Flat Files that can be handled by the SSIS Flat File connection manager, as follows:

若要处理平面文件,必须创建一个SSIS平面文件连接管理器,在其中定义平面文件元数据。 SSIS平面文件连接管理器可以处理平面文件的不同格式,如下所示:

  • Delimited: each row in the Flat File contains columns separated by a delimiter; the most popular format is comma-separated values (.csv) 带分隔符 :平面文件中的每一行都包含由分隔符分隔的列; 最受欢迎的格式是逗号分隔值(.csv)
  • Fixed width: each row contains multiple columns having a fixed number of characters 固定宽度:每行包含具有固定数量字符的多列
  • Ragged Right: This option is added to handle Flat Files that are not well-formatted. You can refer to the following article for more guidance: 标记参差不齐:添加此选项以处理格式不正确的平面文件。 您可以参考以下文章,以获取更多指导: Working with ragged right formatted files in SSIS 在SSIS中使用不正确的格式错误的文件

If you open the SSIS Flat File connection manager editor, it contains four tabs:

如果打开SSIS平面文件连接管理器编辑器,它将包含四个选项卡:

  1. General Tab: Here, the following options are configured: 常规选项卡:在此处,配置了以下选项:
    1. Connection manager name and description 连接管理器名称和描述
    2. Flat File path: Select the file path (if you need to create a new Flat File, then you should enter the desired name) 平面文件路径:选择文件路径(如果需要创建新的平面文件,则应输入所需的名称)
    3. Text encoding: Locale, Code page or Unicode 文本编码:语言环境,代码页或Unicode
    4. Flat file Format: Choose if columns are delimited, fixed-width or ragged right 平面文件格式:选择列是定界,固定宽度还是参差不齐
    5. Header row delimiter: Since some Flat Files are created from templates, you may have a file where the header row delimiter is different from other rows’ delimiters 标题行定界符:由于某些平面文件是根据模板创建的,因此您可能有一个文件,其中标题行定界符与其他行的定界符不同
    6. Header rows to skip: This option can be used to ignore header rows that contain meaningless information 要跳过的标题行:此选项可用于忽略包含无意义信息的标题行
    7. Column names in the first header: This option is to specify if the column names are found within the flat file. If this option is unchecked, then auto-generated column names are used (they can be changed in the advanced tab) 第一个标题中的列名称:此选项用于指定是否在平面文件中找到列名称。 如果未选中此选项,则使用自动生成的列名(可以在“高级”选项卡中更改它们)
  2. Columns Tab: Here, you can configure how columns are extracted from each row: 列选项卡:在这里,您可以配置如何从每一行提取列:
    1. In delimited file format, you have to specify the column and row delimiter

      在定界文件格式中,您必须指定列和行定界符
    2. In fixed-width, you have to specify each column length

      在固定宽度中,您必须指定每列长度
    3. In ragged right files, you have to specify the column markers

      在参差不齐的右文件中,您必须指定列标记
  3. Advanced Tab: Here, you can configure each column’s metadata such as length, data type, name and other attributes “高级”选项卡:在这里,您可以配置每列的元数据,例如长度,数据类型,名称和其他属性
  4. Preview Tab: Here, the data is reviewed based on the configurations you’ve made 预览标签:在这里,数据将根据您所做的配置进行审查

This image shows a screen shot of the SSIS flat file connection manager

Figure 1 – SSIS Flat File connection manager

图1 – SSIS平面文件连接管理器

In a Flat File, data are stored as text. When values with different data types are stored in a Flat File, you can implicitly convert them from the connection manager or the source component, or explicitly convert them using data conversion and derived column transformations. For more information, you can refer to the following article: SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations.

在平面文件中,数据存储为文本。 当具有不同数据类型的值存储在平面文件中时,可以从连接管理器或源组件隐式转换它们,或者使用数据转换和派生的列转换显式转换它们。 有关更多信息,您可以参考以下文章: SSIS数据类型:高级编辑器与数据转换的比较

this image shows the description of SSIS flat file source component from the toolbox

Figure 2 – SSIS Flat File source description from toolbox

图2 –工具箱中的SSIS平面文件源描述

To import or export data from Flat Files, you must use SSIS Flat File Source and SSIS Flat File Destination components within a Data Flow Task. Note, that if you are handling a Flat File that contains non-tabular data, you may need to read the Flat File using a script (task or component) and implement complex logic, or you may need to read each row as one column (length = 4000) and use a transformation to consume the file content. You can check the following links for examples of importing complex flat files in SSIS:

若要从平面文件导入或导出数据,必须在数据流任务中使用SSIS 平面文件源和SSIS 平面文件目标组件。 请注意,如果您要处理包含非表格数据的平面文件,则可能需要使用脚本(任务或组件)读取平面文件并实施复杂的逻辑,或者可能需要将每一行读为一列(长度= 4000)并使用转换来使用文件内容。 您可以检查以下链接,以获取在SSIS中导入复杂平面文件的示例:

this image shows the description of ssis flat file destination component from the toolbox

Figure 3 – SSIS Flat File destination description from toolbox

图3 –工具箱中的SSIS平面文件目标描述

原始文件 (Raw Files)

Raw Files are a kind of SSIS Flat File used to dump data between different ETL stages. The data is stored in binary format and can only be used by the SSIS Raw Files component.

原始文件是一种SSIS平面文件,用于在不同ETL阶段之间转储数据。 数据以二进制格式存储,并且只能由SSIS Raw Files组件使用。

To use Raw Files in SSIS, you don’t have to create a connection manager, since it can be defined within the Raw File source and Raw File destination components:

要在SSIS中使用原始文件,无需创建连接管理器,因为可以在“原始文件”源和“原始文件”目标组件中定义它:

this image shows the raw file destination component editor

Figure 4 – SSIS Raw File destination

图4 – SSIS原始文件目的地

To create a Raw File, just add a Raw File destination in a data flow task. When you open the Raw File destination editor, there are two tabs:

要创建原始文件,只需在数据流任务中添加原始文件目标。 打开原始文件目标编辑器时,有两个选项卡:

  • Connection Manager

    连接管理器
  • Columns

this image shows the raw file destination component description from the toolbox

Figure 5 – SSIS Raw File destination description from toolbox

图5 –工具箱中的SSIS Raw File目标描述

In the connection manager tab, specify the file name (directly or from a variable), and choose the write mode option:

在连接管理器选项卡中,指定文件名(直接或通过变量),然后选择写入模式选项:

  • Create always: Always creates a new file 始终创建一个新文件
  • Create once: Creates a new file. If the file exists, the component fails
  • 一次创建创建一个新文件。 如果文件存在,则组件失败
  • Append: Appends data to a previously created file with similar metadata 追加:将数据追加到具有类似元数据的先前创建的文件中
  • Truncate and Append: Overwrites the data in a previously created file with similar metadata 截断并追加:使用类似的元数据覆盖先前创建的文件中的数据

In the columns tab, select the columns you want to dump into the SSIS Raw File destination:

在“列”选项卡中,选择要转储到SSIS Raw File目标中的列:

this image shows the raw file source component description from the toolbox

Figure 6 – SSIS Raw File source description from toolbox

图6 –工具箱中的SSIS原始文件源描述

After dumping data into a raw file, you must use a Raw File Source to read this data. This component is very similar to the destination component, except that there is no Write mode option:

将数据转储到原始文件后,必须使用原始文件源读取此数据。 该组件与目标组件非常相似,不同之处在于没有写模式选项:

this image shows the ssis raw file source editor

Figure 7 – SSIS Raw File source

图7 – SSIS原始文件源

Note that the SSIS Raw File source can only be used to read a file created using a Raw File destination.

请注意,SSIS Raw File源只能用于读取使用Raw File目标创建的文件。

To read more about Raw Files, refer to the following official documentation:

要阅读有关原始文件的更多信息,请参阅以下官方文档:

平面文件与原始文件 (Flat Files vs. Raw Files)

Now I will illustrate the difference between both file types in SSIS.

现在,我将说明SSIS中两种文件类型之间的区别。

SSIS Flat Files are widely used to dump data from relational databases to be used later for other purposes, but most people don’t know that they are not recommended from a performance perspective. Even though comma-separated values files (.csv) are one of the most popular data sources used, Raw Files are designed to deliver higher performance when transferring data.

SSIS平面文件被广泛用于从关系数据库中转储数据,以供以后用于其他目的,但是大多数人都不知道从性能角度来看不建议使用它们。 即使逗号分隔值文件(.csv)是使用的最受欢迎的数据源之一,原始文件仍被设计为在传输数据时提供更高的性能。

SSIS Flat Files require parsing and validation, while the data in Raw Files are stored in native format and requires no translation and little parsing. In 2009, an experiment was conducted by John Welch to illustrate the difference between SSIS Flat Files and Raw Files from a performance perspective. You can read this article for more details:

SSIS平面文件需要解析和验证,而原始文件中的数据以本机格式存储,并且不需要转换和解析。 2009年,约翰·韦尔奇(John Welch)进行了一项实验,从性能的角度说明了SSIS平面文件和原始文件之间的区别。 您可以阅读本文以了解更多详细信息:

Raw Files are very useful for implementing parallel data import logic since you can split a file over multiple Raw Files then import them in parallel.

原始文件对于实现并行数据导入逻辑非常有用,因为您可以将一个文件拆分为多个原始文件,然后并行导入它们。

On the other hand, Raw Files cannot be edited or consumed outside of SSIS, which makes them only usable for data staging purposes.

另一方面,原始文件不能在SSIS外部进行编辑或使用,这使得它们仅可用于数据登台目的。

In conclusion, if you need to export data into a file for use in other systems or to be published, you will be best served with a data format that is widely used like SSIS Flat Files. But, if you need to dump data for use in a different ETL stage, Raw Files are recommended.

总之,如果您需要将数据导出到文件中以供其他系统使用或发布,则最好使用像SSIS Flat Files这样广泛使用的数据格式。 但是,如果您需要转储数据以用于其他ETL阶段,则建议使用Raw Files。

目录 (Table of contents)

SSIS OLE DB Source: SQL Command vs Table or View
SSIS Expression Tasks vs Evaluating variables as expressions
SSIS OLE DB Destination vs SQL Server Destination
Execute SQL Task in SSIS: SqlStatementSource Expressions vs Variable Source Types
Execute SQL Task in SSIS: Output Parameters vs Result Sets
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations
SSIS Connection Managers: OLE DB vs ODBC vs ADO.NET
SSIS Flat Files vs Raw Files
SSIS Foreach Loop vs For Loop Container
SSIS: Execute T-SQL Statement Task vs Execute SQL Task
SSIS OLE DB来源:SQL命令与表或视图
SSIS表达式任务与将变量作为表达式求值
SSIS OLE DB目标与SQL Server目标
在SSIS中执行SQL任务:SqlStatementSource表达式与可变源类型
在SSIS中执行SQL任务:输出参数与结果集
具有多个表达式与多个转换的SSIS派生列
SSIS数据类型:高级编辑器的更改与数据转换的转换
SSIS连接管理器:OLE DB与ODBC与ADO.NET
SSIS平面文件与原始文件
SSIS Foreach循环与For循环容器
SSIS:执行T-SQL语句任务与执行SQL任务

翻译自: https://www.sqlshack.com/ssis-flat-files-vs-raw-files/

ssis zip压缩文件

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值