ssis什么情况下用到变量_了解SSIS内存使用情况

ssis什么情况下用到变量

In this article, I am going to explain in detail about SSIS memory usage and how can we optimize out tasks and the data flow to leverage maximum benefits from the in-memory operating tool. As you might be aware, SSIS also known as SQL Server Integration Services is a data integration tool, provided by Microsoft which comes shipped with the SQL Server editions. SSIS is an enterprise-scale, in-memory data integration tool which can be used to move data between different databases or different servers in a comfortable yet manageable way.

在本文中,我将详细说明SSIS的内存使用情况,以及如何优化任务和数据流,以充分利用内存操作工具的最大优势。 您可能已经知道,SSIS也称为SQL Server Integration Services是一种数据集成工具,由Microsoft提供,随SQL Server版本一起提供。 SSIS是企业级内存数据集成工具,可用于以舒适但可管理的方式在不同数据库或不同服务器之间移动数据。

SSIS需要多少内存? (How much memory is needed for SSIS?)

In ETL design, the often most important question that I face is how much memory we should put in for our production environment. While we work on configuring the environments or creating a new instance of SSIS, we must understand the jobs that are going to be set up, how many times are these jobs going to be executed etc. These parameters will help a BI Engineer to calculate and decide a correct figure for setting up the required memory requirements for the instance.

在ETL设计中,我经常遇到的最重要的问题是我们应该为生产环境放入多少内存。 在配置环境或创建SSIS的新实例时,我们必须了解要设置的作业,这些作业要执行多少次等。这些参数将帮助BI工程师计算和确定正确的数字以设置实例所需的内存要求。

Since SSIS is an enterprise tool, it can consume as much RAM as you can allocate to it. However, allocating more RAM also increases the budget on the infrastructure of your projects which might not be an ideal case always. So, you should consider the package design, whether it is ETL or ELT, if there is any parallelism involved within the packages, if there are any expensive operations like sort or merge within the packages and so on. All these factors contribute to the SSIS memory usage and we need to understand these in detail.

由于SSIS是企业工具,因此它可以消耗尽可能多的RAM。 但是,分配更多的RAM也会增加项目基础结构的预算,这可能并不总是理想的情况。 因此,您应该考虑软件包设计,无论是ETL还是ELT,如果软件包中涉及任何并行性,以及软件包中是否存在诸如排序或合并之类的昂贵操作,等等。 所有这些因素都会导致SSIS内存使用情况,我们需要详细了解它们。

数据流任务(DFT)的内存消耗 (Memory consumption by Data Flow Tasks (DFT))

Data Flow Tasks or DFTs as most of us call it, are one of the most commonly used tasks and we need to understand its SSIS memory usage. Using a Data Flow Task, we can quickly load data from a flat file to a table in SQL Server or vice versa. SSIS packages, if executed from the SSIS catalog, are executed on a separate process known as ISServerExec.exe. This process allocates most of the available memory from the system for each of the package that is being executed and release this memory shortly after the execution has been completed.

正如我们大多数人所称的, 数据流任务或DFT是最常用的任务之一,我们需要了解其SSIS内存使用情况。 使用数据流任务,我们可以将数据从平面文件快速加载到SQL Server中的表,反之亦然。 如果从SSIS目录执行SSIS程序包,则在称为ISServerExec.exe的单独进程上执行。 此过程为正在执行的每个程序包分配系统中的大部分可用内存,并在执行完成后不久释放该内存。

For example, if a package uses a Data Flow Task which loads 10M records from a flat file to a SQL Server table, then all these 10M records need to pass through the memory space that has been allocated to the SSIS server during execution.

例如,如果程序包使用数据流任务将1000万条记录从平面文件加载到SQL Server表中,则所有这1000万条记录都需要通过在执行过程中已分配给SSIS服务器的内存空间。

SSIS Memory Usage by Data Flow Tasks

Figure 1 – SSIS Memory Usage by Data Flow Tasks

图1 –按数据流任务的SSIS内存使用情况

However, loading all the records into SSIS memory while execution does not mean that all the 1M records will be loaded into the memory at once. If this was the case, then we would require a lot more memory to support 1M records at a single time. Instead, what SSIS does is, it creates much smaller chunks of data from the 1M records and then loads these chunks one after another. These chucks of data, also known as a buffer in SSIS are used so that we can optimize the memory and load a smaller amount of data and process it to the database before loading the next buffer.

但是,在执行时将所有记录加载到SSIS内存中并不意味着所有1M记录都将被立即加载到内存中。 如果真是这样,那么我们将需要更多内存才能一次支持1M记录。 相反,SSIS所做的是,它会从1M记录中创建更小的数据块,然后依次加载这些块。 使用这些数据块(在SSIS中也称为缓冲区),以便我们可以优化内存并加载少量数据,然后在加载下一个缓冲区之前将其处理到数据库中。

SSIS Memory Usage Buffer Example

Figure 2 – SSIS Buffer Example

图2 – SSIS缓冲区示例

通过查找转换的内存消耗 (Memory Consumption by Lookup Transformations)

As explained in the previous example about the buffers in SSIS which releases data as soon as they are loaded into the database, there are also scenarios in SSIS where data resides in the memory for the entire duration of the package execution. In a lookup transformation, the data is usually cached in the memory during the package execution. This enables SSIS to execute the package much faster because the lookup data is directly available in the memory and the SSIS engine does not need to make queries to the source data to fetch the lookup values.

如前面的示例中所述,SSIS中的缓冲区会在将数据加载到数据库后立即释放它们,在SSIS中也有一些方案,其中数据在包执行的整个过程中都驻留在内存中。 在查找转换中,通常在包执行期间将数据缓存在内存中。 这使SSIS可以更快地执行包,因为查找数据直接在内存中可用,并且SSIS引擎无需查询源数据即可获取查找值。

SSIS Memory Usage by Lookup Transformation

Figure 3 – SSIS Memory Usage by Lookup Transformation

图3 –通过查找转换的SSIS内存使用情况

We can alter this behavior of the lookup transformation by modifying the cache mode of the transformation while designing the package. There are three modes for defining the cache in a lookup transformation – full cache, partial cache, or no cache. By default, the full cache mode is active, which loads all the data into memory during the execution phase. Setting this to partial or no cache may reduce the load on the memory, however, it might increase the number of queries generated to fetch the lookup values from the data source every time.

我们可以在设计包时通过修改转换的缓存模式来更改查找转换的行为。 可以通过三种模式在查找转换中定义高速缓存-全高速缓存,部分高速缓存或无高速缓存。 默认情况下,完全高速缓存模式是活动的,它将在执行阶段将所有数据加载到内存中。 将其设置为部分缓存或不使用缓存可以减少内存的负载,但是,这可能会增加每次为从数据源获取查找值而生成的查询的数量。

完全阻止SSIS内存使用 (Fully blocking SSIS Memory Usage)

There are a few operations in SSIS like the Sort transformation, aggregate, the fuzzy lookup, etc., which loads the entire set of records into memory before performing the operation. These are exceptions to the buffering technique as I have explained in the first example. This is because the operation is performed on a set of data rather than the individual records. For example, in a sort operation, the SSIS engine needs to know what are all the values of a possible column on which the records are to be sorted before moving forward with the operation. For this reason, all the records are loaded into memory at the same time and the transformation is applied in memory. Once the transformation is completed, the data is then written to a database and the memory is released.

SSIS中有一些操作,例如排序转换,聚合,模糊查找等,它们会在执行操作之前将整个记录集加载到内存中。 正如我在第一个示例中所解释的那样,这些都是缓冲技术的例外。 这是因为操作是对一组数据而不是单个记录执行的。 例如,在排序操作中,SSIS引擎在继续进行操作之前,需要知道可能对记录进行排序的列的所有值是什么。 因此,所有记录都同时加载到内存中,并且转换应用到内存中。 转换完成后,然后将数据写入数据库并释放内存。

Fully blocked SSIS Memory Usage by Sort operator

Figure 4 – Fully blocked memory consumption by Sort operator

图4 –排序运算符完全阻塞的内存消耗

Although the above-explained sort operation is good for small-sized flat files, it is advised not to apply the same principle to a large data source. For large data sources, which require a hefty amount of memory to process such information, a good idea would be to use an ELT architecture instead of an ETL, where the data will be first loaded into a staging area and from there it can be finally loaded into the database. In such an operation, the sort transformation will be handled by the database engine and not by SSIS. This will also help to reduce some load on the memory as the processing will be shifted to the database.

尽管上面解释的排序操作对于小型平面文件而言是不错的选择,但建议不要对大型数据源应用相同的原理。 对于需要大量内存来处理此类信息的大型数据源,一个好主意是使用ELT架构而不是ETL,在该架构中,数据将首先加载到暂存区域中,然后可以从那里进行最终存储加载到数据库中。 在这种操作中,排序转换将由数据库引擎而不是SSIS处理。 当处理将转移到数据库时,这还将有助于减少内存的某些负载。

SQL Server和SSIS中的内存分配 (Memory allocation in SQL Server and SSIS)

Most of us, developers, and database administrators (DBAs) might be already aware of the maximum and minimum memory options that can be configured for a SQL Server instance. Please note, that this memory which is configured for SQL Server has got nothing to do with the SSIS memory. SSIS packages are executed on the different processes from the database engine and the memory is allocated from the available system memory during runtime. For example, let us consider an environment which has 150GB of RAM, out of which 128GB is being allocated to the SQL Server instance. Out of the remaining 22GB, let us assume that 8GB will be consumed by the Windows process which leaves us to 14GB of available memory. When the SSIS process starts, it will allocate this 14GB of memory to the SSIS engine and do all the processing using this.

我们大多数人,开发人员和数据库管理员(DBA)可能已经知道可以为SQL Server实例配置的最大和最小内存选项。 请注意,为SQL Server配置的此内存与SSIS内存无关。 SSIS包在与数据库引擎不同的进程上执行,并且在运行时从可用的系统内存中分配内存。 例如,让我们考虑一个具有150GB RAM的环境,其中128GB被分配给SQL Server实例。 在剩余的22GB中,让我们假设Windows进程将消耗8GB,从而使我们剩下14GB的可用内存。 当SSIS进程开始时,它将为SSIS引擎分配这14GB的内存,并使用该内存进行所有处理。

If you consider increasing the performance of your SSIS packages on such a machine without increasing the memory, you can deallocate some memory from the SQL Server instance, and then it will be available for the SSIS engine to use when required.

如果您考虑在不增加内存的情况下提高此类计算机上SSIS包的性能,则可以从SQL Server实例中分配一些内存,然后在需要时SSIS引擎可以使用它。

结论 (Conclusion)

In this article, we have understood the concept of SSIS memory usages. I have explained about the various execution tasks that are available within SSIS and how these tasks consume memory. There is no hard and fast rule to define how much memory is optimal for SSIS. It always depends on the volume of data that is being moved across different systems.

在本文中,我们了解了SSIS内存使用的概念。 我已经解释了SSIS中可用的各种执行任务以及这些任务如何消耗内存。 没有硬性规定可以定义SSIS的最佳内存大小。 它始终取决于在不同系统之间移动的数据量。

Usually, higher memory helps to achieve faster transformation as much more data can be loaded into memory while processing and thus helps in completing the job faster. As you grow as a developer, with your experience, you will be able to determine the correct required memory for an SSIS instance and configure the environment accordingly.

通常,更高的内存有助于实现更快的转换,因为在处理过程中可以将更多的数据加载到内存中,从而有助于更快地完成工作。 随着您成为开发人员的成长,凭借您的经验,您将能够确定SSIS实例所需的正确内存并相应地配置环境。

翻译自: https://www.sqlshack.com/understanding-ssis-memory-usage/

ssis什么情况下用到变量

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值