offset fetch_SQL OFFSET FETCH功能:使用SSIS使用有限的资源加载大量数据

本文介绍了如何使用SQL的OFFSET FETCH功能在内存有限的环境中分块加载大量数据,以避免内存不足异常。通过示例说明了在SSIS中实现OFFSET FETCH来分页加载数据的方法,以及在Oracle、SQLite、MySQL和DB2等其他数据库中的类似语法。同时,也指出了使用该方法的一些限制和潜在问题。
摘要由CSDN通过智能技术生成

offset fetch

In this article, we illustrate how to use the OFFSET FETCH feature as a solution for loading large volumes of data from a relational database using a machine with limited memory and preventing an out of memory exception. We describe how to load data in batches to avoid placing a large amount of data into memory.

在本文中,我们说明了如何使用OFFSET FETCH功能作为使用内存有限的机器从关系数据库加载大量数据并防止内存不足异常的解决方案。 我们描述了如何批量加载数据以避免将大量数据放入内存。

This article is the first in the SSIS Tips and Tricks series which aims to illustrate some best practices.

本文是SSIS技巧和窍门系列中的第一篇,旨在说明一些最佳实践。

介绍 (Introduction)

When searching online for problems related to SSIS data import, you’ll find solutions that can be used in optimal environments or tutorials for handling a small amount of data. Unfortunately, these solutions prove to be unsuitable in a real environment.

在线搜索与SSIS数据导入相关的问题时,您会找到可在最佳环境或教程中用于处理少量数据的解决方案。 不幸的是,这些解决方案被证明不适合在实际环境中使用。

In reality, smaller companies can’t always adopt new storage, processing equipment, and technologies although they must still handle an increasing amount of data. This is especially true for social media analysis since they must analyze the behavior of their target audience (customers).

实际上,尽管较小的公司仍必须处理越来越多的数据,但它们并不总是采用新的存储,处理设备和技术。 对于社交媒体分析而言,尤其如此,因为他们必须分析目标受众(客户)的行为。

Similarly, not all companies can upload their data to the cloud due to the high cost along with data privacy and confidentiality issues.

同样,由于成本高昂以及数据隐私和机密性问题,并非所有公司都可以将其数据上传到云。

偏移量功能 (OFFSET FETCH feature)

OFFSET FETCH is a feature added to the ORDER BY clause beginning with the SQL Server 2012 edition. It can be used to extract a specific number of rows starting from a specific index. As an example, we have a query that returns 40 rows and we need to extract 10 rows from the 10th row:

从SQL Server 2012版本开始,OFFSET FETCH是一项添加到ORDER BY子句中的功能。 它可用于从特定索引开始提取特定数量的行。 作为一个例子,我们有一个查询,返回40行,我们需要提取从 10行10行:

SELECT *
FROM Table
ORDER BY ID
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

In the query above, OFFSET 10 is used to skip 10 rows and FETCH 10 ROWS ONLY is used to extract only 10 rows.

在上面的查询中,“偏移量10”用于跳过10行,而“仅捕获10行”仅用于提取10行。

To get additional information about the ORDER BY clause and OFFSET FETCH feature, refer to the official documentation: Using OFFSET and FETCH to limit the rows returned.

要获取有关ORDER BY子句和OFFSET FETCH功能的更多信息,请参考官方文档: 使用OFFSET和FETCH限制返回的行

使用OFFSET FETCH加载数据块(分页) (Using OFFSET FETCH to load data in chunks (pagination))

One of the main purposes of using the OFFSET FETCH feature is to load data in chunks. Let’s imagine we have an application that executes a SQL query and needs to show the results on several pages where each page contains only 10 results (similar to the Google search engine).

使用OFFSET FETCH功能的主要目的之一是分块加载数据。 假设我们有一个执行SQL查询的应用程序,需要在多个页面上显示结果,其中每个页面仅包含10个结果(类似于Google搜索引擎)。

The following query can be used as a paging query where @PageSize is the number of rows you need to show in each chunk and @PageNumber is the iteration (page) number:

以下查询可用作分页查询,其中@PageSize是您需要在每个块中显示的行数,而@PageNumber是迭代(页)数:

SELECT <some columns> 
FROM <table name>
ORDER BY <some columns>
OFFSET @PageSize * @PageNumber ROWS
FETCH NEXT @PageSize ROWS ONLY;

This article is not intended to illustrate all use cases of the OFFSET FETCH feature, nor does it discuss best practices. There are many articles online that you can refer to for more information:

本文无意于说明OFFSET FETCH功能的所有用例,也没有讨论最佳实践。 您可以参考许多在线文章以获取更多信息:

在SSIS中实现OFFSET FETCH功能以分块加载大量数据 (Implementing the OFFSET FETCH feature within SSIS to load a large volume of data in chunks)

We’ve often been asked to build an SSIS package that loads a huge amount of data from SQL Server with limited machine resources. Loading data using OLE DB Source using Table or View data access mode was causing an out of memory exception.

我们经常被要求构建一个SSIS包,以有限的计算机资源从SQL Server加载大量数据。 使用表或视图数据访问模式使用OLE DB Source加载数据导致内存不足异常。

One of the easiest solutions is to use the OFFSET FETCH feature to load data in chunks to prevent memory outage errors. In this section, we provide a step-by-step guide on implementing this logic within an SSIS package.

最简单的解决方案之一是使用OFFSET FETCH功能将数据分块加载,以防止出现内存中断错误。 在本节中,我们提供有关在SSIS包中实现此逻辑的分步指南。

First, we must create a new Integration Services package, then declare four variables as follows:

首先,我们必须创建一个新的Integration Services包,然后声明四个变量,如下所示:

  • RowCount (Int32): Stores the total number of rows in the source table RowCount (Int32):存储源表中的总行数
  • IncrementValue (Int32): Stores the number of rows we need to specify in the OFFSET clause (similar to @PageSize * @PageNumber in the example above) IncrementValue (Int32):存储我们需要在OFFSET子句中指定的行数(类似于上例中的@PageSize * @PageNumber)
  • RowsInChunk (Int32): Specifies the number of rows in each chunk of data (Similar to @PageSize in the example above) RowsInChunk (Int32):指定每个数据块中的行数(类似于上面的示例中的@PageSize)
  • SourceQuery (String): Stores the source SQL command used to fetch data SourceQuery (String):存储用于获取数据的源SQL命令

After declaring the variables, we assign a default value for the RowsInChunk variable; in this example, we will set it to 1000. Furthermore, we must set the Source Query expression, as follows:

声明变量后,我们为RowsInChunk变量分配一个默认值。 在此示例中,我们将其设置为1000。此外,我们必须设置Source Query表达式,如下所示:

"SELECT *
FROM [AdventureWorks2017].[Person].[Person]
ORDER BY [BusinessEntityID]
OFFSET " + (DT_WSTR,50)@[User::IncrementValue] + "  ROWS 
FETCH NEXT " + (DT_WSTR,50) @[User::RowsInChunk] + "  ROWS ONLY"

the variables added to the SSIS package

Figure 1 – Adding variables

图1 –添加变量

Next, we add an Execute SQL Task to get the total number of rows in the source table. In this example, we use the Person table stored in the AdventureWorks2017 database. In the Execute SQL Task, we used the following SQL Statement:

接下来,我们添加一个Execute SQL Task来获取源表中的总行数。 在此示例中,我们使用存储在AdventureWorks2017数据库中的Person表。 在执行SQL任务中,我们使用了以下SQL语句:

SELECT COUNT(*) FROM [AdventureWorks2017].[Person].[Person]

the general tab page of the execute sql task

Figure 2 – Setting Execute SQL Task

图2 –设置执行SQL任务

And, we must change the Result Set property to Single Row. Then, in the Result Set Tab, we select the RowCount variable to store the result set as shown in the image below:

并且,我们必须将结果集属性更改为单行。 然后,在“结果集”选项卡中,我们选择RowCount变量来存储结果集,如下图所示:

how to map the result set to a variable in the execute sql task

Figure 3 – Mapping result set to variable

图3 –映射结果集到变量

After configuring the Execute SQL Task, we add a For Loop Container, with the following specifications:

配置执行SQL任务后,我们添加一个具有以下规范的For循环容器

  • InitExpression: @IncrementValue = 0 InitExpression :@IncrementValue = 0
  • EvalExpression: @IncrementValue <= @RowCount EvalExpression :@IncrementValue <= @RowCount
  • AssignExpression: @IncrementValue = @IncrementValue + @RowsInChunk AssignExpression :@IncrementValue = @IncrementValue + @RowsInChunk

how to configure the for loop container in order to set the offset fetch values

Figure 4 – Configuring for loop container

图4 –配置循环容器

After configuring the For Loop Container, we add a Data Flow Task inside it. Then, within the Data Flow Task, we add an OLE DB Source and OLE DB Destination.

配置For循环容器之后 ,我们在其中添加一个数据流任务 。 然后,在数据流任务中,我们添加一个OLE DB源和OLE DB目标。

In the OLE DB Source we select SQL Command from variable data access mode, and select @User::SourceQuery variable as the source.

OLE DB Source中,我们从变量数据访问模式中选择SQL Command ,然后选择@User :: SourceQuery变量作为源。

how to use the sql command that contains the offset fetch feature in the ole db source

Figure 5 – Configuring OLE DB source

图5 –配置OLE DB源

We specify the destination table within the OLE DB Destination component:

我们在OLE DB目标组件中指定目标表:

a screenshot of the data flow task

Figure 6 – Data flow task screenshot

图6 –数据流任务屏幕截图

The package control flow should look like the following:

程序包控制流应如下所示:

a screenshot of the package control flow

Figure 7 – Control flow screenshot

图7 –控制流屏幕截图

局限性 (Limitations)

After illustrating how to load data in chunks using the OFFSET FETCH feature in SSIS, we’ll note that this logic has some limitations:

在说明了如何使用SSIS中的OFFSET FETCH功能加载数据后,我们将注意到此逻辑有一些局限性:

  1. You always need some columns to be used in the ORDER BY clause (Identity or Primary key is preferred), since OFFSET FETCH is a feature of the ORDER BY clause and it cannot be implemented separately

    您总是需要在ORDER BY子句中使用一些列(首选身份或主键),因为OFFSET FETCH是ORDER BY子句的功能,不能单独实现
  2. If an error occurs while loading data, all data exported to the destination is committed and only the current chunk of data is rolled back. This may require additional steps to prevent data duplication when running the package again

    如果在加载数据时发生错误,则将提交所有导出到目标的数据,并且仅回滚当前数据块。 这可能需要其他步骤来防止再次运行程序包时的数据重复

使用其他数据库提供程序的OFFSET FETCH (OFFSET FETCH using other database providers)

In the following section, we briefly cover the syntax used by other database providers:

在以下部分中,我们简要介绍其他数据库提供程序使用的语法:

Oracle (Oracle)

With Oracle, you can use the same syntax as SQL Server. Refer to the following link for more information: Oracle FETCH

使用Oracle,您可以使用与SQL Server相同的语法。 有关更多信息,请参考以下链接: Oracle FETCH

SQLite的 (SQLite)

In SQLite, the syntax is different from SQL Server, since you use the LIMIT OFFSET feature as mentioned below:

在SQLite中,语法与SQL Server不同,这是因为您使用了LIMIT OFFSET功能 ,如下所述:

SELECT * FROM MYTABLE ORDER BY ID_COLUMN
LIMIT 50
OFFSET 10

MySQL (MySQL)

In MySQL, the syntax is similar to SQLite, since you use LIMIT OFFSET instead of OFFSET Fetch.

在MySQL中,语法类似于SQLite, 因为您使用LIMIT OFFSET而不是OFFSET Fetch。

DB2 (DB2)

In DB2, the syntax is similar to SQLite, since you use LIMIT OFFSET instead of OFFSET FETCH.

在DB2中,语法类似于SQLite, 因为您使用LIMIT OFFSET而不是OFFSET FETCH。

结论 (Conclusion)

In this article, we’ve described the OFFSET FETCH feature found in SQL Server 2012 and higher. We illustrated how to use this feature to create a paging query, then provided a step-by-step guide on how to load data in chunks to allow extracting large amounts of data using a machine with limited resources. Finally, we mentioned some of the limitations and the syntax differences with other database providers.

在本文中,我们描述了SQL Server 2012及更高版本中的OFFSET FETCH功能。 我们说明了如何使用此功能创建分页查询,然后提供了有关如何按组加载数据以允许使用资源有限的机器提取大量数据的分步指南。 最后,我们提到了一些限制以及与其他数据库提供程序的语法差异。

翻译自: https://www.sqlshack.com/sql-offset-fetch-feature-loading-large-volumes-of-data-using-limited-resources-with-ssis/

offset fetch

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值