11.2 New Feature : Using DBMS_PARALLEL_EXECUTE to Update Large Tables in Parallel (Doc ID 1066555.1)
PURPOSE 目的
The purpose of this article is to introduce the 11.2.0.x database new package DBMS_PARALLEL_EXECUTE and how it can used to update large tables in parallel using different chunking mechanism.
本文的目的是介绍11.2.0.x数据库新程序包DBMS_PARALLEL_EXECUTE,以及如何使用不同的分块机制并行更新大型表。
SCOPE 范围
This is intended for PL/SQL programmers 适用于PL/SQL程序员
DETAILS 细节
Overview 总览
The DBMS_PARALLEL_EXECUTE package enables you to incrementally update the data in a table in parallel, in two high-level steps:
DBMS_PARALLEL_EXECUTE 软件包使您可以通过两个高级步骤来并行地更新表中的数据:
- Group the data into smaller chunks. 将数据分组为较小的块。
- Apply the desired DML statement on each chunk in parallel. 在每个块上并行应用所需的DML语句。并在处理完每个块后提交(commit)。
This technique is recommended whenever you are updating huge amount of data. Its advantages are:
每当您要更新大量数据时,建议使用此技术。它的优点是:
- It locks only one set of rows at a time, for a relatively short time, instead of locking the entire table.
- 它在相对较短的时间内一次仅锁定一组行,而不是锁定整个表。
- You do not lose work that has been done if something fails before the entire operation finishes.
- 如果在整个操作完成之前发生故障,您不会丢失已经完成的工作。
- It reduces the rollback space consumption.
- 它减少了回滚空间的消耗。
- It improves the performance.
- 它提高了性能。
NOTE: The DBMS_PARALLEL_EXECUTE package relies on tasks and requires that the job_queue_processes parameter is set to a value greater than 0. If job_queue_processes=0 and a call to DBMS_PARALLEL_EXECUTE is executed. The task will hang waiting on "PL/SQL lock timer".
注意:DBMS_PARALLEL_EXECUTE 程序包依赖于任务(tasks),并要求将 job_queue_processes 参数设置为大于0的值。如果job_queue_processes = 0,则执行对 DBMS_PARALLEL_EXECUTE 的调用。该任务将挂起,等待"PL/SQL lock timer"。
Different ways of chunking method that the DBMS_PARALLEL_EXECUTE package offers:
DBMS_PARALLEL_EXECUTE 软件包提供的分块方法的不同方式:
- CREATE_CHUNKS_BY_NUMBER_COL, This procedure chunks the table by a given column. The column specified must be a number column and it takes MIN and MAX value of the given column and divides the range evenly according to the chunk value specified. 此过程按给定列对表进行分块。指定的列必须是数字列(number),并且采用给定列的 MIN 和 MAX 值,并根据指定的块值平均划分范围。
- CREATE_CHUNKS_BY_ROWID, This procedure chunks the table by ROWID. The table to be chunked must be a physical table with physical ROWID. Index organized tables are not allowed. 此过程按 ROWID 对表进行分块。要分块的表必须是具有物理ROWID的物理表。不允许使用索引组织表。
- CREATE_CHUNKS_BY_SQL, This procedure chunks the table by means of user-provided SELECT statement. 此过程通过用户提供的SELECT语句对表进行分块。
Pre-requisites before using the DBMS_PARALLEL_EXECUTE package:
使用DBMS_PARALLEL_EXECUTE软件包之前的先决条件:
- To execute chunks in parallel, the user must have CREATE JOB system privilege.
- 要并行执行块,用户必须具有CREATE JOB系统特权。
- Execute privilege on DBMS_SQL package because CHUNK_BY_SQL, RUN_TASK, and RESUME_TASK subprograms require a query, and are executed using DBMS_SQL.
- 对DBMS_SQL程序包具有执行特权,因为CHUNK_BY_SQL,RUN_TASK和RESUME_TASK子程序需要查询,并使用DBMS_SQL执行。
This below code creates a dummy table and populates some records in it, which will be in the sample codes further.
下面的代码创建一个虚拟表并填充其中的一些记录,这些记录将在示例代码中进一步显示。
DROP TABLE TEST_TAB;
CREATE TABLE test_tab
(
REC_ID NUMBER,
DESCRIPTION VARCHAR2(50),
NUM_COL NUMBER,
CONSTRAINT TEST_TAB_PK PRIMARY KEY (REC_ID)
);
INSERT /*+ APPEND */ INTO
TEST_TAB
SELECT LEVEL,
'Description for ' || LEVEL,
CASE
WHEN MOD(level, 5) = 0 THEN 10
WHEN MOD(level, 3) = 0 THEN 20
ELSE 30
END
FROM DUAL
CONNECT BY LEVEL <= 500000;
COMMIT;
SELECT NUM_COL, COUNT(*) FROM TEST_TAB
GROUP BY NUM_COL ORDER BY NUM_COL;
NUM_COL COUNT(*)
---------- ----------
10 10000
20 13333
30 26667
1. Sample code to create the chunks using the CREATE_CHUNKS_BY_ROWID procedure:
使用CREATE_CHUNKS_BY_ROWID过程创建块的示例代码:
SET SERVEROUTPUT ON
BEGIN
DBMS_PARALLEL_EXECUTE.DROP_TASK ('test_task');
EXCEPTION WHEN OTHERS THEN
NULL;
END;
/
DECLARE
l_task VARCHAR2(30) := 'test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK 创建任务
DBMS_PARALLEL_EXECUTE.CREATE_TASK (task_name => l_task);
-- Chunk the table by the ROWID 使用ROWID分块
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID
(
TASK_NAME => l_task,
TABLE_OWNER => 'SCOTT', /* Replace the TABLE_OWNER as appropriately 适当替换TABLE_OWNER*/
TABLE_NAME => 'TEST_TAB',
BY_ROW => TRUE, /*TRUE 表示行数,否则 表示块数*/
CHUNK_SIZE => 2500 /* Approximate number of rows/blocks to process for each commit cycle 每个提交周期要处理的行数或块数*/
);
-- DML to be execute in parallel 要并行执行的DML
l_sql_stmt := 'UPDATE test_tab t SET t.num_col = t.num_col + 10 WHERE rowid BETWEEN :start_id AND :end_id';
-- Run the task 执行任务
DBMS_PARALLEL_EXECUTE.RUN_TASK
(
TASK_NAME => l_task,
SQL_STMT => l_sql_stmt,
LANGUAGE_FLAG => DBMS_SQL.NATIVE,
PARALLEL_LEVEL => 10 /*指定并行度,0为串行,NULL为默认cpu_count*parallel_threads_per_cpu*/
);
-- If there is error, RESUME it for at most 2 times. 如果有错误,重新执行两次
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK(l_task);
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);
END LOOP;
-- Done with processing; drop the task 完成处理,删除任务
DBMS_PARALLEL_EXECUTE.DROP_TASK(l_task);
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error in the code :' || SQLERRM);
END;
/
2. Sample code to create the chunks using the CREATE_CHUNKS_BY_NUMBER_COL procedure:
使用CREATE_CHUNKS_BY_NUMBER_COL过程创建块的示例代码:
CREATE OR REPLACE PROCEDURE
process_update
(
p_start_id IN NUMBER,
p_end_id IN NUMBER
)
AS
BEGIN
UPDATE TEST_TAB T
SET T.NUM_COL = T.NUM_COL + 10
WHERE REC_ID BETWEEN p_start_id AND p_end_id;
END;
/
SET SERVEROUTPUT ON
DECLARE
l_task VARCHAR2(30) := 'test_task';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK (task_name => l_task);
-- Chunk the table by the ID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_NUMBER_COL
(
TASK_NAME => l_task,
TABLE_OWNER => 'SCOTT', /* Replace the TABLE_OWNER value as appropriately */
TABLE_NAME => 'TEST_TAB',
TABLE_COLUMN => 'REC_ID',
CHUNK_SIZE => 2500
);
-- Procedure to be execute in parallel
l_sql_stmt := 'BEGIN PROCESS_UPDATE(:start_id, :end_id); END;';
DBMS_PARALLEL_EXECUTE.RUN_TASK
(
TASK_NAME => l_task,
SQL_STMT => l_sql_stmt,
LANGUAGE_FLAG => DBMS_SQL.NATIVE,
PARALLEL_LEVEL => 10
);
-- If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK(l_task);
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS(l_task);
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK(l_task);
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error in the code :' || SQLERRM);
END;
/
3. Sample code to create the chunks using the CREATE_CHUNKS_BY_SQL procedure:
使用CREATE_CHUNKS_BY_SQL过程创建块的示例代码:
SET SERVEROUTPUT ON
DECLARE
l_chunk_sql VARCHAR2(1000);
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('test_task');
-- Chunk the table by NUM_COL
l_chunk_sql := 'SELECT DISTINCT NUM_COL, NUM_COL FROM TEST_TAB';
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_SQL
(
TASK_NAME => 'test_task',
SQL_STMT => l_chunk_sql,
BY_ROWID => false
);
-- Execute the DML in parallel 并行执行DML
-- the WHERE clause contain a condition on num_col, which is the chunk
-- WHERE子句在num_col上包含一个条件,即块(chunk)
-- column. In this case, grouping rows is by num_col.
-- 列. 在这种情况下,按num_col对行进行分组。
l_sql_stmt := 'UPDATE test_tab t SET t.num_col = t.num_col + 10 WHERE num_col BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK
(
TASK_NAME => 'test_task',
SQL_STMT => l_sql_stmt,
LANGUAGE_FLAG => DBMS_SQL.NATIVE,
PARALLEL_LEVEL => 10
);
-- If there is error, RESUME it for at most 2 times.
L_try := 0;
L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('test_task');
WHILE(l_try < 2 and L_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
L_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('test_task');
L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('test_task');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('test_task');
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error in the code :' || SQLERRM);
END;
/
References : 参考
For more information on the DBMS_PARALLEL_EXECUTE package, refer to the chapter "DBMS_PARALLEL_EXECUTE" in the "Database PL/SQL Packages and Types Reference" guide.
有关DBMS_PARALLEL_EXECUTE软件包的更多信息,请参阅"Database PL/SQL Packages and Types Reference"指南中的"DBMS_PARALLEL_EXECUTE" 一章。
https://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#ARPLS67331