在批量更新中,merge语句性能是最好的,因为merge可以多快读,而且可以启用并行,merge语句比一般update语句快,另外还有一种就是根据rowid来更新,这几种方法更新各有好处,首先使用merge更新,可以多快读,且速度是最快的,而且可以并行,速度就更快了,其次用rowid来更新,虽然是单块读,但如果加上组合索引的方式,那通过组合索引就不用回表,只扫描扫描就可以了,这也是比较快的,最后用普通的update,这种写法好处就是方便,容易写。下面来看一个简单的例子
1. 创建测试表,插入数据
PgSQL
SQL> create table test_a as select * from dba_objects;
Table created.
SQL> create table test_b as select * from dba_objects;
Table created.
SQL> insert into test_b select * from test_b;
91570 rows created.
SQL> /
183140 rows created.
SQL> /
366280 rows created.
SQL> /
732560 rows created.
SQL> commit;
Commit complete.
SQL> SELECT SUM(BYTES)/1024/1024 "SIZE(MB)" FROM DBA_SEGMENTS WHERE SEGMENT_NAME='TEST_B';
SIZE(MB)
----------
200
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SQL>createtabletest_aasselect*fromdba_objects;
Tablecreated.
SQL>createtabletest_basselect*fromdba_objects;
Tablecreated.
SQL>insertintotest_bselect*fromtest_b;
91570rowscreated.
SQL>/
183140rowscreated.
SQL>/
366280rowscreated.
SQL>/
732560rowscreated.
SQL>commit;
Commitcomplete.
SQL>SELECTSUM(BYTES)/1024/1024"SIZE(MB)"FROMDBA_SEGMENTSWHERESEGMENT_NAME='TEST_B';
SIZE(MB)
----------
200
2. 在两个测试表上创建2个索引
PgSQL
SQL> create index idx_a on test_a(object_name,object_id);
Index created.
SQL> create index idx_b on test_b(object_id);
Index created.
1
2
3
4
5
6
7
SQL>createindexidx_aontest_a(object_name,object_id);
Indexcreated.
SQL>createindexidx_bontest_b(object_id);
Indexcreated.
3. 如果现在用最原始的update语句从test_a表更新test_b表,关联条件是object_id
PgSQL
SQL> update test_b b set b.object_name=(select a.object_name from test_a a where a.object_id=b.object_id);
1
SQL>updatetest_bbsetb.object_name=(selecta.object_namefromtest_aawherea.object_id=b.object_id);
我这里反正是过了很久都没有更新完成,毕竟测试库,没有那么大的buffer cache,这里的update相当于嵌套循环,test_b表有多少数据,那么子查询就要执行扫描多少次,对于一个几十万的表,扫描这么多次无疑是非常慢的。
4. 这里直接改成merge语句,设置多快读参数为128,开启并行
PgSQL
SQL> alter session set db_file_multiblock_read_count=128;
Session altered.
SQL> alter session enable parallel dml;
Session altered.
SQL> conn scott/tiger
Connected.
SQL> merge /*+ USE_HASH(C,H) FULL(C) */ into test_b c
2 using (select /*+INDEX(A) USE_HASH(A) */ a.object_name, a.object_id
3 from test_a a
4 where a.object_id in (select /*+ use_hash(b) index(b) */ object_id from test_b b)) h
5 on (c.object_id = h.object_id)
6 when matched then
7 update set c.object_name = h.object_name;
1465104 rows merged.
Elapsed: 00:00:12.03
SQL>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL>altersessionsetdb_file_multiblock_read_count=128;
Sessionaltered.
SQL>altersessionenableparalleldml;
Sessionaltered.
SQL>connscott/tiger
Connected.
SQL>merge/*+ USE_HASH(C,H) FULL(C) */intotest_bc
2using(select/*+INDEX(A) USE_HASH(A) */a.object_name,a.object_id
3fromtest_aa
4wherea.object_idin(select/*+ use_hash(b) index(b) */object_idfromtest_bb))h
5on(c.object_id=h.object_id)
6whenmatchedthen
7updatesetc.object_name=h.object_name;
1465104rowsmerged.
Elapsed:00:00:12.03
SQL>
可以看到使用merge语句更新了146W的数据,但是只花掉了10多秒的时间,而第一个update语句却迟迟没有更新成功。
简单看看这2中update方式的执行计划
PgSQL
SQL> explain plan for merge /*+ USE_HASH(C,H) FULL(C) */ into test_b c
using (select /*+INDEX(A) USE_HASH(A) */ a.object_name, a.object_id
from test_a a
2 3 4 where a.object_id in (select /*+ use_hash(b) index(b) */ object_id from test_b b)) h
5 on (c.object_id = h.object_id)
6 when matched then
7 update set c.object_name = h.object_name;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
Plan hash value: 1121481720
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 91569 | 8137K| | 1684 (1)| 00:00:01 |
| 1 | MERGE | TEST_B | | | | | |
| 2 | VIEW | | | | | | |
|* 3 | HASH JOIN | | 91569 | 12M| 3760K| 1684 (1)| 00:00:01 |
| 4 | INDEX FULL SCAN | IDX_A | 91569 | 2682K| | 523 (1)| 00:00:01 |
| 5 | TABLE ACCESS FULL| TEST_B | 91570 | 10M| | 428 (1)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("C"."OBJECT_ID"="A"."OBJECT_ID")
17 rows selected.
SQL> explain plan for update test_b b set b.object_name=(select a.object_name from test_a a where a.object_id=b.object_id);
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
Plan hash value: 703839442
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 91570 | 2682K| 47M (1)| 00:31:11 |
| 1 | UPDATE | TEST_B | | | | |
| 2 | TABLE ACCESS FULL| TEST_B | 91570 | 2682K| 428 (1)| 00:00:01 |
|* 3 | INDEX FULL SCAN | IDX_A | 1 | 30 | 523 (1)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("A"."OBJECT_ID"=:B1)
filter("A"."OBJECT_ID"=:B1)
16 rows selected.
使用merge into方式正确的使用的hash join连接,但普通的update语句,相当于嵌套循环,在id=3处看出来,当驱动表test_b返回一条数据的时候,被驱动表扫描一次,这里对于全表扫描会返回很多行的update来说,是非常不适合的。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
SQL>explainplanformerge/*+ USE_HASH(C,H) FULL(C) */intotest_bc
using(select/*+INDEX(A) USE_HASH(A) */a.object_name,a.object_id
fromtest_aa
234wherea.object_idin(select/*+ use_hash(b) index(b) */object_idfromtest_bb))h
5on(c.object_id=h.object_id)
6whenmatchedthen
7updatesetc.object_name=h.object_name;
Explained.
SQL>select*fromtable(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
Planhashvalue:1121481720
---------------------------------------------------------------------------------------
|Id|Operation|Name|Rows|Bytes|TempSpc|Cost(%CPU)|Time|
---------------------------------------------------------------------------------------
|0|MERGESTATEMENT||91569|8137K||1684(1)|00:00:01|
|1|MERGE|TEST_B||||||
|2|VIEW|||||||
|*3|HASHJOIN||91569|12M|3760K|1684(1)|00:00:01|
|4|INDEXFULLSCAN|IDX_A|91569|2682K||523(1)|00:00:01|
|5|TABLEACCESSFULL|TEST_B|91570|10M||428(1)|00:00:01|
---------------------------------------------------------------------------------------
PredicateInformation(identifiedbyoperationid):
---------------------------------------------------
3-access("C"."OBJECT_ID"="A"."OBJECT_ID")
17rowsselected.
SQL>explainplanforupdatetest_bbsetb.object_name=(selecta.object_namefromtest_aawherea.object_id=b.object_id);
Explained.
SQL>select*fromtable(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
Planhashvalue:703839442
-----------------------------------------------------------------------------
|Id|Operation|Name|Rows|Bytes|Cost(%CPU)|Time|
-----------------------------------------------------------------------------
|0|UPDATESTATEMENT||91570|2682K|47M(1)|00:31:11|
|1|UPDATE|TEST_B|||||
|2|TABLEACCESSFULL|TEST_B|91570|2682K|428(1)|00:00:01|
|*3|INDEXFULLSCAN|IDX_A|1|30|523(1)|00:00:01|
-----------------------------------------------------------------------------
PredicateInformation(identifiedbyoperationid):
---------------------------------------------------
3-access("A"."OBJECT_ID"=:B1)
filter("A"."OBJECT_ID"=:B1)
16rowsselected.
使用mergeinto方式正确的使用的hashjoin连接,但普通的update语句,相当于嵌套循环,在id=3处看出来,当驱动表test_b返回一条数据的时候,被驱动表扫描一次,这里对于全表扫描会返回很多行的update来说,是非常不适合的。
PgSQL
SQL> DECLARE
2 CURSOR CUR_B IS
3 SELECT a.object_id, a.object_name, b.ROWID ROW_ID
4 FROM test_a A, test_b B
5 WHERE A.object_id = B.object_id
6 ORDER BY B.ROWID;
7 V_COUNTER NUMBER;
8 BEGIN
9 V_COUNTER := 0;
10 FOR ROW_B IN CUR_B LOOP
11 UPDATE test_b b SET object_name = ROW_B.object_name WHERE ROWID = ROW_B.ROW_ID;
12 V_COUNTER := V_COUNTER + 1;
13 IF (V_COUNTER >= 10000) THEN
14 COMMIT;
15 dbms_output.put_line('Updated: ' ||V_COUNTER || ' lines.');
16 V_COUNTER := 0;
17 END IF;
18 END LOOP;
19 COMMIT;
20 END;
21 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:09.35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SQL>DECLARE
2CURSORCUR_BIS
3SELECTa.object_id,a.object_name,b.ROWIDROW_ID
4FROMtest_aA,test_bB
5WHEREA.object_id=B.object_id
6ORDERBYB.ROWID;
7V_COUNTERNUMBER;
8BEGIN
9V_COUNTER:=0;
10FORROW_BINCUR_BLOOP
11UPDATEtest_bbSETobject_name=ROW_B.object_nameWHEREROWID=ROW_B.ROW_ID;
12V_COUNTER:=V_COUNTER+1;
13IF(V_COUNTER>=10000)THEN
14COMMIT;
15dbms_output.put_line('Updated: '||V_COUNTER||' lines.');
16V_COUNTER:=0;
17END IF;
18END LOOP;
19COMMIT;
20END;
21/
PL/SQLproceduresuccessfullycompleted.
Elapsed:00:01:09.35
这里通过rowid来更新,前文已经说过,rowid是单块读的,没有merge快,而实验也正好证明了这个问题,merge语句要比通过rowid来更新要快很多,不过在通过rowid更新的时候要注意,我这里用了order by rowid,是在buffer cache不够大的情况下使用,性能比较好,保证被更新的表不被page out,当然如果buffer cache够大,就不会存在这种情况了,反而比不加order by rowid要快,因为少了排序,而且buffer cache也能容纳下这2个表。用rowid的好处就是可以批量提交,但不足的地方就是没有merge快,不能并行更新。
转载请注明: 版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!
最后编辑:2014-03-05作者:Jerry
一个积极向上的小青年,热衷于分享--Focus on DB,BI,ETL