背景:
系统运行在Solaris 8上,v8fp14 DB2 v8.1.0.128 : s061108,用户的语句原本运行时间不到一秒,执行了runstats后十分钟。
并且用户发现当使用optlevel=1的时候执行时间恢复正常,optlevel=5的时候则为10分钟。
语句:
DELETE
FROM CTWEBTRACK
WHERE CTWEBTRACK_ID IN
(SELECT CTWEBTRACK_ID
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY)
runstats命令:
RUNSTATS ON TABLE "DB2RTZ1 "."CTWEBTRACK" WITH DISTRIBUTION ON COLUMNS (
CTWEBTRACK_ID NUM_FREQVALUES 10 NUM_QUANTILES 20,
IP_ADDRESS NUM_FREQVALUES 10 NUM_QUANTILES 20,
SESSION_ID NUM_FREQVALUES 10 NUM_QUANTILES 20,
SOURCE_CODE NUM_FREQVALUES 10 NUM_QUANTILES 20,
STORE_ID NUM_FREQVALUES 10 NUM_QUANTILES 20,
TIMECREATED NUM_FREQVALUES 10 NUM_QUANTILES 20)
;
表结构:
CREATE TABLE "DB2RTZ1 "."CTWEBTRACK" (
"CTWEBTRACK_ID" BIGINT NOT NULL ,
"STORE_ID" INTEGER NOT NULL ,
"SESSION_ID" VARCHAR(254) ,
"SOURCE_CODE" VARCHAR(9) ,
"IP_ADDRESS" VARCHAR(20) ,
"TIMECREATED" TIMESTAMP )
IN "USERSPACE1" ;
-- DDL Statements for primary key on Table "DB2RTZ1 "."CTWEBTRACK"
ALTER TABLE "DB2RTZ1 "."CTWEBTRACK"
ADD PRIMARY KEY
("CTWEBTRACK_ID");
-- DDL Statements for indexes on Table "DB2RTZ1 "."CTWEBTRACK"
CREATE INDEX "DB2RTZ1 "."I_CTWEBTRACK001" ON "DB2RTZ1 "."CTWEBTRACK"
("TIMECREATED" ASC,
"SOURCE_CODE" ASC,
"STORE_ID" ASC) ALLOW REVERSE SCANS;
db2set设置:
[e] DB2_VIEW_REOPT_VALUES=YES
[i] DB2_SKIPINSERTED=ON
[i] DB2_EVALUNCOMMITTED=ON
[i] DB2_SKIPDELETED=ON
[i] DB2BIDI=yes
[i] DB2ENVLIST=EXTSHM
[i] DB2COMM=tcpip
[i] DB2_PARALLEL_IO=*
[i] DB2AUTOSTART=yes
[g] DB2SYSTEM=cyclops
[g] DB2ADMINSERVER=dasusr1
========================================================
分析:
首先在optlevel=1和optlevel=5上各抓一个db2exfmt:
optlevel=1:
Original Statement:
------------------
DELETE
FROM CTWEBTRACK
WHERE CTWEBTRACK_ID IN
(SELECT CTWEBTRACK_ID
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY)
Optimized Statement:
-------------------
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID <= $C1) AND (Q4.CTWEBTRACK_ID >= $C0) AND
Q4.CTWEBTRACK_ID = ANY
(SELECT DISTINCT Q2.CTWEBTRACK_ID, $C0, $C1
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
ORDER BY Q2.CTWEBTRACK_ID))
Access Plan:
-----------
Total Cost: 219912
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1721.2
DELETE
( 2)
219912
2224.68
/---+---
1721.2 4.16672e+07
IXSCAN TABLE: DB2RTZ1
( 3) CTWEBTRACK
176866
503.479
/----+----
5000 4.16672e+07
TBSCAN INDEX: SYSIBM
( 4) SQL050519131628670
9825.23
485.035
|
5000
SORT
( 5)
9823.85
485.035
|
5000
FETCH
( 6)
9812.51
485.035
/---+---
11487.1 4.16672e+07
IXSCAN TABLE: DB2RTZ1
( 7) CTWEBTRACK
435.54
110.561
|
4.16672e+07
INDEX: DB2RTZ1
I_CTWEBTRACK001
optlvl=5:
Original Statement:
------------------
DELETE
FROM CTWEBTRACK
WHERE CTWEBTRACK_ID IN
(SELECT CTWEBTRACK_ID
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY)
Optimized Statement:
-------------------
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM
(SELECT Q2.CTWEBTRACK_ID
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))) AS Q3,
DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID = Q3.$C0))
Access Plan:
-----------
Total Cost: 437517
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
5000
DELETE
( 2)
437517
189667
/---+--
5000 4.16672e+07
NLJOIN TABLE: DB2RTZ1
( 3) CTWEBTRACK
312470
184667
/----+----
5000 1
TBSCAN IXSCAN
( 4) ( 12)
9788.85 75.0824
2220.03 3
| |
5000 4.16672e+07
SORT INDEX: SYSIBM
( 5) SQL050519131628670
9788.85
2220.03
|
5000
TBSCAN
( 6)
9781.04
2220.03
|
5000
TEMP
( 7)
9768.07
2220.03
|
5000
FETCH
( 8)
9761.42
2220.03
/---+---
11487.1 4.16672e+07
RIDSCN TABLE: DB2RTZ1
( 9) CTWEBTRACK
455.613
110.561
|
11487.1
SORT
( 10)
455.612
110.561
|
11487.1
IXSCAN
( 11)
435.54
110.561
|
4.16672e+07
INDEX: DB2RTZ1
I_CTWEBTRACK001
从上面的两个输出大家可以看出什么?
首先是optlevel=5的cost estimation大概是optlevel=1的两倍。
看到这里也许很多人就开始不理解了:为什么optlevel5的estimation竟然大于optlevel1呢?
这个是由于同样的语句被rewrite成不同查询。
当opt=1时,我们有:
Optimized Statement:
-------------------
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID <= $C1) AND (Q4.CTWEBTRACK_ID >= $C0) AND
Q4.CTWEBTRACK_ID = ANY
(SELECT DISTINCT Q2.CTWEBTRACK_ID, $C0, $C1
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
ORDER BY Q2.CTWEBTRACK_ID))
当opt=5时则有:
DELETE
FROM DB2RTZ1.CTWEBTRACK AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM
(SELECT Q2.CTWEBTRACK_ID
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))) AS Q3,
DB2RTZ1.CTWEBTRACK AS Q4
WHERE (Q4.CTWEBTRACK_ID = Q3.$C0))
看一看两者之间有什么区别,问题会不会出现在这里?
在opt1里面,原语句的IN被改写为
Q4.CTWEBTRACK_ID = ANY
(SELECT DISTINCT Q2.CTWEBTRACK_ID, $C0, $C1
FROM DB2RTZ1.CTWEBTRACK AS Q2
WHERE (Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
ORDER BY Q2.CTWEBTRACK_ID)
而在opt5中则被写为JOIN。
这个不应该成为瓶颈。实际上在大部分情况下JOIN的效率远高于子查询,因为JOIN是set-based processing,而子查询则是row-based processing。
所以暂时先把不同的rewritten语句放到一边,来看一看bad plan,瓶颈可能出现在什么地方。
Access Plan:
-----------
Total Cost: 437517
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
5000
DELETE
( 2)
437517
189667
/---+--
5000 4.16672e+07
NLJOIN TABLE: DB2RTZ1
( 3) CTWEBTRACK
312470
184667
/----+----
5000 1
TBSCAN IXSCAN
( 4) ( 12)
9788.85 75.0824
2220.03 3
| |
5000 4.16672e+07
SORT INDEX: SYSIBM
( 5) SQL050519131628670
9788.85
2220.03
|
5000
TBSCAN
( 6)
9781.04
2220.03
|
5000
TEMP
( 7)
9768.07
2220.03
|
5000
FETCH
( 8)
9761.42
2220.03
/---+---
11487.1 4.16672e+07
RIDSCN TABLE: DB2RTZ1
( 9) CTWEBTRACK
455.613
110.561
|
11487.1
SORT
( 10)
455.612
110.561
|
11487.1
IXSCAN
( 11)
435.54
110.561
|
4.16672e+07
INDEX: DB2RTZ1
I_CTWEBTRACK001
回顾一下查询,其中用到了一个predicate
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
这个predicate很可疑,因为我们知道当使用parameter marker的时候,由于优化器对真实传入的数值没有概念,可能会导致不正确的estimation。
那么看一下IXSCAN(11):
11) IXSCAN: (Index Scan)
Cumulative Total Cost: 435.54
Cumulative CPU Cost: 2.02617e+07
Cumulative I/O Cost: 110.561
Cumulative Re-Total Cost: 21.909
Cumulative Re-CPU Cost: 1.97797e+07
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 75.079
Estimated Bufferpool Buffers: 111.47
Arguments:
---------
MAXPAGES: (Maximum pages for prefetch)
46
PREFETCH: (Type of Prefetch)
SEQUENTIAL
ROWLOCK : (Row Lock intent)
NONE
SCANDIR : (Scan Direction)
FORWARD
TABLOCK : (Table Lock intent)
INTENT NONE
Predicates:
----------
3) Stop Key Predicate
Relational Operator: Less Than ( Subquery Input Required: No
Filter Factor: 0.000275688
Predicate Text:
--------------
(Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
Input Streams:
-------------
1) From Object DB2RTZ1.I_CTWEBTRACK001
Estimated number of rows: 4.16672e+07
Number of columns: 2
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q2.TIMECREATED(A)+Q2.$RID$
Output Streams:
--------------
2) To Operator #10
Estimated number of rows: 11487.1
Number of columns: 1
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q2.TIMECREATED(A)
其中的: Predicates:
----------
3) Stop Key Predicate
Relational Operator: Less Than ( Subquery Input Required: No
Filter Factor: 0.000275688
Predicate Text:
--------------
(Q2.TIMECREATED < -(CURRENT TIMESTAMP, :?, 3))
看起来很可疑,因为实际上应用程序中?的值为90,而通过统计数据发现:
UPDATE SYSSTAT.COLUMNS
SET COLCARD=41576849,
NUMNULLS=0,
SUB_COUNT=-1,
SUB_DELIM_LENGTH=-1,
HIGH2KEY='2008-04-02-00.02.55.592000',
LOW2KEY='2006-07-21-05.02.45.425000',
AVGCOLLEN=11
WHERE COLNAME = 'TIMECREATED' AND TABNAME = 'CTWEBTRACK' AND TABSCHEMA = 'DB2RTZ1 ';
其high2key与low2key之间有将近两年的差别,假设数据均匀分布,则选中的区间应该大约有 (1-90天/2年)*41576849。显而易见其filter factor = 0.000275688应该是underestimate了。
也就是说,当真正在运行的情况下,IXSCAN(11)得到的结果集肯定要远大于其预期的11487.1,所以上面的 SORT(10)很显然会成为最主要的瓶颈。
知道了瓶颈,自然也知道了为什么runstats之后性能下降了。因为在runstats之前,优化器并没有索引的统计数据,所以并不会使用当前的访问计划,而是做最简单的对表的IXSCAN,这样实际上避免了 SORT,以至于runstats之前的性能还是好的。
那么怎么解决呢?相信有朋友会说:如果在应用程序里面把?替换为真实值呢?
这个想法很好,但是实际上却并不会有明显改善。因为CURRENT TIMESTAMP在做query compile的时候同样是没有被evaluate出真实值得,所以就算用真实数据替代?也不会得到准确的estimation。
还有什么方法呢?删除这个索引的统计数据,让优化器使用IXSCAN,毫无疑问这个肯定可以工作,但是不知道有没有人想问:这个 SORT(10)到底是干什么的呢?为什么在IXSCAN以后还要来个SORT?
我们先来看一看这个SORT的具体信息:
10) SORT : (Sort)
Cumulative Total Cost: 455.612
Cumulative CPU Cost: 3.8383e+07
Cumulative I/O Cost: 110.561
Cumulative Re-Total Cost: 21.909
Cumulative Re-CPU Cost: 1.97797e+07
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 455.612
Estimated Bufferpool Buffers: 111.47
Arguments:
---------
DUPLWARN: (Duplicates Warning flag)
TRUE
NUMROWS : (Estimated number of rows)
11488
ROWWIDTH: (Estimated width of rows)
12
SORTKEY : (Sort Key column)
1: Q2.$RID$(A)
TEMPSIZE: (Temporary Table Page Size)
4096
UNIQUE : (Uniqueness required flag)
TRUE
Input Streams:
-------------
2) From Operator #11
Estimated number of rows: 11487.1
Number of columns: 1
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q2.TIMECREATED(A)
Output Streams:
--------------
3) To Operator #9
Estimated number of rows: 11487.1
Number of columns: 1
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q2.$RID$(A)
首先看到,其SORTKEY竟然是RID,而不是索引中的任何一列。这个到底是干什么用的?
谜底揭晓,这个SORT是LIST PREFETCH中的一部分。
大家应该早已理解什么是LIST PREFETCH,在这里不重复了。这里的SORT就是为了把按照TIMECREATED列排序的索引重新按照RID排序,然后使用LIST PREFETCH从磁盘中迅速取出数据。
由于预期的返回行数为一万左右,所以从estimate上来看,这里的SORT不会开销很大,然后LIST PREFETCH却可以节省我们很多的I/O开销。
看一看索引的统计数据:
UPDATE SYSSTAT.INDEXES
SET NLEAF=386196,
NLEVELS=4,
FIRSTKEYCARD=41576849,
FIRST2KEYCARD=41640039,
FIRST3KEYCARD=41646606,
FIRST4KEYCARD=-1,
FULLKEYCARD=41646606,
CLUSTERFACTOR=-1.000000,
CLUSTERRATIO=99,
SEQUENTIAL_PAGES=386152,
DENSITY=98,
AVERAGE_SEQUENCE_GAP=6.000000,
AVERAGE_SEQUENCE_FETCH_GAP=-1.000000,
AVERAGE_SEQUENCE_PAGES=18388.000000,
AVERAGE_SEQUENCE_FETCH_PAGES=-1.000000,
AVERAGE_RANDOM_PAGES=0.000000,
AVERAGE_RANDOM_FETCH_PAGES=-1.000000,
NUMRIDS=41758559,
NUMRIDS_DELETED=0,
NUM_EMPTY_LEAFS=863
WHERE INDNAME = 'I_CTWEBTRACK001' AND INDSCHEMA = 'DB2RTZ1 '
AND TABNAME = 'CTWEBTRACK' AND TABSCHEMA = 'DB2RTZ1 ';
呵呵,CLUSTER RATIO = 99%,说明绝大部分的页都是顺序的,这样就可以确认优化器是想使用list prefetch代替random access,希望能够提升性能。
如果大家对db2set变量熟悉的话应该可以想起来,有一个叫做DB2_MINIMIZE_LISTPREFETCH的变量。
从infocenter上我们看到:
DB2_MINIMIZE_LISTPREFETCH 3 All 3 Default=NO 3
Values: YES or NO
List prefetch is a special table access method that involves retrieving the qualifying RIDs from the index, sorting them by page number and then prefetching the data pages. Sometimes the optimizer does not have accurate information to determine if list prefetch is a good access method. This might occur when predicate selectivities contain parameter markers or host variables that prevent the optimizer from using catalog statistics to determine the selectivity.
This registry variable prevents the optimizer from considering list prefetch in such situations.
Note:
9 9
When either or both of the DB2 UDB SQL compiler variables DB2_MINIMIZE_LISTPREFETCH 9 and DB2_INLIST_TO_NLJN, are set to YES, they remain active even if REOPT(ONCE) is specified.
这个说的就是,当优化器在决定是否使用LIST PREFETCH的时候,如果这部分包含了parameter marker,则db2避免在这种情况下使用。
所以如果我们打开DB2_MINIMIZE_LISTPREFETCH =YES,也是一种解决问题的方法。
综上,这个问题有3个方法在不修改程序的基础上搞定:
1)使用optlevel=1
2)删除索引的统计信息
3)使用DB2_MINIMIZE_LISTPREFETCH =YES
不过如果用户能够修改程序的话,可以考虑把该语句修改为DELETE FROM (SELECT ....)的形式。
这个语法在v8 fp4中推出,如果用户使用:
DELETE FROM ( SELECT 1
FROM ctwebtrack
WHERE timecreated < (CURRENT TIMESTAMP - ? DAYS)
FETCH FIRST 5000 ROWS ONLY );
则可以避免不必要的部分。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/671324/viewspace-329732/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/671324/viewspace-329732/