DB2 优化 RUNSTATS：排除索引碎片故障

最新推荐文章于 2023-08-21 11:01:52 发布

pianzif

最新推荐文章于 2023-08-21 11:01:52 发布

阅读量4.7k

点赞数

分类专栏： DB2@BK 文章标签： db2 runstats

DB2@BK 专栏收录该内容

104 篇文章 6 订阅

订阅专栏

简介

索引碎片可能由正常的数据库活动导致，比如 INSERT 和 UPDATE。当索引变得高度碎片化时，会对 RUNSTATS 性能产生重大影响。您可学习识别何时出现了这种情形，并采取更正措施。

本文适用于运行 DB2 9.7 或更低版本的安装。在引入了提前读预取 (readahead prefetching) 的 DB2 10.1 中，索引碎片对 RUNSTATS 性能不会产生同样的影响。

索引扫描性能对 RUNSTATS 性能有何影响

在您提交以下命令时，DB2 会收集表和它的所有索引的统计数据：

RUNSTATS ON TABLE MY.TABLE1 AND INDEXES ALL

它首先扫描表来确定表统计数据，然后扫描表的所有索引，一次一个，以确定索引统计数据。

对于具有多个索引的大型表，索引扫描性能是 RUNSTATS 性能的一个重要影响因素。如果在扫描用户请求页面之前，将合适的索引页面预取到缓冲池中，索引扫描的运行速度将会很快。但是，如果扫描用户必须等待磁盘 I/O 将页面加载到缓冲池中，索引扫描的运行速度就会欠佳。其他数据库操作（比如查询）也将使用索引扫描。但是，RUNSTATS 会全面扫描表的所有索引，一次扫描一个，而查询可能仅扫描表的某个索引的一部分。当索引扫描性能很糟糕时，可在 RUNSTATS 中很明显地感觉到。

DB2 9.7 和更低的版本使用顺序检测来确定是否应该执行索引预取。在 RUNSTATS 按照索引的键的顺序处理索引页面时，如果数据库管理器检测到对索引页面的顺序访问，就会启动预取。当一个索引未碎片化并具有物理上连续的页面时，预取很有用。但是，如果索引的页面分散在整个表空间中，如果预取的大多数页面都不会被使用，那么预取页面可能很浪费资源。

索引碎片是由多种因素引起的。一个因素是一个表有多个索引。一个表的索引存储在单个索引对象中（分区表的未分区索引除外），因此一个索引的页面可与另一个索引的页面相混合。另一个因素是 INSERT 和 UPDATE 活动可能引起的索引页面拆分。

索引重组可将索引数据重新构建到未碎片化的物理上连续的页面中。这使顺序检测能够将页面预取到缓冲池中，以便 RUNSTATS 有需要时下一页已经可以使用。结果将会得到更快的 RUNSTATS 运行速度。

DB2 10.1 提供了新的预取功能，其中索引碎片不再对索引扫描性能具有重大的有害影响。这将在本文后面更详细地讨论。

一个包含碎片化的索引和未碎片化的索引的示例

为了演示碎片化的索引对 RUNSTATS 性能的影响，让我们创建索引碎片化的一种极端情形。清单 1 包含用来创建一个包含 1000 万行的表的命令。它首先创建 5 个索引，然后插入数据。使用这个 insert 方法，不同索引的页面将混合在一起，而且每个索引被严重碎片化。

清单 1. 创建碎片化的索引的脚本

-- run this CLP file with autocommit off (db2 +c -tvf FILE)

connect to db97;

-- create not logged initially table
drop table demo.t1;
CREATE TABLE demo.t1 (i1 int not null,
                      i2 int not null,
                      i3 int not null,
                      i4 int not null,
                      i5 int not null,
                      i6 int not null,
                      i7 int not null)
       not logged initially;

-- create indexes BEFORE inserting data 
create index demo.t1i1    on demo.t1 (i1);
create index demo.t1i2    on demo.t1 (i2);
create index demo.t1i3    on demo.t1 (i3);
create index demo.t1i62   on demo.t1 (i6,i2);
create index demo.t1i765  on demo.t1 (i7,i6,i5);

-- insert 10M rows 
-- note: the pages for the five indexes will be intermixed in the table space 
-- and each index will be fragmented
insert into demo.t1 with q(a) as (values 1 union all select a+1 from q where a<10000000)
  select -a,mod(a,1237),mod(-a,251),mod(a,353),mod(-a,100),mod(a,257),mod(-a,511) from q;
commit;

connect reset;

现在让我们看看一个 RUNSTATS 要花多长时间。对于碎片化的索引，清单 2 显示一次典型的 RUNSTATS 运行花费了超过 6 分钟的时间。

清单 2. 碎片化的索引的 RUNSTATS 执行时间

values (current timestamp, 'TEST1: start')
1                          2
-------------------------- -------------
2013-06-16-13.59.16.093219 TEST1: start
  1 record(s) selected.

runstats on table DEMO.T1 with distribution and sampled detailed indexes all
DB20000I  The RUNSTATS command completed successfully.

values (current timestamp, 'TEST1: stop')
1                          2
-------------------------- ------------
2013-06-16-14.05.25.235269 TEST1: stop
  1 record(s) selected.

接下来，我们使用 REORG 命令重组索引并再执行一次 RUNSTATS。清单 3 显示索引重组用了 1 分钟时间就完成了。

清单 3. REORG INDEXES 命令和执行时间

reorg indexes all for table demo.t1
DB20000I  The REORG command completed successfully.

real    0m58.54s
user    0m0.01s
sys     0m0.01s

清单 4 表明，对于重组后的未碎片化索引，RUNSTATS 现在不到 30 秒就能运行完。与最初的 6 分钟相比，速度调高超过了 12 倍。

清单 4. 执行索引重组后的 RUNSTATS 执行时间

values (current timestamp, 'TEST2: start')
1                          2
-------------------------- -------------
2013-06-17-23.00.35.432198 TEST2: start
  1 record(s) selected.

runstats on table DEMO.T1 with distribution and sampled detailed indexes all
DB20000I  The RUNSTATS command completed successfully.

values (current timestamp, 'TEST2: stop')
1                          2
-------------------------- ------------
2013-06-17-23.01.03.483116 TEST2: stop
  1 record(s) selected.

这是一个人为的示例，但潜在的影响却是真实的。在大型 DB2 安装中，DBA 报告 RUNSTATS 运行时间从 18 小时缩短到了 40 分钟。

回页首

使用统计日志查找具有较长 RUNSTATS 运行时间的表

您可能想知道如何检查您是否存在任何具有这个 RUNSTATS 性能问题的表。一种方法是识别具有长期运行的 RUNSTATS 的表。您需要关注这些表，因为时间是一种非常重要的资源。如果一个表受到了索引碎片的影响，但 RUNSTATS 运行时间不是很糟糕，或许是因为这个表相对较小，所以它不需要您的注意。

统计日志（在 DB2 9.5 中引入）包含实例中的统计操作的信息。这些日志存储在诊断数据目录路径中的 events 目录中 (db2dump/events)。清单 5 显示了碎片化的索引示例中针对 RUNSTATS 的 COLLECT 操作日志条目。从统计日志中，可通过查看 start 和 success 时间戳来识别运行时间较长的 RUNSTATS。

清单 5. 统计日志条目

2013-06-16-13.59.16.101750-240 E1597A582          LEVEL: Event
PID     : 2950020              TID  : 1544        PROC : db2sysc
INSTANCE: kwaiwong             NODE : 000         DB   : DB97
APPHDL  : 0-29                 APPID: *LOCAL.kwaiwong.130616175833
AUTHID  : KWAIWONG
EDUID   : 1544                 EDUNAME: db2agent (DB97)
FUNCTION: DB2 UDB, relation data serv, sqlrLocalRunstats, probe:10
COLLECT : TABLE AND INDEX STATS : AT "2013-06-16-13.59.16.096268" : BY "User" : start
OBJECT  : Object name with schema, 11 bytes
DEMO    .T1
IMPACT  : None

2013-06-16-14.05.25.231686-240 E2180A723          LEVEL: Event
PID     : 2950020              TID  : 1544        PROC : db2sysc
INSTANCE: kwaiwong             NODE : 000         DB   : DB97
APPHDL  : 0-29                 APPID: *LOCAL.kwaiwong.130616175833
AUTHID  : KWAIWONG
EDUID   : 1544                 EDUNAME: db2agent (DB97)
FUNCTION: DB2 UDB, relation data serv, sqlrLocalRunstats, probe:220
COLLECT : TABLE AND INDEX STATS : AT "2013-06-16-14.05.25.231595" : BY "User" : success
OBJECT  : Object name with schema, 11 bytes
DEMO    .T1
IMPACT  : None
DATA #1 : String, 109 bytes
RUNSTATS ON TABLE "DEMO"."T1" ON ALL COLUMNS WITH DISTRIBUTION ON ALL COLUMNS 
 AND SAMPLED DETAILED INDEXES ALL

查看统计日志中的信息的一种更简单方法是，使用 SYSPROC.PD_GET_DIAG_HIST 表函数。您可以使用这个表函数格式化统计日志的不同方面，从而帮助您查看和分析统计事件。清单 6 给出了一个示例。

清单 6. 使用 SYSPROC.PD_GET_DIAG_HIST

select timestamp(varchar(substr(first_eventqualifier,1,26),26)) as eventtime, 
substr(objname_qualifier,1,20) as objschema, substr(objname,1,10) as objname, 
substr(eventtype,1,8) as eventtype,
substr(second_eventqualifier,1,10) eventby, 
substr(eventstate,1,10) as eventstate 
from table(sysproc.PD_GET_DIAG_HIST('optstats','EX','NONE',null,cast(null as timestamp))) 
as sl order by objschema,objname,eventtime

EVENTTIME                  OBJSCHEMA            OBJNAME    EVENTTYPE EVENTBY    EVENTSTATE
-------------------------- -------------------- ---------- --------- ---------- ----------
2013-06-16-13.59.16.096268 DEMO                 T1         COLLECT   User       start
2013-06-16-14.05.25.231595 DEMO                 T1         COLLECT   User       success

  2 record(s) selected.

请记住，统计日志是一种滚动日志，可使用 DB2_OPTSTATS_LOG 注册表变量配置 DB2 包含日志的方式。

回页首

检查正在运行的 RUNSTATS

查看运行时间较长的 RUNSTATS 的另一种方法是检查正在运行的功能，看看它是否运行了较长时间。清单 7 显示了在 RUNSTATS 仍在执行时所提交的 LIST UTILITIES SHOW DETAIL 的输出。在此示例中，用户在 6 月 16 日 13:59 对表 DEMO.T1 调用了 RUNSTATS。

清单 7. LIST UTILITIES 输出

list utilities show detail 

ID                               = 1
Type                             = RUNSTATS
Database Name                    = DB97
Partition Number                 = 0
Description                      = DEMO.T1
Start Time                       = 06/16/2013 13:59:16.356724
State                            = Executing
Invocation Type                  = User
Throttling:
   Priority                      = Unthrottled

RUNSTATS 执行信息也可从 db2pd 命令获得，这将在本文后面讨论。

从统计数据中识别碎片化的索引

您可使用索引统计数据识别可能碎片化的索引。涉及的统计数据包括 SYSCAT.INDEXES.SEQUENTIAL_PAGES 和 SYSCAT.INDEXES.NLEAF。当一个索引非常连贯时，sequential_pages 将非常接近 nleaf。当一个索引已经碎片化时，sequential_pages 将比 nleaf 小得多。让我们看看从之前的示例中收集的统计数据。

清单 8 显示，在 REORG 之前，所有索引都有 0 个 sequential_pages，而在 REORG 之后，所有索引的 sequential_pages 非常接近 nleaf。除了表大小，还可以使用这些索引统计数据来识别具有您希望处理的碎片化的索引的表。

清单 8. SEQUENTIAL_PAGES 和 NLEAF 索引统计数据

select substr(indname,1,8) as idxname, stats_time, nleaf, sequential_pages
from syscat.indexes where tabschema='DEMO' and tabname='T1'

---
-- first, before REORG ...
---

IDXNAME  STATS_TIME                 NLEAF                SEQUENTIAL_PAGES
-------- -------------------------- -------------------- --------------------
T1I1     2013-06-16-14.05.25.080000                41841                    0
T1I2     2013-06-16-14.05.25.080000                25245                    0
T1I3     2013-06-16-14.05.25.080000                24894                    0
T1I62    2013-06-16-14.05.25.080000                31138                    0
T1I765   2013-06-16-14.05.25.080000                81178                    0

  5 record(s) selected.

---
-- next, after REORG ...
---

select substr(indname,1,8) as idxname, stats_time, nleaf, sequential_pages
from syscat.indexes where tabschema='DEMO' and tabname='T1'

IDXNAME  STATS_TIME                 NLEAF                SEQUENTIAL_PAGES
-------- -------------------------- -------------------- --------------------
T1I1     2013-06-17-23.01.03.320000                41667                41666
T1I2     2013-06-17-23.01.03.320000                22475                22474
T1I3     2013-06-17-23.01.03.320000                22473                22472
T1I62    2013-06-17-23.01.03.320000                22817                22816
T1I765   2013-06-17-23.01.03.320000                64103                64102

  5 record(s) selected.

回页首

确认较长的 RUNSTATS 运行时间源于索引统计数据的收集

识别具有碎片化的索引的表后，您可能希望确认较长的运行时间归咎于索引统计数据收集。db2pd monitor 命令的 -runstats 选项可为您提供帮助。它提供了 RUNSTATS 的不同阶段的持续时间，包括收集表统计数据的时间，以及收集最多 4 种索引的索引统计数据的时间。当一个表有超过 4 种索引时，可在 RUNSTATS 执行期间反复（如果有必要）运行 db2pd，以采集第一种索引的收集时间。在 RUNSTATS 完成后运行 db2pd 将会显示所处理的最后 4 种索引的收集时间。让我们再次查看之前的示例中的 db2pd 数据。

清单 9 显示了 RUNSTATS 完成后的 db2pd -runstats 输出。

清单 9. db2pd -runstats 输出

DB Partition 0 - Database DB97 - Active - Up 0 days 00:07:44 - Date 06/16/2013 14:05:55

Table Runstats Information:

Retrieval Time: 06/16/2013 14:05:55
TbspaceID: 2        TableID: 4
Schema: DEMO      TableName: T1
Status: Completed     Access: Allow write
Sampling: No          Sampling Rate: -
Start Time: 06/16/2013 13:59:16   End Time: 06/16/2013 13:59:25
Total Duration: 00:00:09
Cur Count: 0                      Max Count: 0

Index Runstats Information:

Retrieval Time: 06/16/2013 14:05:55
TbspaceID: 2        TableID: 4
Schema: DEMO      TableName: T1
Status: Completed     Access: Allow write
Start Time: 06/16/2013 13:59:25   End Time: 06/16/2013 14:05:25
Total Duration: 00:05:59
Prev Index Duration [1]: 00:00:20
Prev Index Duration [2]: 00:00:49
Prev Index Duration [3]: 00:00:11
Cur Index Start: 06/16/2013 14:02:31
Cur Index: 5            Max Index: 5            Index ID: 5
Cur Count: 0                      Max Count: 0

输出的第一部分 Table Runstats Information 显示了表统计数据收集的信息。仅用了 9 秒（13:59:16 到 13:59:25）来收集表统计数据。输出的第二部分 Index Runstats Information 表明索引 iid5 的统计数据收集用了 2 分 54 秒（14:02:31 到 14:05:25）。Prev Index Duration 区域表明 iid4、iid3、iid2 分别用了 20、49 和 11 秒。表和索引节都显示了 Completed 状态。

这个表有 5 种索引。在 RUNSTATS 正在执行时收集的 db2pd -runstats 输出可显示第一种索引的收集时间。清单 10 是在 RUNSTATS 处理 iid4 时收集的。

清单 10. 执行期间的 db2pd -runstats 输出

DB Partition 0 - Database DB97 - Active - Up 0 days 00:04:14 - Date 06/16/2013 14:02:25

Table Runstats Information:

Retrieval Time: 06/16/2013 14:02:25
TbspaceID: 2        TableID: 4
Schema: DEMO      TableName: T1
Status: Completed     Access: Allow write
Sampling: No          Sampling Rate: -
Start Time: 06/16/2013 13:59:16   End Time: 06/16/2013 13:59:25
Total Duration: 00:00:09
Cur Count: 0                      Max Count: 0

Index Runstats Information:

Retrieval Time: 06/16/2013 14:02:25
TbspaceID: 2        TableID: 4
Schema: DEMO      TableName: T1
Status: In Progress   Access: Allow write
Start Time: 06/16/2013 13:59:25   End Time: 06/16/2013 14:02:11
Total Duration: 00:02:45
Prev Index Duration [1]: 00:00:49
Prev Index Duration [2]: 00:00:11
Prev Index Duration [3]: 00:01:45
Cur Index Start: 06/16/2013 14:02:11
Cur Index: 4            Max Index: 5            Index ID: 4
Cur Count: 23150                  Max Count: 47500

索引节中的状态显示为 In progress。Prev Index Duration [3] 表明第一个索引 iid1 的收集时间为 1 分 45 秒。

确认 RUNSTATS 运行时间较长是因为缺少预取

可使用 DB2 数据库监视器来确认 RUNSTATS 运行时间较长是因为缺少预取。再次声明，我们将查看之前使用的示例。清单 11 显示了执行 REORG 前后的 RUNSTATS 的快照数据。

清单 11. 应用程序快照数据

get snapshot for application AGENTID $runstatsAppId

*** before REORG ...
Snapshot timestamp                         = 06/16/2013 14:05:44.884178
Buffer pool index logical reads            = 409356
Buffer pool index physical reads           = 204435
Total buffer pool read time (milliseconds) = 685220

*** after REORG ...
Snapshot timestamp                         = 06/17/2013 23:01:17.519335
Buffer pool index logical reads            = 175391
Buffer pool index physical reads           = 140
Total buffer pool read time (milliseconds) = 157

执行 REORG 之前，较高数量的索引物理读取表明缺少预取。伴随这个较高的物理读取数量的是较长的缓冲池读取时间，这直接导致了较长的 RUNSTATS 执行时间。

相比较而言，执行索引重组后的快照数据拥有改善的性能特征。更少的逻辑读取表明要处理的页面更少，更低的物理与逻辑比率暗示在 RUNSTATS 方索引页面时，这些页面已预取到缓冲池中。这两种因素造就了演示中 12 倍的改进。

回页首

使用 REORG 消除索引碎片

除了 RUNSTATS 性能之外，碎片化的索引还可能影响一般系统性能。因为索引扫描可能在查询处理期间执行，所以消除它们的碎片可能有助于改善查询和 RUNSTATS 的执行时间。

REORG INDEX 命令可用于重新构建索引和消除碎片。清单 3 提供了该命令的一个示例。

一定要考虑的是，对象重组会消耗资源，比如临时空间。

DB2 10.1 中的预取增强

DB2 10.1 包含一种新的预取类型，称为提前读 (readahead, RA) 预取。此特性改善了访问碎片化的对象的操作的性能，减少了执行 REORG 的需求。如果在具有 RA 预取功能的 DB2 10.1 上重复本文中的演示，您会观察到使用和不使用 DB2 10.1 的 RUNSTATS 运行时间非常接近，也与 DB2 9.7 中执行 REORG 后的 RUNSTATS 时间非常接近。

有关的更多信息，请参阅参考资料一节，获取 IBM DB2 Version 10.1 信息中心中通过更高效的数据和索引预取来改善查询性能的链接。