by hellen
一、优化前的sql 及存取方案图
1. 原Sql:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY INNER_SET.ORDER_DATE DESC) AS ROW_NO, INNER_SET.* FROM
( select order_id,order_date
from
(select order_Id, order_Type_Id, status_Id, owner_Id, order_date,created_By from Order_Header where 1=1 and order_Type_Id = 'DRPR_ORDER' order by order_date desc ) ord
where 1=1 and
(1=2 or exists
(select order_id from Order_Attribute where attr_name='ORDER_CORPORATION_ID' and (1=2 or attr_Value = '10000' or attr_Value = '11000' ) and ord.order_id=order_id )
or exists (select order_id from Order_Attribute where attr_name='BUSINESS_MODE' and attr_Value='BM_DDCG' and ord.order_id=order_id ) )
and exists(select work_effort_id from Work_Effort where workflow_Activity_Id='purchase_approve' and current_Status_Id='WF_RUNNING' and ord.order_id = source_Reference_Id)
and exists(select oi.order_id from Order_Item oi ,product prdt where 1=1 and oi.product_id=prdt.product_id and prdt.product_Name like '%海尔洗衣机%' ) ORDER BY ord.ORDER_DATE DESC ) AS INNER_SET) AS OUTER_SET WHERE OUTER_SET.ROW_NO > 0 AND OUTER_SET.ROW_NO <= 15
2.原语句sql分析:
图1
二、优化过程:
分析1:
分析:图1 WorkEffort表:使用的是TBSCAN ,未利用索引,效率低,
尝试加索引 pk_effort (workflow_Activity_Id、current_Status_Id )
处理:sql不变,对WorkEffort加索引后,sql解释如下
对比:
Work_Effort索引状况 | 查找时间 | 总成本 | 效率 |
未加索引 | 122,451.53 | 240,639.38 | 低 |
加索引 | 120,614.60 | 238,177.41 | 改善 |
分析2: ORDER BY ord.ORDER_DATE DESC
有两个地方提到,显然重复,去掉第一个ORDER BY ord.ORDER_DATE DESC
去掉重复的order by | 总成本 | 效率 |
去掉前 | 238,177.41 |
|
去掉后 | 158,733.84 | 改善 |
分析3:
分析:本查询采用select * from order_header where …, where语句在基本表order_header基础上加了大量的判断.
处理:采取减少基本表 order_header的大小。Sql改为:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY INNER_SET.ORDER_DATE DESC) AS ROW_NO, INNER_SET.* FROM
( select order_id,order_date
from
(select ord.order_Id, order_Type_Id, ord.status_Id, owner_Id, order_date,created_By
from Order_Header ord, Work_Effort effort
where ord.order_Type_Id = 'DRPR_ORDER' and
effort.workflow_Activity_Id='purchase_approve' and current_Status_Id='WF_RUNNING'
and ord.order_id = effort.source_Reference_Id
) tmp
where 1=1 and
(1=2 or exists (select order_id from Order_Attribute where attr_name='ORDER_CORPORATION_ID' and (1=2 or attr_Value = '10000'
or attr_Value = '11000' or attr_Value = '20000' or attr_Value = '30000' or attr_Value = '31000' or attr_Value = '50000' or
attr_Value = '60000' or attr_Value = '90000' ) and tmp.order_id=order_id )
or exists (select order_id from Order_Attribute where attr_name='BUSINESS_MODE' and attr_Value='BM_DDCG' and tmp.order_id=order_id ) )
and exists(select oi.order_id from Order_Item oi ,product prdt where 1=1 and oi.product_id=prdt.product_id and prdt.product_Name like '%海尔洗衣机%' )
ORDER BY tmp.ORDER_DATE DESC ) AS INNER_SET) AS OUTER_SET WHERE OUTER_SET.ROW_NO > 0 AND OUTER_SET.ROW_NO <= 15
对比:
Order_header 处理 | 总成本 | 效率 |
原sql | 158,733.84 | 低 |
现sql | 146,423.36 | 改善 |
分析4:
分析: 分析3的sql 对order_attribute 的 条件限制有重复;
处理: 修改sql
SELECT *
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY INNER_SET.ORDER_DATE
DESC) AS ROW_NO, INNER_SET.*
FROM (
select order_id,order_date
from
(select header.order_Id, order_Type_Id, header.status_Id,
owner_Id, order_date,created_By
from Order_Header header,Work_Effort effort
where 1=1 and header.order_Type_Id = 'DRPR_ORDER' and
workflow_Activity_Id='purchase_approve' and current_Status_Id='WF_RUNNING'
and header.order_id = effort.source_Reference_Id
) ord
where 1=1 and (1=2 or exists
(select order_id
from Order_Attribute
where ((attr_name='ORDER_CORPORATION_ID' and (1=2 or
attr_Value = '10000' or attr_Value = '11000' or
attr_Value = '20000' or attr_Value = '30000' or
attr_Value = '31000' or attr_Value = '50000' or
attr_Value = '60000' or attr_Value = '90000' ))
or ( attr_name='BUSINESS_MODE' and attr_Value='BM_DDCG'
)) and ord.order_id=order_id )) and exists(select
oi.order_id
from Order_Item oi ,product prdt
where 1=1 and oi.product_id=prdt.product_id and prdt.product_Name
like '%海尔%' )
ORDER BY ord.ORDER_DATE DESC ) AS INNER_SET) AS OUTER_SET
WHERE OUTER_SET.ROW_NO > 0 AND OUTER_SET.ROW_NO <= 15
对比:
Sql修改 | 总成本 | 效率 |
原sql | 146,423.36 | 低 |
现sql | 136,035.56 | 改善 |
分析5:
提高本机的内存后 分析4中的sql执行效率为: 26,783.24
分析 IXSCAN的时间,表order_item 索引查找时间11,778异常长,远高于其他表的查找时间。因此分析表order_item。表order_item在product_id上建有索引
select oi.order_id from Order_Item oi ,product prdt
where 1=1 and oi.product_id=prdt.product_id and prdt.product_Name
like '%海尔洗衣机%'
表中有13万条记录,product表有10万条记录,在product_id也建有索引
查看order_item表的统计信息。看出该表的统计信息未更新。
处理:
1 . 察看order_item 表的统计信息
db2 => reorgchk current statistics on table db2inst1.order_item
表统计信息:
F1: 100 * OVERFLOW / CARD < 5
F2: 100 * TSIZE / ((FPAGES-1) * (TABLEPAGESIZE-76)) > 70
F3: 100 * NPAGES / FPAGES > 80
CREATOR NAME CARD OV NP FP TSIZE F 1 F 2 F 3 REORG
--------------------------------------------------------------------------------
DB2INST1 ORDER_ITEM 138064 81 7116 7116 27888928 0 97 100 ---
--------------------------------------------------------------------------------
索引统计信息:
F4: CLUSTERRATIO 或正常化的 CLUSTERFACTOR > 80
F5: 100 * (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) / (NLEAF * INDEXPAGESIZE) > 50
F6: (100-PCTFREE) * (INDEXPAGESIZE-96) / (ISIZE+12) ** (NLEVELS-2) * (INDEXPAGES
IZE-96) / (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) < 100
CREATOR NAME CARD LEAF LVLS ISIZE KEYS F 4 F 5 F 6 REORG
--------------------------------------------------------------------------------
表:DB2INST1.ORDER_ITEM
DB2INST1 ORDER_ITEM_HDR 138064 369 3 10 48245 97 81 53 ---
DB2INST1 ORDER_ITEM_ORTYP 138064 192 2 6 1 100 70 0 ---
DB2INST1 ORDER_ITEM_PRODUCT 138064 201 3 12 2465 43 71 101 *-*
DB2INST1 ORDER_ITEM_STTS 138064 193 2 17 26 88 69 0 ---
DB2INST1 ORDER_ITEM_SYST 138064 192 2 6 1 100 70 0 ---
DB2INST1 PK_ORDER_ITEM 138064 783 3 15 138064 97 99 16 ---
--------------------------------------------------------------------------------
CLUSTERRATIO 或正常化的 CLUSTERFACTOR(F4)将指示索引需要
REORG,该索引与基本表不在相同的序列中。当在表中定义了多个索引时,一个或多个索引
可能被标记为需要 REORG。指定 REORG 顺序的最重要索引。
2. 重新组织索引:
reorg table db2inst1.order_item index db2inst1.ORDER_ITEM_PRODUCT use tempspace1
3. 进行统计 RUNSTATS ON TABLE db2inst1.order_item AND INDEXES ALL
4.重新运行 reorgchk
索引统计信息:
F4: CLUSTERRATIO 或正常化的 CLUSTERFACTOR > 80
F5: 100 * (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) / (NLEAF * INDEXPAGESIZE) > 50
F6: (100-PCTFREE) * (INDEXPAGESIZE-96) / (ISIZE+12) ** (NLEVELS-2) * (INDEXPAGES
IZE-96) / (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) < 100
CREATOR NAME CARD LEAF LVLS ISIZE KEYS F 4 F 5 F 6 REORG
--------------------------------------------------------------------------------
表:DB2INST1.ORDER_ITEM
DB2INST1 ORDER_ITEM_HDR 138086 366 3 10 48266 28 81 53 *--
DB2INST1 ORDER_ITEM_ORTYP 138086 193 2 6 1 100 69 0 ---
DB2INST1 ORDER_ITEM_PRODUCT 138086 202 3 12 2465 99 71 101 --*
DB2INST1 ORDER_ITEM_STTS 138086 194 2 17 26 83 69 0 ---
DB2INST1 ORDER_ITEM_SYST 138086 193 2 6 1 100 69 0 ---
DB2INST1 PK_ORDER_ITEM 138086 726 3 15 138086 26 106 16 *--
--------------------------------------------------------------------------------
CLUSTERRATIO 或正常化的 CLUSTERFACTOR(F4)将指示索引需要
REORG,该索引与基本表不在相同的序列中。当在表中定义了多个索引时,一个或多个索引
可能被标记为需要 REORG。指定 REORG 顺序的最重要索引。
5. REBIND 所有的数据库包,以便静态 SQL 可以利用最新的系统统计信息。
db2rbind spider81 -l logfile.out -u db2inst1 -p ibmdb2
结果:
对比:
| 总成本 | 效率 |
Reorg前 | 26,783.24 |
|
Reorg后 | 48,721.15 | 更差了! |
分析:
product 原来是ixscan,现在是 tbscan. Reorg可能改变了与之相关的表的信息。
而相关的Sql 是:
select oi.order_id from Order_Item oi ,product prdt
where 1=1 and oi.product_id=prdt.product_id and prdt.product_Name
like '%海尔洗衣机%'
处理: 在product表对字段 product_name 创建索引 PRODUCT_PRDNAME 。
再分析sql
分析: 索引用到了,但是 时间 33,794.18和tbscan的34,156 相比并没有显著改善,为什么呢
运行一下: reorgchk current statistics on table db2inst1.product
表统计信息:
F1: 100 * OVERFLOW / CARD < 5
F2: 100 * TSIZE / ((FPAGES-1) * (TABLEPAGESIZE-76)) > 70
F3: 100 * NPAGES / FPAGES > 80
CREATOR NAME CARD OV NP FP TSIZE F 1 F 2 F 3 REORG
--------------------------------------------------------------------------------
DB2INST1 PRODUCT 103168 2 8688 8688 33839104 0 96 100 ---
--------------------------------------------------------------------------------
索引统计信息:
F4: CLUSTERRATIO 或正常化的 CLUSTERFACTOR > 80
F5: 100 * (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) / (NLEAF * INDEXPAGESIZE) > 50
F6: (100-PCTFREE) * (INDEXPAGESIZE-96) / (ISIZE+12) ** (NLEVELS-2) * (INDEXPAGES
IZE-96) / (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) < 100
CREATOR NAME CARD LEAF LVLS ISIZE KEYS F 4 F 5 F 6 REORG
--------------------------------------------------------------------------------
表:DB2INST1.PRODUCT
DB2INST1 PK_PRODUCT - - - - - - - - ---
DB2INST1 PRODUCT_PRDNAME - - - - - - - - ---
--------------------------------------------------------------------------------
CLUSTERRATIO 或正常化的 CLUSTERFACTOR(F4)将指示索引需要
REORG,该索引与基本表不在相同的序列中。当在表中定义了多个索引时,一个或多个索引
可能被标记为需要 REORG。指定 REORG 顺序的最重要索引。
结果令人吃惊,为什么索引没内容呢
再运行 RUNSTATS ON TABLE db2inst1.product AND INDEXES ALL
表统计信息:
F1: 100 * OVERFLOW / CARD < 5
F2: 100 * TSIZE / ((FPAGES-1) * (TABLEPAGESIZE-76)) > 70
F3: 100 * NPAGES / FPAGES > 80
CREATOR NAME CARD OV NP FP TSIZE F 1 F 2 F 3 REORG
--------------------------------------------------------------------------------
DB2INST1 PRODUCT 103168 2 8688 8688 33839104 0 96 100 ---
--------------------------------------------------------------------------------
索引统计信息:
F4: CLUSTERRATIO 或正常化的 CLUSTERFACTOR > 80
F5: 100 * (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) / (NLEAF * INDEXPAGESIZE) > 50
F6: (100-PCTFREE) * (INDEXPAGESIZE-96) / (ISIZE+12) ** (NLEVELS-2) * (INDEXPAGES
IZE-96) / (KEYS * (ISIZE+8) + (CARD-KEYS) * 4) < 100
CREATOR NAME CARD LEAF LVLS ISIZE KEYS F 4 F 5 F 6 REORG
--------------------------------------------------------------------------------
表:DB2INST1.PRODUCT
DB2INST1 PK_PRODUCT 103168 480 3 10 103168 95 94 35 ---
DB2INST1 PRODUCT_PRDNAME 103168 271 3 23 16107 53 76 48 *--
--------------------------------------------------------------------------------
CLUSTERRATIO 或正常化的 CLUSTERFACTOR(F4)将指示索引需要
REORG,该索引与基本表不在相同的序列中。当在表中定义了多个索引时,一个或多个索引
可能被标记为需要 REORG。指定 REORG 顺序的最重要索引。
运行sql
对product建索引 | 总成本 | 效率 |
索引前 | 48,721.15 |
|
索引后 | 22,207.1 |
|
也比分析5的26,783.24 有了提高。
分析6:
分析:将分析4的sql 放到程序中运行,发现 不加商品名称 和 加商品名称 查询时居然记录数是一样的! 仔细看sql ,原来 Order_Item 和 product关联时没和order_header关联。
处理: 修改sql如下: 红字为新加
SELECT *
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY INNER_SET.ORDER_DATE
DESC) AS ROW_NO, INNER_SET.*
FROM (
select order_id,order_date
from
(select header.order_Id, order_Type_Id, header.status_Id,
owner_Id, order_date,created_By
from Order_Header header,Work_Effort effort
where 1=1 and header.order_Type_Id = 'DRPR_ORDER' and
workflow_Activity_Id='purchase_approve' and current_Status_Id='WF_RUNNING'
and header.order_id = effort.source_Reference_Id
) ord
where 1=1 and (1=2 or exists
(select order_id
from Order_Attribute
where ((attr_name='ORDER_CORPORATION_ID' and (1=2 or
attr_Value = '10000' or attr_Value = '11000' or
attr_Value = '20000' or attr_Value = '30000' or
attr_Value = '31000' or attr_Value = '50000' or
attr_Value = '60000' or attr_Value = '90000' ))
or ( attr_name='BUSINESS_MODE' and attr_Value='BM_DDCG'
)) and ord.order_id=order_id )) and exists(select
oi.order_id
from Order_Item oi ,product prdt
where 1=1 and oi.product_id=prdt.product_id and prdt.product_Name
like '%海尔%' and oi.order_id=ord.order_id)
ORDER BY ord.ORDER_DATE DESC ) AS INNER_SET) AS OUTER_SET
WHERE OUTER_SET.ROW_NO > 0 AND OUTER_SET.ROW_NO <= 15
Sql存取图:
| 总成本 | 效率 |
改动前 | 22,207.1 |
|
改动后 | 6,819.9 |
|
再分析存取图: 显然针对 product表的productName所作的索引没用上,考虑去掉productname索引 ,在运行一下sql 总成本不变。 因此去掉 productName的索引。
最终 处理方式:
1. work_effort加索引: pk_effort
2. sql 为 分析6的sql
1. 尽量减少order by、exsist 字句的使用
2. 在两个表连接列上建索引 where a=b
3. 对于where (a=b or c=d) and e=f 情况, 在 and 后的e,f表的索引起作用
4. 对表作改动时,要进行必要的reorg 和runstats
contact me:
mousehellen@hotmail.com