SQL调优一则。关键字:索引使用效率 谓词筛选率

前两天遇到的SQL调优问题。

关键字:索引使用效率 谓词筛选率

[@more@]

问题描述:

SQL执行时间不能忍受,2小时以上未出结果。

环境:

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

PL/SQL Release 10.2.0.4.0 - Production

CORE 10.2.0.4.0 Production

TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio

NLSRTL Version 10.2.0.4.0 – Production

SQL

select count(distinct(case

when c.sumdutypaid <> 0 then

c.claimno

else

null

end)),

count(distinct(case

when c.sumdutypaid = 0 then

c.claimno

else

null

end)),

sum(c.sumdutypaid)

FROM (SELECT claimno, sumdutypaid

FROM CCIC.PRPLCOMPENSATE

WHERE SUBSTR(FLAG, 4, 1) = '0'

AND underwriteflag in ('1', '3')

and (SUBSTR(flag, 6, 1) <> 'p' or SUBSTR(flag, 6, 1) is null)

AND compensatetype <> '3'

AND compensateno like 'CDD%') c,

(SELECT claimno

FROM ccic.prplclaimtoplatformsuccess

WHERE uploaddate between

to_date('2011-06-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and

to_date('2011-06-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')

AND requesttype = '82'

AND comflag = 'J') s

WHERE s.claimno = c.claimno;

基础数据:

CCIC.PRPLCOMPENSATE数据量1000W+4G多。

ccic.prplclaimtoplatformsuccess较小,不计。

处理过程:

1.首先查执行计划

SQL> set autotrace traceonly

SQL> select * from dual;

Execution Plan

----------------------------------------------------------

ERROR:

ORA-00604: error occurred at recursive SQL level 2

ORA-16000: database open for read-only access

SP2-0612: Error generating AUTOTRACE EXPLAIN report

Statistics

----------------------------------------------------------

24 recursive calls

0 db block gets

8 consistent gets

4 physical reads

0 redo size

511 bytes sent via SQL*Net to client

492 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

报错。该库是将一个生产的DG库启动为READ_ONLY模式供查询使用。由于不能做写操作,set autotraceexplain plan for then select * from table(dbms_xplan.display)的方式都获取不到执行计划,因为这两种方法都是将执行计划写入一个表中,再从该表读出展示(个人分析应该是这个原因)。故使用sql_trace生成后台trace来获取执行计划,具体操作略。

以下是执行计划,第一个是完整sql,第二个是s子表,第三个是c子表,执行过程的统计信息(parse等)对于一、三无意义(因为时间太长没有执行完,手动中止了)。

*******************************************************************************

select count (distinct (case when c.sumdutypaid <>0 then c.claimno else null end)),

count (distinct (case when c.sumdutypaid = 0 then c.claimno else null end)),

sum(c.sumdutypaid)

FROM (SELECT claimno,sumdutypaid FROM CCIC.PRPLCOMPENSATE

WHERE SUBSTR(FLAG, 4, 1) = '0'

AND underwriteflag in ('1', '3')

and (SUBSTR(flag, 6, 1) <> 'p' or SUBSTR(flag, 6, 1) is null)

AND compensatetype <> '3'

AND compensateno like 'CDD%') C,

(SELECT claimno FROM ccic.prplclaimtoplatformsuccess

WHERE uploaddate between

to_date('2011-06-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and

to_date('2011-06-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')

AND requesttype = '82'

AND comflag = 'J' )s

WHERE s.claimno = c.claimno

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 0.11 6.59 890 1168 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 0.11 6.59 890 1168 0 0

Misses in library cache during parse: 0

Optimizer mode: ALL_ROWS

Parsing user id: SYS

Rows Row Source Operation

------- ---------------------------------------------------

0 SORT GROUP BY (cr=0 pr=0 pw=0 time=152 us)

0 HASH JOIN (cr=0 pr=0 pw=0 time=40 us)

0 TABLE ACCESS BY INDEX ROWID PRPLCOMPENSATE (cr=0 pr=0 pw=0 time=32 us)

1218 INDEX RANGE SCAN PK_LCOMPENSATE (cr=15 pr=14 pw=0 time=38117 us)(object id 50498)

0 TABLE ACCESS FULL PRPLCLAIMTOPLATFORMSUCCESS (cr=0 pr=0 pw=0 time=0 us)

*******************************************************************************

SELECT count(1) FROM ccic.prplclaimtoplatformsuccess

WHERE uploaddate between

to_date('2011-06-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and

to_date('2011-06-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')

AND requesttype = '82'

AND comflag = 'J'

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 1.86 15.61 35214 66794 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 1.86 15.61 35214 66794 0 1

Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: SYS

Rows Row Source Operation

------- ---------------------------------------------------

1 SORT AGGREGATE (cr=66794 pr=35214 pw=0 time=15611750 us)

2876 TABLE ACCESS FULL PRPLCLAIMTOPLATFORMSUCCESS (cr=66794 pr=35214 pw=0 time=14404921 us)

*******************************************************************************

SELECT count(1) FROM CCIC.PRPLCOMPENSATE

WHERE SUBSTR(FLAG, 4, 1) = '0'

AND underwriteflag in ('1', '3')

and (SUBSTR(flag, 6, 1) <> 'p' or SUBSTR(flag, 6, 1) is null)

AND compensatetype <> '3'

AND compensateno like 'CDD%'

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 8.14 190.72 55941 358126 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 8.14 190.73 55941 358126 0 0

Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: SYS

Rows Row Source Operation

------- ---------------------------------------------------

0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=64 us)

0 TABLE ACCESS BY INDEX ROWID PRPLCOMPENSATE (cr=0 pr=0 pw=0 time=38 us)

362388 INDEX RANGE SCAN PK_LCOMPENSATE (cr=2896 pr=2864 pw=0 time=20315365 us)(object id 50498)

*******************************************************************************

2.初步判断:

·c子表返回结果集可能太大,执行较长时间没有执行完,这个子查询的消耗比较大。

·s子表返回2000多行,这个很小,也很快,不会是性能瓶颈。

·cs等值连接时,只要s作驱动表,无论NL还是HashJoin都是效率比较好的。不过目前是c作驱动表,而c又比较大,故效率很差。

调整s作驱动表,sql及执行计划如下:

*******************************************************************************

select/*+ ORDERED */ count (distinct (case when c.sumdutypaid <>0 then c.claimno else null end)),

count (distinct (case when c.sumdutypaid = 0 then c.claimno else null end)),

sum(c.sumdutypaid)

FROM (SELECT claimno FROM ccic.prplclaimtoplatformsuccess

WHERE uploaddate between

to_date('2011-06-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and

to_date('2011-06-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')

AND requesttype = '82'

AND comflag = 'J' )s,

(SELECT claimno,sumdutypaid FROM CCIC.PRPLCOMPENSATE

WHERE SUBSTR(FLAG, 4, 1) = '0'

AND underwriteflag in ('1', '3')

and (SUBSTR(flag, 6, 1) <> 'p' or SUBSTR(flag, 6, 1) is null)

AND compensatetype <> '3'

AND compensateno like 'CDD%') c

WHERE s.claimno = c.claimno

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.11 0.40 27 149 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 1 0.18 2.33 3711 5807 0 0

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 3 0.29 2.73 3738 5956 0 0

Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: SYS

Rows Row Source Operation

------- ---------------------------------------------------

0 SORT GROUP BY (cr=0 pr=0 pw=0 time=38 us)

0 HASH JOIN (cr=0 pr=0 pw=0 time=11 us)

0 TABLE ACCESS FULL PRPLCLAIMTOPLATFORMSUCCESS (cr=0 pr=0 pw=0 time=5 us)

0 TABLE ACCESS BY INDEX ROWID PRPLCOMPENSATE (cr=0 pr=0 pw=0 time=0 us)

0 INDEX RANGE SCAN PK_LCOMPENSATE (cr=0 pr=0 pw=0 time=0 us)(object id 50498)

*******************************************************************************

但是执行的仍然很慢。

3.进一步判断:

c子查询的效率低是主要因素。重点分析c子查询。

c子查询的数据量跑出,为90W+记录,数据量不算大,但是跑的时间很长。

c子查询中,对PK的索引进行index range scan,即使首先使用条件(compensateno like 'CDD%'),选出数据后再使用其它条件过滤,经查询,符合条件(compensateno like 'CDD%')的记录有将近1KW条,

SQL>SELECT count(1) FROM CCIC.PRPLCOMPENSATE WHERE compensateno like 'CDD%';

COUNT(1)

-------------

9763692

根据统计信息中该标的总记录数为1KW多条(SQL如下),那么使用全表扫描要优于使用索引(结果集占整个表记录的比例很高)

SQL>select TABLE_NAME,OWNER,LAST_ANALYZED,NUM_ROWS from dba_tables where OWNER='CCIC' and TABLE_NAME='PRPLCOMPENSATE';

但是这张表有4000MSQL如下),全表扫描的消耗也会很大

select 'PRPLCOMPENSATE' segment_name,

'CCIC' owner,

sum(bytes) / 1024 / 1024 || 'M' capacity

from dba_segments

where segment_name = 'PRPLCOMPENSATE'

and owner = 'CCIC';

4.最后的判断:

最根本的原因:对PK的使用不好(不是PK不好,而是本sqlPK的使用不好),对PK的使用条件(compensateno like 'CDD%')不能筛选掉大多数数据,c的结果应该是90w记录,而使用PK筛选得到900W记录。这与我们使用索引的初衷是不相符的(我们希望使用索引能够尽可能多的过滤掉数据,这样才够快)。

另外一个佐证,执行以下sql

SELECT count(1) FROM CCIC.PRPLCOMPENSATE WHERE compensateno like 'C8%';

形式跟SELECT count(1) FROM CCIC.PRPLCOMPENSATE WHERE compensateno like 'CDD%';一样,但是条件的筛选度高,速度1-2s

10:32:56 SQL>SELECT count(1) FROM CCIC.PRPLCOMPENSATE WHERE compensateno like 'C8%';

COUNT(1)

-------------

9763692

10:32:58 SQL>SELECT count(1) FROM CCIC.PRPLCOMPENSATE WHERE compensateno like 'C8%';

COUNT(1)

-------------

9763692

10:32:59 SQL>

5.优化建议:

因为最主要的原因是使用索引的条件筛选率低,所以解决方法也从这里入手。

1-提高使用PK的效率。进一步挖掘业务逻辑的关联性,从而改变对PK的使用。比如c子查询中还有条件SUBSTR(FLAG, 4, 1) = '0',那么这个条件在业务含义上是否能进一步精确对PK的使用?比如根据业务逻辑凡是满足SUBSTR(FLAG, 4, 1) = '0'的纪录,compensateno都是以CCDXXXXX开头,那么相应可以改写compensateno like 'CDD%' compensateno like 'CDDXXXXX%',这样可能对PK的使用更精确,能够过滤更多的数据。

2-谓词条件不变的情况下使用新的更高效的索引。如使用组合索引,跟据c子查询中的若干条件确定一个组合索引,使用该组合索引提高筛选率。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21129591/viewspace-1052095/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/21129591/viewspace-1052095/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值