前言
SQL优化的终极境界是回归需求本质。理解需求,正是高效SQL的起点,而改写SQL往往是优化的关键手段。
本系列暂时不对执行计划做过多的分析,虽然它在SQL优化中至关重要。原因很简单:如果不能深入理解业务需求,即使精通执行计划分析,面对那些"九曲十八弯"的复杂SQL,我们依然会束手无策。
文中案例的SQL语句全部来自真实的生产系统,SQL语句大多非常繁琐,希望大家有空的时候能克服烦躁,耐心阅读完。我在审核这样的SQL时不但要看懂这么繁琐的语句,还要完成高效的等价改写来调优,这需要读懂开发人员代码编写的需求背后的真正含义。
低效语句往往在开发投入生产初期由于数据量少而不能被识别出来,等运行了一段时间数据量大涨后才暴露出来。根据经验,生产中和数据库相关的80%的故障都来自不良SQL,因此今天根据审核SQL的心得,总结为如下三部分,共计12个经典案例,希望对大家有用。
一.统一天下(合并代码,减少表扫描)
合理整合SQL逻辑,一次查询解决多个需求,让数据库少跑路。
案例1 注重COUNT(CASE WHEN)的合并魔力
看到这个SQL,你可能第一反应是"天啊,这么长!":
select distinct ne_state.peer_id peer_name,
to_char(ne_state.ne_state) peer_state,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
(select distinct to_char(nvl(ne_active.active, 0))
from dcc_sys_log,
(select peer_id,
decode(action,
'active',
1,
'de-active',
0,
0) active,
max(log_time)
from dcc_sys_log
where action = 'active'
or action = 'de-active'
group by (peer_id, action)) ne_active
where dcc_sys_log.peer_id = ne_active.peer_id(+)
and dcc_sys_log.peer_id = ne_state.peer_id)
end) peer_active,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
(to_char(nvl((select count(*)
from dcc_ne_log
where dcc_ne_log.result <> 1
and peer_id = ne_state.peer_id
and log_time between
trunc(sysdate) and sysdate
group by (peer_id)),
0)))
end) err_cnt,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
(to_char(nvl((select count(*)
from dcc_ne_log in_dnl
where in_dnl.direction = 'recv'
and in_dnl.peer_id =
ne_state.peer_id
and log_time between
trunc(sysdate) and sysdate),
0)))
end) recv_cnt,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
(to_char(nvl((select sum(length)
from dcc_ne_log in_dnl
where in_dnl.direction = 'recv'
and in_dnl.peer_id =
ne_state.peer_id
and log_time between
trunc(sysdate) and sysdate),
0)))
end) recv_byte,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
(to_char(nvl((select count(*)
from dcc_ne_log in_dnl
where in_dnl.direction = 'send'
and in_dnl.peer_id =
ne_state.peer_id
and log_time between
trunc(sysdate) and sysdate),
0)))
end) send_cnt,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
(to_char(nvl((select sum(length)
from dcc_ne_log in_dnl
where in_dnl.direction = 'send'
and in_dnl.peer_id =
ne_state.peer_id
and log_time between
trunc(sysdate) and sysdate),
0)))
end) send_byte
from dcc_ne_log,
(select distinct dsl1.peer_id peer_id,
nvl(ne_disconnect_info.ne_state, 1) ne_state
from dcc_sys_log dsl1,
(select distinct dnl.peer_id peer_id,
decode(action,
'disconnect',
0,
'connect',
0,
1) ne_state
from dcc_sys_log dsl, dcc_ne_log dnl
where dsl.peer_id = dnl.peer_id
and ((dsl.action = 'disconnect' and
dsl.cause = '关闭对端') or
(dsl.action = 'connect' and
dsl.cause = '连接主机失败'))
and dsl.log_time =
(select max(log_time)
from dcc_sys_log
where peer_id = dnl.peer_id
and log_type = '对端交互')) ne_disconnect_info
where dsl1.peer_id = ne_disconnect_info.peer_id(+)) ne_state
where ne_state.peer_id = dcc_ne_log.peer_id(+)
问题诊断:
- 一眼看去,代码冗长复杂
- 多次重复扫描同一张表(dcc_ne_log),仅条件不同
- 存在多个子查询和复杂嵌套
- 执行时间:超过300秒!
优化思路:寻找合并机会
分析发现:err_cnt、recv_cnt、recv_byte、send_cnt、send_byte这5个字段全部来自同一张表,只是过滤条件不同。这是典型的"可合并查询"场景!
魔法武器:COUNT(CASE WHEN)和SUM(CASE WHEN)
select peer_id
,COUNT(CASE WHEN RESULT <> 1 THEN 1 END) err_cnt
,COUNT(CASE WHEN direction = 'recv' THEN 1 END) recv_cnt
,SUM(CASE WHEN direction = 'recv' THEN length END) recv_byte
,COUNT(CASE WHEN direction = 'send' THEN 1 END) send_cnt
,SUM(CASE WHEN direction = 'send' THEN length END) send_byte
from dcc_ne_log
where log_time >=trunc(sysdate)
GROUP BY peer_id
进一步分析如下ne_disconnect_info结果集的写法:
select distinct dsl1.peer_id peer_id,
nvl(ne_disconnect_info.ne_state, 1) ne_state
from dcc_sys_log dsl1,
(select distinct dnl.peer_id peer_id,
decode(action,
'disconnect',
0,
'connect',
0,
1) ne_state
from dcc_sys_log dsl, dcc_ne_log dnl
where dsl.peer_id = dnl.peer_id
and ((dsl.action = 'disconnect' and
dsl.cause = '关闭对端') or
(dsl.action = 'connect' and
dsl.cause = '连接主机失败'))
and log_type = '对端交互'
and dsl.log_time =
(select max(log_time)
from dcc_sys_log
where peer_id = dnl.peer_id
and log_type = '对端交互')) ne_disconnect_info
where dsl1.peer_id = ne_disconnect_info.peer_id(+)
为了清晰的将结构展现出来,将上述两次改造的脚本用WITH子句分别封装为ne_state和dcc_ne_log_time,最终整体SQL语句改写完毕后,代码量大大减少的同时性能极大的提升了,表扫描次数从总计7次缩减为2次,执行完成时间从原先的300秒缩短为5秒。完整改造后的最终SQL代码优雅精致,具体如下:
with ne_state as
(SELECT a.peer_id,
CASE WHEN dnl.peer_id IS NOT NULL AND str IN ('disconnect关闭对端','connect连接主机失败') THEN '0' ELSE '1' END ne_state
FROM (SELECT peer_id,MIN(action||cause) KEEP(DENSE_RANK LAST ORDER BY log_time) str
FROM dcc_sys_log dsl
WHERE log_type = '对端交互'
GROUP BY peer_id
) a,(SELECT DISTINCT peer_id FROM dcc_ne_log) dnl
WHERE a.peer_id = dnl.peer_id(+)),
dcc_ne_log_time as (select peer_id
,COUNT(CASE WHEN RESULT <> 1 THEN 1 END) err_cnt
,COUNT(CASE WHEN direction = 'recv' THEN 1 END) recv_cnt
,SUM(CASE WHEN direction = 'recv' THEN length END) recv_byte
,COUNT(CASE WHEN direction = 'send' THEN 1 END) send_cnt
,SUM(CASE WHEN direction = 'send' THEN length END) send_byte
from dcc_ne_log
where log_time >=trunc(sysdate) ---- between trunc(sysdate) and sysdate
GROUP BY peer_id)
select distinct ne_state.peer_id peer_name,
to_char(ne_state.ne_state) peer_state,
(case
when ne_state.ne_state = 0 then
to_char(0)
else
NVL((select '1' from dcc_sys_log where peer_id = ne_state.peer_id and action = 'active' and rownum=1),'0')
end) peer_active,
decode(ne_state.ne_state,0,'0',nvl(dnlt.ERR_CNT,0)) ERR_CNT, ---注意NVL改造
decode(ne_state.ne_state,0,'0',nvl(dnlt.recv_cnt,0)) recv_cnt,
decode(ne_state.ne_state,0,'0',nvl(dnlt.recv_byte,0)) recv_byte,
decode(ne_state.ne_state,0,'0',nvl(dnlt.send_cnt,0)) send_cnt,
decode(ne_state.ne_state,0,'0',nvl(dnlt.send_byte,0)) send_byte
from ne_state ,dcc_ne_log_time dnlt
where ne_state.peer_id=dnlt.peer_id(+)
关键启示
本案例完美诠释了"统一天下"的优化理念。当代码逻辑散落在多个相似的子查询中时,我们可以通过巧妙的SQL技巧(如条件聚合)将它们合并,减少数据访问次数,实现性能的质的提升。
优化SQL前,不要急于分析执行计划细节,而是先退一步看全局:这段SQL到底想完成什么?有没有更简洁的表达方式?回归需求本质,往往能找到突破性的优化思路。
未完待续…
SQL等价改写优化案例精选第一部【一统天下】②分析函数的穿透之力