梁敬彬梁敬弘兄弟出品
往期回顾
SQL等价改写优化案例精选第一部【一统天下】① count(case when)的合并魔力
SQL等价改写优化案例精选第一部【一统天下】②分析函数的穿透之力
SQL等价改写优化案例精选第一部【一统天下】③ group by (case when)的聚合艺术
SQL等价改写优化案例精选第一部【一统天下】④ or 与 union all的优化交响
SQL等价改写优化案例精选第二部【精兵简政】① 莫名其妙的行列转换
在上一篇案例中,我们探讨了"莫名其妙"的行列转换如何导致SQL性能低下。本篇,我们将继续探讨"精兵简政"理念,聚焦于另一个常见的代码编写误区——无脑复制粘贴。这种做法不仅使代码冗长难维护,还会带来严重的性能问题。
案例6 代码需要如此的复制粘贴吗?
select distinct to_char(a.svcctx_id),
to_char(0),
to_char(nvl((select peer_id
from dcc_ne_config
where peer_name = a.peer),
0)),
to_char(a.priority),
to_char(nvl((select peer_id
from dcc_ne_config
where peer_name = b.peer),
0)),
to_char(b.priority),
to_char(nvl((select peer_id
from dcc_ne_config
where peer_name = c.peer),
0)),
to_char(c.priority)
from (select hopbyhop,
svcctx_id,
substr(cause,
instr(cause, 'Host = ') + 7,
instr(cause, 'Priority = ') - instr(cause, 'Host = ') - 11) peer,
substr(cause,
instr(cause, 'Priority = ') + 11,
instr(cause, 'reachable = ') -
instr(cause, 'Priority = ') - 13) priority
from dcc_sys_log
where cause like '%SC路由应答%'
and hopbyhop in (select distinct hopbyhop from dcc_sys_log)) a,
(select hopbyhop,
svcctx_id,
substr(cause,
instr(cause, 'Host = ') + 7,
instr(cause, 'Priority = ') - instr(cause, 'Host = ') - 11) peer,
substr(cause,
instr(cause, 'Priority = ') + 11,
instr(cause, 'reachable = ') -
instr(cause, 'Priority = ') - 13) priority
from dcc_sys_log
where cause like '%SC路由应答%'
and hopbyhop in (select distinct hopbyhop from dcc_sys_log)) b,
(select hopbyhop,
svcctx_id,
substr(cause,
instr(cause, 'Host = ') + 7,
instr(cause, 'Priority = ') - instr(cause, 'Host = ') - 11) peer,
substr(cause,
instr(cause, 'Priority = ') + 11,
instr(cause, 'reachable = ') -
instr(cause, 'Priority = ') - 13) priority
from dcc_sys_log
where cause like '%SC路由应答%'
and hopbyhop in (select distinct hopbyhop from dcc_sys_log)) c
where a.hopbyhop = b.hopbyhop
and a.hopbyhop = c.hopbyhop
and a.peer <> b.peer
and a.peer <> c.peer
and b.peer <> c.peer
and a.priority <> b.priority
and a.priority <> c.priority
and b.priority <> c.priority
一眼望去,这段SQL充满了复制粘贴的痕迹。同一个子查询被复制了三次,只是别名不同(a、b、c)。这种写法不仅使代码臃肿难读,更严重的是,它导致数据库需要多次扫描相同的表和执行相同的计算逻辑,极大地降低了查询性能。
这个SQL语句其实是为了实现行列转换,但是效率非常低,扫描表多次,而且最终结果也不正确。如何进行分析呢?
需求理解
首先了解该语句的需求是什么,经了解,需求如下:
DROP TABLE TEST;
CREATE TABLE TEST ( ID1 NUMBER,ID2 NUMBER,VALUE1 VARCHAR2(20),VALUE2 VARCHAR2(20));
INSERT INTO TEST VALUES (1,2,'A','B');
INSERT INTO TEST VALUES (1,2,'C','D');
INSERT INTO TEST VALUES (1,2,'E','F');
INSERT INTO TEST VALUES (1,2,'G','H');
INSERT INTO TEST VALUES (3,8,'I','J');
INSERT INTO TEST VALUES (3,8,'K','L');
INSERT INTO TEST VALUES (3,8,'M','N');
INSERT INTO TEST VALUES (8,9,'O','P');
INSERT INTO TEST VALUES (8,9,'Q','R');
INSERT INTO TEST VALUES (11,12,'S','T');
COMMIT;
SQL> SELECT * FROM TEST;
ID1 ID2 VALUE1 VALUE2
---------- ---------- -------------------- ------------------------------
1 2 A B
1 2 C D
1 2 E F
1 2 G H
3 8 I J
3 8 K L
3 8 M N
8 9 O P
8 9 Q R
11 12 S T
10 rows selected
要求为(行列转换,超过3个的只取三个,不足3个的用空格来补列)
ID1 ID2 VALUE1 VALUE2 VALUE3 VALUE4 VALUE5 VALUE6
---------- ---------- -------------------- ----------------------------------------------------------------------------------------------------------------------------
1 2 A B C D E F
3 8 I J K L M N
8 9 O P Q R NULL NULL
11 12 S T NULL NULL NULL NULL
问题分析
原SQL存在以下几个主要问题:
- 重复子查询:相同的子查询被复制三次,导致数据库必须重复执行相同的逻辑
- 自连接复杂性:使用三个相同子查询的自连接,并通过多个条件进行筛选,逻辑复杂且性能低下
- 字符串处理僵硬:使用固定偏移量进行字符串截取,缺乏灵活性
- IN子查询低效:hopbyhop in (select distinct hopbyhop from
dcc_sys_log)子句使得每个子查询都会扫描同一个表两次
优化方案
针对行列转换需求,我们可以使用更高效的方法,我们可以通过MAX+分析函数实现如下:
SELECT ID1,ID2
,MAX(DECODE(RN,1,VALUE1))
,MAX(DECODE(RN,1,VALUE2))
,MAX(DECODE(RN,2,VALUE1))
,MAX(DECODE(RN,2,VALUE2))
,MAX(DECODE(RN,3,VALUE1))
,MAX(DECODE(RN,3,VALUE2))
FROM (SELECT TEST.*, ROW_NUMBER() OVER(PARTITION BY ID1,ID2 ORDER BY VALUE1,VALUE2) RN FROM TEST) T
WHERE RN<=3
GROUP BY ID1,ID2;
这种方式利用分析函数和条件聚合实现行列转换,只需一次表扫描,代码简洁明了。对于原始SQL中的字符串处理,我们也可以采用更灵活的方式:
with data as (SELECT 'SC路由应答:Host = SR2@001.ChinaTelecom.com, Priority = 1, reachable = true' as str
,'Host = ' k1
,'Priority = ' k2
FROM DUAL)
,data2 AS (SELECT data.*,INSTR(str,k1) p1,INSTR(str,k2) p2 FROM data)
select SUBSTR(str,p1+LENGTH(k1),INSTR(str,',',p1+1)-p1-LENGTH(k1))
,SUBSTR(str,p2+LENGTH(k2),INSTR(str,',',p2+1)-p2-LENGTH(k2))
from data2;
这样通过计算相对位置来提取字符串,更加健壮可靠。
完整优化方案
将上述思路应用到原始SQL中,优化后的代码如下:
select distinct to_char(svcctx_id),
to_char(0),
to_char(nvl((select peer_id
from dcc_ne_config
where peer_name = PEER1),
0)),
to_char(priority1),
to_char(nvl((select peer_id
from dcc_ne_config
where peer_name =PEER2),
0)),
to_char(priority2),
to_char(nvl((select peer_id
from dcc_ne_config
where peer_name = PEER3),
0)),
to_char(priority3)
from
(with data as (select hopbyhop,
svcctx_id,
cause as str,
'Host = ' k1,
'Priority = ' k2
from dcc_sys_log where cause like '%SC路由应答%')
,data2 as (select data.*,instr(str,k1) p1, instr(str,k2) p2 from data)
,data3 as
(select hopbyhop,
svcctx_id,
SUBSTR(str,p1+LENGTH(k1),INSTR(str,',',p1+1)-p1-LENGTH(k1)) peer
,SUBSTR(str,p2+LENGTH(k2),INSTR(str,',',p2+1)-p2-LENGTH(k2)) PRIORITY
from data2)
SELECT hopbyhop,svcctx_id,
MAX(DECODE(RN,1,PEER)) PEER1
,MAX(DECODE(RN,1,PRIORITY)) PRIORITY1
,MAX(DECODE(RN,2,PEER)) PEER2
,MAX(DECODE(RN,2,PRIORITY)) PRIORITY2
,MAX(DECODE(RN,3,PEER)) PEER3
,MAX(DECODE(RN,3,PRIORITY)) PRIORITY3
FROM (SELECT data3.*, ROW_NUMBER() OVER(PARTITION BY hopbyhop,svcctx_id ORDER BY PEER,PRIORITY) RN FROM data3)
WHERE RN<=3
GROUP BY hopbyhop,svcctx_id) t2
优化后的SQL具有以下显著优势:
- 只扫描dcc_sys_log表一次,而不是原来的三次
- 使用CTE(WITH子句)清晰地组织代码逻辑,提高可读性
- 通过分析函数和条件聚合优雅地实现行列转换
- 采用相对位置计算进行字符串处理,提高代码健壮性
- 消除了复杂的自连接条件和多余的过滤条件
总结:告别"无脑复制粘贴"的习惯
本案例揭示了SQL开发中一个常见但危险的习惯——通过复制粘贴来处理看似类似的逻辑。这种方式看似方便,实则带来了性能负担和维护困难。在"精兵简政"的理念下,我们应当始终寻求更优雅、更高效的解决方案。
使用现代SQL特性如公共表表达式(CTE)、分析函数和条件聚合,能够大幅简化代码并提升性能。这不仅仅是技术上的改进,更是思维方式的转变——从机械地堆砌代码转向优雅地解决问题。
在日常SQL开发中,当你发现自己正在复制粘贴代码块时,请停下来思考:是否存在更简洁、更高效的方式来实现相同功能?这种思考习惯的养成,正是从"无脑复制粘贴"向"精兵简政"迈进的关键一步。
未完待续…
SQL等价改写优化案例精选第二部【精兵简政】③ 有困难先将需求最小化