对于有一定数据量的表,使用 IN 和Not IN往往效率很低
在项目中遇到这样一个问题:
原来表中有7000多条数据,日常查询、跑应用都很流畅,而当数据量增加到3w条左右时,运行速度慢到无法忍受。
经过简单排查,将问题定位在了一条使用 Not IN 进行查询的sql语句上:
SELECT * FROM tb_case WHERE SYMBOL!= 9 AND
CASENO NOT IN
(SELECT CASENO FROM tb_carout WHERE STATUS=25 AND SYMBOL!=9)
把这条语句拎出来单独执行,足足跑了一分多钟。。
该语句之所以效率低下,很大一部分原因是Not IN不走索引。
解决办法很简单——避免使用Not IN
1.使用Not Exists
SELECT COUNT(*) FROM tb_case a WHERE SYMBOL!= 9 and
NOT exists
(SELECT CASENO FROM tb_carout b WHERE a.caseno=b.caseno and STATUS=25 AND SYMBOL!=9)
2.使用Left Join
SELECT * FROM tb_case a
left join tb_carout b on a.caseno=b.caseno and b.STATUS=25 and b.SYMBOL!=9
where b.caseno is null and a.symbol!=9
提升效果很明显,执行时间不足0.1s
- IN和Exists的区别
简单来说,前者是非相关子查询,子查询先执行,且只执行一次,执行完毕后将值传递给外层查询;后者是相关子查询,将外层查询的一个元组传递给内层查询,然后执行内层查询,外层查询根据返回的结果集得到满足条件的记录,重复这个过程直到外层查询的所有元组都处理完毕。
从这个过程来看,非相关子查询比相关子查询效率高。
详细过程参考SQL相关子查询与非相关子查询
最后记录下多个表Left Join的写法
SELECT TB_CASE.CASENO AS CASENO,TB_CASE.LAWGROUP AS LAWGROUP,S1.ROLE_NAME AS LGNAME,TB_CASE.CARPARK AS CARPARK,
S2.ROLE_NAME AS CARPARKNAME,TB_CASE.LAWTIME AS LAWTIME,TB_CASE.CARTYPE AS CARTYPE,TB_CASE.CARNO AS CARNO,
TB_CASE.STATUS AS STATUS,TB_CASE.IS_CLONE AS IS_CLONE,TB_CASE.BCFS AS BCFS
FROM
TB_CASE,SYS_ROLE S1, SYS_ROLE S2
WHERE
((NOT(TB_CASE.CASENO IN (SELECT TB_CAROUT.CASENO FROM TB_CAROUT WHERE ((TB_CAROUT.STATUS = '25') AND (TB_CAROUT.SYMBOL <> 9)))))
AND (TB_CASE.LAWGROUP = S1.ROLE_ID) AND (TB_CASE.CARPARK = S2.ROLE_ID))
;
Left Join写法:
SELECT T1.CASENO AS CASENO,T1.LAWGROUP AS LAWGROUP,S1.ROLE_NAME AS LGNAME,T1.CARPARK AS CARPARK,S2.ROLE_NAME AS CARPARKNAME,
T1.LAWTIME AS LAWTIME,T1.CARTYPE AS CARTYPE,T1.CARNO AS CARNO,T1.STATUS AS STATUS,T1.IS_CLONE AS IS_CLONE,
T1.BCFS AS BCFS FROM TB_CASE T1
left join TB_CAROUT T2 on T1.CASENO=T2.CASENO and T2.STATUS='25' AND T2.SYMBOL!=9
left join SYS_ROLE S1 ON T1.LAWGROUP=S1.ROLE_ID
left join SYS_ROLE S2 ON T1.CARPARK=S2.ROLE_ID
where T2.CASENO IS NULL;