前一段时间,同学写sql碰到一个问题:两段sql连接看似一样,但是条数却不一样。原因是在左连接之后的where条件对数据进行了筛选,使得数据变少了。两段sql如下:
select count(1)
from (select *
from KXAPP.I_CASH_LOAN_WHITELIST_C2_V_NEW b
where b.BG_DT < '20180820'
and b.EN_DT >= '20180904') gh
left join (select *
from KXAPP.I_CASH_LOAN_WISH_SCORE
where busi_dt = '20180820') hg
on gh.loan_no = hg.loan_no
select count(1)
from KXAPP.I_CASH_LOAN_WHITELIST_C2_V_NEW b
left join KXAPP.I_CASH_LOAN_WISH_SCORE c
on b.loan_no = c.loan_no
where b.BG_DT < '20180820'
and b.EN_DT >= '20180904'
and busi_dt = '20180820') hg
乍一看,两段sql好像没有什么区别。第一个是先通过条件取子表,再链接;第二个是先进行链接,在通过条件进行筛选。统计后,发现上边的数据量要比下面的多。
其实这两个链接是有区别的。第一种链接方式,主表是gh;第二种链接方式,主表是b。第二种链接方式数据量少于第一种链接方式的原因是:链接后添加了where对链接结果进行了筛选。
下面通过一个例子说明:
表A结构如下:
表B结构如下:
select *
from (select 1 as a, 2 as b
from dual
union all
select 2 as a, 2 as b from dual) a
left join (select 3 as a, 2 as b
from dual
union all
select 2 as a, 2 as b from dual) b
on a.a = b.a
and b.a = 1;
select *
from (select 1 as a, 2 as b
from dual
union all
select 2 as a, 2 as b from dual) a
left join (select 3 as a, 2 as b
from dual
union all
select 2 as a, 2 as b from dual) b
on a.a = b.a
where b.a = 1;