在工作中我们经常需要加工表,可能join很多张表然后将最后的结果插入到一个新表里面,如果查询带where语句那么我们可以进行如下优化:
- 例如有如下伪代码,将查询的结果插入APP_NEW表中:
insert overwrite table APP_NEW
select
a.x1,
a.x2,
b.x3,
b.x4,
b.x5,
c.x6,
c.x7,
c.x8,
d.x9
from
a
left outer join b on a.x2 = b.xx1
left outer join b on a.x2 = c.x6
left outer join b on c.x6 = d.xx2
where
b.xx3 > 80
- 我们对它进行如下优化操作,先过滤where条件然后插入元表,没有的字段以NULL代替,然后再次join其他表:
--先进行where操作:
insert overwrite table APP_NEW
select
a.x1,
a.x2,
NULL,
NULL,
NULL,
c.x6,
c.x7,
c.x8,
NULL
from
a
left outer join b on a.x2 = c.x6
where
b.xx3 > 80
--然后再进行下面操作:
insert overwrite table APP_NEW
select
a1.x1,
a1.x2,
b.x3,
b.x4,
b.x5,
a1.x6,
a1.x7,
a1.x8,
d.x9
from
APP_NEW a1
left outer join b on a1.x2 = b.xx1
left outer join b on a1.x6 = d.xx2
注意:
有人可能会按照如下去优化,但是把筛选条件写在join on上面会有问题。(这里可以参考http://blog.csdn.net/qq_20641565/article/details/52950087)
insert overwrite table APP_NEW
select
a.x1,
a.x2,
b.x3,
b.x4,
b.x5,
c.x6,
c.x7,
c.x8,
d.x9
from
a
left outer join b on a.x2 = c.x6 and b.xx3 > 80
left outer join b on a.x2 = b.xx1
left outer join b on c.x6 = d.xx2
- 结论
主要原则就是先进行where筛选。