hive谓词下推

谓词下推

在不影响结果的情况下,尽量将过滤条件提前执行。谓词下推后,过滤条件在map端执行,减少了map端的输出,降低了数据在集群上传输的量,节约了集群的资源,也提升了任务的性能。

概念引入

Preserved Row table保留表在outer join中需要返回所有数据的表叫做保留表,即在left outer join中,左表是保留表;right outer join中右表则是保留表;在full outer join中左表和右表都要返回所有数据,则左右表都是保留表
Null Supplying table空表在outer join中对于没有匹配到的行需要用null来填充的表称为空表。在left outer join中,左表的数据全返回,对于左表在右表中无法匹配的数据的相应列用null表示,则此时右表是空表,相应的如果是right outer join的话,左表是空表。但是在full outer join中左表和右表都是Null Supplying table,因为左表和右表都会用null来填充无法匹配的数据。
During Join predicateJoin中的谓词Join中的谓词是指Join On语句中的谓词。如:a join b on a.type=1 那么a.type=1是Join中的谓词
After Join predicateJoin之后的谓词where语句中的谓词称之为Join之后的谓词

规则的逻辑描述如下:

在 join关联情况下,过滤条件无论在join中还是where中谓词下推都生效;
在full join关联情况下,过滤条件无论在join中还是where中谓词下推都不生效。

保留表的谓词写在join中不能下推
During Join predicates cannot be pushed past Preserved Row tables.
空表的谓词写在join之后不能下推
After Join predicates cannot be pushed past Null Supplying tables.

四种关联情况下谓词是否下推

Pushed or NotSQL
Pushedselect ename,dept_name from E join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Pushedselect ename,dept_name from E join D on E.dept_id = D.dept_id where E.eid='HZ001';
Pushedselect ename,dept_name from E join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Pushedselect ename,dept_name from E join D on E.dept_id = D.dept_id where D.dept_id='D001';
Not Pushedselect ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Pushedselect ename,dept_name from E left outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
Pushedselect ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Not Pushedselect ename,dept_name from E left outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
Pushedselect ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Not Pushedselect ename,dept_name from E right outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
Not Pushedselect ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Pushedselect ename,dept_name from E right outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
Not Pushedselect ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
Not Pushedselect ename,dept_name from E full outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
Not Pushedselect ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
Not Pushedselect ename,dept_name from E full outer join D on E.dept_id = D.dept_id where D.dept_id='D001';

 

 join(inner join)left outer joinright outer joinfull outer join
left tableright tableleft tableright tableleft tableright tableleft tableright table
joinPushedPushedNot PushedPushedPushedNot PushedNot PushedNot Pushed
wherePushedPushedPushedNot PushedNot PushedPushedNot PushedNot Pushed


注意!!如果在表达式中含有不确定函数,整个表达式的谓词将不会被下推,其中unix_timestamp()是未知的,在编译时无法得知,此外还有rand等不确定函数使谓词下推无效

select *
from a join b 
on a.id = b.id
where a.create_time = unix_timestamp();

参考 https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值