hive left outer join where 条件问题


select count(1) from s_ods_trade where part ='2012-10-31';
22076
select count(1) from s_ods_trade
104343

select count(1) from s_ods_trade_full where part ='2012-10-31';
11456
select count(1) from s_ods_trade_full
53049

SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part='2012-10-31' and b.part='2012-10-31');
104343

SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part=b.part and a.part='2012-10-31');
104343

SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid ) where a.part='2012-10-31' and b.part='2012-10-31';
11456
SELECT count(1) FROM s_ods_trade a left outer JOIN s_ods_trade_full b ON (a.dp_id = b.dp_id AND a.tid = b.tid and a.part=b.part) where a.part='2012-10-31';
22076

倒数第二个第三个sql很是不解。
最后一个sql效果相同语句,效率会高一些:
SELECT count(1) FROM (select * from s_ods_trade where part ='2012-10-31') a left outer JOIN (select * from s_ods_trade_full where part ='2012-10-31' ) b ON (a.dp_id = b.dp_id AND a.tid = b.tid); 

官网解释https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#:

Joins occur BEFORE WHERE CLAUSES. So, if you want to restrict the OUTPUT of a join, a requirement should be in the WHERE clause, otherwise it should be in the JOIN clause. A big point of confusion for this issue is partitioned tables:
SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)
WHERE a.ds='2009-07-07' AND b.ds='2009-07-07'
will join a on b, producing a list of a.val and b.val. The WHERE clause, however, can also reference other columns of a and b that are in the output of the join, and then filter them out. However, whenever a row from the JOIN has found a key for a and no key for b, all of the columns of b will be NULL, including the ds column. This is to say, you will filter out all rows of join output for which there was no valid b.key, and thus you have outsmarted your LEFT OUTER requirement. In other words, the LEFT OUTER part of the join is irrelevant if you reference any column of b in the WHERE clause. Instead, when OUTER JOINing, use this syntax:
SELECT a.val, b.val FROM a LEFT OUTER JOIN b
ON (a.key=b.key AND b.ds='2009-07-07' AND a.ds='2009-07-07')
..the result is that the output of the join is pre-filtered, and you won't get post-filtering trouble for rows that have a valid a.key but no matching b.key. The same logic applies to RIGHT and FULL joins.


[img]http://dl.iteye.com/upload/attachment/0076/2244/1ede0eb4-2a2f-3c2d-96ef-73c2d64d86d1.jpg[/img]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值