hql的join on操作只支持相等条件,比如:
select * from a join b on a.id=b.id;
但是不支持相等条件以外的情况,比如:
select * from a join b on a.id <> b.id;
select * from a join b on a.name like '%'+b.name+'%';
这是因为Hive很难把不等条件翻译成mapreduce job;
但是工作中我们需要实现不等条件,比如微博需要向用户推送私信,但白名单的用户除外,现在全部用户的uid在表weibouid表的alluid分区,白名单在baimingdan分区,现在用join on实现去除alluid中的白名单uid,两个分区的uid是去重的。
第一种方式:
select uid from
(select uid from weibouid where part='alluid')a
join
(select uid,count(uid) as c from weibouid where part='alluid' or part='baimingdan' group by uid having c=1)b on a.uid=b.uid;
第二种方式:
select a.uid from
(select a.uid as auid,b.uid as buid from
(select uid from weibouid where part='alluid')a
left join
(select uid from weibouid where part='baimingdan')b
on a.uid = b.uid)c where c.buid is null;
显然第二种方式效率更高,推荐第二种。如果有其他方式希望留言。。。