Hive 不支持 where 子句中的子查询, SQL 常用的 exist in 子句需要改写。这一改写相对简单。考虑以下 SQL 查询语句:
SELECT a.key, a.value
FROM a
WHERE a.key in
(SELECT b.key
FROM B);
可以改写为
SELECT a.key, a.value
FROM a LEFT OUTER JOIN b ON (a.key = b.key)
WHERE b.key <> NULL;
一个更高效的实现是利用 left semi join 改写为:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key);
left semi join 是 0.5.0 以上版本的特性。hive 的 left semi join 讲解https://blog.csdn.net/happyrocking/article/details/79885071
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
not exists 例子
select a, b
from table1 t1
where not exists (select 1
from table2 t2
where t1.a = t2.a
and t1.b = t2.b)
可以改为
select t1.a, t2.b
from table1 t1
left join table2 t2
on (t1.a = t2.a and t1.b = t2.b)
where t2.a is null