https://blog.csdn.net/qq_42246689/article/details/84702253
https://blog.csdn.net/zhangge360/article/details/84865567
In的实现:
Hive中的in的实现方式很多,简单说几种:
- 用left semi join实现
- 用left outer join+is not null实现
- Inner join实现
Not in的实现:
Left outer join+is null
举例说明:
有两个表如下:
skim表
userID | itemID | time |
001 | 342 | 2015-05-08 |
002 | 382 | 2015-05-09 |
002 | 458 | 2015-05-09 |
004 | 325 | 2015-05-09 |
userID | itemID | time |
001 | 342 | 2015-05-07 |
002 | 382 | 2015-05-08 |
003 | 458 | 2015-05-09 |
004 | 325 | 2015-05-09 |
IN实现:
如果要查询在skim表中并且也在buy表中的信息,需要用in查询,hive sql如下:
select skim.userId , skim.itemId from skim left outer join buy
on skim.userId = buy .userId and skim.itemId = buy .itemId where buy .userId is not null;
或
select skim.userId , skim.itemId from skim left semi join buy
on skim.userId = buy .userId and skim.itemId = buy .itemId;
或
select skim.userId , skim.itemId from skim join buy
on skim.userId = buy .userId and skim.itemId = buy .itemId;
结果如下:
userID | itemID |
001 | 342 |
002 | 382 |
003 | 458 |
NOT IN实现:
如果要查询在skim表中并且不在buy表中的信息,需要用not in查询,hive sql如下:
select skim.userId, skim.itemId from skim left outer join buy
on skim.userId=buy .userId and skim.itemId=buy .itemId where buy .userId is null;
结果如下:
userID | itemID |
004 | 468 |
Hive 不支持 where 子句中的子查询, SQL 常用的 exist in 子句需要改写。这一改写相对简单。考虑以下 SQL 查询语句:
-
SELECT a.key, a.value
-
FROM a
-
WHERE a.key in
-
(SELECT b.key
-
FROM B);
可以改写为
-
SELECT a.key, a.value
-
FROM a LEFT OUTER JOIN b ON (a.key = b.key)
-
WHERE b.key <> NULL;
一个更高效的实现是利用 left semi join 改写为:
-
SELECT a.key, a.val
-
FROM a LEFT SEMI JOIN b on (a.key = b.key);
left semi join 是 0.5.0 以上版本的特性。hive 的 left semi join 讲解https://blog.csdn.net/happyrocking/article/details/79885071
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
not exists 例子
-
select a, b
-
from table1 t1
-
where not exists (select 1
-
from table2 t2
-
where t1.a = t2.a
-
and t1.b = t2.b)
可以改为
-
select t1.a, t2.b
-
from table1 t1
-
left join table2 t2
-
on (t1.a = t2.a and t1.b = t2.b)
-
where t2.a is null